validity test msvt: Topics by Science.gov

Sample records for validity test msvt

Evaluating the Medical Symptom Validity Test (MSVT) in a Sample of Veterans Between the Ages of 18 to 64.

PubMed

Reslan, Summar; Axelrod, Bradley N

2017-01-01

The purpose of the current study was to compare three potential profiles of the Medical Symptom Validity Test (MSVT; Pass, Genuine Memory Impairment Profile [GMIP], and Fail) on other freestanding and embedded performance validity tests (PVTs). Notably, a quantitatively computed version of the GMIP was utilized in this investigation. Data obtained from veterans referred for a neuropsychological evaluation in a metropolitan Veteran Affairs medical center were included (N = 494). Individuals age 65 and older were not included to exclude individuals with dementia from this investigation. The sample revealed 222 (45%) in the Pass group. Of the 272 who failed the easy subtests of the MSVT, 221 (81%) met quantitative criteria for the GMIP and 51 (19%) were classified as Fail. The Pass group failed fewer freestanding and embedded PVTs and obtained higher raw scores on all PVTs than both GMIP and Fail groups. The differences in performances of the GMIP and Fail groups were minimal. Specifically, GMIP protocols failed fewer freestanding PVTs than the Fail group; failure on embedded PVTs did not differ between GMIP and Fail. The MSVT GMIP incorporates the presence of clinical correlates of disability to assist with this distinction, but future research should consider performances on other freestanding measures of performance validity to differentiate cognitive impairment from invalidity.
Combining the test of memory malingering trial 1 with behavioral responses improves the detection of effort test failure.

PubMed

Denning, John Henry

2014-01-01

Validity measures derived from the Test of Memory Malingering Trial 1 (TOMM1) and errors across the first 10 items of TOMM1 (TOMMe10) may be further enhanced by combining these scores with "embedded" behavioral responses while patients complete these measures. In a sample of nondemented veterans (n = 151), five possible behavioral responses observed during completion of the first 10 items of the TOMM were combined with TOMM1 and TOMMe10 to assess any increased sensitivity in predicting Medical Symptom Validity Test (MSVT) performance. Both TOMM1 and TOMMe10 alone were highly accurate overall in predicting MSVT performance (TOMM1 [area under the curve (AUC)] = .95, TOMMe10 [AUC] = .92). The combination of TOMM measures and behavioral responses did not increase overall accuracy rates; however, when specificity was held at approximately 90%, there was a slight increase in sensitivity (+7%) for both TOMM measures when combined with the number of "point and name" responses. Examples are provided demonstrating that at a given TOMM score (TOMM1 or TOMMe10), with an increase in "point and name" responses, there is an incremental increase in the probability of failing the MSVT. Exploring the utility of combining freestanding or embedded validity measures with behavioral features during test administration should be encouraged.
An exploratory study into the effect of time-restricted internet access on face-validity, construct validity and reliability of postgraduate knowledge progress testing

PubMed Central

2013-01-01

Background Yearly formative knowledge testing (also known as progress testing) was shown to have a limited construct-validity and reliability in postgraduate medical education. One way to improve construct-validity and reliability is to improve the authenticity of a test. As easily accessible internet has become inseparably linked to daily clinical practice, we hypothesized that allowing internet access for a limited amount of time during the progress test would improve the perception of authenticity (face-validity) of the test, which would in turn improve the construct-validity and reliability of postgraduate progress testing. Methods Postgraduate trainees taking the yearly knowledge progress test were asked to participate in a study where they could access the internet for 30 minutes at the end of a traditional pen and paper test. Before and after the test they were asked to complete a short questionnaire regarding the face-validity of the test. Results Mean test scores increased significantly for all training years. Trainees indicated that the face-validity of the test improved with internet access and that they would like to continue to have internet access during future testing. Internet access did not improve the construct-validity or reliability of the test. Conclusion Improving the face-validity of postgraduate progress testing, by adding the possibility to search the internet for a limited amount of time, positively influences test performance and face-validity. However, it did not change the reliability or the construct-validity of the test. PMID:24195696
10 CFR 26.131 - Cutoff levels for validity screening and initial validity tests.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 10 Energy 1 2010-01-01 2010-01-01 false Cutoff levels for validity screening and initial validity tests. 26.131 Section 26.131 Energy NUCLEAR REGULATORY COMMISSION FITNESS FOR DUTY PROGRAMS Licensee Testing Facilities § 26.131 Cutoff levels for validity screening and initial validity tests. (a) Each...
10 CFR 26.131 - Cutoff levels for validity screening and initial validity tests.

Code of Federal Regulations, 2011 CFR

2011-01-01

... 10 Energy 1 2011-01-01 2011-01-01 false Cutoff levels for validity screening and initial validity tests. 26.131 Section 26.131 Energy NUCLEAR REGULATORY COMMISSION FITNESS FOR DUTY PROGRAMS Licensee Testing Facilities § 26.131 Cutoff levels for validity screening and initial validity tests. (a) Each...
What tests should you use to assess small intestinal bacterial overgrowth in systemic sclerosis?

PubMed

Braun-Moscovici, Yolanda; Braun, Marius; Khanna, Dinesh; Balbir-Gurman, Alexandra; Furst, Daniel E

2015-01-01

Small intestinal bacterial overgrowth (SIBO) plays a major role in the pathogenesis of malabsorption in SSc patients and is a source of great morbidity and even mortality, in those patients. This manuscript reviews which tests are valid and should be used in SSc when evaluating SIBO. We performed systematic literature searches in PubMed, Embase and the Cochrane library from 1966 up to November 2014 for English language, published articles examining bacterial overgrowth in SSc (e.g. malabsorption tests, breath tests, xylose test, etc). Articles obtained from these searches were reviewed for additional references. The validity of the tests was evaluated according to the OMERACT principles of truth, discrimination and feasibility. From a total of 65 titles, 22 articles were reviewed and 20 were ultimately extracted to examine the validity of tests for GI morphology, bacterial overgrowth and malabsorption in SSc. Only 1 test (hydrogen and methane breath tests) is fully validated. Four tests are partially validated, including jejunal cultures, xylose, lactulose tests, and 72 hours fecal fat test. Only 1 of a total of 5 GI tests of bacterial overgrowth (see above) is fully validated in SSc. For clinical trials, fully validated tests are preferred, although some investigators use partially validated tests (4 tests). Further validation of GI tests in SSc is needed.
Construct Validity of the Nepalese School Leaving English Reading Test

ERIC Educational Resources Information Center

Dawadi, Saraswati; Shrestha, Prithvi N.

2018-01-01

There has been a steady interest in investigating the validity of language tests in the last decades. Despite numerous studies on construct validity in language testing, there are not many studies examining the construct validity of a reading test. This paper reports on a study that explored the construct validity of the English reading test in…
14 CFR 91.1041 - Aircraft proving and validation tests.

Code of Federal Regulations, 2014 CFR

2014-01-01

... 14 Aeronautics and Space 2 2014-01-01 2014-01-01 false Aircraft proving and validation tests. 91... Ownership Operations Program Management § 91.1041 Aircraft proving and validation tests. (a) No program... tests. However, pilot flight training may be conducted during the proving tests. (d) Validation testing...
14 CFR 91.1041 - Aircraft proving and validation tests.

Code of Federal Regulations, 2012 CFR

2012-01-01

... 14 Aeronautics and Space 2 2012-01-01 2012-01-01 false Aircraft proving and validation tests. 91... Ownership Operations Program Management § 91.1041 Aircraft proving and validation tests. (a) No program... tests. However, pilot flight training may be conducted during the proving tests. (d) Validation testing...
14 CFR 91.1041 - Aircraft proving and validation tests.

Code of Federal Regulations, 2013 CFR

2013-01-01

... 14 Aeronautics and Space 2 2013-01-01 2013-01-01 false Aircraft proving and validation tests. 91... Ownership Operations Program Management § 91.1041 Aircraft proving and validation tests. (a) No program... tests. However, pilot flight training may be conducted during the proving tests. (d) Validation testing...
14 CFR 91.1041 - Aircraft proving and validation tests.

Code of Federal Regulations, 2011 CFR

2011-01-01

... 14 Aeronautics and Space 2 2011-01-01 2011-01-01 false Aircraft proving and validation tests. 91... Ownership Operations Program Management § 91.1041 Aircraft proving and validation tests. (a) No program... tests. However, pilot flight training may be conducted during the proving tests. (d) Validation testing...
14 CFR 91.1041 - Aircraft proving and validation tests.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 14 Aeronautics and Space 2 2010-01-01 2010-01-01 false Aircraft proving and validation tests. 91... Ownership Operations Program Management § 91.1041 Aircraft proving and validation tests. (a) No program... tests. However, pilot flight training may be conducted during the proving tests. (d) Validation testing...
49 CFR 40.89 - What is validity testing, and are laboratories required to conduct it?

Code of Federal Regulations, 2013 CFR

2013-10-01

... PROCEDURES FOR TRANSPORTATION WORKPLACE DRUG AND ALCOHOL TESTING PROGRAMS Drug Testing Laboratories § 40.89 What is validity testing, and are laboratories required to conduct it? (a) Specimen validity testing is... 49 Transportation 1 2013-10-01 2013-10-01 false What is validity testing, and are laboratories...
49 CFR 40.89 - What is validity testing, and are laboratories required to conduct it?

Code of Federal Regulations, 2011 CFR

2011-10-01

... PROCEDURES FOR TRANSPORTATION WORKPLACE DRUG AND ALCOHOL TESTING PROGRAMS Drug Testing Laboratories § 40.89 What is validity testing, and are laboratories required to conduct it? (a) Specimen validity testing is... 49 Transportation 1 2011-10-01 2011-10-01 false What is validity testing, and are laboratories...
49 CFR 40.89 - What is validity testing, and are laboratories required to conduct it?

Code of Federal Regulations, 2010 CFR

2010-10-01

... PROCEDURES FOR TRANSPORTATION WORKPLACE DRUG AND ALCOHOL TESTING PROGRAMS Drug Testing Laboratories § 40.89 What is validity testing, and are laboratories required to conduct it? (a) Specimen validity testing is... 49 Transportation 1 2010-10-01 2010-10-01 false What is validity testing, and are laboratories...
49 CFR 40.89 - What is validity testing, and are laboratories required to conduct it?

Code of Federal Regulations, 2012 CFR

2012-10-01

... PROCEDURES FOR TRANSPORTATION WORKPLACE DRUG AND ALCOHOL TESTING PROGRAMS Drug Testing Laboratories § 40.89 What is validity testing, and are laboratories required to conduct it? (a) Specimen validity testing is... 49 Transportation 1 2012-10-01 2012-10-01 false What is validity testing, and are laboratories...
49 CFR 40.89 - What is validity testing, and are laboratories required to conduct it?

Code of Federal Regulations, 2014 CFR

2014-10-01

... PROCEDURES FOR TRANSPORTATION WORKPLACE DRUG AND ALCOHOL TESTING PROGRAMS Drug Testing Laboratories § 40.89 What is validity testing, and are laboratories required to conduct it? (a) Specimen validity testing is... 49 Transportation 1 2014-10-01 2014-10-01 false What is validity testing, and are laboratories...
Validation of sterilizing grade filtration.

PubMed

Jornitz, M W; Meltzer, T H

2003-01-01

Validation consideration of sterilizing grade filters, namely 0.2 micron, changed when FDA voiced concerns about the validity of Bacterial Challenge tests performed in the past. Such validation exercises are nowadays considered to be filter qualification. Filter validation requires more thorough analysis, especially Bacterial Challenge testing with the actual drug product under process conditions. To do so, viability testing is a necessity to determine the Bacterial Challenge test methodology. Additionally to these two compulsory tests, other evaluations like extractable, adsorption and chemical compatibility tests should be considered. PDA Technical Report # 26, Sterilizing Filtration of Liquids, describes all parameters and aspects required for the comprehensive validation of filters. The report is a most helpful tool for validation of liquid filters used in the biopharmaceutical industry. It sets the cornerstones of validation requirements and other filtration considerations.
Validation of alternative methods for toxicity testing.

PubMed Central

Bruner, L H; Carr, G J; Curren, R D; Chamberlain, M

1998-01-01

Before nonanimal toxicity tests may be officially accepted by regulatory agencies, it is generally agreed that the validity of the new methods must be demonstrated in an independent, scientifically sound validation program. Validation has been defined as the demonstration of the reliability and relevance of a test method for a particular purpose. This paper provides a brief review of the development of the theoretical aspects of the validation process and updates current thinking about objectively testing the performance of an alternative method in a validation study. Validation of alternative methods for eye irritation testing is a specific example illustrating important concepts. Although discussion focuses on the validation of alternative methods intended to replace current in vivo toxicity tests, the procedures can be used to assess the performance of alternative methods intended for other uses. Images Figure 1 PMID:9599695
Official Position of the American Academy of Clinical Neuropsychology Social Security Administration Policy on Validity Testing: Guidance and Recommendations for Change.

PubMed

Chafetz, M D; Williams, M A; Ben-Porath, Y S; Bianchini, K J; Boone, K B; Kirkwood, M W; Larrabee, G J; Ord, J S

2015-01-01

The milestone publication by Slick, Sherman, and Iverson (1999) of criteria for determining malingered neurocognitive dysfunction led to extensive research on validity testing. Position statements by the National Academy of Neuropsychology and the American Academy of Clinical Neuropsychology (AACN) recommended routine validity testing in neuropsychological evaluations. Despite this widespread scientific and professional support, the Social Security Administration (SSA) continued to discourage validity testing, a stance that led to a congressional initiative for SSA to reevaluate their position. In response, SSA commissioned the Institute of Medicine (IOM) to evaluate the science concerning the validation of psychological testing. The IOM concluded that validity assessment was necessary in psychological and neuropsychological examinations (IOM, 2015 ). The AACN sought to provide independent expert guidance and recommendations concerning the use of validity testing in disability determinations. A panel of contributors to the science of validity testing and its application to the disability process was charged with describing why the disability process for SSA needs improvement, and indicating the necessity for validity testing in disability exams. This work showed how the determination of malingering is a probability proposition, described how different types of validity tests are appropriate, provided evidence concerning non-credible findings in children and low-functioning individuals, and discussed the appropriate evaluation of pain disorders typically seen outside of mental consultations. A scientific plan for validity assessment that additionally protects test security is needed in disability determinations and in research on classification accuracy of disability decisions.

A Note on Economic Content and Test Validity.

ERIC Educational Resources Information Center

Soper, John C.; Brenneke, Judith Staley

1987-01-01

Offers practical tips on how teachers can determine whether classroom tests are actually measuring what they are designed to measure. Discusses criterion-related validity, construct validity, and content validity. Demonstrates how to determine the degree of content validity a particular test may have for a particular course or unit. (Author/DH)
On the Validity of Useless Tests

ERIC Educational Resources Information Center

Sireci, Stephen G.

2016-01-01

A misconception exists that validity may refer only to the "interpretation" of test scores and not to the "uses" of those scores. The development and evolution of validity theory illustrate test score interpretation was a primary focus in the earliest days of modern testing, and that validating interpretations derived from test…
On Validity Theory and Test Validation

ERIC Educational Resources Information Center

Sireci, Stephen G.

2007-01-01

Lissitz and Samuelsen (2007) propose a new framework for conceptualizing test validity that separates analysis of test properties from analysis of the construct measured. In response, the author of this article reviews fundamental characteristics of test validity, drawing largely from seminal writings as well as from the accepted standards. He…
14 CFR 135.145 - Aircraft proving and validation tests.

Code of Federal Regulations, 2011 CFR

2011-01-01

... 14 Aeronautics and Space 3 2011-01-01 2011-01-01 false Aircraft proving and validation tests. 135... Aircraft and Equipment § 135.145 Aircraft proving and validation tests. (a) No certificate holder may...) Validation testing is required to determine that a certificate holder is capable of conducting operations...
14 CFR 135.145 - Aircraft proving and validation tests.

Code of Federal Regulations, 2013 CFR

2013-01-01

... 14 Aeronautics and Space 3 2013-01-01 2013-01-01 false Aircraft proving and validation tests. 135... Aircraft and Equipment § 135.145 Aircraft proving and validation tests. (a) No certificate holder may...) Validation testing is required to determine that a certificate holder is capable of conducting operations...
14 CFR 135.145 - Aircraft proving and validation tests.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 14 Aeronautics and Space 3 2010-01-01 2010-01-01 false Aircraft proving and validation tests. 135... Aircraft and Equipment § 135.145 Aircraft proving and validation tests. (a) No certificate holder may...) Validation testing is required to determine that a certificate holder is capable of conducting operations...
How Is Testing Supposed to Improve Schooling?

ERIC Educational Resources Information Center

Haertel, Edward

2013-01-01

Validation research for educational achievement tests is often limited to an examination of intended test score interpretations. This article calls for an expansion of validation research in three dimensions. First, validation must attend to actual test use and its consequences, not just score meaning. Second, validation must attend to unintended…
14 CFR 135.145 - Aircraft proving and validation tests.

Code of Federal Regulations, 2014 CFR

2014-01-01

... 14 Aeronautics and Space 3 2014-01-01 2014-01-01 false Aircraft proving and validation tests. 135... Aircraft and Equipment § 135.145 Aircraft proving and validation tests. (a) No certificate holder may...) Validation testing is required to determine that a certificate holder is capable of conducting operations...
14 CFR 135.145 - Aircraft proving and validation tests.

Code of Federal Regulations, 2012 CFR

2012-01-01

... 14 Aeronautics and Space 3 2012-01-01 2012-01-01 false Aircraft proving and validation tests. 135... Aircraft and Equipment § 135.145 Aircraft proving and validation tests. (a) No certificate holder may...) Validation testing is required to determine that a certificate holder is capable of conducting operations...
Student mathematical imagination instruments: construction, cultural adaptation and validity

NASA Astrophysics Data System (ADS)

Dwijayanti, I.; Budayasa, I. K.; Siswono, T. Y. E.

2018-03-01

Imagination has an important role as the center of sensorimotor activity of the students. The purpose of this research is to construct the instrument of students’ mathematical imagination in understanding concept of algebraic expression. The researcher performs validity using questionnaire and test technique and data analysis using descriptive method. Stages performed include: 1) the construction of the embodiment of the imagination; 2) determine the learning style questionnaire; 3) construct instruments; 4) translate to Indonesian as well as adaptation of learning style questionnaire content to student culture; 5) perform content validation. The results stated that the constructed instrument is valid by content validation and empirical validation so that it can be used with revisions. Content validation involves Indonesian linguists, english linguists and mathematics material experts. Empirical validation is done through a legibility test (10 students) and shows that in general the language used can be understood. In addition, a questionnaire test (86 students) was analyzed using a biserial point correlation technique resulting in 16 valid items with a reliability test using KR 20 with medium reability criteria. While the test instrument test (32 students) to find all items are valid and reliability test using KR 21 with reability is 0,62.
Concurrent validity and clinical usefulness of several individually administered tests of children's social-emotional cognition.

PubMed

McKown, Clark

2007-03-01

In this study, the validity of 5 tests of children's social-emotional cognition, defined as their encoding, memory, and interpretation of social information, was tested. Participants were 126 clinic-referred children between the ages of 5 and 17. All 5 tests were evaluated in terms of their (a) concurrent validity, (b) incremental validity, and (c) clinical usefulness in predicting social functioning. Tests included measures of nonverbal sensitivity, social language, and social problem solving. Criterion measures included parent and teacher report of social functioning. Analyses support the concurrent validity of all measures, and the incremental validity and clinical usefulness of tests of pragmatic language and problem solving.
Educational testing validity and reliability in pharmacy and medical education literature.

PubMed

Hoover, Matthew J; Jung, Rose; Jacobs, David M; Peeters, Michael J

2013-12-16

To evaluate and compare the reliability and validity of educational testing reported in pharmacy education journals to medical education literature. Descriptions of validity evidence sources (content, construct, criterion, and reliability) were extracted from articles that reported educational testing of learners' knowledge, skills, and/or abilities. Using educational testing, the findings of 108 pharmacy education articles were compared to the findings of 198 medical education articles. For pharmacy educational testing, 14 articles (13%) reported more than 1 validity evidence source while 83 articles (77%) reported 1 validity evidence source and 11 articles (10%) did not have evidence. Among validity evidence sources, content validity was reported most frequently. Compared with pharmacy education literature, more medical education articles reported both validity and reliability (59%; p<0.001). While there were more scholarship of teaching and learning (SoTL) articles in pharmacy education compared to medical education, validity, and reliability reporting were limited in the pharmacy education literature.
Validation through Understanding Test-Taking Strategies: An Illustration With the CELPIP-General Reading Pilot Test Using Structural Equation Modeling

ERIC Educational Resources Information Center

Wu, Amery D.; Stone, Jake E.

2016-01-01

This article explores an approach for test score validation that examines test takers' strategies for taking a reading comprehension test. The authors formulated three working hypotheses about score validity pertaining to three types of test-taking strategy (comprehending meaning, test management, and test-wiseness). These hypotheses were…
Evidence of Construct Validity in Published Achievement Tests.

ERIC Educational Resources Information Center

Nolet, Victor; Tindal, Gerald

Valid interpretation of test scores is the shared responsibility of the test designer and the test user. Test publishers must provide evidence of the validity of the decisions their tests are intended to support, while test users are responsible for analyzing this evidence and subsequently using the test in the manner indicated by the publisher.…
Validation of the Lollipop Test: A Diagnostic Screening Test of School Readiness.

ERIC Educational Resources Information Center

Chew, Alex L.; Morris, John D.

1984-01-01

The validity of the Lollipop Test: A Diagnostic Screening Test of School Readiness was examined using the Metropolitan Readiness Test (MRT), Level I, Form Q, as the criterion. Appreciable concurrent validity was found across test batteries. Implications for school readiness screening are discussed. (Author/BS)
The Concurrent Validity of Four Tests of Metalinguistic Awareness.

ERIC Educational Resources Information Center

Day, Kaaren C.; Day, H. D.

1991-01-01

Examines the concurrent validity of four metalinguistic awareness tests (Written Language Awareness Test, Test of Early Reading Ability, Linguistic Awareness in Reading Readiness Test, and the Concepts about Print Test). Finds rather low concurrent validity coefficients which suggests that further work is needed to clarify the operations required…
Reliability and validity of the revised Gibson Test of Cognitive Skills, a computer-based test battery for assessing cognition across the lifespan.

PubMed

Moore, Amy Lawson; Miller, Terissa M

2018-01-01

The purpose of the current study is to evaluate the validity and reliability of the revised Gibson Test of Cognitive Skills, a computer-based battery of tests measuring short-term memory, long-term memory, processing speed, logic and reasoning, visual processing, as well as auditory processing and word attack skills. This study included 2,737 participants aged 5-85 years. A series of studies was conducted to examine the validity and reliability using the test performance of the entire norming group and several subgroups. The evaluation of the technical properties of the test battery included content validation by subject matter experts, item analysis and coefficient alpha, test-retest reliability, split-half reliability, and analysis of concurrent validity with the Woodcock Johnson III Tests of Cognitive Abilities and Tests of Achievement. Results indicated strong sources of evidence of validity and reliability for the test, including internal consistency reliability coefficients ranging from 0.87 to 0.98, test-retest reliability coefficients ranging from 0.69 to 0.91, split-half reliability coefficients ranging from 0.87 to 0.91, and concurrent validity coefficients ranging from 0.53 to 0.93. The Gibson Test of Cognitive Skills-2 is a reliable and valid tool for assessing cognition in the general population across the lifespan.
Performance and Symptom Validity Testing as a Function of Medical Board Evaluation in U.S. Military Service Members with a History of Mild Traumatic Brain Injury.

PubMed

Armistead-Jehle, Patrick; Cole, Wesley R; Stegman, Robert L

2018-02-01

The study was designed to replicate and extend pervious findings demonstrating the high rates of invalid neuropsychological testing in military service members (SMs) with a history of mild traumatic brain injury (mTBI) assessed in the context of a medical evaluation board (MEB). Two hundred thirty-one active duty SMs (61 of which were undergoing an MEB) underwent neuropsychological assessment. Performance validity (Word Memory Test) and symptom validity (MMPI-2-RF) test data were compared across those evaluated within disability (MEB) and clinical contexts. As with previous studies, there were significantly more individuals in an MEB context that failed performance (MEB = 57%, non-MEB = 31%) and symptom validity testing (MEB = 57%, non-MEB = 22%) and performance validity testing had a notable affect on cognitive test scores. Performance and symptom validity test failure rates did not vary as a function of the reason for disability evaluation when divided into behavioral versus physical health conditions. These data are consistent with past studies, and extends those studies by including symptom validity testing and investigating the effect of reason for MEB. This and previous studies demonstrate that more than 50% of SMs seen in the context of an MEB will fail performance validity tests and over-report on symptom validity measures. These results emphasize the importance of using both performance and symptom validity testing when evaluating SMs with a history of mTBI, especially if they are being seen for disability evaluations, in order to ensure the accuracy of cognitive and psychological test data. Published by Oxford University Press 2017. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Validation of NASA Thermal Ice Protection Computer Codes. Part 1; Program Overview

NASA Technical Reports Server (NTRS)

Miller, Dean; Bond, Thomas; Sheldon, David; Wright, William; Langhals, Tammy; Al-Khalil, Kamel; Broughton, Howard

1996-01-01

The Icing Technology Branch at NASA Lewis has been involved in an effort to validate two thermal ice protection codes developed at the NASA Lewis Research Center. LEWICE/Thermal (electrothermal deicing & anti-icing), and ANTICE (hot-gas & electrothermal anti-icing). The Thermal Code Validation effort was designated as a priority during a 1994 'peer review' of the NASA Lewis Icing program, and was implemented as a cooperative effort with industry. During April 1996, the first of a series of experimental validation tests was conducted in the NASA Lewis Icing Research Tunnel(IRT). The purpose of the April 96 test was to validate the electrothermal predictive capabilities of both LEWICE/Thermal, and ANTICE. A heavily instrumented test article was designed and fabricated for this test, with the capability of simulating electrothermal de-icing and anti-icing modes of operation. Thermal measurements were then obtained over a range of test conditions, for comparison with analytical predictions. This paper will present an overview of the test, including a detailed description of: (1) the validation process; (2) test article design; (3) test matrix development; and (4) test procedures. Selected experimental results will be presented for de-icing and anti-icing modes of operation. Finally, the status of the validation effort at this point will be summarized. Detailed comparisons between analytical predictions and experimental results are contained in the following two papers: 'Validation of NASA Thermal Ice Protection Computer Codes: Part 2- The Validation of LEWICE/Thermal' and 'Validation of NASA Thermal Ice Protection Computer Codes: Part 3-The Validation of ANTICE'
10 CFR 26.139 - Reporting initial validity and drug test results.

Code of Federal Regulations, 2014 CFR

2014-01-01

... 10 Energy 1 2014-01-01 2014-01-01 false Reporting initial validity and drug test results. 26.139... § 26.139 Reporting initial validity and drug test results. (a) The licensee testing facility shall... permitted under § 26.75(h), positive test results from initial drug tests at the licensee testing facility...

10 CFR 26.139 - Reporting initial validity and drug test results.

Code of Federal Regulations, 2012 CFR

2012-01-01

... 10 Energy 1 2012-01-01 2012-01-01 false Reporting initial validity and drug test results. 26.139... § 26.139 Reporting initial validity and drug test results. (a) The licensee testing facility shall... permitted under § 26.75(h), positive test results from initial drug tests at the licensee testing facility...
Performance Evaluation of a Data Validation System

NASA Technical Reports Server (NTRS)

Wong, Edmond (Technical Monitor); Sowers, T. Shane; Santi, L. Michael; Bickford, Randall L.

2005-01-01

Online data validation is a performance-enhancing component of modern control and health management systems. It is essential that performance of the data validation system be verified prior to its use in a control and health management system. A new Data Qualification and Validation (DQV) Test-bed application was developed to provide a systematic test environment for this performance verification. The DQV Test-bed was used to evaluate a model-based data validation package known as the Data Quality Validation Studio (DQVS). DQVS was employed as the primary data validation component of a rocket engine health management (EHM) system developed under NASA's NGLT (Next Generation Launch Technology) program. In this paper, the DQVS and DQV Test-bed software applications are described, and the DQV Test-bed verification procedure for this EHM system application is presented. Test-bed results are summarized and implications for EHM system performance improvements are discussed.
Application of validity theory and methodology to patient-reported outcome measures (PROMs): building an argument for validity.

PubMed

Hawkins, Melanie; Elsworth, Gerald R; Osborne, Richard H

2018-07-01

Data from subjective patient-reported outcome measures (PROMs) are now being used in the health sector to make or support decisions about individuals, groups and populations. Contemporary validity theorists define validity not as a statistical property of the test but as the extent to which empirical evidence supports the interpretation of test scores for an intended use. However, validity testing theory and methodology are rarely evident in the PROM validation literature. Application of this theory and methodology would provide structure for comprehensive validation planning to support improved PROM development and sound arguments for the validity of PROM score interpretation and use in each new context. This paper proposes the application of contemporary validity theory and methodology to PROM validity testing. The validity testing principles will be applied to a hypothetical case study with a focus on the interpretation and use of scores from a translated PROM that measures health literacy (the Health Literacy Questionnaire or HLQ). Although robust psychometric properties of a PROM are a pre-condition to its use, a PROM's validity lies in the sound argument that a network of empirical evidence supports the intended interpretation and use of PROM scores for decision making in a particular context. The health sector is yet to apply contemporary theory and methodology to PROM development and validation. The theoretical and methodological processes in this paper are offered as an advancement of the theory and practice of PROM validity testing in the health sector.
Students' Initial Knowledge State and Test Design: Towards a Valid and Reliable Test Instrument

ERIC Educational Resources Information Center

CoPo, Antonio Roland I.

2015-01-01

Designing a good test instrument involves specifications, test construction, validation, try-out, analysis and revision. The initial knowledge state of forty (40) tertiary students enrolled in Business Statistics course was determined and the same test instrument undergoes validation. The designed test instrument did not only reveal the baseline…
Evaluating the Content Validity of Multistage-Adaptive Tests

ERIC Educational Resources Information Center

Crotts, Katrina; Sireci, Stephen G.; Zenisky, April

2012-01-01

Validity evidence based on test content is important for educational tests to demonstrate the degree to which they fulfill their purposes. Most content validity studies involve subject matter experts (SMEs) who rate items that comprise a test form. In computerized-adaptive testing, examinees take different sets of items and test "forms"…
Validation of a Videoconferenced Speaking Test

ERIC Educational Resources Information Center

Kim, Jungtae; Craig, Daniel A.

2012-01-01

Videoconferencing offers new opportunities for language testers to assess speaking ability in low-stakes diagnostic tests. To be considered a trusted testing tool in language testing, a test should be examined employing appropriate validation processes [Chapelle, C.A., Jamieson, J., & Hegelheimer, V. (2003). "Validation of a web-based ESL…
Construct Validity: Advances in Theory and Methodology

PubMed Central

Strauss, Milton E.; Smith, Gregory T.

2008-01-01

Measures of psychological constructs are validated by testing whether they relate to measures of other constructs as specified by theory. Each test of relations between measures reflects on the validity of both the measures and the theory driving the test. Construct validation concerns the simultaneous process of measure and theory validation. In this chapter, we review the recent history of validation efforts in clinical psychological science that has led to this perspective, and we review five recent advances in validation theory and methodology of importance for clinical researchers. These are: the emergence of nonjustificationist philosophy of science; an increasing appreciation for theory and the need for informative tests of construct validity; valid construct representation in experimental psychopathology; the need to avoid representing multidimensional constructs with a single score; and the emergence of effective new statistical tools for the evaluation of convergent and discriminant validity. PMID:19086835
Performance validity testing in neuropsychology: a clinical guide, critical review, and update on a rapidly evolving literature.

PubMed

Lippa, Sara M

2018-04-01

Over the past two decades, there has been much research on measures of response bias and myriad measures have been validated in a variety of clinical and research samples. This critical review aims to guide clinicians through the use of performance validity tests (PVTs) from test selection and administration through test interpretation and feedback. Recommended cutoffs and relevant test operating characteristics are presented. Other important issues to consider during test selection, administration, interpretation, and feedback are discussed including order effects, coaching, impact on test data, and methods to combine measures and improve predictive power. When interpreting performance validity measures, neuropsychologists must use particular caution in cases of dementia, low intelligence, English as a second language/minority cultures, or low education. PVTs provide valuable information regarding response bias and, under the right circumstances, can provide excellent evidence of response bias. Only after consideration of the entire clinical picture, including validity test performance, can concrete determinations regarding the validity of test data be made.
15 CFR 995.27 - Format validation software testing.

Code of Federal Regulations, 2013 CFR

2013-01-01

... 15 Commerce and Foreign Trade 3 2013-01-01 2013-01-01 false Format validation software testing... of NOAA ENC Products § 995.27 Format validation software testing. Tests shall be performed verifying, as far as reasonable and practicable, that CEVAD's data testing software performs the checks, as...
15 CFR 995.27 - Format validation software testing.

Code of Federal Regulations, 2014 CFR

2014-01-01

... 15 Commerce and Foreign Trade 3 2014-01-01 2014-01-01 false Format validation software testing... of NOAA ENC Products § 995.27 Format validation software testing. Tests shall be performed verifying, as far as reasonable and practicable, that CEVAD's data testing software performs the checks, as...
15 CFR 995.27 - Format validation software testing.

Code of Federal Regulations, 2012 CFR

2012-01-01

... 15 Commerce and Foreign Trade 3 2012-01-01 2012-01-01 false Format validation software testing... of NOAA ENC Products § 995.27 Format validation software testing. Tests shall be performed verifying, as far as reasonable and practicable, that CEVAD's data testing software performs the checks, as...
15 CFR 995.27 - Format validation software testing.

Code of Federal Regulations, 2011 CFR

2011-01-01

... 15 Commerce and Foreign Trade 3 2011-01-01 2011-01-01 false Format validation software testing... of NOAA ENC Products § 995.27 Format validation software testing. Tests shall be performed verifying, as far as reasonable and practicable, that CEVAD's data testing software performs the checks, as...
Embedded performance validity testing in neuropsychological assessment: Potential clinical tools.

PubMed

Rickards, Tyler A; Cranston, Christopher C; Touradji, Pegah; Bechtold, Kathleen T

2018-01-01

The article aims to suggest clinically-useful tools in neuropsychological assessment for efficient use of embedded measures of performance validity. To accomplish this, we integrated available validity-related and statistical research from the literature, consensus statements, and survey-based data from practicing neuropsychologists. We provide recommendations for use of 1) Cutoffs for embedded performance validity tests including Reliable Digit Span, California Verbal Learning Test (Second Edition) Forced Choice Recognition, Rey-Osterrieth Complex Figure Test Combination Score, Wisconsin Card Sorting Test Failure to Maintain Set, and the Finger Tapping Test; 2) Selecting number of performance validity measures to administer in an assessment; and 3) Hypothetical clinical decision-making models for use of performance validity testing in a neuropsychological assessment collectively considering behavior, patient reporting, and data indicating invalid or noncredible performance. Performance validity testing helps inform the clinician about an individual's general approach to tasks: response to failure, task engagement and persistence, compliance with task demands. Data-driven clinical suggestions provide a resource to clinicians and to instigate conversation within the field to make more uniform, testable decisions to further the discussion, and guide future research in this area.
40 CFR 86.1341-90 - Test cycle validation criteria.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 19 2011-07-01 2011-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...
40 CFR 86.1341-90 - Test cycle validation criteria.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 20 2013-07-01 2013-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...
40 CFR 86.1341-90 - Test cycle validation criteria.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 20 2012-07-01 2012-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...
Validity evidence based on test content.

PubMed

Sireci, Stephen; Faulkner-Bond, Molly

2014-01-01

Validity evidence based on test content is one of the five forms of validity evidence stipulated in the Standards for Educational and Psychological Testing developed by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. In this paper, we describe the logic and theory underlying such evidence and describe traditional and modern methods for gathering and analyzing content validity data. A comprehensive review of the literature and of the aforementioned Standards is presented. For educational tests and other assessments targeting knowledge and skill possessed by examinees, validity evidence based on test content is necessary for building a validity argument to support the use of a test for a particular purpose. By following the methods described in this article, practitioners have a wide arsenal of tools available for determining how well the content of an assessment is congruent with and appropriate for the specific testing purposes.
Development and Validation of Targeted Next-Generation Sequencing Panels for Detection of Germline Variants in Inherited Diseases.

PubMed

Santani, Avni; Murrell, Jill; Funke, Birgit; Yu, Zhenming; Hegde, Madhuri; Mao, Rong; Ferreira-Gonzalez, Andrea; Voelkerding, Karl V; Weck, Karen E

2017-06-01

- The number of targeted next-generation sequencing (NGS) panels for genetic diseases offered by clinical laboratories is rapidly increasing. Before an NGS-based test is implemented in a clinical laboratory, appropriate validation studies are needed to determine the performance characteristics of the test. - To provide examples of assay design and validation of targeted NGS gene panels for the detection of germline variants associated with inherited disorders. - The approaches used by 2 clinical laboratories for the development and validation of targeted NGS gene panels are described. Important design and validation considerations are examined. - Clinical laboratories must validate performance specifications of each test prior to implementation. Test design specifications and validation data are provided, outlining important steps in validation of targeted NGS panels by clinical diagnostic laboratories.
Performance Validity Testing in Neuropsychology: Scientific Basis and Clinical Application-A Brief Review.

PubMed

Greher, Michael R; Wodushek, Thomas R

2017-03-01

Performance validity testing refers to neuropsychologists' methodology for determining whether neuropsychological test performances completed in the course of an evaluation are valid (ie, the results of true neurocognitive function) or invalid (ie, overly impacted by the patient's effort/engagement in testing). This determination relies upon the use of either standalone tests designed for this sole purpose, or specific scores/indicators embedded within traditional neuropsychological measures that have demonstrated this utility. In response to a greater appreciation for the critical role that performance validity issues play in neuropsychological testing and the need to measure this variable to the best of our ability, the scientific base for performance validity testing has expanded greatly over the last 20 to 30 years. As such, the majority of current day neuropsychologists in the United States use a variety of measures for the purpose of performance validity testing as part of everyday forensic and clinical practice and address this issue directly in their evaluations. The following is the first article of a 2-part series that will address the evolution of performance validity testing in the field of neuropsychology, both in terms of the science as well as the clinical application of this measurement technique. The second article of this series will review performance validity tests in terms of methods for development of these measures, and maximizing of diagnostic accuracy.
Evaluation of tools used to measure calcium and/or dairy consumption in adults.

PubMed

Magarey, Anthea; Baulderstone, Lauren; Yaxley, Alison; Markow, Kylie; Miller, Michelle

2015-05-01

To identify and critique tools for the assessment of Ca and/or dairy intake in adults, in order to ascertain the most accurate and reliable tools available. A systematic review of the literature was conducted using defined inclusion and exclusion criteria. Articles reporting on originally developed tools or testing the reliability or validity of existing tools that measure Ca and/or dairy intake in adults were included. Author-defined criteria for reporting reliability and validity properties were applied. Studies conducted in Western countries. Adults. Thirty papers, utilising thirty-six tools assessing intake of dairy, Ca or both, were identified. Reliability testing was conducted on only two dairy and five Ca tools, with results indicating that only one dairy and two Ca tools were reliable. Validity testing was conducted for all but four Ca-only tools. There was high reliance in validity testing on lower-order tests such as correlation and failure to differentiate between statistical and clinically meaningful differences. Results of the validity testing suggest one dairy and five Ca tools are valid. Thus one tool was considered both reliable and valid for the assessment of dairy intake and only two tools proved reliable and valid for the assessment of Ca intake. While several tools are reliable and valid, their application across adult populations is limited by the populations in which they were tested. These results indicate a need for tools that assess Ca and/or dairy intake in adults to be rigorously tested for reliability and validity.

The influence of validity criteria on Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) test-retest reliability among high school athletes.

PubMed

Brett, Benjamin L; Solomon, Gary S

2017-04-01

Research findings to date on the stability of Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) Composite scores have been inconsistent, requiring further investigation. The use of test validity criteria across these studies also has been inconsistent. Using multiple measures of stability, we examined test-retest reliability of repeated ImPACT baseline assessments in high school athletes across various validity criteria reported in previous studies. A total of 1146 high school athletes completed baseline cognitive testing using the online ImPACT test battery at two time periods of approximately two-year intervals. No participant sustained a concussion between assessments. Five forms of validity criteria used in previous test-retest studies were applied to the data, and differences in reliability were compared. Intraclass correlation coefficients (ICCs) ranged in composite scores from .47 (95% confidence interval, CI [.38, .54]) to .83 (95% CI [.81, .85]) and showed little change across a two-year interval for all five sets of validity criteria. Regression based methods (RBMs) examining the test-retest stability demonstrated a lack of significant change in composite scores across the two-year interval for all forms of validity criteria, with no cases falling outside the expected range of 90% confidence intervals. The application of more stringent validity criteria does not alter test-retest reliability, nor does it account for some of the variation observed across previously performed studies. As such, use of the ImPACT manual validity criteria should be utilized in the determination of test validity and in the individualized approach to concussion management. Potential future efforts to improve test-retest reliability are discussed.
Establishing the Test-Retest Reliability & Concurrent Validity for the Repeat Ice Skating Test (RIST) in Adolescent Male Ice Hockey Players

ERIC Educational Resources Information Center

Power, Allan; Faught, Brent E.; Przysucha, Eryk; McPherson, Moira; Montelpare, William

2012-01-01

In this study the authors examine the test-retest reliability and concurrent validity of the Repeat Ice Skating Test (RIST). This was an on-ice field anaerobic test that measured average peak power and was validated with 3 anaerobic lab tests: (a) vertical jump, (b) the Margaria-Kalamen stair test, and (c) the Wingate Anaerobic Test. The…
78 FR 20695 - Walk-Through Metal Detectors and Hand-Held Metal Detectors Test Method Validation

Federal Register 2010, 2011, 2012, 2013, 2014

2013-04-05

... Detectors and Hand-Held Metal Detectors Test Method Validation AGENCY: National Institute of Justice, DOJ... ensure that the test methods in the standards are properly documented, NIJ is requesting proposals (including price quotes) for test method validation efforts from testing laboratories. NIJ is also seeking...
10 CFR 26.139 - Reporting initial validity and drug test results.

Code of Federal Regulations, 2013 CFR

2013-01-01

... 10 Energy 1 2013-01-01 2013-01-01 false Reporting initial validity and drug test results. 26.139 Section 26.139 Energy NUCLEAR REGULATORY COMMISSION FITNESS FOR DUTY PROGRAMS Licensee Testing Facilities § 26.139 Reporting initial validity and drug test results. (a) The licensee testing facility shall...
10 CFR 26.139 - Reporting initial validity and drug test results.

Code of Federal Regulations, 2011 CFR

2011-01-01

... 10 Energy 1 2011-01-01 2011-01-01 false Reporting initial validity and drug test results. 26.139 Section 26.139 Energy NUCLEAR REGULATORY COMMISSION FITNESS FOR DUTY PROGRAMS Licensee Testing Facilities § 26.139 Reporting initial validity and drug test results. (a) The licensee testing facility shall...
10 CFR 26.139 - Reporting initial validity and drug test results.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 10 Energy 1 2010-01-01 2010-01-01 false Reporting initial validity and drug test results. 26.139 Section 26.139 Energy NUCLEAR REGULATORY COMMISSION FITNESS FOR DUTY PROGRAMS Licensee Testing Facilities § 26.139 Reporting initial validity and drug test results. (a) The licensee testing facility shall...
Validation in Support of Internationally Harmonised OECD Test Guidelines for Assessing the Safety of Chemicals.

PubMed

Gourmelon, Anne; Delrue, Nathalie

Ten years elapsed since the OECD published the Guidance document on the validation and international regulatory acceptance of test methods for hazard assessment. Much experience has been gained since then in validation centres, in countries and at the OECD on a variety of test methods that were subjected to validation studies. This chapter reviews validation principles and highlights common features that appear to be important for further regulatory acceptance across studies. Existing OECD-agreed validation principles will most likely generally remain relevant and applicable to address challenges associated with the validation of future test methods. Some adaptations may be needed to take into account the level of technique introduced in test systems, but demonstration of relevance and reliability will continue to play a central role as pre-requisite for the regulatory acceptance. Demonstration of relevance will become more challenging for test methods that form part of a set of predictive tools and methods, and that do not stand alone. OECD is keen on ensuring that while these concepts evolve, countries can continue to rely on valid methods and harmonised approaches for an efficient testing and assessment of chemicals.
Construct Validity of Neuropsychological Tests in Schizophrenia.

ERIC Educational Resources Information Center

Allen, Daniel N.; Aldarondo, Felito; Goldstein, Gerald; Huegel, Stephen G.; Gilbertson, Mark; van Kammen, Daniel P.

1998-01-01

The construct validity of neuropsychological tests in patients with schizophrenia was studied with 39 patients who were evaluated with a battery of six tests assessing attention, memory, and abstract reasoning abilities. Results support the construct validity of the neuropsychological tests in patients with schizophrenia. (SLD)
15 CFR 995.27 - Format validation software testing.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 15 Commerce and Foreign Trade 3 2010-01-01 2010-01-01 false Format validation software testing... CERTIFICATION REQUIREMENTS FOR NOAA HYDROGRAPHIC PRODUCTS AND SERVICES CERTIFICATION REQUIREMENTS FOR... of NOAA ENC Products § 995.27 Format validation software testing. Tests shall be performed verifying...
Validity of the Eating Attitudes Test and the Eating Disorders Inventory in Bulimia Nervosa.

ERIC Educational Resources Information Center

Gross, Janet; And Others

1986-01-01

Assessed criterion and concurrent validity of the Eating Attitudes Test and the Eating Disorder Inventory in 82 women with bulimia nervosa. Both tests demonstrated criterion validity by discriminating bulimia nervosa subjects from normals. Only weak support was found for concurrent validity within bulimia subjects. Recommends combination of…
Eye-Tracking as a Tool in Process-Oriented Reading Test Validation

ERIC Educational Resources Information Center

Solheim, Oddny Judith; Uppstad, Per Henning

2011-01-01

The present paper addresses the continuous need for methodological reflection on how to validate inferences made on the basis of test scores. Validation is a process that requires many lines of evidence. In this article we discuss the potential of eye tracking methodology in process-oriented reading test validation. Methodological considerations…
40 CFR 86.1341-98 - Test cycle validation criteria.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 20 2012-07-01 2012-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-98 Test cycle validation criteria. Section 86.1341-98 includes text that specifies...-90 (d)(4), shall be excluded from both cycle validation and the integrated work used for emissions...
40 CFR 86.1341-98 - Test cycle validation criteria.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 20 2013-07-01 2013-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-98 Test cycle validation criteria. Section 86.1341-98 includes text that specifies...-90 (d)(4), shall be excluded from both cycle validation and the integrated work used for emissions...
40 CFR 86.1341-98 - Test cycle validation criteria.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 19 2011-07-01 2011-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-98 Test cycle validation criteria. Section 86.1341-98 includes text that specifies...-90 (d)(4), shall be excluded from both cycle validation and the integrated work used for emissions...
Design and validation of a comprehensive fecal incontinence questionnaire.

PubMed

Macmillan, Alexandra K; Merrie, Arend E H; Marshall, Roger J; Parry, Bryan R

2008-10-01

Fecal incontinence can have a profound effect on quality of life. Its prevalence remains uncertain because of stigma, lack of consistent definition, and dearth of validated measures. This study was designed to develop a valid clinical and epidemiologic questionnaire, building on current literature and expertise. Patients and experts undertook face validity testing. Construct validity, criterion validity, and test-retest reliability was undertaken. Construct validity comprised factor analysis and internal consistency of the quality of life scale. The validity of known groups was tested against 77 control subjects by using regression models. Questionnaire results were compared with a stool diary for criterion validity. Test-retest reliability was calculated from repeated questionnaire completion. The questionnaire achieved good face validity. It was completed by 104 patients. The quality of life scale had four underlying traits (factor analysis) and high internal consistency (overall Cronbach alpha = 0.97). Patients and control subjects answered the questionnaire significantly differently (P < 0.01) in known-groups validity testing. Criterion validity assessment found mean differences close to zero. Median reliability for the whole questionnaire was 0.79 (range, 0.35-1). This questionnaire compares favorably with other available instruments, although the interpretation of stool consistency requires further research. Its sensitivity to treatment still needs to be investigated.
Predictive Validity Study of the APS Writing and Reading Tests [and] Validating Placement Rules for the APS Writing Test.

ERIC Educational Resources Information Center

College of the Canyons, Valencia, CA. Office of Institutional Development.

California's College of the Canyons has used the College Board Assessment and Placement Services (APS) test to assess students' abilities in basic and college English since spring 1993. These two reports summarize data from a May 1994 study of the predictive validity of the APS writing and reading tests and a June 1994 effort to validate the cut…
Validation of the breast evaluation questionnaire for breast hypertrophy and breast reduction.

PubMed

Lewin, Richard; Elander, Anna; Lundberg, Jonas; Hansson, Emma; Thorarinsson, Andri; Claudelin, Malin; Bladh, Helena; Lidén, Mattias

2018-06-13

There is a lack of published, validated questionnaires for evaluating psychosocial morbidity in patients with breast hypertrophy undergoing breast reduction surgery. To validate the breast evaluation questionnaire (BEQ), originally developed for the assessment of breast augmentation patients, for the assessment of psychosocial morbidity in patients with breast hypertrophy undergoing breast reduction surgery. Validation study Subjects: Women with macromastia Methods: The validation of the BEQ, adapted to breast reduction, was performed in several steps. Content validity, reliability, construct validity and responsiveness were assessed. The original version was adjusted according to the results for content validity and resulted in item reduction and a modified BEQ (mBEQ) that was then assessed for reliability, construct validity and responsiveness. Internal and external validation was performed for the modified BEQ. Convergent validity was tested against Breast-Q (reduction) and discriminate validity was tested against the SF-36. Known-groups validation revealed significant differences between the normal population and patients undergoing breast reduction surgery. The BEQ showed good reliability by test-re-test analysis and high responsiveness. The modified BEQ may be reliable, valid and responsive instrument for assessing women who undergo breast reduction.
How to test validity in orthodontic research: a mixed dentition analysis example.

PubMed

Donatelli, Richard E; Lee, Shin-Jae

2015-02-01

The data used to test the validity of a prediction method should be different from the data used to generate the prediction model. In this study, we explored whether an independent data set is mandatory for testing the validity of a new prediction method and how validity can be tested without independent new data. Several validation methods were compared in an example using the data from a mixed dentition analysis with a regression model. The validation errors of real mixed dentition analysis data and simulation data were analyzed for increasingly large data sets. The validation results of both the real and the simulation studies demonstrated that the leave-1-out cross-validation method had the smallest errors. The largest errors occurred in the traditional simple validation method. The differences between the validation methods diminished as the sample size increased. The leave-1-out cross-validation method seems to be an optimal validation method for improving the prediction accuracy in a data set with limited sample sizes. Copyright © 2015 American Association of Orthodontists. Published by Elsevier Inc. All rights reserved.
Validation of Alternative In Vitro Methods to Animal Testing: Concepts, Challenges, Processes and Tools.

PubMed

Griesinger, Claudius; Desprez, Bertrand; Coecke, Sandra; Casey, Warren; Zuang, Valérie

This chapter explores the concepts, processes, tools and challenges relating to the validation of alternative methods for toxicity and safety testing. In general terms, validation is the process of assessing the appropriateness and usefulness of a tool for its intended purpose. Validation is routinely used in various contexts in science, technology, the manufacturing and services sectors. It serves to assess the fitness-for-purpose of devices, systems, software up to entire methodologies. In the area of toxicity testing, validation plays an indispensable role: "alternative approaches" are increasingly replacing animal models as predictive tools and it needs to be demonstrated that these novel methods are fit for purpose. Alternative approaches include in vitro test methods, non-testing approaches such as predictive computer models up to entire testing and assessment strategies composed of method suites, data sources and decision-aiding tools. Data generated with alternative approaches are ultimately used for decision-making on public health and the protection of the environment. It is therefore essential that the underlying methods and methodologies are thoroughly characterised, assessed and transparently documented through validation studies involving impartial actors. Importantly, validation serves as a filter to ensure that only test methods able to produce data that help to address legislative requirements (e.g. EU's REACH legislation) are accepted as official testing tools and, owing to the globalisation of markets, recognised on international level (e.g. through inclusion in OECD test guidelines). Since validation creates a credible and transparent evidence base on test methods, it provides a quality stamp, supporting companies developing and marketing alternative methods and creating considerable business opportunities. Validation of alternative methods is conducted through scientific studies assessing two key hypotheses, reliability and relevance of the test method for a given purpose. Relevance encapsulates the scientific basis of the test method, its capacity to predict adverse effects in the "target system" (i.e. human health or the environment) as well as its applicability for the intended purpose. In this chapter we focus on the validation of non-animal in vitro alternative testing methods and review the concepts, challenges, processes and tools fundamental to the validation of in vitro methods intended for hazard testing of chemicals. We explore major challenges and peculiarities of validation in this area. Based on the notion that validation per se is a scientific endeavour that needs to adhere to key scientific principles, namely objectivity and appropriate choice of methodology, we examine basic aspects of study design and management, and provide illustrations of statistical approaches to describe predictive performance of validated test methods as well as their reliability.
Safety validation test equipment operation

NASA Astrophysics Data System (ADS)

Kurosaki, Tadaaki; Watanabe, Takashi

1992-08-01

An overview of the activities conducted on safety validation test equipment operation for materials used for NASA manned missions is presented. Safety validation tests, such as flammability, odor, offgassing, and so forth were conducted in accordance with NASA-NHB-8060.1C using test subjects common with those used by NASA, and the equipment used were qualified for their functions and performances in accordance with NASDA-CR-99124 'Safety Validation Test Qualification Procedures.' Test procedure systems were established by preparing 'Common Procedures for Safety Validation Test' as well as test procedures for flammability, offgassing, and odor tests. The test operation organization chaired by the General Manager of the Parts and Material Laboratory of NASDA (National Space Development Agency of Japan) was established, and the test leaders and operators in the organization were qualified in accordance with the specified procedures. One-hundred-one tests had been conducted so far by the Parts and Material Laboratory according to the request submitted by the manufacturers through the Space Station Group and the Safety and Product Assurance for Manned Systems Office.

Testing expert systems

NASA Technical Reports Server (NTRS)

Chang, C. L.; Stachowitz, R. A.

1988-01-01

Software quality is of primary concern in all large-scale expert system development efforts. Building appropriate validation and test tools for ensuring software reliability of expert systems is therefore required. The Expert Systems Validation Associate (EVA) is a validation system under development at the Lockheed Artificial Intelligence Center. EVA provides a wide range of validation and test tools to check correctness, consistency, and completeness of an expert system. Testing a major function of EVA. It means executing an expert system with test cases with the intent of finding errors. In this paper, we describe many different types of testing such as function-based testing, structure-based testing, and data-based testing. We describe how appropriate test cases may be selected in order to perform good and thorough testing of an expert system.
Psychological testing and psychological assessment. A review of evidence and issues.

PubMed

Meyer, G J; Finn, S E; Eyde, L D; Kay, G G; Moreland, K L; Dies, R R; Eisman, E J; Kubiszyn, T W; Reed, G M

2001-02-01

This article summarizes evidence and issues associated with psychological assessment. Data from more than 125 meta-analyses on test validity and 800 samples examining multimethod assessment suggest 4 general conclusions: (a) Psychological test validity is strong and compelling, (b) psychological test validity is comparable to medical test validity, (c) distinct assessment methods provide unique sources of information, and (d) clinicians who rely exclusively on interviews are prone to incomplete understandings. Following principles for optimal nomothetic research, the authors suggest that a multimethod assessment battery provides a structured means for skilled clinicians to maximize the validity of individualized assessments. Future investigations should move beyond an examination of test scales to focus more on the role of psychologists who use tests as helpful tools to furnish patients and referral sources with professional consultation.
Validation Test Report For The CRWMS Analysis and Logistics Visually Interactive Model Calvin Version 3.0, 10074-Vtr-3.0-00

DOE Office of Scientific and Technical Information (OSTI.GOV)

S. Gillespie

2000-07-27

This report describes the tests performed to validate the CRWMS ''Analysis and Logistics Visually Interactive'' Model (CALVIN) Version 3.0 (V3.0) computer code (STN: 10074-3.0-00). To validate the code, a series of test cases was developed in the CALVIN V3.0 Validation Test Plan (CRWMS M&O 1999a) that exercises the principal calculation models and options of CALVIN V3.0. Twenty-five test cases were developed: 18 logistics test cases and 7 cost test cases. These cases test the features of CALVIN in a sequential manner, so that the validation of each test case is used to demonstrate the accuracy of the input to subsequentmore » calculations. Where necessary, the test cases utilize reduced-size data tables to make the hand calculations used to verify the results more tractable, while still adequately testing the code's capabilities. Acceptance criteria, were established for the logistics and cost test cases in the Validation Test Plan (CRWMS M&O 1999a). The Logistics test cases were developed to test the following CALVIN calculation models: Spent nuclear fuel (SNF) and reactivity calculations; Options for altering reactor life; Adjustment of commercial SNF (CSNF) acceptance rates for fiscal year calculations and mid-year acceptance start; Fuel selection, transportation cask loading, and shipping to the Monitored Geologic Repository (MGR); Transportation cask shipping to and storage at an Interim Storage Facility (ISF); Reactor pool allocation options; and Disposal options at the MGR. Two types of cost test cases were developed: cases to validate the detailed transportation costs, and cases to validate the costs associated with the Civilian Radioactive Waste Management System (CRWMS) Management and Operating Contractor (M&O) and Regional Servicing Contractors (RSCs). For each test case, values calculated using Microsoft Excel 97 worksheets were compared to CALVIN V3.0 scenarios with the same input data and assumptions. All of the test case results compare with the CALVIN V3.0 results within the bounds of the acceptance criteria. Therefore, it is concluded that the CALVIN V3.0 calculation models and options tested in this report are validated.« less
Meeting Report: Validation of Toxicogenomics-Based Test Systems: ECVAM–ICCVAM/NICEATM Considerations for Regulatory Use

PubMed Central

Corvi, Raffaella; Ahr, Hans-Jürgen; Albertini, Silvio; Blakey, David H.; Clerici, Libero; Coecke, Sandra; Douglas, George R.; Gribaldo, Laura; Groten, John P.; Haase, Bernd; Hamernik, Karen; Hartung, Thomas; Inoue, Tohru; Indans, Ian; Maurici, Daniela; Orphanides, George; Rembges, Diana; Sansone, Susanna-Assunta; Snape, Jason R.; Toda, Eisaku; Tong, Weida; van Delft, Joost H.; Weis, Brenda; Schechtman, Leonard M.

2006-01-01

This is the report of the first workshop “Validation of Toxicogenomics-Based Test Systems” held 11–12 December 2003 in Ispra, Italy. The workshop was hosted by the European Centre for the Validation of Alternative Methods (ECVAM) and organized jointly by ECVAM, the U.S. Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM), and the National Toxicology Program (NTP) Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM). The primary aim of the workshop was for participants to discuss and define principles applicable to the validation of toxicogenomics platforms as well as validation of specific toxicologic test methods that incorporate toxicogenomics technologies. The workshop was viewed as an opportunity for initiating a dialogue between technologic experts, regulators, and the principal validation bodies and for identifying those factors to which the validation process would be applicable. It was felt that to do so now, as the technology is evolving and associated challenges are identified, would be a basis for the future validation of the technology when it reaches the appropriate stage. Because of the complexity of the issue, different aspects of the validation of toxicogenomics-based test methods were covered. The three focus areas include a) biologic validation of toxicogenomics-based test methods for regulatory decision making, b) technical and bioinformatics aspects related to validation, and c) validation issues as they relate to regulatory acceptance and use of toxicogenomics-based test methods. In this report we summarize the discussions and describe in detail the recommendations for future direction and priorities. PMID:16507466
Meeting report: Validation of toxicogenomics-based test systems: ECVAM-ICCVAM/NICEATM considerations for regulatory use.

PubMed

Corvi, Raffaella; Ahr, Hans-Jürgen; Albertini, Silvio; Blakey, David H; Clerici, Libero; Coecke, Sandra; Douglas, George R; Gribaldo, Laura; Groten, John P; Haase, Bernd; Hamernik, Karen; Hartung, Thomas; Inoue, Tohru; Indans, Ian; Maurici, Daniela; Orphanides, George; Rembges, Diana; Sansone, Susanna-Assunta; Snape, Jason R; Toda, Eisaku; Tong, Weida; van Delft, Joost H; Weis, Brenda; Schechtman, Leonard M

2006-03-01

This is the report of the first workshop "Validation of Toxicogenomics-Based Test Systems" held 11-12 December 2003 in Ispra, Italy. The workshop was hosted by the European Centre for the Validation of Alternative Methods (ECVAM) and organized jointly by ECVAM, the U.S. Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM), and the National Toxicology Program (NTP) Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM). The primary aim of the workshop was for participants to discuss and define principles applicable to the validation of toxicogenomics platforms as well as validation of specific toxicologic test methods that incorporate toxicogenomics technologies. The workshop was viewed as an opportunity for initiating a dialogue between technologic experts, regulators, and the principal validation bodies and for identifying those factors to which the validation process would be applicable. It was felt that to do so now, as the technology is evolving and associated challenges are identified, would be a basis for the future validation of the technology when it reaches the appropriate stage. Because of the complexity of the issue, different aspects of the validation of toxicogenomics-based test methods were covered. The three focus areas include a) biologic validation of toxicogenomics-based test methods for regulatory decision making, b) technical and bioinformatics aspects related to validation, and c) validation issues as they relate to regulatory acceptance and use of toxicogenomics-based test methods. In this report we summarize the discussions and describe in detail the recommendations for future direction and priorities.
Alternative Vocabularies in the Test Validity Literature

ERIC Educational Resources Information Center

Markus, Keith A.

2016-01-01

Justification of testing practice involves moving from one state of knowledge about the test to another. Theories of test validity can (a) focus on the beginning of the process, (b) focus on the end, or (c) encompass the entire process. Analyses of four case studies test and illustrate three claims: (a) restrictions on validity entail a supplement…
Content Validity Index and Intra- and Inter-Rater Reliability of a New Muscle Strength/Endurance Test Battery for Swedish Soldiers

PubMed Central

Larsson, Helena; Tegern, Matthias; Monnier, Andreas; Skoglund, Jörgen; Helander, Charlotte; Persson, Emelie; Malm, Christer; Broman, Lisbet; Aasa, Ulrika

2015-01-01

The objective of this study was to examine the content validity of commonly used muscle performance tests in military personnel and to investigate the reliability of a proposed test battery. For the content validity investigation, thirty selected tests were those described in the literature and/or commonly used in the Nordic and North Atlantic Treaty Organization (NATO) countries. Nine selected experts rated, on a four-point Likert scale, the relevance of these tests in relation to five different work tasks: lifting, carrying equipment on the body or in the hands, climbing, and digging. Thereafter, a content validity index (CVI) was calculated for each work task. The result showed excellent CVI (≥0.78) for sixteen tests, which comprised of one or more of the military work tasks. Three of the tests; the functional lower-limb loading test (the Ranger test), dead-lift with kettlebells, and back extension, showed excellent content validity for four of the work tasks. For the development of a new muscle strength/endurance test battery, these three tests were further supplemented with two other tests, namely, the chins and side-bridge test. The inter-rater reliability was high (intraclass correlation coefficient, ICC2,1 0.99) for all five tests. The intra-rater reliability was good to high (ICC3,1 0.82–0.96) with an acceptable standard error of mean (SEM), except for the side-bridge test (SEM%>15). Thus, the final suggested test battery for a valid and reliable evaluation of soldiers’ muscle performance comprised the following four tests; the Ranger test, dead-lift with kettlebells, chins, and back extension test. The criterion-related validity of the test battery should be further evaluated for soldiers exposed to varying physical workload. PMID:26177030
English Placement Testing, Multiple Measures, and Disproportionate Impact: An Analysis of the Criterion- and Content-Related Validity Evidence for the Reading & Writing Placement Tests in the San Diego Community College District.

ERIC Educational Resources Information Center

Armstrong, William B.

As part of an effort to statistically validate the placement tests used in California's San Diego Community College District (SDCCD) a study was undertaken to review the criteria- and content-related validity of the Assessment and Placement Services (APS) reading and writing tests. Evidence of criteria and content validity was gathered from…
A Historical Overview on the Concept of Validity in Language Testing

ERIC Educational Resources Information Center

Hamavandy, Mehraban; Kiany, Gholam Reza

2014-01-01

This article provides an overview on language test validation theories, especially the Messickian view on construct validity and the way it's been translated into practice. First, a brief historical synopsis will be set forth, followed by recent views on test validity as advanced by Messick and Kane. The review goes on to lay out the similarities…
Validating Test Score Meaning and Defending Test Score Use: Different Aims, Different Methods

ERIC Educational Resources Information Center

Cizek, Gregory J.

2016-01-01

Advances in validity theory and alacrity in validation practice have suffered because the term "validity" has been used to refer to two incompatible concerns: (1) the degree of support for specified interpretations of test scores (i.e. intended score meaning) and (2) the degree of support for specified applications (i.e. intended test…
40 CFR 86.1341-98 - Test cycle validation criteria.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 19 2010-07-01 2010-07-01 false Test cycle validation criteria. 86...) Emission Regulations for New Otto-Cycle and Diesel Heavy-Duty Engines; Gaseous and Particulate Exhaust Test Procedures § 86.1341-98 Test cycle validation criteria. Section 86.1341-98 includes text that specifies...
Test Takers and the Validity of Score Interpretations

ERIC Educational Resources Information Center

Kopriva, Rebecca J.; Thurlow, Martha L.; Perie, Marianne; Lazarus, Sheryl S.; Clark, Amy

2016-01-01

This article argues that test takers are as integral to determining validity of test scores as defining target content and conditioning inferences on test use. A principled sustained attention to how students interact with assessment opportunities is essential, as is a principled sustained evaluation of evidence confirming the validity or calling…
Recommendations for elaboration, transcultural adaptation and validation process of tests in Speech, Hearing and Language Pathology.

PubMed

Pernambuco, Leandro; Espelt, Albert; Magalhães, Hipólito Virgílio; Lima, Kenio Costa de

2017-06-08

to present a guide with recommendations for translation, adaptation, elaboration and process of validation of tests in Speech and Language Pathology. the recommendations were based on international guidelines with a focus on the elaboration, translation, cross-cultural adaptation and validation process of tests. the recommendations were grouped into two Charts, one of them with procedures for translation and transcultural adaptation and the other for obtaining evidence of validity, reliability and measures of accuracy of the tests. a guide with norms for the organization and systematization of the process of elaboration, translation, cross-cultural adaptation and validation process of tests in Speech and Language Pathology was created.
Validity Tests of the Adolescent Domain Screening Inventory (ADSI) with Older Adolescents

ERIC Educational Resources Information Center

Corrigan, Matthew J.; Forte, James; Bulgaris, Sarah

2017-01-01

The purpose of this replication study is to test the validity of the Adolescent Domain Screening Inventory (ADSI) on an older adolescent population. This cross sectional study used a convenience sample to preliminarily test the validity of the ADSI. Concurrent validity correlations ranged from a high of 0.924 to a low of 0.760. The known…
Test Anxiety and the Validity of Cognitive Tests: A Confirmatory Factor Analysis Perspective and Some Empirical Findings

ERIC Educational Resources Information Center

Wicherts, Jelte M.; Scholten, Annemarie Zand

2010-01-01

The validity of cognitive ability tests is often interpreted solely as a function of the cognitive abilities that these tests are supposed to measure, but other factors may be at play. The effects of test anxiety on the criterion related validity (CRV) of tests was the topic of a recent study by Reeve, Heggestad, and Lievens (2009) (Reeve, C. L.,…
The validity of three tests of temperament in guppies (Poecilia reticulata).

PubMed

Burns, James G

2008-11-01

Differences in temperament (consistent differences among individuals in behavior) can have important effects on fitness-related activities such as dispersal and competition. However, evolutionary ecologists have put limited effort into validating their tests of temperament. This article attempts to validate three standard tests of temperament in guppies: the open-field test, emergence test, and novel-object test. Through multiple reliability trials, and comparison of results between different types of test, this study establishes the confidence that can be placed in these temperament tests. The open-field test is shown to be a good test of boldness and exploratory behavior; the open-field test was reliable when tested in multiple ways. There were problems with the emergence test and novel-object test, which leads one to conclude that the protocols used in this study should not be considered valid tests for this species. (PsycINFO Database Record (c) 2008 APA, all rights reserved).
Validation of Linguistic and Communicative Oral Language Tests for Spanish-English Bilingual Programs.

ERIC Educational Resources Information Center

Politzer, Robert L.; And Others

1983-01-01

The development, administration, and scoring of a communicative test and its validation with tests of linguistic and sociolinguistic competence in English and Spanish are reported. Correlation with measures of home language use and school achievement are also presented, and issues of test validation for bilingual programs are discussed. (MSE)
Development and Validation of Diagnostic Economics Test for Secondary Schools

ERIC Educational Resources Information Center

Eleje, Lydia I.; Esomonu, Nkechi P. M.; Agu, Ngozi N.; Okoye, Romy O.; Obasi, Emma; Onah, Frederick E.

2016-01-01

A diagnostic test in economics to aid the teachers determine student's specific weak content areas was developed and validated. Five research questions guided the study. Preliminary validation was done by two experienced teachers in the content area of secondary economics and two experts in test construction. The pilot testing was conducted for…
40 CFR 86.1341-90 - Test cycle validation criteria.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 19 2010-07-01 2010-07-01 false Test cycle validation criteria. 86...) Emission Regulations for New Otto-Cycle and Diesel Heavy-Duty Engines; Gaseous and Particulate Exhaust Test Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag...
An Integrated Approach to Establish Validity and Reliability of Reading Tests

ERIC Educational Resources Information Center

Razi, Salim

2012-01-01

This study presents the processes of developing and establishing reliability and validity of a reading test by administering an integrative approach as conventional reliability and validity measures superficially reveals the difficulty of a reading test. In this respect, analysing vocabulary frequency of the test is regarded as a more eligible way…

Validation of the Simple Shoulder Test in a Portuguese-Brazilian population. Is the latent variable structure and validation of the Simple Shoulder Test Stable across cultures?

PubMed

Neto, Jose Osni Bruggemann; Gesser, Rafael Lehmkuhl; Steglich, Valdir; Bonilauri Ferreira, Ana Paula; Gandhi, Mihir; Vissoci, João Ricardo Nickenig; Pietrobon, Ricardo

2013-01-01

The validation of widely used scales facilitates the comparison across international patient samples. The objective of this study was to translate, culturally adapt and validate the Simple Shoulder Test into Brazilian Portuguese. Also we test the stability of factor analysis across different cultures. The objective of this study was to translate, culturally adapt and validate the Simple Shoulder Test into Brazilian Portuguese. Also we test the stability of factor analysis across different cultures. The Simple Shoulder Test was translated from English into Brazilian Portuguese, translated back into English, and evaluated for accuracy by an expert committee. It was then administered to 100 patients with shoulder conditions. Psychometric properties were analyzed including factor analysis, internal reliability, test-retest reliability at seven days, and construct validity in relation to the Short Form 36 health survey (SF-36). Factor analysis demonstrated a three factor solution. Cronbach's alpha was 0.82. Test-retest reliability index as measured by intra-class correlation coefficient (ICC) was 0.84. Associations were observed in the hypothesized direction with all subscales of SF-36 questionnaire. The Simple Shoulder Test translation and cultural adaptation to Brazilian-Portuguese demonstrated adequate factor structure, internal reliability, and validity, ultimately allowing for its use in the comparison with international patient samples.
Validation of the Simple Shoulder Test in a Portuguese-Brazilian Population. Is the Latent Variable Structure and Validation of the Simple Shoulder Test Stable across Cultures?

PubMed Central

Neto, Jose Osni Bruggemann; Gesser, Rafael Lehmkuhl; Steglich, Valdir; Bonilauri Ferreira, Ana Paula; Gandhi, Mihir; Vissoci, João Ricardo Nickenig; Pietrobon, Ricardo

2013-01-01

Background The validation of widely used scales facilitates the comparison across international patient samples. The objective of this study was to translate, culturally adapt and validate the Simple Shoulder Test into Brazilian Portuguese. Also we test the stability of factor analysis across different cultures. Objective The objective of this study was to translate, culturally adapt and validate the Simple Shoulder Test into Brazilian Portuguese. Also we test the stability of factor analysis across different cultures. Methods The Simple Shoulder Test was translated from English into Brazilian Portuguese, translated back into English, and evaluated for accuracy by an expert committee. It was then administered to 100 patients with shoulder conditions. Psychometric properties were analyzed including factor analysis, internal reliability, test-retest reliability at seven days, and construct validity in relation to the Short Form 36 health survey (SF-36). Results Factor analysis demonstrated a three factor solution. Cronbach’s alpha was 0.82. Test-retest reliability index as measured by intra-class correlation coefficient (ICC) was 0.84. Associations were observed in the hypothesized direction with all subscales of SF-36 questionnaire. Conclusion The Simple Shoulder Test translation and cultural adaptation to Brazilian-Portuguese demonstrated adequate factor structure, internal reliability, and validity, ultimately allowing for its use in the comparison with international patient samples. PMID:23675436
Functional performance testing of the hip in athletes: a systematic review for reliability and validity.

PubMed

Kivlan, Benjamin R; Martin, Robroy L

2012-08-01

The purpose of this study was to systematically review the literature for functional performance tests with evidence of reliability and validity that could be used for a young, athletic population with hip dysfunction. A search of PubMed and SPORTDiscus databases were performed to identify movement, balance, hop/jump, or agility functional performance tests from the current peer-reviewed literature used to assess function of the hip in young, athletic subjects. The single-leg stance, deep squat, single-leg squat, and star excursion balance tests (SEBT) demonstrated evidence of validity and normative data for score interpretation. The single-leg stance test and SEBT have evidence of validity with association to hip abductor function. The deep squat test demonstrated evidence as a functional performance test for evaluating femoroacetabular impingement. Hop/Jump tests and agility tests have no reported evidence of reliability or validity in a population of subjects with hip pathology. Use of functional performance tests in the assessment of hip dysfunction has not been well established in the current literature. Diminished squat depth and provocation of pain during the single-leg balance test have been associated with patients diagnosed with FAI and gluteal tendinopathy, respectively. The SEBT and single-leg squat tests provided evidence of convergent validity through an analysis of kinematics and muscle function in normal subjects. Reliability of functional performance tests have not been established on patients with hip dysfunction. Further study is needed to establish reliability and validity of functional performance tests that can be used in a young, athletic population with hip dysfunction. 2b (Systematic Review of Literature).
The validity and reliability of a dynamic neuromuscular stabilization-heel sliding test for core stability.

PubMed

Cha, Young Joo; Lee, Jae Jin; Kim, Do Hyun; You, Joshua Sung H

2017-10-23

Core stabilization plays an important role in the regulation of postural stability. To overcome shortcomings associated with pain and severe core instability during conventional core stabilization tests, we recently developed the dynamic neuromuscular stabilization-based heel sliding (DNS-HS) test. The purpose of this study was to establish the criterion validity and test-retest reliability of the novel DNS-HS test. Twenty young adults with core instability completed both the bilateral straight leg lowering test (BSLLT) and DNS-HS test for the criterion validity study and repeated the DNS-HS test for the test-retest reliability study. Criterion validity was determined by comparing hip joint angle data that were obtained from BSLLT and DNS-HS measures. The test-retest reliability was determined by comparing hip joint angle data. Criterion validity was (ICC2,3) = 0.700 (p< 0.05), suggesting a good relationship between the two core stability measures. Test-retest reliability was (ICC3,3) = 0.953 (p< 0.05), indicating excellent consistency between the repeated DNS-HS measurements. Criterion validity data demonstrated a good relationship between the gold standard BSLLT and DNS-HS core stability measures. Test-retest reliability data suggests that DNS-HS core stability was a reliable test for core stability. Clinically, the DNS-HS test is useful to objectively quantify core instability and allow early detection and evaluation.
Assessment of a condition-specific quality-of-life measure for patients with developmentally absent teeth: validity and reliability testing.

PubMed

Akram, A J; Ireland, A J; Postlethwaite, K C; Sandy, J R; Jerreat, A S

2013-11-01

This article describes the process of validity and reliability testing of a condition-specific quality-of-life measure for patients with hypodontia presenting for orthodontic treatment. The development of the instrument is described in a previous article. Royal Devon and Exeter NHS Foundation Trust & Musgrove Park Hospital, Taunton. The child perception questionnaire was used as a standard against which to test criterion validity. The Bland and Altman method was used to check agreement between the two questionnaires. Construct validity was tested using principal component analysis on the four sections of the questionnaire. Test-retest reliability was tested using intraclass correlation coefficient and Bland and Altman method. Cronbach's alpha was used to test internal consistency reliability. Overall the questionnaire showed good reliability, criterion and construct validity. This together with previous evidence of good face and content validity suggests that the instrument may prove useful in clinical practice and further research. This study has demonstrated that the newly developed condition-specific quality-of-life questionnaire is both valid and reliable for use in young patients with hypodontia. © 2013 John Wiley & Sons A/S. Published by Blackwell Publishing Ltd.
Applying modern psychometric techniques to melodic discrimination testing: Item response theory, computerised adaptive testing, and automatic item generation.

PubMed

Harrison, Peter M C; Collins, Tom; Müllensiefen, Daniel

2017-06-15

Modern psychometric theory provides many useful tools for ability testing, such as item response theory, computerised adaptive testing, and automatic item generation. However, these techniques have yet to be integrated into mainstream psychological practice. This is unfortunate, because modern psychometric techniques can bring many benefits, including sophisticated reliability measures, improved construct validity, avoidance of exposure effects, and improved efficiency. In the present research we therefore use these techniques to develop a new test of a well-studied psychological capacity: melodic discrimination, the ability to detect differences between melodies. We calibrate and validate this test in a series of studies. Studies 1 and 2 respectively calibrate and validate an initial test version, while Studies 3 and 4 calibrate and validate an updated test version incorporating additional easy items. The results support the new test's viability, with evidence for strong reliability and construct validity. We discuss how these modern psychometric techniques may also be profitably applied to other areas of music psychology and psychological science in general.
Identification student’s misconception of heat and temperature using three-tier diagnostic test

NASA Astrophysics Data System (ADS)

Suliyanah; Putri, H. N. P. A.; Rohmawati, L.

2018-03-01

The objective of this research is to develop a Three-Tier Diagnostic Test (TTDT) to identify the student's misconception of heat and temperature. Stages of development include: analysis, planning, design, development, evaluation and revise. The results of this study show that (1) the quality of the three-tier type diagnostic test instrument developed has been expressed well with the following details: (a) Internal validity of 88.19% belonging to the valid category. (b) External validity of empirical construct validity test using Pearson Product Moment obtained 0.43 is classified and result of empirical construct validity test obtained false positives 6.1% and false negatives 5.9% then the instrument was valid. (c) Test reliability by using Cronbach’s Alpha of 0.98 which means acceptable. (d) The 80% difficulty level test is quite difficult. (2) Student misconceptions on the temperature of heat and displacement materials based on the II test the highest (84%), the lowest (21%), and the non-misconceptions (7%). (3) The highest cause of misconception among students is associative thinking (22%) and the lowest is caused by incomplete or incomplete reasoning (11%). Three-Tier Diagnostic Test (TTDT) could identify the student's misconception of heat and temperature.
Validity and reliability of the NAB Naming Test.

PubMed

Sachs, Bonnie C; Rush, Beth K; Pedraza, Otto

2016-05-01

Confrontation naming is commonly assessed in neuropsychological practice, but few standardized measures of naming exist and those that do are susceptible to the effects of education and culture. The Neuropsychological Assessment Battery (NAB) Naming Test is a 31-item measure used to assess confrontation naming. Despite adequate psychometric information provided by the test publisher, there has been limited independent validation of the test. In this study, we investigated the convergent and discriminant validity, internal consistency, and alternate forms reliability of the NAB Naming Test in a sample of adults (Form 1: n = 247, Form 2: n = 151) clinically referred for neuropsychological evaluation. Results indicate adequate-to-good internal consistency and alternate forms reliability. We also found strong convergent validity as demonstrated by relationships with other neurocognitive measures. We found preliminary evidence that the NAB Naming Test demonstrates a more pronounced ceiling effect than other commonly used measures of naming. To our knowledge, this represents the largest published independent validation study of the NAB Naming Test in a clinical sample. Our findings suggest that the NAB Naming Test demonstrates adequate validity and reliability and merits consideration in the test arsenal of clinical neuropsychologists.
Measuring verbal and non-verbal communication in aphasia: reliability, validity, and sensitivity to change of the Scenario Test.

PubMed

van der Meulen, Ineke; van de Sandt-Koenderman, W Mieke E; Duivenvoorden, Hugo J; Ribbers, Gerard M

2010-01-01

This study explores the psychometric qualities of the Scenario Test, a new test to assess daily-life communication in severe aphasia. The test is innovative in that it: (1) examines the effectiveness of verbal and non-verbal communication; and (2) assesses patients' communication in an interactive setting, with a supportive communication partner. To determine the reliability, validity, and sensitivity to change of the Scenario Test and discuss its clinical value. The Scenario Test was administered to 122 persons with aphasia after stroke and to 25 non-aphasic controls. Analyses were performed for the entire group of persons with aphasia, as well as for a subgroup of persons unable to communicate verbally (n = 43). Reliability (internal consistency, test-retest reliability, inter-judge, and intra-judge reliability) and validity (internal validity, convergent validity, known-groups validity) and sensitivity to change were examined using standard psychometric methods. The Scenario Test showed high levels of reliability. Internal consistency (Cronbach's alpha = 0.96; item-rest correlations = 0.58-0.82) and test-retest reliability (ICC = 0.98) were high. Agreement between judges in total scores was good, as indicated by the high inter- and intra-judge reliability (ICC = 0.86-1.00). Agreement in scores on the individual items was also good (square-weighted kappa values 0.61-0.92). The test demonstrated good levels of validity. A principal component analysis for categorical data identified two dimensions, interpreted as general communication and communicative creativity. Correlations with three other instruments measuring communication in aphasia, that is, Spontaneous Speech interview from the Aachen Aphasia Test (AAT), Amsterdam-Nijmegen Everyday Language Test (ANELT), and Communicative Effectiveness Index (CETI), were moderate to strong (0.50-0.85) suggesting good convergent validity. Group differences were observed between persons with aphasia and non-aphasic controls, as well as between persons with aphasia unable to use speech to convey information and those able to communicate verbally; this indicates good known-groups validity. The test was sensitive to changes in performance, measured over a period of 6 months. The data support the reliability and validity of the Scenario Test as an instrument for examining daily-life communication in aphasia. The test focuses on multimodal communication; its psychometric qualities enable future studies on the effect of Alternative and Augmentative Communication (AAC) training in aphasia.
[Do Current German-Language Intelligence Tests Take into Consideration the Special Needs of Children with Disabilities?].

PubMed

Mickley, Manfred; Renner, Gerolf

2015-01-01

Do Current German-Language Intelligence Tests Take into Consideration the Special Needs of Children with Disabilities? A review of 23 German intelligence test manuals shows that test-authors do not exclude the use of their tests for children with disabilities. However, these special groups play a minor role in the construction, standardization, and validation of intelligence tests. There is no sufficient discussion and reflection concerning the issue which construct-irrelevant requirements may reduce the validity of the test or which individual test-adaptations are allowed or recommended. Intelligence testing of children with disabilities needs more empirical evidence on objectivity, reliability, and validity of the assessment-procedures employed. Future test construction and validation should systematically analyze construct-irrelevant variance in item format, the special needs of handicapped children, and should give hints for useful test-adaptations.
Development of a framework for international certification by OIE of diagnostic tests validated as fit for purpose.

PubMed

Wright, P; Edwards, S; Diallo, A; Jacobson, R

2006-01-01

Historically, the OIE has focused on test methods applicable to trade and the international movement of animals and animal products. With its expanding role as the World Organisation for Animal Health, the OIE has recognised the need to evaluate test methods relative to specific diagnostic applications other than trade. In collaboration with its international partners, the OIE solicited input from experts through consultants' meetings on the development of guidelines for validation and certification of diagnostic assays for infectious animal diseases. Recommendations from the first meeting were formally adopted and have subsequently been acted upon by the OIE. A validation template has been developed that specifically requires a test to be fit or suited for its intended purpose (e.g. as a screening or a confirmatory test). This is a key criterion for validation. The template incorporates four distinct stages of validation, each of which has bearing on the evaluation of fitness for purpose. The OIE has just recently created a registry for diagnostic tests that fulfil these validation requirements. Assay developers are invited to submit validation dossiers to the OIE for evaluation by a panel of experts. Recognising that validation is an incremental process, tests methods achieving at least the first stages of validation may be provisionally accepted. To provide additional confidence in assay performance, the OIE, through its network of Reference Laboratories, has embarked on the development of evaluation panels. These panels would contain specially selected test samples that would assist in verifying fitness for purpose.
Development of a framework for international certification by the OIE of diagnostic tests validated as fit for purpose.

PubMed

Wright, P; Edwards, S; Diallo, A; Jacobson, R

2007-01-01

Historically, the OIE has focussed on test methods applicable to trade and the international movement of animals and animal products. With its expanding role as the World Organisation for Animal Health, the OIE has recognised the need to evaluate test methods relative to specific diagnostic applications other than trade. In collaboration with its international partners, the OIE solicited input from experts through consultants meetings on the development of guidelines for validation and certification of diagnostic assays for infectious animal diseases. Recommendations from the first meeting were formally adopted and have subsequently been acted upon by the OIE. A validation template has been developed that specifically requires a test to be fit or suited for its intended purpose (e.g. as a screening or a confirmatory test). This is a key criterion for validation. The template incorporates four distinct stages of validation, each of which has bearing on the evaluation of fitness for purpose. The OIE has just recently created a registry for diagnostic tests that fulfil these validation requirements. Assay developers are invited to submit validation dossiers to the OIE for evaluation by a panel of experts. Recognising that validation is an incremental process, tests methods achieving at least the first stages of validation may be provisionally accepted. To provide additional confidence in assay performance, the OIE, through its network of Reference Laboratories, has embarked on the development of evaluation panels. These panels would contain specially selected test samples that would assist in verifying fitness for purpose.
Beyond Faith and Face Validity: The Multitrait-Multimethod Matrix and the Convergent and Discriminant Validity of Oral Proficiency Tests.

ERIC Educational Resources Information Center

Stevenson, Douglas K.

Recently there has been a renewed international interest in direct oral proficiency measures such as the oral interview. There has also been a growing awareness among some language testing specialists that all proficiency tests must be subjected to construct validation. It seems that the high face validity of oral interviews tends to cloud and…
Development and validation of a knowledge test for health professionals regarding lifestyle modification.

PubMed

Talip, Whadi-ah; Steyn, Nelia P; Visser, Marianne; Charlton, Karen E; Temple, Norman

2003-09-01

We wanted to develop and validate a test that assesses the knowledge and practices of health professionals (HPs) with regard to the role of nutrition, physical activity, and smoking cessation (lifestyle modification) in chronic diseases of lifestyle. A descriptive cross-sectional validation study was carried out. The validation design consisted of two phases, namely 1) test planning and development and 2) test evaluation. The study sample consisted of five groups of HPs: dietitians, dietetic interns, general practitioners, medical students, and nurses. The overall response rate was 58%, resulting in a sample size of 186 participants. A test was designed to evaluate the knowledge and practices of HPs. The test was first evaluated by an expert group to ensure content, construct, and face validity. Thereafter, the questionnaire was tested on five groups of HPs to test for criterion validity. Internal consistency was evaluated by Cronbach's alpha. An expert panel ensured content, construct, and face validity of the test. Groups with the most training and exposure to nutrition (dietitians and dietetic interns) had the highest group mean score, ranging from 61% to 88%, whereas those with limited nutrition training (general practitioners, medical students, and nurses) had significantly lower scores, ranging from 26% to 80%. This result demonstrated criterion validity. Internal consistency of the overall test demonstrated a Cronbach's alpha of 0.99. Most HPs identified the mass media as their main source of information on lifestyle modification. These HPs also identified lack of time, lack of patient compliance, and lack of knowledge as barriers that prevent them from providing counseling on lifestyle modification. The results of this study showed that this test instrument identifies groups of health professionals with adequate training (knowledge) in lifestyle modification and those who require further training (knowledge).
Development and validation of a new questionnaire for the assessment of subjective physical performance in adult patients with haemophilia--the HEP-Test-Q.

PubMed

von Mackensen, S; Czepa, D; Herbsleb, M; Hilberg, T

2010-01-01

Specific research studies for the investigation of physical performance in haemophilic patients are rare. However, these instruments become increasingly more important to evaluate therapeutic treatments. Within the frame of the Haemophilia & Exercise Project (HEP), a new questionnaire, namely HEP-Test-Q, has been developed for the assessment of subjective physical performance in haemophilic adults. In this article, the development and validation of the HEP-Test-Q is described. The development consisted of different phases including item collection, pilot testing and field testing. The preliminary version was pilot-tested in 24 German HEP-participants. Following evaluation and preliminary psychometric analysis, the HEP-Test-Q was revised. The final version consists of 25 items pertaining to the domains 'mobility', 'strength & coordination', 'endurance' and 'body perception', which was administered to 43 German haemophilic patients (43.8 +/- 11.2 years). Psychometric analysis included reliability and validity testing. Convergent validity was tested correlating the HEP-Test-Q with SF-36, Haem-A-QoL, HAL and the Orthopaedic Joint Score. Discriminant validity tested different clinical subgroups. Patients accepted the questionnaire and found it easy to fill in. Psychometric testing revealed good values for reliability in terms of internal consistency (Cronbach's alpha = 0.96) and test-retest reliability (r = 0.90) as well as for convergent validity correlating highly with Haem-A-QoL, HAL and SF-36. Discriminant validity testing showed significant differences for age, hepatitis A and hepatitis B and the number of target joints. HEP-Test-Q is a short and well-accepted questionnaire, assessing subjective physical performance of haemophiliacs, which might be combined with objective assessments to reveal aspects, which cannot be measured objectively, such as body perception.
Construction of Valid and Reliable Test for Assessment of Students

ERIC Educational Resources Information Center

Osadebe, P. U.

2015-01-01

The study was carried out to construct a valid and reliable test in Economics for secondary school students. Two research questions were drawn to guide the establishment of validity and reliability for the Economics Achievement Test (EAT). It is a multiple choice objective test of five options with 100 items. A sample of 1000 students was randomly…
Test of Achievement in Quantitative Economics for Secondary Schools: Construction and Validation Using Item Response Theory

ERIC Educational Resources Information Center

Eleje, Lydia I.; Esomonu, Nkechi P. M.

2018-01-01

A Test to measure achievement in quantitative economics among secondary school students was developed and validated in this study. The test is made up 20 multiple choice test items constructed based on quantitative economics sub-skills. Six research questions guided the study. Preliminary validation was done by two experienced teachers in…
The Need, Development, and Validation of the Innovation Test Instrument

ERIC Educational Resources Information Center

Wheadon, Jacob; Wright, Geoff A.; West, Richard E.; Skaggs, Paul

2017-01-01

This study discusses the need, development, and validation of the Innovation Test Instrument (ITI). This article outlines how the researchers identified the content domain of the assessment and created test items. Then, it describes initial validation testing of the instrument. The findings suggest that the ITI is a good first step in creating an…
Validity and Reliability of the Arabic Token Test for Children

ERIC Educational Resources Information Center

Alkhamra, Rana A.; Al-Jazi, Aya B.

2016-01-01

Background: The Token Test for Children (2nd edition) (TTFC) is a measure for assessing receptive language. In this study we describe the translation process, validity and reliability of the Arabic Token Test for Children (A-TTFC). Aims: The aim of this study is to translate, validate and establish the reliability of the Arabic Token Test for…
Construction and Evaluation of Reliability and Validity of Reasoning Ability Test

ERIC Educational Resources Information Center

Bhat, Mehraj A.

2014-01-01

This paper is based on the construction and evaluation of reliability and validity of reasoning ability test at secondary school students. In this paper an attempt was made to evaluate validity, reliability and to determine the appropriate standards to interpret the results of reasoning ability test. The test includes 45 items to measure six types…

Conceptualizing Essay Tests' Reliability and Validity: From Research to Theory

ERIC Educational Resources Information Center

Badjadi, Nour El Imane

2013-01-01

The current paper on writing assessment surveys the literature on the reliability and validity of essay tests. The paper aims to examine the two concepts in relationship with essay testing as well as to provide a snapshot of the current understandings of the reliability and validity of essay tests as drawn in recent research studies. Bearing in…
The Validity and Responsiveness of Isometric Lower Body Multi-Joint Tests of Muscular Strength: a Systematic Review.

PubMed

Drake, David; Kennedy, Rodney; Wallace, Eric

2017-12-01

Researchers and practitioners working in sports medicine and science require valid tests to determine the effectiveness of interventions and enhance understanding of mechanisms underpinning adaptation. Such decision making is influenced by the supportive evidence describing the validity of tests within current research. The objective of this study is to review the validity of lower body isometric multi-joint tests ability to assess muscular strength and determine the current level of supporting evidence. Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines were followed in a systematic fashion to search, assess and synthesize existing literature on this topic. Electronic databases such as Web of Science, CINAHL and PubMed were searched up to 18 March 2015. Potential inclusions were screened against eligibility criteria relating to types of test, measurement instrument, properties of validity assessed and population group and were required to be published in English. The Consensus-based Standards for the Selection of health Measurement Instruments (COSMIN) checklist was used to assess methodological quality and measurement property rating of included studies. Studies rated as fair or better in methodological quality were included in the best evidence synthesis. Fifty-nine studies met the eligibility criteria for quality appraisal. The ten studies that rated fair or better in methodological quality were included in the best evidence synthesis. The most frequently investigated lower body isometric multi-joint tests for validity were the isometric mid-thigh pull and isometric squat. The validity of each of these tests was strong in terms of reliability and construct validity. The evidence for responsiveness of tests was found to be moderate for the isometric squat test and unknown for the isometric mid-thigh pull. No tests using the isometric leg press met the criteria for inclusion in the best evidence synthesis. Researchers and practitioners can use the isometric squat and isometric mid-thigh pull with confidence in terms of reliability and construct validity. Further work to investigate other validity components such as criterion validity, smallest detectable change and responsiveness to resistance exercise interventions may be beneficial to the current level of evidence.
An alternative to the balance error scoring system: using a low-cost balance board to improve the validity/reliability of sports-related concussion balance testing.

PubMed

Chang, Jasper O; Levy, Susan S; Seay, Seth W; Goble, Daniel J

2014-05-01

Recent guidelines advocate sports medicine professionals to use balance tests to assess sensorimotor status in the management of concussions. The present study sought to determine whether a low-cost balance board could provide a valid, reliable, and objective means of performing this balance testing. Criterion validity testing relative to a gold standard and 7 day test-retest reliability. University biomechanics laboratory. Thirty healthy young adults. Balance ability was assessed on 2 days separated by 1 week using (1) a gold standard measure (ie, scientific grade force plate), (2) a low-cost Nintendo Wii Balance Board (WBB), and (3) the Balance Error Scoring System (BESS). Validity of the WBB center of pressure path length and BESS scores were determined relative to the force plate data. Test-retest reliability was established based on intraclass correlation coefficients. Composite scores for the WBB had excellent validity (r = 0.99) and test-retest reliability (R = 0.88). Both the validity (r = 0.10-0.52) and test-retest reliability (r = 0.61-0.78) were lower for the BESS. These findings demonstrate that a low-cost balance board can provide improved balance testing accuracy/reliability compared with the BESS. This approach provides a potentially more valid/reliable, yet affordable, means of assessing sports-related concussion compared with current methods.
[Design and validation of a questionnaire for psychosocial nursing diagnosis in Primary Care].

PubMed

Brito-Brito, Pedro Ruymán; Rodríguez-Álvarez, Cristobalina; Sierra-López, Antonio; Rodríguez-Gómez, José Ángel; Aguirre-Jaime, Armando

2012-01-01

To develop a valid, reliable and easy-to-use questionnaire for a psychosocial nursing diagnosis. The study was performed in two phases: first phase, questionnaire design and construction; second phase, validity and reliability tests. A bank of items was constructed using the NANDA classification as a theoretical framework. Each item was assigned a Likert scale or dichotomous response. The combination of responses to the items constituted the diagnostic rules to assign up to 28 labels. A group of experts carried out the validity test for content. Other validated scales were used as reference standards for the criterion validity tests. Forty-five nurses provided the questionnaire to the patients on three separate occasions over a period of three weeks, and the other validated scales only once to 188 randomly selected patients in Primary Care centres in Tenerife (Spain). Validity tests for construct confirmed the six dimensions of the questionnaire with 91% of total variance explained. Validity tests for criterion showed a specificity of 66%-100%, and showed high correlations with the reference scales when the questionnaire was assigning nursing diagnoses. Reliability tests showed agreement of 56%-91% (P<.001), and a 93% internal consistency. The Questionnaire for Psychosocial Nursing Diagnosis was called CdePS, and included 61 items. The CdePS is a valid, reliable and easy-to-use tool in Primary Care centres to improve the assigning of a psychosocial nursing diagnosis. Copyright © 2011 Elsevier España, S.L. All rights reserved.
[Comparison of the Wechsler Memory Scale-III and the Spain-Complutense Verbal Learning Test in acquired brain injury: construct validity and ecological validity].

PubMed

Luna-Lario, P; Pena, J; Ojeda, N

2017-04-16

To perform an in-depth examination of the construct validity and the ecological validity of the Wechsler Memory Scale-III (WMS-III) and the Spain-Complutense Verbal Learning Test (TAVEC). The sample consists of 106 adults with acquired brain injury who were treated in the Area of Neuropsychology and Neuropsychiatry of the Complejo Hospitalario de Navarra and displayed memory deficit as the main sequela, measured by means of specific memory tests. The construct validity is determined by examining the tasks required in each test over the basic theoretical models, comparing the performance according to the parameters offered by the tests, contrasting the severity indices of each test and analysing their convergence. The external validity is explored through the correlation between the tests and by using regression models. According to the results obtained, both the WMS-III and the TAVEC have construct validity. The TAVEC is more sensitive and captures not only the deficits in mnemonic consolidation, but also in the executive functions involved in memory. The working memory index of the WMS-III is useful for predicting the return to work at two years after the acquired brain injury, but none of the instruments anticipates the disability and dependence at least six months after the injury. We reflect upon the construct validity of the tests and their insufficient capacity to predict functionality when the sequelae become chronic.
The Teenage Nonviolence Test: Concurrent and Discriminant Validity.

ERIC Educational Resources Information Center

Konen, Kristopher; Mayton, Daniel M., II; Delva, Zenita; Sonnen, Melinda; Dahl, William; Montgomery, Richard

This study was designed to document the validity of the Teenage Nonviolence Test (TNT). In this study the concurrent validity of the TNT in various ways, the validity of the TNT using known groups, and the discriminant validity of the TNT by evaluating its relationships with other psychological constructs were assessed. The results showed that the…
FUNCTIONAL PERFORMANCE TESTING OF THE HIP IN ATHLETES: A SYSTEMATIC REVIEW FOR RELIABILITY AND VALIDITY

PubMed Central

Martin, RobRoy L.

2012-01-01

Purpose/Background: The purpose of this study was to systematically review the literature for functional performance tests with evidence of reliability and validity that could be used for a young, athletic population with hip dysfunction. Methods: A search of PubMed and SPORTDiscus databases were performed to identify movement, balance, hop/jump, or agility functional performance tests from the current peer-reviewed literature used to assess function of the hip in young, athletic subjects. Results: The single-leg stance, deep squat, single-leg squat, and star excursion balance tests (SEBT) demonstrated evidence of validity and normative data for score interpretation. The single-leg stance test and SEBT have evidence of validity with association to hip abductor function. The deep squat test demonstrated evidence as a functional performance test for evaluating femoroacetabular impingement. Hop/Jump tests and agility tests have no reported evidence of reliability or validity in a population of subjects with hip pathology. Conclusions: Use of functional performance tests in the assessment of hip dysfunction has not been well established in the current literature. Diminished squat depth and provocation of pain during the single-leg balance test have been associated with patients diagnosed with FAI and gluteal tendinopathy, respectively. The SEBT and single-leg squat tests provided evidence of convergent validity through an analysis of kinematics and muscle function in normal subjects. Reliability of functional performance tests have not been established on patients with hip dysfunction. Further study is needed to establish reliability and validity of functional performance tests that can be used in a young, athletic population with hip dysfunction. Level of Evidence: 2b (Systematic Review of Literature) PMID:22893860
Development and Validation of Extract the Base: An English Derivational Morphology Test for Third through Fifth Grade Monolingual Students and Spanish-Speaking English Language Learners

ERIC Educational Resources Information Center

Goodwin, Amanda P.; Huggins, A. Corinne; Carlo, Maria; Malabonga, Valerie; Kenyon, Dorry; Louguit, Mohammed; August, Diane

2012-01-01

This study describes the development and validation of the Extract the Base test (ETB), which assesses derivational morphological awareness. Scores on this test were validated for 580 monolingual students and 373 Spanish-speaking English language learners (ELLs) in third through fifth grade. As part of the validation of the internal structure,…
Statistical methodology: II. Reliability and validity assessment in study design, Part B.

PubMed

Karras, D J

1997-02-01

Validity measures the correspondence between a test and other purported measures of the same or similar qualities. When a reference standard exists, a criterion-based validity coefficient can be calculated. If no such standard is available, the concepts of content and construct validity may be used, but quantitative analysis may not be possible. The Pearson and Spearman tests of correlation are often used to assess the correspondence between tests, but do not account for measurement biases and may yield misleading results. Techniques that measure interest differences may be more meaningful in validity assessment, and the kappa statistic is useful for analyzing categorical variables. Questionnaires often can be designed to allow quantitative assessment of reliability and validity, although this may be difficult. Inclusion of homogeneous questions is necessary to assess reliability. Analysis is enhanced by using Likert scales or similar techniques that yield ordinal data. Validity assessment of questionnaires requires careful definition of the scope of the test and comparison with previously validated tools.
Item Development and Validity Testing for a Self- and Proxy Report: The Safe Driving Behavior Measure

PubMed Central

Classen, Sherrilene; Winter, Sandra M.; Velozo, Craig A.; Bédard, Michel; Lanford, Desiree N.; Brumback, Babette; Lutz, Barbara J.

2010-01-01

OBJECTIVE We report on item development and validity testing of a self-report older adult safe driving behaviors measure (SDBM). METHOD On the basis of theoretical frameworks (Precede–Proceed Model of Health Promotion, Haddon’s matrix, and Michon’s model), existing driving measures, and previous research and guided by measurement theory, we developed items capturing safe driving behavior. Item development was further informed by focus groups. We established face validity using peer reviewers and content validity using expert raters. RESULTS Peer review indicated acceptable face validity. Initial expert rater review yielded a scale content validity index (CVI) rating of 0.78, with 44 of 60 items rated ≥0.75. Sixteen unacceptable items (≤0.5) required major revision or deletion. The next CVI scale average was 0.84, indicating acceptable content validity. CONCLUSION The SDBM has relevance as a self-report to rate older drivers. Future pilot testing of the SDBM comparing results with on-road testing will define criterion validity. PMID:20437917
Alphabus Mechanical Validation Plan and Test Campaign

NASA Astrophysics Data System (ADS)

Calvisi, G.; Bonnet, D.; Belliol, P.; Lodereau, P.; Redoundo, R.

2012-07-01

A joint team of the two leading European satellite companies (Astrium and Thales Alenia Space) worked with the support of ESA and CNES to define a product line able to efficiently address the upper segment of communications satellites : Alphabus Starting in 2009 and up to 2011 the mechanical validation of the Alphabus platform has been obtained thanks to static tests performed on dedicated static model and to environmental test performed on the first satellite based on Alphabus: Alphasat I-XL. The mechanical validation of the Alphabus platform presented an excellent opportunity to improve the validation and qualification process, with respect to static, sine vibrations, acoustic and L/V shock environment, minimizing recurrent cost of manufacturing, integration and testing. A main driver on mechanical testing is that mechanical acceptance testing at satellite level will be performed with empty tanks due to technical constraints (limitation of existing vibration devices) and programmatic advantages (test risk reduction, test schedule minimization). In this paper the impacts that such testing logic have on validation plan are briefly recalled and its actual application for Alphasat PFM mechanical test campaign is detailed.
Contemporary Test Validity in Theory and Practice: A Primer for Discipline-Based Education Researchers

PubMed Central

Reeves, Todd D.; Marbach-Ad, Gili

2016-01-01

Most discipline-based education researchers (DBERs) were formally trained in the methods of scientific disciplines such as biology, chemistry, and physics, rather than social science disciplines such as psychology and education. As a result, DBERs may have never taken specific courses in the social science research methodology—either quantitative or qualitative—on which their scholarship often relies so heavily. One particular aspect of (quantitative) social science research that differs markedly from disciplines such as biology and chemistry is the instrumentation used to quantify phenomena. In response, this Research Methods essay offers a contemporary social science perspective on test validity and the validation process. The instructional piece explores the concepts of test validity, the validation process, validity evidence, and key threats to validity. The essay also includes an in-depth example of a validity argument and validation approach for a test of student argument analysis. In addition to DBERs, this essay should benefit practitioners (e.g., lab directors, faculty members) in the development, evaluation, and/or selection of instruments for their work assessing students or evaluating pedagogical innovations. PMID:26903498
The reliability and validity of the SF-8 with a conflict-affected population in northern Uganda.

PubMed

Roberts, Bayard; Browne, John; Ocaka, Kaducu Felix; Oyok, Thomas; Sondorp, Egbert

2008-12-02

The SF-8 is a health-related quality of life instrument that could provide a useful means of assessing general physical and mental health amongst populations affected by conflict. The purpose of this study was to test the validity and reliability of the SF-8 with a conflict-affected population in northern Uganda. A cross-sectional multi-staged, random cluster survey was conducted with 1206 adults in camps for internally displaced persons in Gulu and Amuru districts of northern Uganda. Data quality was assessed by analysing the number of incomplete responses to SF-8 items. Response distribution was analysed using aggregate endorsement frequency. Test-retest reliability was assessed in a separate smaller survey using the intraclass correlation test. Construct validity was measured using principal component analysis, and the Pearson Correlation test for item-summary score correlation and inter-instrument correlations. Known groups validity was assessed using a two sample t-test to evaluates the ability of the SF-8 to discriminate between groups known to have, and not have, physical and mental health problems. The SF-8 showed excellent data quality. It showed acceptable item response distribution based upon analysis of aggregate endorsement frequencies. Test-retest showed a good intraclass correlation of 0.61 for PCS and 0.68 for MCS. The principal component analysis indicated strong construct validity and concurred with the results of the validity tests by the SF-8 developers. The SF-8 also showed strong construct validity between the 8 items and PCS and MCS summary score, moderate inter-instrument validity, and strong known groups validity. This study provides evidence on the reliability and validity of the SF-8 amongst IDPs in northern Uganda.
The reliability and validity of the SF-8 with a conflict-affected population in northern Uganda

PubMed Central

Roberts, Bayard; Browne, John; Ocaka, Kaducu Felix; Oyok, Thomas; Sondorp, Egbert

2008-01-01

Background The SF-8 is a health-related quality of life instrument that could provide a useful means of assessing general physical and mental health amongst populations affected by conflict. The purpose of this study was to test the validity and reliability of the SF-8 with a conflict-affected population in northern Uganda. Methods A cross-sectional multi-staged, random cluster survey was conducted with 1206 adults in camps for internally displaced persons in Gulu and Amuru districts of northern Uganda. Data quality was assessed by analysing the number of incomplete responses to SF-8 items. Response distribution was analysed using aggregate endorsement frequency. Test-retest reliability was assessed in a separate smaller survey using the intraclass correlation test. Construct validity was measured using principal component analysis, and the Pearson Correlation test for item-summary score correlation and inter-instrument correlations. Known groups validity was assessed using a two sample t-test to evaluates the ability of the SF-8 to discriminate between groups known to have, and not have, physical and mental health problems. Results The SF-8 showed excellent data quality. It showed acceptable item response distribution based upon analysis of aggregate endorsement frequencies. Test-retest showed a good intraclass correlation of 0.61 for PCS and 0.68 for MCS. The principal component analysis indicated strong construct validity and concurred with the results of the validity tests by the SF-8 developers. The SF-8 also showed strong construct validity between the 8 items and PCS and MCS summary score, moderate inter-instrument validity, and strong known groups validity. Conclusion This study provides evidence on the reliability and validity of the SF-8 amongst IDPs in northern Uganda. PMID:19055716
Face Validity of Test and Acceptance of Generalized Personality Interpretations

ERIC Educational Resources Information Center

Delprato, Dennis J.

1975-01-01

The degree to which variations in the face validity of psychological tests affected students' willingness to accept personality interpretations was studied. Acceptance of personality interpretations was compared for four types of tests which varied in face validity. The relationship between judged accuracy and rated likability of the…
Clarifying the Consensus Definition of Validity

ERIC Educational Resources Information Center

Newton, Paul E.

2012-01-01

The 1999 "Standards for Educational and Psychological Testing" defines validity as the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. Although quite explicit, there are ways in which this definition lacks precision, consistency, and clarity. The history of validity has taught us…
The Validity of IQ Scores Derived from Readiness Screening Tests

ERIC Educational Resources Information Center

Telegdy, Gabriel A.

1976-01-01

The Screening Test of Academic Readiness (STAR) and the Peabody Picture Vocabulary Test (PPVT) were administered to 52 kindergarten children to reveal the convergent validity of IQ scores derived from the STAR. The findings raise doubts about the validity of the deviation IQs derived from the STAR. (Author)
Designing the Nuclear Energy Attitude Scale.

ERIC Educational Resources Information Center

Calhoun, Lawrence; And Others

1988-01-01

Presents a refined method for designing a valid and reliable Likert-type scale to test attitudes toward the generation of electricity from nuclear energy. Discusses various tests of validity that were used on the nuclear energy scale. Reports results of administration and concludes that the test is both reliable and valid. (CW)
Construct Validation of the Fairy Tale Test--Standardization Data.

ERIC Educational Resources Information Center

Coulacoglou, Carina

2002-01-01

Studied the construct validity of the Fairy Tale Test (C. Coulacoglu, 1993), a personality projective test for children, in a sample of 800 Greek children aged 8, 10, and 12. Factor analysis led to identification of eight primary factors, and correlations with other measures provide construct validity evidence. (SLD)
Evaluating Test Validity: Reprise and Progress

ERIC Educational Resources Information Center

Shepard, Lorrie A.

2016-01-01

The AERA, APA, NCME Standards define validity as "the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests". A century of disagreement about validity does not mean that there has not been substantial progress. This consensus definition brings together interpretations and use so that it…

Initial Teacher Licensure Testing in Tennessee: Test Validation.

ERIC Educational Resources Information Center

Bowman, Harry L.; Petry, John R.

In 1988 a study was conducted to determine the validity of candidate teacher licensure examinations for use in Tennessee under the 1984 Comprehensive Education Reform Act. The Department of Education conducted a study to determine the validity of 11 previously unvalidated or extensively revised tests for certification and to make recommendations…
Determination of the criterion-related validity of hip joint angle test for estimating hamstring flexibility using a contemporary statistical approach.

PubMed

Sainz de Baranda, Pilar; Rodríguez-Iniesta, María; Ayala, Francisco; Santonja, Fernando; Cejudo, Antonio

2014-07-01

To examine the criterion-related validity of the horizontal hip joint angle (H-HJA) test and vertical hip joint angle (V-HJA) test for estimating hamstring flexibility measured through the passive straight-leg raise (PSLR) test using contemporary statistical measures. Validity study. Controlled laboratory environment. One hundred thirty-eight professional trampoline gymnasts (61 women and 77 men). Hamstring flexibility. Each participant performed 2 trials of H-HJA, V-HJA, and PSLR tests in a randomized order. The criterion-related validity of H-HJA and V-HJA tests was measured through the estimation equation, typical error of the estimate (TEEST), validity correlation (β), and their respective confidence limits. The findings from this study suggest that although H-HJA and V-HJA tests showed moderate to high validity scores for estimating hamstring flexibility (standardized TEEST = 0.63; β = 0.80), the TEEST statistic reported for both tests was not narrow enough for clinical purposes (H-HJA = 10.3 degrees; V-HJA = 9.5 degrees). Subsequently, the predicted likely thresholds for the true values that were generated were too wide (H-HJA = predicted value ± 13.2 degrees; V-HJA = predicted value ± 12.2 degrees). The results suggest that although the HJA test showed moderate to high validity scores for estimating hamstring flexibility, the prediction intervals between the HJA and PSLR tests are not strong enough to suggest that clinicians and sport medicine practitioners should use the HJA and PSLR tests interchangeably as gold standard measurement tools to evaluate and detect short hamstring muscle flexibility.
Determining the Appropriateness of the "What If" Situations Test (WIST) with Turkish Pre-Schoolers.

PubMed

Citak Tunc, Gulseren; Gorak, Gulay; Ozyazicioglu, Nurcan; Ak, Bedriye; Isil, Ozlem; Vural, Pinar

2018-04-01

Measurement instruments are needed to assess the child's sexual abuse prevention program. The purpose of the study was to determine the reliability and validity of the WIST (What If Situations Test) for Turkish culture. Participants were children of the 3-6 age group attending pre-school education institutions and the sample size was identified by means of a power analysis. Seventy children were identified as the sample with 0.85 power and 0.05 type I error according to the power analysis. Language validity, content validity, internal validity coefficient (Cronbach alpha coefficient), and test-retest analyses were conducted in terms of validity and reliability in the scope of efforts for adaptation to Turkish culture. Firstly, Kendall W = 0.83 was the score for the expert opinions concerning the content validity of the language validity scale. It was found that the Cronbach alpha coefficients were between 0.68 and 0.90 for the scale sub-dimensions of appropriate and inappropriate recognition, saying, doing, telling, and reporting. The test-retest reliability of the scale was found to be r = 0.89 and the test-retest reliabilities for the sub-dimensions (appropriate recognition, inappropriate recognition, say skills, do skills, tell skills, and reporting skills) were between r = 0.48 and r = 0.92. The test-retest reliability for the Personal Safety Questionnaire (PSQ), as having complimentary items to the WIST, was found to be r = 0.82. The reliability and validity analysis of the 'What If' Situations Test (WIST), used to evaluate pre-schoolers' skills regarding self-protection against sexual abuse, showed that the Test's adaptation to Turkish culture was reliable and valid.
The Chinese version of the Outcome Expectations for Exercise scale: validation study.

PubMed

Lee, Ling-Ling; Chiu, Yu-Yun; Ho, Chin-Chih; Wu, Shu-Chen; Watson, Roger

2011-06-01

Estimates of the reliability and validity of the English nine-item Outcome Expectations for Exercise (OEE) scale have been tested and found to be valid for use in various settings, particularly among older people, with good internal consistency and validity. Data on the use of the OEE scale among older Chinese people living in the community and how cultural differences might affect the administration of the OEE scale are limited. To test the validity and reliability of the Chinese version of the Outcome Expectations for Exercise scale among older people. A cross-sectional validation study was designed to test the Chinese version of the OEE scale (OEE-C). Reliability was examined by testing both the internal consistency for the overall scale and the squared multiple correlation coefficient for the single item measure. The validity of the scale was tested on the basis of both a traditional psychometric test and a confirmatory factor analysis using structural equation modelling. The Mokken Scaling Procedure (MSP) was used to investigate if there were any hierarchical, cumulative sets of items in the measure. The OEE-C scale was tested in a group of older people in Taiwan (n=108, mean age=77.1). There was acceptable internal consistency (alpha=.85) and model fit in the scale. Evidence of the validity of the measure was demonstrated by the tests for criterion-related validity and construct validity. There was a statistically significant correlation between exercise outcome expectations and exercise self-efficacy (r=.34, p<.01). An analysis of the Mokken Scaling Procedure found that nine items of the scale were all retained in the analysis and the resulting scale was reliable and statistically significant (p=.0008). The results obtained in the present study provided acceptable levels of reliability and validity evidence for the Chinese Outcome Expectations for Exercise scale when used with older people in Taiwan. Future testing of the OEE-C scale needs to be carried out to see whether these results are generalisable to older Chinese people living in urban areas. Copyright © 2010 Elsevier Ltd. All rights reserved.
Diagnostic validity of physical examination tests for common knee disorders: An overview of systematic reviews and meta-analysis.

PubMed

Décary, Simon; Ouellet, Philippe; Vendittoli, Pascal-André; Roy, Jean-Sébastien; Desmeules, François

2017-01-01

More evidence on diagnostic validity of physical examination tests for knee disorders is needed to lower frequently used and costly imaging tests. To conduct a systematic review of systematic reviews (SR) and meta-analyses (MA) evaluating the diagnostic validity of physical examination tests for knee disorders. A structured literature search was conducted in five databases until January 2016. Methodological quality was assessed using the AMSTAR. Seventeen reviews were included with mean AMSTAR score of 5.5 ± 2.3. Based on six SR, only the Lachman test for ACL injuries is diagnostically valid when individually performed (Likelihood ratio (LR+):10.2, LR-:0.2). Based on two SR, the Ottawa Knee Rule is a valid screening tool for knee fractures (LR-:0.05). Based on one SR, the EULAR criteria had a post-test probability of 99% for the diagnosis of knee osteoarthritis. Based on two SR, a complete physical examination performed by a trained health provider was found to be diagnostically valid for ACL, PCL and meniscal injuries as well as for cartilage lesions. When individually performed, common physical tests are rarely able to rule in or rule out a specific knee disorder, except the Lachman for ACL injuries. There is low-quality evidence concerning the validity of combining history elements and physical tests. Copyright © 2016 Elsevier Ltd. All rights reserved.
Validation of novel recipes for double-blind, placebo-controlled food challenges in children and adults.

PubMed

Vlieg-Boerstra, B J; Herpertz, I; Pasker, L; van der Heide, S; Kukler, J; Jansink, C; Vaessen, W; Beusekamp, B J; Dubois, A E J

2011-07-01

In double-blind, placebo-controlled food challenges (DBPCFCs), the use of challenge materials in which blinding is validated is a prerequisite for obtaining true blinded conditions during the test procedure. Therefore, the aim of this study was to enlarge the available range of validated recipes for DBPCFCs to facilitate oral challenge tests in all age groups, including young children, while maximizing the top dose in an acceptable volume. Recipes were developed and subsequently validated by a panel recruited by a matching sensory test. The best 30% of candidates were selected to participate in sensory testing using the paired comparison test. For young children, three recipes with cow's milk and one recipe with peanut could be validated which may be utilized in DBPCFCs. For children older than 4 years and adults, one recipe with egg, two with peanut, one with hazelnut, and one with cashew nut were validated for use in DBPCFCs. All recipes contained larger amounts of allergenic foods than previously validated. These recipes increase the range of validated recipes for use in DBPCFCs in adults and children. © 2011 John Wiley & Sons A/S.
Independent validation of the MMPI-2-RF Somatic/Cognitive and Validity scales in TBI Litigants tested for effort.

PubMed

Youngjohn, James R; Wershba, Rebecca; Stevenson, Matthew; Sturgeon, John; Thomas, Michael L

2011-04-01

The MMPI-2 Restructured Form (MMPI-2-RF; Ben-Porath & Tellegen, 2008) is replacing the MMPI-2 as the most widely used personality test in neuropsychological assessment, but additional validation studies are needed. Our study examines MMPI-2-RF Validity scales and the newly created Somatic/Cognitive scales in a recently reported sample of 82 traumatic brain injury (TBI) litigants who either passed or failed effort tests (Thomas & Youngjohn, 2009). The restructured Validity scales FBS-r (restructured symptom validity), F-r (restructured infrequent responses), and the newly created Fs (infrequent somatic responses) were not significant predictors of TBI severity. FBS-r was significantly related to passing or failing effort tests, and Fs and F-r showed non-significant trends in the same direction. Elevations on the Somatic/Cognitive scales profile (MLS-malaise, GIC-gastrointestinal complaints, HPC-head pain complaints, NUC-neurological complaints, and COG-cognitive complaints) were significant predictors of effort test failure. Additionally, HPC had the anticipated paradoxical inverse relationship with head injury severity. The Somatic/Cognitive scales as a group were better predictors of effort test failure than the RF Validity scales, which was an unexpected finding. MLS arose as the single best predictor of effort test failure of all RF Validity and Somatic/Cognitive scales. Item overlap analysis revealed that all MLS items are included in the original MMPI-2 Hy scale, making MLS essentially a subscale of Hy. This study validates the MMPI-2-RF as an effective tool for use in neuropsychological assessment of TBI litigants.
Measuring social alienation in adolescence: translation and validation of the Jessor and Jessor Social Alienation Scale.

PubMed

Safipour, Jalal; Tessma, Mesfin Kassaye; Higginbottom, Gina; Emami, Azita

2010-12-01

The objective of the study is to translate and examine the reliability and validity of the Jessor and Jessor Social Alienation Scale for use in a Swedish context. The study involved four phases of testing: (1) Translation and back-translation; (2) a pilot test to evaluate the translation; (3) reliability testing; and (4) a validity test. Main participants of this study were 446 students (Age = 15-19, SD = 1.01, Mean = 17). Results from the reliability test showed high internal consistency and stability. Face, content and construct validity were demonstrated using experts and confirmatory factor analysis. The results of testing the Swedish version of the alienation scale revealed an acceptable level of reliability and validity, and is appropriate for use in the Swedish context. © 2010 The Authors. Scandinavian Journal of Psychology © 2010 The Scandinavian Psychological Associations.
Validation of science virtual test to assess 8th grade students' critical thinking on living things and environmental sustainability theme

NASA Astrophysics Data System (ADS)

Rusyati, Lilit; Firman, Harry

2017-05-01

This research was motivated by the importance of multiple-choice questions that indicate the elements and sub-elements of critical thinking and implementation of computer-based test. The method used in this research was descriptive research for profiling the validation of science virtual test to measure students' critical thinking in junior high school. The participant is junior high school students of 8th grade (14 years old) while science teacher and expert as the validators. The instrument that used as a tool to capture the necessary data are sheet of an expert judgment, sheet of legibility test, and science virtual test package in multiple choice form with four possible answers. There are four steps to validate science virtual test to measure students' critical thinking on the theme of "Living Things and Environmental Sustainability" in 7th grade Junior High School. These steps are analysis of core competence and basic competence based on curriculum 2013, expert judgment, legibility test and trial test (limited and large trial test). The test item criterion based on trial test are accepted, accepted but need revision, and rejected. The reliability of the test is α = 0.747 that categorized as `high'. It means the test instruments used is reliable and high consistency. The validity of Rxy = 0.63 means that the validity of the instrument was categorized as `high' according to interpretation value of Rxy (correlation).
Implementation and Initial Validation of the APS English Test [and] The APS English-Writing Test at Golden West College: Evidence for Predictive Validity.

ERIC Educational Resources Information Center

Isonio, Steven

In May 1991, Golden West College (California) conducted a validation study of the English portion of the Assessment and Placement Services for Community Colleges (APS), followed by a predictive validity study in July 1991. The initial study was designed to aid in the implementation of the new test at GWC by comparing data on APS use at other…
Validity of the Timed Up and Go Test as a Measure of Functional Mobility in Persons With Multiple Sclerosis.

PubMed

Sebastião, Emerson; Sandroff, Brian M; Learmonth, Yvonne C; Motl, Robert W

2016-07-01

To examine the validity of the timed Up and Go (TUG) test as a measure of functional mobility in persons with multiple sclerosis (MS) by using a comprehensive framework based on construct validity (ie, convergent and divergent validity). Cross-sectional study. Hospital setting. Community-residing persons with MS (N=47). Not applicable. Main outcome measures included the TUG test, timed 25-foot walk test, 6-minute walk test, Multiple Sclerosis Walking Scale-12, Late-Life Function and Disability Instrument, posturography evaluation, Activities-specific Balance Confidence scale, Symbol Digits Modalities Test, Expanded Disability Status Scale, and the number of steps taken per day. The TUG test was strongly associated with other valid outcome measures of ambulatory mobility (Spearman rank correlation, rs=.71-.90) and disability status (rs=.80), moderately to strongly associated with balance confidence (rs=.66), and weakly associated with postural control (ie, balance) (rs=.31). The TUG test was moderately associated with cognitive processing speed (rs=.59), but not associated with other nonambulatory measures (ie, Late-Life Function and Disability Instrument-upper extremity function). Our findings support the validity of the TUG test as a measure of functional mobility. This warrants its inclusion in patients' assessment alongside other valid measures of functional mobility in both clinical and research practice in persons with MS. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Validity Semantics in Educational and Psychological Assessment

ERIC Educational Resources Information Center

Hathcoat, John D.

2013-01-01

The semantics, or meaning, of validity is a fluid concept in educational and psychological testing. Contemporary controversies surrounding this concept appear to stem from the proper location of validity. Under one view, validity is a property of score-based inferences and entailed uses of test scores. This view is challenged by the…
Validity of the modified back-saver sit-and-reach test: a comparison with other protocols.

PubMed

Hui, S S; Yuen, P Y

2000-09-01

Studies have shown that the classical sit-and-reach (CSR) test, the modified sit-and-reach (MSR), and the newly developed back-saver sit-and-reach (BS) test have poor criterion-related validity in estimating low-back flexibility but yielded moderate criterion-related validity in hamstring flexibility. The V sit-and-reach (VSR) test was found to be practical but the validity has not been established. The purpose of this study was to propose a modified back-saver sit-and-reach (MBS) test, which incorporated all advantages of the various protocols, and to compare the criterion-related validity and reliability of all these tests. 158 college students (F = 96, and M = 62; age = 20.77 +/- 2.51) performed CSR, VSR, BS (left and right leg), and MBS (left and right leg) tests in a randomized order. Scores from each test were then correlated with the criterion measures. For all sit-reach tests, intraclass reliability (single trial) was very high (r = 0.89-0.98). MBS yielded significant and highest r with low-back and hamstring criterion for men (r = 0.47-0.67) and women (r = 0.23-0.54). The low-back and right hamstring validity of MBS for men were significantly (P < 0.01) higher than those from BS and CSR, whereas no differences in criterion-related validity were found between the MBS and other protocols in women. The ratings of perceived comfort among the sit-and-reach protocols were significantly different (P < 0.001) from each other. The rating for MBS was observed the most comfortable test as compared with other protocols. The MBS test is not only a reliable test for hamstring and low-back flexibility, it is also a more practical with improved validity for hamstring and low-back flexibility in men than previous protocols.
Validity and Reliability of Published Comprehensive Theory of Mind Tests for Normal Preschool Children: A Systematic Review.

PubMed

Ziatabar Ahmadi, Seyyede Zohreh; Jalaie, Shohreh; Ashayeri, Hassan

2015-09-01

Theory of mind (ToM) or mindreading is an aspect of social cognition that evaluates mental states and beliefs of oneself and others. Validity and reliability are very important criteria when evaluating standard tests; and without them, these tests are not usable. The aim of this study was to systematically review the validity and reliability of published English comprehensive ToM tests developed for normal preschool children. We searched MEDLINE (PubMed interface), Web of Science, Science direct, PsycINFO, and also evidence base Medicine (The Cochrane Library) databases from 1990 to June 2015. Search strategy was Latin transcription of 'Theory of Mind' AND test AND children. Also, we manually studied the reference lists of all final searched articles and carried out a search of their references. Inclusion criteria were as follows: Valid and reliable diagnostic ToM tests published from 1990 to June 2015 for normal preschool children; and exclusion criteria were as follows: the studies that only used ToM tests and single tasks (false belief tasks) for ToM assessment and/or had no description about structure, validity or reliability of their tests. METHODological quality of the selected articles was assessed using the Critical Appraisal Skills Programme (CASP). In primary searching, we found 1237 articles in total databases. After removing duplicates and applying all inclusion and exclusion criteria, we selected 11 tests for this systematic review. There were a few valid, reliable and comprehensive ToM tests for normal preschool children. However, we had limitations concerning the included articles. The defined ToM tests were different in populations, tasks, mode of presentations, scoring, mode of responses, times and other variables. Also, they had various validities and reliabilities. Therefore, it is recommended that the researchers and clinicians select the ToM tests according to their psychometric characteristics, validity and reliability.
Validity and Reliability of Published Comprehensive Theory of Mind Tests for Normal Preschool Children: A Systematic Review

PubMed Central

Ziatabar Ahmadi, Seyyede Zohreh; Jalaie, Shohreh; Ashayeri, Hassan

2015-01-01

Objective: Theory of mind (ToM) or mindreading is an aspect of social cognition that evaluates mental states and beliefs of oneself and others. Validity and reliability are very important criteria when evaluating standard tests; and without them, these tests are not usable. The aim of this study was to systematically review the validity and reliability of published English comprehensive ToM tests developed for normal preschool children. Method: We searched MEDLINE (PubMed interface), Web of Science, Science direct, PsycINFO, and also evidence base Medicine (The Cochrane Library) databases from 1990 to June 2015. Search strategy was Latin transcription of ‘Theory of Mind’ AND test AND children. Also, we manually studied the reference lists of all final searched articles and carried out a search of their references. Inclusion criteria were as follows: Valid and reliable diagnostic ToM tests published from 1990 to June 2015 for normal preschool children; and exclusion criteria were as follows: the studies that only used ToM tests and single tasks (false belief tasks) for ToM assessment and/or had no description about structure, validity or reliability of their tests. Methodological quality of the selected articles was assessed using the Critical Appraisal Skills Programme (CASP). Result: In primary searching, we found 1237 articles in total databases. After removing duplicates and applying all inclusion and exclusion criteria, we selected 11 tests for this systematic review. Conclusion: There were a few valid, reliable and comprehensive ToM tests for normal preschool children. However, we had limitations concerning the included articles. The defined ToM tests were different in populations, tasks, mode of presentations, scoring, mode of responses, times and other variables. Also, they had various validities and reliabilities. Therefore, it is recommended that the researchers and clinicians select the ToM tests according to their psychometric characteristics, validity and reliability. PMID:27006666
Implementation of the validation testing in MPPG 5.a "Commissioning and QA of treatment planning dose calculations-megavoltage photon and electron beams".

PubMed

Jacqmin, Dustin J; Bredfeldt, Jeremy S; Frigo, Sean P; Smilowitz, Jennifer B

2017-01-01

The AAPM Medical Physics Practice Guideline (MPPG) 5.a provides concise guidance on the commissioning and QA of beam modeling and dose calculation in radiotherapy treatment planning systems. This work discusses the implementation of the validation testing recommended in MPPG 5.a at two institutions. The two institutions worked collaboratively to create a common set of treatment fields and analysis tools to deliver and analyze the validation tests. This included the development of a novel, open-source software tool to compare scanning water tank measurements to 3D DICOM-RT Dose distributions. Dose calculation algorithms in both Pinnacle and Eclipse were tested with MPPG 5.a to validate the modeling of Varian TrueBeam linear accelerators. The validation process resulted in more than 200 water tank scans and more than 50 point measurements per institution, each of which was compared to a dose calculation from the institution's treatment planning system (TPS). Overall, the validation testing recommended in MPPG 5.a took approximately 79 person-hours for a machine with four photon and five electron energies for a single TPS. Of the 79 person-hours, 26 person-hours required time on the machine, and the remainder involved preparation and analysis. The basic photon, electron, and heterogeneity correction tests were evaluated with the tolerances in MPPG 5.a, and the tolerances were met for all tests. The MPPG 5.a evaluation criteria were used to assess the small field and IMRT/VMAT validation tests. Both institutions found the use of MPPG 5.a to be a valuable resource during the commissioning process. The validation testing in MPPG 5.a showed the strengths and limitations of the TPS models. In addition, the data collected during the validation testing is useful for routine QA of the TPS, validation of software upgrades, and commissioning of new algorithms. © 2016 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
Validation of Survivability Validation Protocols

DTIC Science & Technology

1993-05-01

simu- lation fidelityl. Physical testing of P.i SOS, in either aboveground tests (AGTs) or underground test ( UGTs ), will usually be impossible, due...with some simulation fidelity compromises) are possible in UGTs and/orAGTs. Hence proof tests, if done in statistically significant numbers, can...level. Simulation fidelity and AGT/ UGT /threat correlation will be validation issues here. Extrapolation to threat environments will be done via modeling
Reliability and validity of the closed kinetic chain upper extremity stability test.

PubMed

Lee, Dong-Rour; Kim, Laurentius Jongsoon

2015-04-01

[Purpose] The purpose of this study was to examine the reliability and validity of the Closed Kinetic Chain Upper Extremity Stability (CKCUES) test. [Subjects and Methods] A sample of 40 subjects (20 males, 20 females) with and without pain in the upper limbs was recruited. The subjects were tested twice, three days apart to assess the reliability of the CKCUES test. The CKCUES test was performed four times, and the average was calculated using the data of the last 3 tests. In order to test the validity of the CKCUES test, peak torque of internal/external shoulder rotation was measured using an isokinetic dynamometer, and maximum grip strength was measured using a hand dynamometer, and their Pearson correlation coefficients with the average values of the CKCUES test were calculated. [Results] The reliability of the CKCUES test was very high (ICC=0.97). The correlations between the CKCUES test and maximum grip strength (r=0.78-0.79), and the peak torque of internal/external shoulder rotation (r=0.87-0.94) were high indicating its validity. [Conclusion] The reliability and validity of the CKCUES test were high. The CKCUES test is expected to be used for clinical tests on upper limb stability at low price.
Constructing a question bank based on script concordance approach as a novel assessment methodology in surgical education.

PubMed

Aldekhayel, Salah A; Alselaim, Nahar A; Magzoub, Mohi Eldin; Al-Qattan, Mohammad M; Al-Namlah, Abdullah M; Tamim, Hani; Al-Khayal, Abdullah; Al-Habdan, Sultan I; Zamakhshary, Mohammed F

2012-10-24

Script Concordance Test (SCT) is a new assessment tool that reliably assesses clinical reasoning skills. Previous descriptions of developing SCT-question banks were merely subjective. This study addresses two gaps in the literature: 1) conducting the first phase of a multistep validation process of SCT in Plastic Surgery, and 2) providing an objective methodology to construct a question bank based on SCT. After developing a test blueprint, 52 test items were written. Five validation questions were developed and a validation survey was established online. Seven reviewers were asked to answer this survey. They were recruited from two countries, Saudi Arabia and Canada, to improve the test's external validity. Their ratings were transformed into percentages. Analysis was performed to compare reviewers' ratings by looking at correlations, ranges, means, medians, and overall scores. Scores of reviewers' ratings were between 76% and 95% (mean 86% ± 5). We found poor correlations between reviewers (Pearson's: +0.38 to -0.22). Ratings of individual validation questions ranged between 0 and 4 (on a scale 1-5). Means and medians of these ranges were computed for each test item (mean: 0.8 to 2.4; median: 1 to 3). A subset of test items comprising 27 items was generated based on a set of inclusion and exclusion criteria. This study proposes an objective methodology for validation of SCT-question bank. Analysis of validation survey is done from all angles, i.e., reviewers, validation questions, and test items. Finally, a subset of test items is generated based on a set of criteria.
Process Skill Assessment Instrument: Innovation to measure student’s learning result holistically

NASA Astrophysics Data System (ADS)

Azizah, K. N.; Ibrahim, M.; Widodo, W.

2018-01-01

Science process skills (SPS) are very important skills for students. However, the fact that SPS is not being main concern in the primary school learning is undeniable. This research aimed to develop a valid, practical, and effective assessment instrument to measure student’s SPS. Assessment instruments comprise of worksheet and test. This development research used one group pre-test post-test design. Data were obtained with validation, observation, and test method to investigate validity, practicality, and the effectivenss of the instruments. Results showed that the validity of assessment instruments is very valid, the reliability is categorized as reliable, student SPS activities have a high percentage, and there is significant improvement on student’s SPS score. It can be concluded that assessment instruments of SPS are valid, practical, and effective to be used to measure student’s SPS result.

Solar Sail Models and Test Measurements Correspondence for Validation Requirements Definition

NASA Technical Reports Server (NTRS)

Ewing, Anthony; Adams, Charles

2004-01-01

Solar sails are being developed as a mission-enabling technology in support of future NASA science missions. Current efforts have advanced solar sail technology sufficient to justify a flight validation program. A primary objective of this activity is to test and validate solar sail models that are currently under development so that they may be used with confidence in future science mission development (e.g., scalable to larger sails). Both system and model validation requirements must be defined early in the program to guide design cycles and to ensure that relevant and sufficient test data will be obtained to conduct model validation to the level required. A process of model identification, model input/output documentation, model sensitivity analyses, and test measurement correspondence is required so that decisions can be made to satisfy validation requirements within program constraints.
Characterizing the GOES-R (GOES-16) Geostationary Lightning Mapper (GLM) On-Orbit Performance

NASA Technical Reports Server (NTRS)

Rudlosky, Scott D.; Goodman, Steven J.; Koshak, William J.; Blakeslee, Richard J.; Buechler, Dennis E.; Mach, Douglas M.; Bateman, Monte

2017-01-01

Two overlapping efforts help to characterize the GLM performance, the Post Launch Test (PLT) phase to validate the predicted pre-launch instrument performance and the Post Launch Product Test (PLPT) phase to validate the lightning detection product used in forecast and warning decision-making. This paper documents the calibration and validation plans and activities for the first 6 months of GLM on-orbit testing and validation commencing with first light on 4 January 2017. The PLT phase addresses image quality, on-orbit calibration, RTEP threshold tuning, image navigation, noise filtering, and solar intrusion assessment, resulting in a GLM calibration parameter file. The PLPT includes four main activities, the Reference Data Comparisons (RDC), Algorithm Testing (AT), Instrument Navigation and Registration Testing (INRT), and Long Term Baseline Testing (LTBT). Field campaigns are also designed to contribute valuable insights into the GLM performance capabilities. The PLPT tests each contribute to the beta, provisional, and fully validated GLM data.
Content validity and reliability of test of gross motor development in Chilean children

PubMed Central

Cano-Cappellacci, Marcelo; Leyton, Fernanda Aleitte; Carreño, Joshua Durán

2016-01-01

ABSTRACT OBJECTIVE To validate a Spanish version of the Test of Gross Motor Development (TGMD-2) for the Chilean population. METHODS Descriptive, transversal, non-experimental validity and reliability study. Four translators, three experts and 92 Chilean children, from five to 10 years, students from a primary school in Santiago, Chile, have participated. The Committee of Experts has carried out translation, back-translation and revision processes to determine the translinguistic equivalence and content validity of the test, using the content validity index in 2013. In addition, a pilot implementation was achieved to determine test reliability in Spanish, by using the intraclass correlation coefficient and Bland-Altman method. We evaluated whether the results presented significant differences by replacing the bat with a racket, using T-test. RESULTS We obtained a content validity index higher than 0.80 for language clarity and relevance of the TGMD-2 for children. There were significant differences in the object control subtest when comparing the results with bat and racket. The intraclass correlation coefficient for reliability inter-rater, intra-rater and test-retest reliability was greater than 0.80 in all cases. CONCLUSIONS The TGMD-2 has appropriate content validity to be applied in the Chilean population. The reliability of this test is within the appropriate parameters and its use could be recommended in this population after the establishment of normative data, setting a further precedent for the validation in other Latin American countries. PMID:26815160
Comment on Hall et al. (2017), "How to Choose Between Measures of Tinnitus Loudness for Clinical Research? A Report on the Reliability and Validity of an Investigator-Administered Test and a Patient-Reported Measure Using Baseline Data Collected in a Phase IIa Drug Trial".

PubMed

Sabour, Siamak

2018-03-08

The purpose of this letter, in response to Hall, Mehta, and Fackrell (2017), is to provide important knowledge about methodology and statistical issues in assessing the reliability and validity of an audiologist-administered tinnitus loudness matching test and a patient-reported tinnitus loudness rating. The author uses reference textbooks and published articles regarding scientific assessment of the validity and reliability of a clinical test to discuss the statistical test and the methodological approach in assessing validity and reliability in clinical research. Depending on the type of the variable (qualitative or quantitative), well-known statistical tests can be applied to assess reliability and validity. The qualitative variables of sensitivity, specificity, positive predictive value, negative predictive value, false positive and false negative rates, likelihood ratio positive and likelihood ratio negative, as well as odds ratio (i.e., ratio of true to false results), are the most appropriate estimates to evaluate validity of a test compared to a gold standard. In the case of quantitative variables, depending on distribution of the variable, Pearson r or Spearman rho can be applied. Diagnostic accuracy (validity) and diagnostic precision (reliability or agreement) are two completely different methodological issues. Depending on the type of the variable (qualitative or quantitative), well-known statistical tests can be applied to assess validity.
Aerocapture, Entry, Descent and Landing (AEDL) Human Planetary Landing Systems. Section 10: AEDL Analysis, Test and Validation Infrastructure

NASA Technical Reports Server (NTRS)

Arnold, J.; Cheatwood, N.; Powell, D.; Wolf, A.; Guensey, C.; Rivellini, T.; Venkatapathy, E.; Beard, T.; Beutter, B.; Laub, B.

2005-01-01

Contents include the following: 3 Listing of critical capabilities (knowledge, procedures, training, facilities) and metrics for validating that they are mission ready. Examples of critical capabilities and validation metrics: ground test and simulations. Flight testing to prove capabilities are mission ready. Issues and recommendations.
Reliability and Validity of Information about Student Achievement: Comparing Large-Scale and Classroom Testing Contexts

ERIC Educational Resources Information Center

Cizek, Gregory J.

2009-01-01

Reliability and validity are two characteristics that must be considered whenever information about student achievement is collected. However, those characteristics--and the methods for evaluating them--differ in large-scale testing and classroom testing contexts. This article presents the distinctions between reliability and validity in the two…
The adolescent child health and illness profile. A population-based measure of health.

PubMed

Starfield, B; Riley, A W; Green, B F; Ensminger, M E; Ryan, S A; Kelleher, K; Kim-Harris, S; Johnston, D; Vogel, K

1995-05-01

This study was designed to test the reliability and validity of an instrument to assess adolescent health status. Reliability and validity were examined by administration to adolescents (ages 11-17 years) in eight schools in two urban areas, one area in Appalachia, and one area in the rural South. Integrity of the domains and subdomains and construct validity were tested in all areas. Test/retest stability, criterion validity, and convergent and discriminant validity were tested in the two urban areas. Iterative testing has resulted in the final form of the CHIP-AE (Child Health and Illness Profile-Adolescent Edition) having 6 domains with 20 subdomains. The domains are Discomfort, Disorders, Satisfaction with Health, Achievement (of age-appropriate social roles), Risks, and Resilience. Tested aspects of reliability and validity have achieved acceptable levels for all retained subdomains. The CHIP-AE in its current form is suitable for assessing the health status of populations and subpopulations of adolescents. Evidence from test-retest stability analyses suggests that the CHIP-AE also can be used to assess changes occurring over time or in response to health services interventions targeted at groups of adolescents.
Item validity vs. item discrimination index: a redundancy?

NASA Astrophysics Data System (ADS)

Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.

2018-03-01

In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.
Construct validity of the Free and Cued Selective Reminding Test in older adults with memory complaints.

PubMed

Clerici, Francesca; Ghiretti, Roberta; Di Pucchio, Alessandra; Pomati, Simone; Cucumo, Valentina; Marcone, Alessandra; Vanacore, Nicola; Mariani, Claudio; Cappa, Stefano Francesco

2017-06-01

The Free and Cued Selective Reminding Test (FCSRT) is the memory test recommended by the International Working Group on Alzheimer's disease (AD) for the detection of amnestic syndrome of the medial temporal type in prodromal AD. Assessing the construct validity and internal consistency of the Italian version of the FCSRT is thus crucial. The FCSRT was administered to 338 community-dwelling participants with memory complaints (57% females, age 74.5 ± 7.7 years), including 34 with AD, 203 with Mild Cognitive Impairment, and 101 with Subjective Memory Impairment. Internal Consistency was estimated using Cronbach's alpha coefficient. To assess convergent validity, five FCSRT scores (Immediate Free Recall, Immediate Total Recall, Delayed Free Recall, Delayed Total Recall, and Index of Sensitivity of Cueing) were correlated with three well-validated memory tests: Story Recall, Rey Auditory Verbal Learning test, and Rey Complex Figure (RCF) recall (partial correlation analysis). To assess divergent validity, a principal component analysis (an exploratory factor analysis) was performed including, in addition to the above-mentioned memory tasks, the following tests: Word Fluencies, RCF copy, Clock Drawing Test, Trail Making Test, Frontal Assessment Battery, Raven Coloured Progressive Matrices, and Stroop Colour-Word Test. Cronbach's alpha coefficients for immediate recalls (IFR and ITR) and delayed recalls (DFR and DTR) were, respectively, .84 and .81. All FCSRT scores were highly correlated with those of the three well-validated memory tests. The factor analysis showed that the FCSRT does not load on the factors saturated by non-memory tests. These findings indicate that the FCSRT has a good internal consistency and has an excellent construct validity as an episodic memory measure. © 2015 The British Psychological Society.
The Air Force Officer Qualifying Test: Validity, Fairness, and Bias

DTIC Science & Technology

2010-01-01

scores. The Standards for Educational and Psychological Testing (AERA, APA, and NCME, 1999) provides a set of guidelines published and endorsed by the...determining the validity and bias of selection tests falls upon professionals in the discipline of industrial/organizational psychology 20 See Roper v. Dep’t...i). 30 The Air Force Officer Qualifying Test : Validity, Fairness, and Bias and closely related fields (e.g., educational psychology and
Effort, symptom validity testing, performance validity testing and traumatic brain injury.

PubMed

Bigler, Erin D

2014-01-01

To understand the neurocognitive effects of brain injury, valid neuropsychological test findings are paramount. This review examines the research on what has been referred to a symptom validity testing (SVT). Above a designated cut-score signifies a 'passing' SVT performance which is likely the best indicator of valid neuropsychological test findings. Likewise, substantially below cut-point performance that nears chance or is at chance signifies invalid test performance. Significantly below chance is the sine qua non neuropsychological indicator for malingering. However, the interpretative problems with SVT performance below the cut-point yet far above chance are substantial, as pointed out in this review. This intermediate, border-zone performance on SVT measures is where substantial interpretative challenges exist. Case studies are used to highlight the many areas where additional research is needed. Historical perspectives are reviewed along with the neurobiology of effort. Reasons why performance validity testing (PVT) may be better than the SVT term are reviewed. Advances in neuroimaging techniques may be key in better understanding the meaning of border zone SVT failure. The review demonstrates the problems with rigidity in interpretation with established cut-scores. A better understanding of how certain types of neurological, neuropsychiatric and/or even test conditions may affect SVT performance is needed.
Commentary on "Validating the Interpretations and Uses of Test Scores"

ERIC Educational Resources Information Center

Brennan, Robert L.

2013-01-01

Kane's paper "Validating the Interpretations and Uses of Test Scores" is the most complete and clearest discussion yet available of the argument-based approach to validation. At its most basic level, validation as formulated by Kane is fundamentally a simply-stated two-step enterprise: (1) specify the claims inherent in a particular interpretation…
Validity and Reliability Testing of an e-learning Questionnaire for Chemistry Instruction

NASA Astrophysics Data System (ADS)

Guspatni, G.; Kurniawati, Y.

2018-04-01

The aim of this paper is to examine validity and reliability of a questionnaire used to evaluate e-learning implementation in chemistry instruction. 48 questionnaires were filled in by students who had studied chemistry through e-learning system. The questionnaire consisted of 20 indicators evaluating students’ perception on using e-learning. Parametric testing was done as data were assumed to follow normal distribution. Item validity of the questionnaire was examined through item-total correlation using Pearson’s formula while its reliability was assessed with Cronbach’s alpha formula. Moreover, convergent validity was assessed to see whether indicators building a factor had theoretically the same underlying construct. The result of validity testing revealed 19 valid indicators while the result of reliability testing revealed Cronbach’s alpha value of .886. The result of factor analysis showed that questionnaire consisted of five factors, and each of them had indicators building the same construct. This article shows the importance of factor analysis to get a construct valid questionnaire before it is used as research instrument.
The validity and reliability of the Functional Strength Measurement (FSM) in children with intellectual disabilities.

PubMed

Aertssen, W F M; Steenbergen, B; Smits-Engelsman, B C M

2018-06-07

There is lack of valid and reliable field-based tests for assessing functional strength in young children with mild intellectual disabilities (IDs). The aim of this study was to investigate the test-retest reliability and construct validity of the Functional Strength Measurement in children with ID (FSM-ID). Fifty-two children with mild ID (40 boys and 12 girls, mean age 8.48 years, SD = 1.48) were tested with the FSM. Test-retest reliability (n = 32) was examined by a two-way interclass correlation coefficient for agreement (ICC 2.1A). Standard error of measurement and smallest detectable change were calculated. Construct validity was determined by calculating correlations between the FSM-ID and handheld dynamometry (HHD) (convergent validity), FSM-ID, FSM-ID and subtest strength of the Bruininks-Oseretsky test of motor proficiency - second edition (BOT-2) (convergent validity) and the FSM-ID and balance subtest of the BOT-2 (discriminant validity). Test-retest reliability ICC ranged 0.89-0.98. Correlation between the items of the FSM-ID and HHD ranged 0.39-0.79 and between FSM-ID and BOT-2 (strength items) 0.41-0.80. Correlation between items of the FSM-ID and BOT-2 (balance items) ranged 0.41-0.70. The FSM-ID showed good test-retest reliability and good convergent validity with the HHD and BOT-2 subtest strength. The correlations assessing discriminant validity were higher than expected. Poor levels of postural control and core stability in children with mild IDs may be the underlying factor of those higher correlations. © 2018 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
TESTING BALANCE AND FALL RISK IN PERSONS WITH PARKINSON DISEASE, AN ARGUMENT FOR ECOLOGICALLY VALID TESTING

PubMed Central

Foreman, K. Bo; Addison, Odessa; Kim, Han S.; Dibble, Leland E.

2010-01-01

Introduction Despite clear deficits in postural control, most clinical examination tools lack accuracy in identifying persons with Parkinson disease (PD) who have fallen or are at risk for falls. We assert that this is in part due to the lack of ecological validity of the testing. Methods To test this assertion, we examined the responsiveness and predictive validity of the Functional Gait Assessment (FGA), the Pull test, and the Timed up and Go (TUG) during clinically defined ON and OFF medication states. To address responsiveness, ON/OFF medication performance was compared. To address predictive validity, areas under the curve (AUC) of receiver operating characteristic (ROC) curves were compared. Comparisons were made using separate non-parametric tests. Results Thirty-six persons (24 male, 12 female) with PD (22 fallers, 14 non-fallers) participated. Only the FGA was able to detect differences between fallers and non-fallers for both ON/OFF medication testing. The predictive validity of the FGA and the TUG for fall identification was higher during OFF medication compared to ON medication testing. The predictive validity of the FGA was higher than the TUG and the Pull test during ON and OFF medication testing. Discussion In order to most accurately identify fallers, clinicians should test persons with PD in ecologically relevant conditions and tasks. In this study, interpretation of the OFF medication performance and use of the FGA provided more accurate prediction of those who would fall. PMID:21215674
Development and validation of challenge materials for double-blind, placebo-controlled food challenges in children.

PubMed

Vlieg-Boerstra, Berber J; Bijleveld, Charles M A; van der Heide, Sicco; Beusekamp, Berta J; Wolt-Plompen, Saskia A A; Kukler, Jeanet; Brinkman, Joep; Duiverman, Eric J; Dubois, Anthony E J

2004-02-01

The use of double-blind, placebo-controlled food challenges (DBPCFCs) is considered the gold standard for the diagnosis of food allergy. Despite this, materials and methods used in DBPCFCs have not been standardized. The purpose of this study was to develop and validate recipes for use in DBPCFCs in children by using allergenic foods, preferably in their usual edible form. Recipes containing milk, soy, cooked egg, raw whole egg, peanut, hazelnut, and wheat were developed. For each food, placebo and active test food recipes were developed that met the requirements of acceptable taste, allowance of a challenge dose high enough to elicit reactions in an acceptable volume, optimal matrix ingredients, and good matching of sensory properties of placebo and active test food recipes. Validation was conducted on the basis of sensory tests for difference by using the triangle test and the paired comparison test. Recipes were first tested by volunteers from the hospital staff and subsequently by a professional panel of food tasters in a food laboratory designed for sensory testing. Recipes were considered to be validated if no statistically significant differences were found. Twenty-seven recipes were developed and found to be valid by the volunteer panel. Of these 27 recipes, 17 could be validated by the professional panel. Sensory testing with appropriate statistical analysis allows for objective validation of challenge materials. We recommend the use of professional tasters in the setting of a food laboratory for best results.
Development of diagnostic test instruments to reveal level student conception in kinematic and dynamics

NASA Astrophysics Data System (ADS)

Handhika, J.; Cari, C.; Suparmi, A.; Sunarno, W.; Purwandari, P.

2018-03-01

The purpose of this research was to develop a diagnostic test instrument to reveal students' conceptions in kinematics and dynamics. The diagnostic test was developed based on the content indicator the concept of (1) displacement and distance, (2) instantaneous and average velocity, (3) zero and constant acceleration, (4) gravitational acceleration (5) Newton's first Law, (6) and Newton's third Law. The diagnostic test development model includes: Diagnostic test requirement analysis, formulating test-making objectives, developing tests, checking the validity of the content and the performance of reliability, and application of tests. The Content Validation Index (CVI) results in the category are highly relevant, with a value of 0.85. Three questions get negative Content Validation Ratio CVR) (-0.6), after revised distractors and clarify visual presentation; the CVR become 1 (highly relevant). This test was applied, obtained 16 valid test items, with Cronbach Alpha value of 0.80. It can conclude that diagnostic test can be used to reveal the level of students conception in kinematics and dynamics.
An Efficient Data Partitioning to Improve Classification Performance While Keeping Parameters Interpretable.

PubMed

Korjus, Kristjan; Hebart, Martin N; Vicente, Raul

2016-01-01

Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier's generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term "Cross-validation and cross-testing" improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do.
A Comparison of Validity Rates between Paper-and-Pencil and Computerized Testing with the MMPI-2

ERIC Educational Resources Information Center

Blazek, Nicole L.; Forbey, Johnathan D.

2011-01-01

Although the use of computerized testing in psychopathology assessment has increased in recent years, limited research has examined the impact of this format in terms of potential differences in test validity rates. The current study explores potential differences in the rates of valid and invalid Minnesota Multiphasic Personality Inventory--2…
A Proposal on the Validation Model of Equivalence between PBLT and CBLT

ERIC Educational Resources Information Center

Chen, Huilin

2014-01-01

The validity of the computer-based language test is possibly affected by three factors: computer familiarity, audio-visual cognitive competence, and other discrepancies in construct. Therefore, validating the equivalence between the paper-and-pencil language test and the computer-based language test is a key step in the procedure of designing a…

Validity and Reliability of a Medicine Ball Explosive Power Test.

ERIC Educational Resources Information Center

Stockbrugger, Barry A.; Haennel, Robert G.

2001-01-01

Evaluated the validity and reliability of a medicine ball throw test to evaluate explosive power. Data on competitive sand volleyball players who performed a medicine ball throw and a standard countermovement jump indicated that the medicine ball throw test was a valid and reliable way to assess explosive power for an analogous total-body movement…
Understanding Student Teachers' Behavioural Intention to Use Technology: Technology Acceptance Model (TAM) Validation and Testing

ERIC Educational Resources Information Center

Wong, Kung-Teck; Osman, Rosma bt; Goh, Pauline Swee Choo; Rahmat, Mohd Khairezan

2013-01-01

This study sets out to validate and test the Technology Acceptance Model (TAM) in the context of Malaysian student teachers' integration of their technology in teaching and learning. To establish factorial validity, data collected from 302 respondents were tested against the TAM using confirmatory factor analysis (CFA), and structural equation…
Validation of a Computerized Cognitive Assessment System for Persons with Stroke: A Pilot Study

ERIC Educational Resources Information Center

Yip, Chi Kwong; Man, David W. K.

2009-01-01

This study investigates the validity of a newly developed computerized cognitive assessment system (CCAS) that is equipped with rich multimedia to generate simulated testing situations and considers both test item difficulty and the test taker's ability. It is also hypothesized that better predictive validity of the CCAS in self-care of persons…
Improving the quality of discrete-choice experiments in health: how can we assess validity and reliability?

PubMed

Janssen, Ellen M; Marshall, Deborah A; Hauber, A Brett; Bridges, John F P

2017-12-01

The recent endorsement of discrete-choice experiments (DCEs) and other stated-preference methods by regulatory and health technology assessment (HTA) agencies has placed a greater focus on demonstrating the validity and reliability of preference results. Areas covered: We present a practical overview of tests of validity and reliability that have been applied in the health DCE literature and explore other study qualities of DCEs. From the published literature, we identify a variety of methods to assess the validity and reliability of DCEs. We conceptualize these methods to create a conceptual model with four domains: measurement validity, measurement reliability, choice validity, and choice reliability. Each domain consists of three categories that can be assessed using one to four procedures (for a total of 24 tests). We present how these tests have been applied in the literature and direct readers to applications of these tests in the health DCE literature. Based on a stakeholder engagement exercise, we consider the importance of study characteristics beyond traditional concepts of validity and reliability. Expert commentary: We discuss study design considerations to assess the validity and reliability of a DCE, consider limitations to the current application of tests, and discuss future work to consider the quality of DCEs in healthcare.
Validating use of a critical thinking test for the dental admission test.

PubMed

Tsai, Tsung-Hsun

2014-04-01

The purpose of this study was to validate the use of a test to assess dental school applicants' critical thinking abilities. The intent was to include this test on the Dental Admission Test (DAT) if it was shown to enhance the DAT's validity. Correlation and regression analyses of undergraduate and dental school performance with scores on each of the tests on the DAT battery and the California Critical Thinking Skills Test (CCTST) were performed. Data were collected from 439 third- and fourth-year dental students who consented to participate and were enrolled at one of the ten accredited dental schools included in the study. These ten dental schools were from most regions of the United States. This study concluded that including the CCTST on the DAT did not significantly enhance the DAT's validity.
Voices from Test-Takers: Further Evidence for Language Assessment Validation and Use

ERIC Educational Resources Information Center

Cheng, Liying; DeLuca, Christopher

2011-01-01

Test-takers' interpretations of validity as related to test constructs and test use have been widely debated in large-scale language assessment. This study contributes further evidence to this debate by examining 59 test-takers' perspectives in writing large-scale English language tests. Participants wrote about their test-taking experiences in…
Coverage of the Test of Memory Malingering, Victoria Symptom Validity Test, and Word Memory Test on the Internet: is test security threatened?

PubMed

Bauer, Lyndsey; McCaffrey, Robert J

2006-01-01

In forensic neuropsychological settings, maintaining test security has become critically important, especially in regard to symptom validity tests (SVTs). Coaching, which can entail providing patients or litigants with information about the cognitive sequelae of head injury, or teaching them test-taking strategies to avoid detection of symptom dissimulation has been examined experimentally in many research studies. Emerging evidence supports that coaching strategies affect psychological and neuropsychological test performance to differing degrees depending on the coaching paradigm and the tests administered. The present study sought to examine Internet coverage of SVTs because it is potentially another source of coaching, or information that is readily available. Google searches were performed on the Test of Memory Malingering, the Victoria Symptom Validity Test, and the Word Memory Test. Results indicated that there is a variable amount of information available about each test that could threaten test security and validity should inappropriately interested parties find it. Steps that could be taken to improve this situation and limitations to this exploration are discussed.
Valid methods: the quality assurance of test method development, validation, approval, and transfer for veterinary testing laboratories.

PubMed

Wiegers, Ann L

2003-07-01

Third-party accreditation is a valuable tool to demonstrate a laboratory's competence to conduct testing. Accreditation, internationally and in the United States, has been discussed previously. However, accreditation is only I part of establishing data credibility. A validated test method is the first component of a valid measurement system. Validation is defined as confirmation by examination and the provision of objective evidence that the particular requirements for a specific intended use are fulfilled. The international and national standard ISO/IEC 17025 recognizes the importance of validated methods and requires that laboratory-developed methods or methods adopted by the laboratory be appropriate for the intended use. Validated methods are therefore required and their use agreed to by the client (i.e., end users of the test results such as veterinarians, animal health programs, and owners). ISO/IEC 17025 also requires that the introduction of methods developed by the laboratory for its own use be a planned activity conducted by qualified personnel with adequate resources. This article discusses considerations and recommendations for the conduct of veterinary diagnostic test method development, validation, evaluation, approval, and transfer to the user laboratory in the ISO/IEC 17025 environment. These recommendations are based on those of nationally and internationally accepted standards and guidelines, as well as those of reputable and experienced technical bodies. They are also based on the author's experience in the evaluation of method development and transfer projects, validation data, and the implementation of quality management systems in the area of method development.
Establishing the Validity of TOEIC Bridge™ Test Scores for Students in Colombia, Chile, and Ecuador. Research Report. ETS RR-08-58

ERIC Educational Resources Information Center

Sinharay, Sandip; Feng, Ying; Saldivia, Luis; Powers, Donald E.; Ginuta, Anthony; Simpson, Annabelle; Weng, Vincent

2008-01-01

The validity of TOEIC Bridge™ scores as a measure of English language skill was examined from the standpoint of a unified concept of test validity. In this study, more than 6,000 test takers in 3 Latin American countries (Chile, Colombia, and Ecuador) took 1 form of the TOEIC Bridge test, and their scores were compared to additional information…
Converting Hangar High Expansion Foam Systems to Prevent Cockpit Damage: Full-Scale Validation Tests

DTIC Science & Technology

2017-09-01

AFCEC-CO-TY-TR-2018-0001 CONVERTING HANGAR HIGH EXPANSION FOAM SYSTEMS TO PREVENT COCKPIT DAMAGE: FULL-SCALE VALIDATION TESTS Gerard G...REPORT NUMBER(S) 12. DISTRIBUTION/ AVAILABILITY STATEMENT 13. SUPPLEMENTARY NOTES 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: a. REPORT b...09-2017 Final Test Report May 2017 Converting Hangar High Expansion Foam Systems to Prevent Cockpit Damage: Full-Scale Validation Tests N00173-15-D
Validation of the Narrowing Beam Walking Test in Lower Limb Prosthesis Users.

PubMed

Sawers, Andrew; Hafner, Brian

2018-04-11

To evaluate the content, construct, and discriminant validity of the Narrowing Beam Walking Test (NBWT), a performance-based balance test for lower limb prosthesis users. Cross-sectional study. Research laboratory and prosthetics clinic. Unilateral transtibial and transfemoral prosthesis users (N=40). Not applicable. Content validity was examined by quantifying the percentage of participants receiving maximum or minimum scores (ie, ceiling and floor effects). Convergent construct validity was examined using correlations between participants' NBWT scores and scores or times on existing clinical balance tests regularly administered to lower limb prosthesis users. Known-groups construct validity was examined by comparing NBWT scores between groups of participants with different fall histories, amputation levels, amputation etiologies, and functional levels. Discriminant validity was evaluated by analyzing the area under each test's receiver operating characteristic (ROC) curve. No minimum or maximum scores were recorded on the NBWT. NBWT scores demonstrated strong correlations (ρ=.70‒.85) with scores/times on performance-based balance tests (timed Up and Go test, Four Square Step Test, and Berg Balance Scale) and a moderate correlation (ρ=.49) with the self-report Activities-specific Balance Confidence scale. NBWT performance was significantly lower among participants with a history of falls (P=.003), transfemoral amputation (P=.011), and a lower mobility level (P<.001). The NBWT also had the largest area under the ROC curve (.81) and was the only test to exhibit an area that was statistically significantly >.50 (ie, chance). The results provide strong evidence of content, construct, and discriminant validity for the NBWT as a performance-based test of balance ability. The evidence supports its use to assess balance impairments and fall risk in unilateral transtibial and transfemoral prosthesis users. Copyright © 2018 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Measuring Nutrition Literacy in Spanish-Speaking Latinos: An Exploratory Validation Study.

PubMed

Gibbs, Heather D; Camargo, Juliana M T B; Owens, Sarah; Gajewski, Byron; Cupertino, Ana Paula

2017-11-21

Nutrition is important for preventing and treating chronic diseases highly prevalent among Latinos, yet no tool exists for measuring nutrition literacy among Spanish speakers. This study aimed to adapt the validated Nutrition Literacy Assessment Instrument for Spanish-speaking Latinos. This study was developed in two phases: adaptation and validity testing. Adaptation included translation, expert item content review, and interviews with Spanish speakers. For validity testing, 51 participants completed the Short Assessment of Health Literacy-Spanish (SAHL-S), the Nutrition Literacy Assessment Instrument in Spanish (NLit-S), and socio-demographic questionnaire. Validity and reliability statistics were analyzed. Content validity was confirmed with a Scale Content Validity Index of 0.96. Validity testing demonstrated NLit-S scores were strongly correlated with SAHL-S scores (r = 0.52, p < 0.001). Entire reliability was substantial at 0.994 (CI 0.992-0.996) and internal consistency was excellent (Cronbach's α = 0.92). The NLit-S demonstrates validity and reliability for measuring nutrition literacy among Spanish-speakers.
The Hyper-X Flight Systems Validation Program

NASA Technical Reports Server (NTRS)

Redifer, Matthew; Lin, Yohan; Bessent, Courtney Amos; Barklow, Carole

2007-01-01

For the Hyper-X/X-43A program, the development of a comprehensive validation test plan played an integral part in the success of the mission. The goal was to demonstrate hypersonic propulsion technologies by flight testing an airframe-integrated scramjet engine. Preparation for flight involved both verification and validation testing. By definition, verification is the process of assuring that the product meets design requirements; whereas validation is the process of assuring that the design meets mission requirements for the intended environment. This report presents an overview of the program with emphasis on the validation efforts. It includes topics such as hardware-in-the-loop, failure modes and effects, aircraft-in-the-loop, plugs-out, power characterization, antenna pattern, integration, combined systems, captive carry, and flight testing. Where applicable, test results are also discussed. The report provides a brief description of the flight systems onboard the X-43A research vehicle and an introduction to the ground support equipment required to execute the validation plan. The intent is to provide validation concepts that are applicable to current, follow-on, and next generation vehicles that share the hybrid spacecraft and aircraft characteristics of the Hyper-X vehicle.
Translation, Cultural Adaptation and Validation of the Simple Shoulder Test to Spanish

PubMed Central

Arcuri, Francisco; Barclay, Fernando; Nacul, Ivan

2015-01-01

Background: The validation of widely used scales facilitates the comparison across international patient samples. Objective: The objective was to translate, culturally adapt and validate the Simple Shoulder Test into Argentinian Spanish. Methods: The Simple Shoulder Test was translated from English into Argentinian Spanish by two independent translators, translated back into English and evaluated for accuracy by an expert committee to correct the possible discrepancies. It was then administered to 50 patients with different shoulder conditions.Psycometric properties were analyzed including internal consistency, measured with Cronbach´s Alpha, test-retest reliability at 15 days with the interclass correlation coefficient. Results: The internal consistency, validation, was an Alpha of 0,808, evaluated as good. The test-retest reliability index as measured by intra-class correlation coefficient (ICC) was 0.835, evaluated as excellent. Conclusion: The Simple Shoulder Test translation and it´s cultural adaptation to Argentinian-Spanish demonstrated adequate internal reliability and validity, ultimately allowing for its use in the comparison with international patient samples.
Development and validation of a new instrument for testing functional health literacy in Japanese adults.

PubMed

Nakagami, Katsuyuki; Yamauchi, Toyoaki; Noguchi, Hiroyuki; Maeda, Tohru; Nakagami, Tomoko

2014-06-01

This study aimed to develop a reliable and valid measure of functional health literacy in a Japanese clinical setting. Test development consisted of three phases: generation of an item pool, consultation with experts to assess content validity, and comparison with external criteria (the Japanese Health Knowledge Test) to assess criterion validity. A trial version of the test was administered to 535 Japanese outpatients. Internal consistency reliability, calculated by Cronbach's alpha, was 0.81, and concurrent validity was moderate. Receiver Operating Characteristics and Item Response Theory were used to classify patients as having adequate, marginal, or inadequate functional health literacy. Both inadequate and marginal functional health literacy were associated with older age, lower income, lower educational attainment, and poor health knowledge. The time required to complete the test was 10-15 min. This test should enable health workers to better identify patients with inadequate health literacy. © 2013 Wiley Publishing Asia Pty Ltd.
Statistical validation of normal tissue complication probability models.

PubMed

Xu, Cheng-Jian; van der Schaaf, Arjen; Van't Veld, Aart A; Langendijk, Johannes A; Schilstra, Cornelis

2012-09-01

To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. A penalized regression method, LASSO (least absolute shrinkage and selection operator), was used to build NTCP models for xerostomia after radiation therapy treatment of head-and-neck cancer. Model assessment was based on the likelihood function and the area under the receiver operating characteristic curve. Repeated double cross-validation showed the uncertainty and instability of the NTCP models and indicated that the statistical significance of model performance can be obtained by permutation testing. Repeated double cross-validation and permutation tests are recommended to validate NTCP models before clinical use. Copyright © 2012 Elsevier Inc. All rights reserved.
Proposal and validation of a clinical trunk control test in individuals with spinal cord injury.

PubMed

Quinzaños, J; Villa, A R; Flores, A A; Pérez, R

2014-06-01

One of the problems that arise in spinal cord injury (SCI) is alteration in trunk control. Despite the need for standardized scales, these do not exist for evaluating trunk control in SCI. To propose and validate a trunk control test in individuals with SCI. National Institute of Rehabilitation, Mexico. The test was developed and later evaluated for reliability and criteria, content, and construct validity. We carried out 531 tests on 177 patients and found high inter- and intra-rater reliability. In terms of criterion validity, analysis of variance demonstrated a statistically significant difference in the test score of patients with adequate or inadequate trunk control according to the assessment of a group of experts. A receiver operating characteristic curve was plotted for optimizing the instrument's cutoff point, which was determined at 13 points, with a sensitivity of 98% and a specificity of 92.2%. With regard to construct validity, the correlation between the proposed test and the spinal cord independence measure (SCIM) was 0.873 (P=0.001) and that with the evolution time was 0.437 (P=0.001). For testing the hypothesis with qualitative variables, the Kruskal-Wallis test was performed, which resulted in a statistically significant difference between the scores in the proposed scale of each group defined by these variables. It was proven experimentally that the proposed trunk control test is valid and reliable. Furthermore, the test can be used for all patients with SCI despite the type and level of injury.
Practical color vision tests for air traffic control applicants: en route center and terminal facilities.

PubMed

Mertens, H W; Milburn, N J; Collins, W E

2000-12-01

Two practical color vision tests were developed and validated for use in screening Air Traffic Control Specialist (ATCS) applicants for work at en route center or terminal facilities. The development of the tests involved careful reproduction/simulation of color-coded materials from the most demanding, safety-critical color task performed in each type of facility. The tests were evaluated using 106 subjects with normal color vision and 85 with color vision deficiency. The en route center test, named the Flight Progress Strips Test (FPST), required the identification of critical red/black coding in computer printing and handwriting on flight progress strips. The terminal option test, named the Aviation Lights Test (ALT), simulated red/green/white aircraft lights that must be identified in night ATC tower operations. Color-coding is a non-redundant source of safety-critical information in both tasks. The FPST was validated by direct comparison of responses to strip reproductions with responses to the original flight progress strips and a set of strips selected independently. Validity was high; Kappa = 0.91 with original strips as the validation criterion and 0.86 with different strips. The light point stimuli of the ALT were validated physically with a spectroradiometer. The reliabilities of the FPST and ALT were estimated with Chronbach's alpha as 0.93 and 0.98, respectively. The high job-relevance, validity, and reliability of these tests increases the effectiveness and fairness of ATCS color vision testing.
Validity of Selected Lab and Field Tests of Physical Working Capacity.

ERIC Educational Resources Information Center

Burke, Edmund J.

The validity of selected lab and field tests of physical working capacity was investigated. Forty-four male college students were administered a series of lab and field tests of physical working capacity. Lab tests include a test of maximum oxygen uptake, the PWC 170 test, the Harvard Step Test, the Progressive Pulse Ratio Test, Margaria Test of…
Self-esteem among nursing assistants: reliability and validity of the Rosenberg Self-Esteem Scale.

PubMed

McMullen, Tara; Resnick, Barbara

2013-01-01

To establish the reliability and validity of the Rosenberg Self-Esteem Scale (RSES) when used with nursing assistants (NAs). Testing the RSES used baseline data from a randomized controlled trial testing the Res-Care Intervention. Female NAs were recruited from nursing homes (n = 508). Validity testing for the positive and negative subscales of the RSES was based on confirmatory factor analysis (CFA) using structural equation modeling and Rasch analysis. Estimates of reliability were based on Rasch analysis and the person separation index. Evidence supports the reliability and validity of the RSES in NAs although we recommend minor revisions to the measure for subsequent use. Establishing reliable and valid measures of self-esteem in NAs will facilitate testing of interventions to strengthen workplace self-esteem, job satisfaction, and retention.

Mars Exploration Rover Mission: Entry, Descent, and Landing System Validation

NASA Technical Reports Server (NTRS)

Mitcheltree, Robert A.; Lee, Wayne; Steltzner, Adam; SanMartin, Alejanhdro

2004-01-01

System validation for a Mars entry, descent, and landing system is not simply a demonstration that the electrical system functions in the associated environments. The function of this system is its interaction with the atmospheric and surface environment. Thus, in addition to traditional test-bed, hardware-in-the-loop, testing, a validation program that confirms the environmental interaction is required. Unfortunately, it is not possible to conduct a meaningful end-to-end test of a Mars landing system on Earth. The validation plan must be constructed from an interconnected combination of simulation, analysis and test. For the Mars Exploration Rover mission, this combination of activities and the logic of how they combined to the system's validation was explicitly stated, reviewed, and tracked as part of the development plan.
Agility performance in high-level junior basketball players: the predictive value of anthropometrics and power qualities.

PubMed

Sisic, Nedim; Jelicic, Mario; Pehar, Miran; Spasic, Miodrag; Sekulic, Damir

2016-01-01

In basketball, anthropometric status is an important factor when identifying and selecting talents, while agility is one of the most vital motor performances. The aim of this investigation was to evaluate the influence of anthropometric variables and power capacities on different preplanned agility performances. The participants were 92 high-level, junior-age basketball players (16-17 years of age; 187.6±8.72 cm in body height, 78.40±12.26 kg in body mass), randomly divided into a validation and cross-validation subsample. The predictors set consisted of 16 anthropometric variables, three tests of power-capacities (Sargent-jump, broad-jump and medicine-ball-throw) as predictors. The criteria were three tests of agility: a T-Shape-Test; a Zig-Zag-Test, and a test of running with a 180-degree turn (T180). Forward stepwise multiple regressions were calculated for validation subsamples and then cross-validated. Cross validation included correlations between observed and predicted scores, dependent samples t-test between predicted and observed scores; and Bland Altman graphics. Analysis of the variance identified centres being advanced in most of the anthropometric indices, and medicine-ball-throw (all at P<0.05); with no significant between-position-differences for other studied motor performances. Multiple regression models originally calculated for the validation subsample were then cross-validated, and confirmed for Zig-zag-Test (R of 0.71 and 0.72 for the validation and cross-validation subsample, respectively). Anthropometrics were not strongly related to agility performance, but leg length is found to be negatively associated with performance in basketball-specific agility. Power capacities are confirmed to be an important factor in agility. The results highlighted the importance of sport-specific tests when studying pre-planned agility performance in basketball. The improvement in power capacities will probably result in an improvement in agility in basketball athletes, while anthropometric indices should be used in order to identify those athletes who can achieve superior agility performance.
Validity and Reliability of the 8-Item Work Limitations Questionnaire.

PubMed

Walker, Timothy J; Tullar, Jessica M; Diamond, Pamela M; Kohl, Harold W; Amick, Benjamin C

2017-12-01

Purpose To evaluate factorial validity, scale reliability, test-retest reliability, convergent validity, and discriminant validity of the 8-item Work Limitations Questionnaire (WLQ) among employees from a public university system. Methods A secondary analysis using de-identified data from employees who completed an annual Health Assessment between the years 2009-2015 tested research aims. Confirmatory factor analysis (CFA) (n = 10,165) tested the latent structure of the 8-item WLQ. Scale reliability was determined using a CFA-based approach while test-retest reliability was determined using the intraclass correlation coefficient. Convergent/discriminant validity was tested by evaluating relations between the 8-item WLQ with health/performance variables for convergent validity (health-related work performance, number of chronic conditions, and general health) and demographic variables for discriminant validity (gender and institution type). Results A 1-factor model with three correlated residuals demonstrated excellent model fit (CFI = 0.99, TLI = 0.99, RMSEA = 0.03, and SRMR = 0.01). The scale reliability was acceptable (0.69, 95% CI 0.68-0.70) and the test-retest reliability was very good (ICC = 0.78). Low-to-moderate associations were observed between the 8-item WLQ and the health/performance variables while weak associations were observed between the demographic variables. Conclusions The 8-item WLQ demonstrated sufficient reliability and validity among employees from a public university system. Results suggest the 8-item WLQ is a usable alternative for studies when the more comprehensive 25-item WLQ is not available.
The predictive value of the sacral base pressure test in detecting specific types of sacroiliac dysfunction

PubMed Central

Mitchell, Travis D.; Urli, Kristina E.; Breitenbach, Jacques; Yelverton, Chris

2007-01-01

Abstract Objective This study aimed to evaluate the validity of the sacral base pressure test in diagnosing sacroiliac joint dysfunction. It also determined the predictive powers of the test in determining which type of sacroiliac joint dysfunction was present. Methods This was a double-blind experimental study with 62 participants. The results from the sacral base pressure test were compared against a cluster of previously validated tests of sacroiliac joint dysfunction to determine its validity and predictive powers. The external rotation of the feet, occurring during the sacral base pressure test, was measured using a digital inclinometer. Results There was no statistically significant difference in the results of the sacral base pressure test between the types of sacroiliac joint dysfunction. In terms of the results of validity, the sacral base pressure test was useful in identifying positive values of sacroiliac joint dysfunction. It was fairly helpful in correctly diagnosing patients with negative test results; however, it had only a “slight” agreement with the diagnosis for κ interpretation. Conclusions In this study, the sacral base pressure test was not a valid test for determining the presence of sacroiliac joint dysfunction or the type of dysfunction present. Further research comparing the agreement of the sacral base pressure test or other sacroiliac joint dysfunction tests with a criterion standard of diagnosis is necessary. PMID:19674694
Validation of Metagenomic Next-Generation Sequencing Tests for Universal Pathogen Detection.

PubMed

Schlaberg, Robert; Chiu, Charles Y; Miller, Steve; Procop, Gary W; Weinstock, George

2017-06-01

- Metagenomic sequencing can be used for detection of any pathogens using unbiased, shotgun next-generation sequencing (NGS), without the need for sequence-specific amplification. Proof-of-concept has been demonstrated in infectious disease outbreaks of unknown causes and in patients with suspected infections but negative results for conventional tests. Metagenomic NGS tests hold great promise to improve infectious disease diagnostics, especially in immunocompromised and critically ill patients. - To discuss challenges and provide example solutions for validating metagenomic pathogen detection tests in clinical laboratories. A summary of current regulatory requirements, largely based on prior guidance for NGS testing in constitutional genetics and oncology, is provided. - Examples from 2 separate validation studies are provided for steps from assay design, and validation of wet bench and bioinformatics protocols, to quality control and assurance. - Although laboratory and data analysis workflows are still complex, metagenomic NGS tests for infectious diseases are increasingly being validated in clinical laboratories. Many parallels exist to NGS tests in other fields. Nevertheless, specimen preparation, rapidly evolving data analysis algorithms, and incomplete reference sequence databases are idiosyncratic to the field of microbiology and often overlooked.
Why Lessons Learned from the Past Require Haertel's Expanded Scope for Test Validation

ERIC Educational Resources Information Center

Shepard, Lorrie A.

2013-01-01

In his article, Haertel (this issue) asks a fundamental question about how use of a test is expected to cause improvements in the educational system and in learning. He also considers how test validity should be investigated and argues for a more expansive view of validity that does not stop with scoring or generalization (the more technical and…
Examining the Reliability and Validity of ADEPT and CELDT: Comparing Two Assessments of Oral Language Proficiency for English Language Learners

ERIC Educational Resources Information Center

Chavez, Gina

2013-01-01

Few classroom measures of English language proficiency have been evaluated for reliability and validity. This research examined the concurrent and predictive validity of an oral language test, titled A Developmental English Language Proficiency Test (ADEPT), and the relationship to the California English Language Development Test (CELDT) in the…
Impact on Participation and Autonomy: Test of Validity and Reliability for Older Persons.

PubMed

Hammar, Isabelle Ottenvall; Ekelund, Christina; Wilhelmson, Katarina; Eklund, Kajsa

2014-11-06

In research and healthcare it is important to measure older persons' self-determination in order to improve their possibilities to decide for themselves in daily life. The questionnaire Impact on Participation and Autonomy (IPA) assesses self-determination, but is not constructed for older persons. The aim of this study was to examine the validity and reliability of the IPA-S questionnaire for persons aged 70 years and older. The study was performed in two steps; first a validity test of the Swedish version of the questionnaire, IPA-S, followed by a reliability test-retest of an adjusted version. The validity was tested with focus groups and individual interviews on persons aged 77-88 years, and the reliability on persons aged 70-99 years. The validity test result showed that IPA-S is valid for older persons but it was too extensive and the phrasing of the items needed adjustments. The reliability test-retest on the adjusted questionnaire, IPA- Older persons (IPA-O), showed that 15 of 22 items had high agreement. IPA-O can be used to measure older persons' self-determination in their care and rehabilitation.
Examining the Validity of GED[R] Tests Scores with Scheduling and Setting Accommodations. GED Testing Service Research Studies, 2004-1

ERIC Educational Resources Information Center

George-Ezzelle, Carol E.; Skaggs, Gary

2004-01-01

Current testing standards call for test developers to provide evidence that testing procedures and test scores, and the inferences made based on the test scores, show evidence of validity and are comparable across subpopulations (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on…
Implementation and Initial Validation of the MDTP Tests at Golden West College.

ERIC Educational Resources Information Center

Isonio, Steven

In 1992, a study was conducted at Golden West College (California) to determine the predictive validity of the Math Diagnostic Testing Project (MDTP) tests. A total of 1,137 students were tested in-class; 601 took the Algebra Readiness test, 376 took the Elementary Algebra test, and 160 took the Intermediate Algebra test. Two correlation…
Dynamic testing in schizophrenia: does training change the construct validity of a test?

PubMed

Wiedl, Karl H; Schöttke, Henning; Green, Michael F; Nuechterlein, Keith H

2004-01-01

Dynamic testing typically involves specific interventions for a test to assess the extent to which test performance can be modified, beyond level of baseline (static) performance. This study used a dynamic version of the Wisconsin Card Sorting Test (WCST) that is based on cognitive remediation techniques within a test-training-test procedure. From results of previous studies with schizophrenia patients, we concluded that the dynamic and static versions of the WCST should have different construct validity. This hypothesis was tested by examining the patterns of correlations with measures of executive functioning, secondary verbal memory, and verbal intelligence. Results demonstrated a specific construct validity of WCST dynamic (i.e., posttest) scores as an index of problem solving (Tower of Hanoi) and secondary verbal memory and learning (Auditory Verbal Learning Test), whereas the impact of general verbal capacity and selective attention (Verbal IQ, Stroop Test) was reduced. It is concluded that the construct validity of the test changes with dynamic administration and that this difference helps to explain why the dynamic version of the WCST predicts functional outcome better than the static version.
Screening for cognitive impairment in older individuals. Validation study of a computer-based test.

PubMed

Green, R C; Green, J; Harrison, J M; Kutner, M H

1994-08-01

This study examined the validity of a computer-based cognitive test that was recently designed to screen the elderly for cognitive impairment. Criterion-related validity was examined by comparing test scores of impaired patients and normal control subjects. Construct-related validity was computed through correlations between computer-based subtests and related conventional neuropsychological subtests. University center for memory disorders. Fifty-two patients with mild cognitive impairment by strict clinical criteria and 50 unimpaired, age- and education-matched control subjects. Control subjects were rigorously screened by neurological, neuropsychological, imaging, and electrophysiological criteria to identify and exclude individuals with occult abnormalities. Using a cut-off total score of 126, this computer-based instrument had a sensitivity of 0.83 and a specificity of 0.96. Using a prevalence estimate of 10%, predictive values, positive and negative, were 0.70 and 0.96, respectively. Computer-based subtests correlated significantly with conventional neuropsychological tests measuring similar cognitive domains. Thirteen (17.8%) of 73 volunteers with normal medical histories were excluded from the control group, with unsuspected abnormalities on standard neuropsychological tests, electroencephalograms, or magnetic resonance imaging scans. Computer-based testing is a valid screening methodology for the detection of mild cognitive impairment in the elderly, although this particular test has important limitations. Broader applications of computer-based testing will require extensive population-based validation. Future studies should recognize that normal control subjects without a history of disease who are typically used in validation studies may have a high incidence of unsuspected abnormalities on neurodiagnostic studies.
The validity of upper-limb neurodynamic tests for detecting peripheral neuropathic pain.

PubMed

Nee, Robert J; Jull, Gwendolen A; Vicenzino, Bill; Coppieters, Michel W

2012-05-01

The validity of upper-limb neurodynamic tests (ULNTs) for detecting peripheral neuropathic pain (PNP) was assessed by reviewing the evidence on plausibility, the definition of a positive test, reliability, and concurrent validity. Evidence was identified by a structured search for peer-reviewed articles published in English before May 2011. The quality of concurrent validity studies was assessed with the Quality Assessment of Diagnostic Accuracy Studies tool, where appropriate. Biomechanical and experimental pain data support the plausibility of ULNTs. Evidence suggests that a positive ULNT should at least partially reproduce the patient's symptoms and that structural differentiation should change these symptoms. Data indicate that this definition of a positive ULNT is reliable when used clinically. Limited evidence suggests that the median nerve test, but not the radial nerve test, helps determine whether a patient has cervical radiculopathy. The median nerve test does not help diagnose carpal tunnel syndrome. These findings should be interpreted cautiously, because diagnostic accuracy might have been distorted by the investigators' definitions of a positive ULNT. Furthermore, patients with PNP who presented with increased nerve mechanosensitivity rather than conduction loss might have been incorrectly classified by electrophysiological reference standards as not having PNP. The only evidence for concurrent validity of the ulnar nerve test was a case study on cubital tunnel syndrome. We recommend that researchers develop more comprehensive reference standards for PNP to accurately assess the concurrent validity of ULNTs and continue investigating the predictive validity of ULNTs for prognosis or treatment response.
Validity and reliability of Patient-Reported Outcomes Measurement Information System (PROMIS) Instruments in Osteoarthritis

PubMed Central

Broderick, Joan E.; Schneider, Stefan; Junghaenel, Doerte U.; Schwartz, Joseph E.; Stone, Arthur A.

2013-01-01

Objective Evaluation of known group validity, ecological validity, and test-retest reliability of four domain instruments from the Patient Reported Outcomes Measurement System (PROMIS) in osteoarthritis (OA) patients. Methods Recruitment of an osteoarthritis sample and a comparison general population (GP) through an Internet survey panel. Pain intensity, pain interference, physical functioning, and fatigue were assessed for 4 consecutive weeks with PROMIS short forms on a daily basis and compared with same-domain Computer Adaptive Test (CAT) instruments that use a 7-day recall. Known group validity (comparison of OA and GP), ecological validity (comparison of aggregated daily measures with CATs), and test-retest reliability were evaluated. Results The recruited samples matched (age, sex, race, ethnicity) the demographic characteristics of the U.S. sample for arthritis and the 2009 Census for the GP. Compliance with repeated measurements was excellent: > 95%. Known group validity for CATs was demonstrated with large effect sizes (pain intensity: 1.42, pain interference: 1.25, and fatigue: .85). Ecological validity was also established through high correlations between aggregated daily measures and weekly CATs (≥ .86). Test-retest validity (7-day) was very good (≥ .80). Conclusion PROMIS CAT instruments demonstrated known group and ecological validity in a comparison of osteoarthritis patients with a general population sample. Adequate test-retest reliability was also observed. These data provide encouraging initial data on the utility of these PROMIS instruments for clinical and research outcomes in osteoarthritis patients. PMID:23592494
The Validity and Incremental Validity of Knowledge Tests, Low-Fidelity Simulations, and High-Fidelity Simulations for Predicting Job Performance in Advanced-Level High-Stakes Selection

ERIC Educational Resources Information Center

Lievens, Filip; Patterson, Fiona

2011-01-01

In high-stakes selection among candidates with considerable domain-specific knowledge and experience, investigations of whether high-fidelity simulations (assessment centers; ACs) have incremental validity over low-fidelity simulations (situational judgment tests; SJTs) are lacking. Therefore, this article integrates research on the validity of…
Overview of Heat Addition and Efficiency Predictions for an Advanced Stirling Convertor

NASA Technical Reports Server (NTRS)

Wilson, Scott D.; Reid, Terry; Schifer, Nicholas; Briggs, Maxwell

2011-01-01

Past methods of predicting net heat input needed to be validated. Validation effort pursued with several paths including improving model inputs, using test hardware to provide validation data, and validating high fidelity models. Validation test hardware provided direct measurement of net heat input for comparison to predicted values. Predicted value of net heat input was 1.7 percent less than measured value and initial calculations of measurement uncertainty were 2.1 percent (under review). Lessons learned during validation effort were incorporated into convertor modeling approach which improved predictions of convertor efficiency.
Criterion-Related Validity of Sit-and-Reach Tests for Estimating Hamstring and Lumbar Extensibility: a Meta-Analysis

PubMed Central

Mayorga-Vega, Daniel; Merino-Marban, Rafael; Viciana, Jesús

2014-01-01

The main purpose of the present meta-analysis was to examine the scientific literature on the criterion-related validity of sit-and-reach tests for estimating hamstring and lumbar extensibility. For this purpose relevant studies were searched from seven electronic databases dated up through December 2012. Primary outcomes of criterion-related validity were Pearson´s zero-order correlation coefficients (r) between sit-and-reach tests and hamstrings and/or lumbar extensibility criterion measures. Then, from the included studies, the Hunter- Schmidt´s psychometric meta-analysis approach was conducted to estimate population criterion- related validity of sit-and-reach tests. Firstly, the corrected correlation mean (rp), unaffected by statistical artefacts (i.e., sampling error and measurement error), was calculated separately for each sit-and-reach test. Subsequently, the three potential moderator variables (sex of participants, age of participants, and level of hamstring extensibility) were examined by a partially hierarchical analysis. Of the 34 studies included in the present meta-analysis, 99 correlations values across eight sit-and-reach tests and 51 across seven sit-and-reach tests were retrieved for hamstring and lumbar extensibility, respectively. The overall results showed that all sit-and-reach tests had a moderate mean criterion-related validity for estimating hamstring extensibility (rp = 0.46-0.67), but they had a low mean for estimating lumbar extensibility (rp = 0. 16-0.35). Generally, females, adults and participants with high levels of hamstring extensibility tended to have greater mean values of criterion-related validity for estimating hamstring extensibility. When the use of angular tests is limited such as in a school setting or in large scale studies, scientists and practitioners could use the sit-and-reach tests as a useful alternative for hamstring extensibility estimation, but not for estimating lumbar extensibility. Key Points Overall sit-and-reach tests have a moderate mean criterion-related validity for estimating hamstring extensibility, but they have a low mean validity for estimating lumbar extensibility. Among all the sit-and-reach test protocols, the Classic sit-and-reach test seems to be the best option to estimate hamstring extensibility. End scores (e.g., the Classic sit-and-reach test) are a better indicator of hamstring extensibility than the modifications that incorporate fingers-to-box distance (e.g., the Modified sit-and-reach test). When angular tests such as straight leg raise or knee extension tests cannot be used, sit-and-reach tests seem to be a useful field test alternative to estimate hamstring extensibility, but not to estimate lumbar extensibility. PMID:24570599
Translation, cultural adaptation and validation of the Diabetes Attitudes Scale - third version into Brazilian Portuguese 1

PubMed Central

Vieira, Gisele de Lacerda Chaves; Pagano, Adriana Silvino; Reis, Ilka Afonso; Rodrigues, Júlia Santos Nunes; Torres, Heloísa de Carvalho

2018-01-01

ABSTRACT Objective: to perform the translation, adaptation and validation of the Diabetes Attitudes Scale - third version instrument into Brazilian Portuguese. Methods: methodological study carried out in six stages: initial translation, synthesis of the initial translation, back-translation, evaluation of the translated version by the Committee of Judges (27 Linguists and 29 health professionals), pre-test and validation. The pre-test and validation (test-retest) steps included 22 and 120 health professionals, respectively. The Content Validity Index, the analyses of internal consistency and reproducibility were performed using the R statistical program. Results: in the content validation, the instrument presented good acceptance among the Judges with a mean Content Validity Index of 0.94. The scale presented acceptable internal consistency (Cronbach’s alpha = 0.60), while the correlation of the total score at the test and retest moments was considered high (Polychoric Correlation Coefficient = 0.86). The Intra-class Correlation Coefficient, for the total score, presented a value of 0.65. Conclusion: the Brazilian version of the instrument (Escala de Atitudes dos Profissionais em relação ao Diabetes Mellitus) was considered valid and reliable for application by health professionals in Brazil. PMID:29319739
Issues in cross-cultural validity: example from the adaptation, reliability, and validity testing of a Turkish version of the Stanford Health Assessment Questionnaire.

PubMed

Küçükdeveci, Ayse A; Sahin, Hülya; Ataman, Sebnem; Griffiths, Bridget; Tennant, Alan

2004-02-15

Guidelines have been established for cross-cultural adaptation of outcome measures. However, invariance across cultures must also be demonstrated through analysis of Differential Item Functioning (DIF). This is tested in the context of a Turkish adaptation of the Health Assessment Questionnaire (HAQ). Internal construct validity of the adapted HAQ is assessed by Rasch analysis; reliability, by internal consistency and the intraclass correlation coefficient; external construct validity, by association with impairments and American College of Rheumatology functional stages. Cross-cultural validity is tested through DIF by comparison with data from the UK version of the HAQ. The adapted version of the HAQ demonstrated good internal construct validity through fit of the data to the Rasch model (mean item fit 0.205; SD 0.998). Reliability was excellent (alpha = 0.97) and external construct validity was confirmed by expected associations. DIF for culture was found in only 1 item. Cross-cultural validity was found to be sufficient for use in international studies between the UK and Turkey. Future adaptation of instruments should include analysis of DIF at the field testing stage in the adaptation process.
Validation of Social Cognition Rating Tools in Indian Setting (SOCRATIS): A new test-battery to assess social cognition.

PubMed

Mehta, Urvakhsh M; Thirthalli, Jagadisha; Naveen Kumar, C; Mahadevaiah, Mahesh; Rao, Kiran; Subbakrishna, Doddaballapura K; Gangadhar, Bangalore N; Keshavan, Matcheri S

2011-09-01

Social cognition is a cognitive domain that is under substantial cultural influence. There are no culturally appropriate standardized tools in India to comprehensively test social cognition. This study describes validation of tools for three social cognition constructs: theory of mind, social perception and attributional bias. Theory of mind tests included adaptations of, (a) two first order tasks [Sally-Anne and Smarties task], (b) two second order tasks [Ice cream van and Missing cookies story], (c) two metaphor-irony tasks and (d) the faux pas recognition test. Internal, Personal, and Situational Attributions Questionnaire (IPSAQ) and Social Cue Recognition Test were adapted to assess attributional bias and social perception, respectively. These tests were first modified to suit the Indian cultural context without changing the constructs to be tested. A panel of experts then rated the tests on likert scales as to (1) whether the modified tasks tested the same construct as in the original and (2) whether they were culturally appropriate. The modified tests were then administered to groups of actively symptomatic and remitted schizophrenia patients as well as healthy comparison subjects. All tests of the Social Cognition Rating Tools in Indian Setting had good content validity and known groups validity. In addition, the social cure recognition test in Indian setting had good internal consistency and concurrent validity. Copyright © 2011 Elsevier B.V. All rights reserved.

On Nomological Validity and Auxiliary Assumptions: The Importance of Simultaneously Testing Effects in Social Cognitive Theories Applied to Health Behavior and Some Guidelines

PubMed Central

Hagger, Martin S.; Gucciardi, Daniel F.; Chatzisarantis, Nikos L. D.

2017-01-01

Tests of social cognitive theories provide informative data on the factors that relate to health behavior, and the processes and mechanisms involved. In the present article, we contend that tests of social cognitive theories should adhere to the principles of nomological validity, defined as the degree to which predictions in a formal theoretical network are confirmed. We highlight the importance of nomological validity tests to ensure theory predictions can be disconfirmed through observation. We argue that researchers should be explicit on the conditions that lead to theory disconfirmation, and identify any auxiliary assumptions on which theory effects may be conditional. We contend that few researchers formally test the nomological validity of theories, or outline conditions that lead to model rejection and the auxiliary assumptions that may explain findings that run counter to hypotheses, raising potential for ‘falsification evasion.’ We present a brief analysis of studies (k = 122) testing four key social cognitive theories in health behavior to illustrate deficiencies in reporting theory tests and evaluations of nomological validity. Our analysis revealed that few articles report explicit statements suggesting that their findings support or reject the hypotheses of the theories tested, even when findings point to rejection. We illustrate the importance of explicit a priori specification of fundamental theory hypotheses and associated auxiliary assumptions, and identification of the conditions which would lead to rejection of theory predictions. We also demonstrate the value of confirmatory analytic techniques, meta-analytic structural equation modeling, and Bayesian analyses in providing robust converging evidence for nomological validity. We provide a set of guidelines for researchers on how to adopt and apply the nomological validity approach to testing health behavior models. PMID:29163307
Economic analysis of model validation for a challenge problem

DOE PAGES

Paez, Paul J.; Paez, Thomas L.; Hasselman, Timothy K.

2016-02-19

It is now commonplace for engineers to build mathematical models of the systems they are designing, building, or testing. And, it is nearly universally accepted that phenomenological models of physical systems must be validated prior to use for prediction in consequential scenarios. Yet, there are certain situations in which testing only or no testing and no modeling may be economically viable alternatives to modeling and its associated testing. This paper develops an economic framework within which benefit–cost can be evaluated for modeling and model validation relative to other options. The development is presented in terms of a challenge problem. Asmore » a result, we provide a numerical example that quantifies when modeling, calibration, and validation yield higher benefit–cost than a testing only or no modeling and no testing option.« less
Psychometric Arabic Sino-Nasal Outcome Test-22: validation and translation in chronic rhinosinusitis patients.

PubMed

Alanazy, Fatma; Dousary, Surayie Al; Albosaily, Ahmed; Aldriweesh, Turki; Alsaleh, Saad; Aldrees, Turki

2018-01-01

The Sino-Nasal Outcome Test (SNOT)-22 has multiple items that reflect how nasal disease affects quality of life. Currently, no validated Arabic version of the SNOT-22 is available. . To develop an Arabic-validated version of SNOT-22. Prospective. Tertiary care center. This single-center validation study was conducted between 2015 and 2017 at King Abdul-Aziz University Hospital, Riyadh, Saudi Arabia. The SNOT-22 English version was translated into Arabic by the forward and backward method. The test and retest reliability, internal consistency, responsiveness to surgical treatment, discriminant validity, sensitivity and specificity all were tested. Validated Arabic version of the SNOT-22. Of 265 individuals, 171 were healthy volunteers and 94 were chronic rhinosinusitis patients. The Arabic version showed high internal consistency (Cronbach's of 0.94), and the ability to differentiate between diseased and healthy volunteers (P < .001). The translated versions demonstrated the ability to detect the change scores significantly in response to intervention (P < .001). This is the first validated Arabic version of SNOT-22. The instrument can be used among the Arabic population. No subjects from other Arab countries.
Several submaximal exercise tests are reliable, valid and acceptable in people with chronic pain, fibromyalgia or chronic fatigue: a systematic review.

PubMed

Ratter, Julia; Radlinger, Lorenz; Lucas, Cees

2014-09-01

Are submaximal and maximal exercise tests reliable, valid and acceptable in people with chronic pain, fibromyalgia and fatigue disorders? Systematic review of studies of the psychometric properties of exercise tests. People older than 18 years with chronic pain, fibromyalgia and chronic fatigue disorders. Studies of the measurement properties of tests of physical capacity in people with chronic pain, fibromyalgia or chronic fatigue disorders were included. Studies were required to report: reliability coefficients (intraclass correlation coefficient, alpha reliability coefficient, limits of agreements and Bland-Altman plots); validity coefficients (intraclass correlation coefficient, Spearman's correlation, Kendal T coefficient, Pearson's correlation); or dropout rates. Fourteen studies were eligible: none had low risk of bias, 10 had unclear risk of bias and four had high risk of bias. The included studies evaluated: Åstrand test; modified Åstrand test; Lean body mass-based Åstrand test; submaximal bicycle ergometer test following another protocol other than Åstrand test; 2-km walk test; 5-minute, 6-minute and 10-minute walk tests; shuttle walk test; and modified symptom-limited Bruce treadmill test. None of the studies assessed maximal exercise tests. Where they had been tested, reliability and validity were generally high. Dropout rates were generally acceptable. The 2-km walk test was not recommended in fibromyalgia. Moderate evidence was found for reliability, validity and acceptability of submaximal exercise tests in patients with chronic pain, fibromyalgia or chronic fatigue. There is no evidence about maximal exercise tests in patients with chronic pain, fibromyalgia and chronic fatigue. Copyright © 2014. Published by Elsevier B.V.
Reliability, Validity, and Cross-Cultural Adaptation of the Turkish Version of the Bournemouth Questionnaire.

PubMed

Gunaydin, Gurkan; Citaker, Seyit; Meray, Jale; Cobanoglu, Gamze; Gunaydin, Ozge Ece; Hazar Kanik, Zeynep

2016-11-01

Validation of a self-report questionnaire. The purpose of this study was to investigate adaptation, validity, and reliability of the Turkish version of the Bournemouth Questionnaire. Low back pain is one of the most frequent disorders leading to activity limitation. This pain affects most of people in their lives. The most important point to evaluate patient's functional abilities and to decide a successful therapy procedure is to manage the assessment questionnaires precisely. One hundred ten patients with chronic low back pain were included in present study. To assess reliability, test-retest and internal consistency analyses were applied. The results of test-retest analysis were assessed by using Intraclass Correlation Coefficient method (95% confidence interval). For internal consistency, Cronbach alpha value was calculated. Validity of the questionnaire was assessed in terms of construct validity. For construct validity, factor analysis and convergent validity were tested. For convergent validity, total points of the Bournemouth Questionnaire were assessed with the total points of Quebec Back Pain Disability Scale and Roland Morris Disability Questionnaire by using Pearson correlation coefficient analysis. Cronbach alpha value was found 0.914, showing that this questionnaire has high internal consistency. The results of test-retest analysis were varying between 0.851 and 0.927, which shows that test-retest results are highly correlated. Factor analysis test indicated that this questionnaire had one factor. Pearson correlation coefficient of the Bournemouth Questionnaire with Roland Morris Disability Questionnaire was calculated 0.703 and it was found with Quebec Back Pain Disability Scale is 0.659. These results showed that the Bournemouth Questionnaire is very good correlated with Roland Morris Disability Questionnaire and Quebec Back Pain Disability Scale. The Turkish version of the Bournemouth Questionnaire is valid and reliable. 3.
Reliability and validity of Kano Test for Social Nicotine Dependence (KTSND), and development of its revised scale assessing the psychosocial acceptability of smoking among university students.

PubMed

Kitada, Masako; Musashi, Manabu; Kano, Masato

2011-08-01

To examine reliability and validity of Kano Test for Social Nicotine Dependence (KTSND), a scale assessing the psychosocial acceptability of smoking, and to develop a new version when validity or reliability of KTSND was not acceptable. We carried out a self-administered cross-sectional survey on undergraduate university students. The participants completed the KTSND, and supplemented three questions on the attitudes toward tobacco control policies and smoking states. Using daily smokers, we examined the relationship between the KTSND and Fagerström Test for Nicotine Dependence (FTND). In each study, we examined test-retest reliability and construct validity, discriminant and convergent validity, and factor validity. Although the KTSND had high internal consistency (Cronbach's a 0.82) and high test-retest reliability (r=0.72), the results of factor analysis were unacceptable; we expected three factors to be extracted, however, only two factors of "Overestimate of smoking usefulness" and "Allege smoking as a taste and/or culture" were extracted. Using the Kano's Test for Assessing Acceptability of Smoking (KTAAS), the new version of KTSND in which a question was replaced with another one, the third factor of "Neglect of harm of tobacco smoking" was extracted adding to the above-mentioned two. KTAAS had also both high internal consistency (Cronbach's alpha 0.82) and test-retest reliability (r=0.66). Overall, the KTSND and the KTAAS score differed according to smoking states, and the nonsmokers' scores were the lowest. The KTSND was a popular questionnaire in Japan, however, its validity assessed using factor analysis was not acceptable, while KTAAS had sufficient reliability and validity, and might assess the cognition and attitude affirming or accepting tobacco smoking among university students.
Test-retest reliability and construct validity of the ENERGY-child questionnaire on energy balance-related behaviours and their potential determinants: the ENERGY-project.

PubMed

Singh, Amika S; Vik, Froydis N; Chinapaw, Mai J M; Uijtdewilligen, Léonie; Verloigne, Maïté; Fernández-Alvira, Juan M; Stomfai, Sarolta; Manios, Yannis; Martens, Marloes; Brug, Johannes

2011-12-09

Insight in children's energy balance-related behaviours (EBRBs) and their determinants is important to inform obesity prevention research. Therefore, reliable and valid tools to measure these variables in large-scale population research are needed. To examine the test-retest reliability and construct validity of the child questionnaire used in the ENERGY-project, measuring EBRBs and their potential determinants among 10-12 year old children. We collected data among 10-12 year old children (n = 730 in the test-retest reliability study; n = 96 in the construct validity study) in six European countries, i.e. Belgium, Greece, Hungary, the Netherlands, Norway, and Spain. Test-retest reliability was assessed using the intra-class correlation coefficient (ICC) and percentage agreement comparing scores from two measurements, administered one week apart. To assess construct validity, the agreement between questionnaire responses and a subsequent face-to-face interview was assessed using ICC and percentage agreement. Of the 150 questionnaire items, 115 (77%) showed good to excellent test-retest reliability as indicated by ICCs > .60 or percentage agreement ≥ 75%. Test-retest reliability was moderate for 34 items (23%) and poor for one item. Construct validity appeared to be good to excellent for 70 (47%) of the 150 items, as indicated by ICCs > .60 or percentage agreement ≥ 75%. From the other 80 items, construct validity was moderate for 39 (26%) and poor for 41 items (27%). Our results demonstrate that the ENERGY-child questionnaire, assessing EBRBs of the child as well as personal, family, and school-environmental determinants related to these EBRBs, has good test-retest reliability and moderate to good construct validity for the large majority of items.
Test-retest reliability and construct validity of the ENERGY-child questionnaire on energy balance-related behaviours and their potential determinants: the ENERGY-project

PubMed Central

2011-01-01

Background Insight in children's energy balance-related behaviours (EBRBs) and their determinants is important to inform obesity prevention research. Therefore, reliable and valid tools to measure these variables in large-scale population research are needed. Objective To examine the test-retest reliability and construct validity of the child questionnaire used in the ENERGY-project, measuring EBRBs and their potential determinants among 10-12 year old children. Methods We collected data among 10-12 year old children (n = 730 in the test-retest reliability study; n = 96 in the construct validity study) in six European countries, i.e. Belgium, Greece, Hungary, the Netherlands, Norway, and Spain. Test-retest reliability was assessed using the intra-class correlation coefficient (ICC) and percentage agreement comparing scores from two measurements, administered one week apart. To assess construct validity, the agreement between questionnaire responses and a subsequent face-to-face interview was assessed using ICC and percentage agreement. Results Of the 150 questionnaire items, 115 (77%) showed good to excellent test-retest reliability as indicated by ICCs > .60 or percentage agreement ≥ 75%. Test-retest reliability was moderate for 34 items (23%) and poor for one item. Construct validity appeared to be good to excellent for 70 (47%) of the 150 items, as indicated by ICCs > .60 or percentage agreement ≥ 75%. From the other 80 items, construct validity was moderate for 39 (26%) and poor for 41 items (27%). Conclusions Our results demonstrate that the ENERGY-child questionnaire, assessing EBRBs of the child as well as personal, family, and school-environmental determinants related to these EBRBs, has good test-retest reliability and moderate to good construct validity for the large majority of items. PMID:22152048
Vacuum decay container closure integrity leak test method development and validation for a lyophilized product-package system.

PubMed

Patel, Jayshree; Mulhall, Brian; Wolf, Heinz; Klohr, Steven; Guazzo, Dana Morton

2011-01-01

A leak test performed according to ASTM F2338-09 Standard Test Method for Nondestructive Detection of Leaks in Packages by Vacuum Decay Method was developed and validated for container-closure integrity verification of a lyophilized product in a parenteral vial package system. This nondestructive leak test method is intended for use in manufacturing as an in-process package integrity check, and for testing product stored on stability in lieu of sterility tests. Method development and optimization challenge studies incorporated artificially defective packages representing a range of glass vial wall and sealing surface defects, as well as various elastomeric stopper defects. Method validation required 3 days of random-order replicate testing of a test sample population of negative-control, no-defect packages and positive-control, with-defect packages. Positive-control packages were prepared using vials each with a single hole laser-drilled through the glass vial wall. Hole creation and hole size certification was performed by Lenox Laser. Validation study results successfully demonstrated the vacuum decay leak test method's ability to accurately and reliably detect those packages with laser-drilled holes greater than or equal to approximately 5 μm in nominal diameter. All development and validation studies were performed at Whitehouse Analytical Laboratories in Whitehouse, NJ, under the direction of consultant Dana Guazzo of RxPax, LLC, using a VeriPac 455 Micro Leak Test System by Packaging Technologies & Inspection (Tuckahoe, NY). Bristol Myers Squibb (New Brunswick, NJ) fully subsidized all work. A leak test performed according to ASTM F2338-09 Standard Test Method for Nondestructive Detection of Leaks in Packages by Vacuum Decay Method was developed and validated to detect defects in stoppered vial packages containing lyophilized product for injection. This nondestructive leak test method is intended for use in manufacturing as an in-process package integrity check, and for testing product stored on stability in lieu of sterility tests. Test method validation study results proved the method capable of detecting holes laser-drilled through the glass vial wall greater than or equal to 5 μm in nominal diameter. Total test time is less than 1 min per package. All method development and validation studies were performed at Whitehouse Analytical Laboratories in Whitehouse, NJ, under the direction of consultant Dana Guazzo of RxPax, LLC, using a VeriPac 455 Micro Leak Test System by Packaging Technologies & Inspection (Tuckahoe, NY). Bristol Myers Squibb (New Brunswick, NJ) fully subsidized all work.
A structured interview for the DSM-III personality disorders. A preliminary report.

PubMed

Stangl, D; Pfohl, B; Zimmerman, M; Bowers, W; Corenthal, C

1985-06-01

With few exceptions, published studies fail to indicate that the DSM-III personality disorders can be distinguished from each other with respect to etiology, prognosis, treatment response, or family history. The Structured Interview for the DSM-III Personality Disorders (SIDP) was developed to improve axis II diagnostic reliability, and hence allow validity testing of axis II. Sixty-three subjects were independently rated by two interviewers using the SIDP. The kappa coefficients for interrater agreement reached .70 or higher for histrionic, borderline, and dependent personalities. While it is impossible to separate the validity testing of the SIDP from validity testing of the DSM-III personality criteria themselves, preliminary results from 102 inpatient SIDP interviews suggest some criterion-based validity with respect to standard personality rating scales and some construct validity with respect to the dexamethasone suppression test.
34 CFR 462.11 - What must an application contain?

Code of Federal Regulations, 2010 CFR

2010-07-01

... the methodology and procedures used to measure the reliability of the test. (h) Construct validity... previous test, and results from validity, reliability, and equating or standard-setting studies undertaken... NRS educational functioning levels (content validity). Documentation of the extent to which the items...
Development and validation of the Cancer Exercise Stereotypes Scale.

PubMed

Falzon, Charlène; Sabiston, Catherine; Bergamaschi, Alessandro; Corrion, Karine; Chalabaev, Aïna; D'Arripe-Longueville, Fabienne

2014-01-01

The objective of this study was to develop and validate a French-language questionnaire measuring stereotypes related to exercise in cancer patients: The Cancer Exercise Stereotypes Scale (CESS). Four successive steps were carried out with 806 participants. First, a preliminary version was developed on the basis of the relevant literature and qualitative interviews. A test of clarity then led to the reformulation of six of the 30 items. Second, based on the modification indices of the first confirmatory factorial analysis, 11 of the 30 initial items were deleted. A new factorial structure analysis showed a good fit and validated a 19-item instrument with five subscales. Third, the stability of the instrument was tested over time. Last, tests of construct validity were conducted to examine convergent validity and discriminant validity. The French-language CESS appears to have good psychometric qualities and can be used to test theoretical tenets and inform intervention strategies on ways to foster exercise in cancer patients.
Validity of Scientific Based Chemistry Android Module to Empower Science Process Skills (SPS) in Solubility Equilibrium

NASA Astrophysics Data System (ADS)

Antrakusuma, B.; Masykuri, M.; Ulfa, M.

2018-04-01

Evolution of Android technology can be applied to chemistry learning, one of the complex chemistry concept was solubility equilibrium. this concept required the science process skills (SPS). This study aims to: 1) Characteristic scientific based chemistry Android module to empowering SPS, and 2) Validity of the module based on content validity and feasibility test. This research uses a Research and Development approach (RnD). Research subjects were 135 s1tudents and three teachers at three high schools in Boyolali, Central of Java. Content validity of the module was tested by seven experts using Aiken’s V technique, and the module feasibility was tested to students and teachers in each school. Characteristics of chemistry module can be accessed using the Android device. The result of validation of the module contents got V = 0.89 (Valid), and the results of the feasibility test Obtained 81.63% (by the student) and 73.98% (by the teacher) indicates this module got good criteria.
The Method Effect in Communicative Testing.

ERIC Educational Resources Information Center

Canale, Michael

1981-01-01

A focus on test validity includes a consideration of the way a test measures that which it proposes to test; in other words, the validity of a test depends on method as well as content. This paper examines three areas of concern: (1) some features of communication that test method should reflect, (2) the main components of method, and (3) some…
The Unified Language Testing Plan: Speaking Proficiency Test. Russian Pilot Validation Studies. Report Number 2.

ERIC Educational Resources Information Center

Thornton, Julie A.

The report describes one segment of the Federal Language Testing Board's Unified Language Testing Plan (ULTP), the validation of the speaking proficiency test in Russian. The ULTP is a project to increase standardization of foreign language proficiency measurement and promote sharing of resources among testing programs in the federal government.…
The Unified Language Testing Plan: Speaking Proficiency Test. Spanish and English Pilot Validation Studies. Report Number 1.

ERIC Educational Resources Information Center

Thornton, Julie A.

This report describes one segment of the Federal Language Testing Board's Unified Language Testing Plan (ULTP), the validation of speaking proficiency tests in Spanish and English. The ULTP is a project to increase standardization of foreign language proficiency measurement and promote sharing of resources among testing programs in the federal…
CAPTIONALS: A computer aided testing environment for the verification and validation of communication protocols

NASA Technical Reports Server (NTRS)

Feng, C.; Sun, X.; Shen, Y. N.; Lombardi, Fabrizio

1992-01-01

This paper covers the verification and protocol validation for distributed computer and communication systems using a computer aided testing approach. Validation and verification make up the so-called process of conformance testing. Protocol applications which pass conformance testing are then checked to see whether they can operate together. This is referred to as interoperability testing. A new comprehensive approach to protocol testing is presented which address: (1) modeling for inter-layer representation for compatibility between conformance and interoperability testing; (2) computational improvement to current testing methods by using the proposed model inclusive of formulation of new qualitative and quantitative measures and time-dependent behavior; (3) analysis and evaluation of protocol behavior for interactive testing without extensive simulation.
Veggie and the VEG-01 Hardware Validation Test

NASA Technical Reports Server (NTRS)

Massa, Gioia; wheeler, Ray; Smith, Trent

2015-01-01

This presentation presents a brief overview of KSC plant science hardware for space and then details the Veggie hardware and the VEG-01 hardware validation test. The test results and future plans are discussed.
An Efficient Data Partitioning to Improve Classification Performance While Keeping Parameters Interpretable

PubMed Central

Korjus, Kristjan; Hebart, Martin N.; Vicente, Raul

2016-01-01

Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier’s generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term “Cross-validation and cross-testing” improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do. PMID:27564393
High Fidelity Modeling of Field-Reversed Configuration (FRC) Thrusters (Briefing Charts)

DTIC Science & Technology

2017-05-24

Converged Math → Irrelevant Solutions? Validation: Fluids Example Stoke’s Flow MARTIN, SOUSA, TRAN (AFRL/RQRS) DISTRIBUTION A - APPROVED FOR PUBLIC RELEASE...Convergence Tests Converged Math → Irrelevant Solutions? Must be Aware of Valid Assumption Regions Validation: Fluids Example Stoke’s Flow Potential...AND VALIDATION Verification: Asymptotic Models → Analytical Solutions Yields Exact Convergence Tests Converged Math → Irrelevant Solutions? Must be

Construct Validity of the Nutrition and Activity Knowledge Scale in a French Sample of Adolescents with Mild to Moderate Intellectual Disability

ERIC Educational Resources Information Center

Maiano, Christophe; Begarie, Jerome; Morin, Alexandre J. S.; Garbarino, Jean-Marie; Ninot, Gregory

2010-01-01

The purpose of this study was to test the reliability (i.e. internal consistency and test-retest reliability) and construct validity (i.e. content validity, factor validity, measurement invariance, and latent mean invariance) of the Nutrition and Activity Knowledge Scale (NAKS) in a sample of French adolescents with mild to moderate Intellectual…
Reliability and validity of the Children's Fear Survey Schedule-Dental Subscale for Arabic-speaking children: a cross-sectional study.

PubMed

El-Housseiny, Azza A; Alsadat, Farah A; Alamoudi, Najlaa M; El Derwi, Douaa A; Farsi, Najat M; Attar, Moaz H; Andijani, Basil M

2016-04-14

Early recognition of dental fear is essential for the effective delivery of dental care. This study aimed to test the reliability and validity of the Arabic version of the Children's Fear Survey Schedule-Dental Subscale (CFSS-DS). A school-based sample of 1546 children was randomly recruited. The Arabic version of the CFSS-DS was completed by children during class time. The scale was tested for internal consistency and test-retest reliability. To test criterion validity, children's behavior was assessed using the Frankl scale during dental examination, and results were compared with children's CFSS-DS scores. To test the scale's construct validity, scores on "fear of going to the dentist soon" were correlated with CFSS-DS scores. Factor analysis was also used. The Arabic version of the CFSS-DS showed high reliability regarding both test-retest reliability (intraclass correlation = 0.83, p < 0.001) and internal consistency (Cronbach's α = 0.88). It showed good criterion validity: children with negative behavior had significantly higher fear scores (t = 13.67, p < 0.001). It also showed moderate construct validity (Spearman's rho correlation, r = 0.53, p < 0.001). Factor analysis identified the following factors: "fear of invasive dental procedures," "fear of less invasive dental procedures" and "fear of strangers." The Arabic version of the CFSS-DS is a reliable and valid measure of dental fear in Arabic-speaking children. Pediatric dentists and researchers may use this validated version of the CFSS-DS to measure dental fear in Arabic-speaking children.
Utility of NIST Whole-Genome Reference Materials for the Technical Validation of a Multigene Next-Generation Sequencing Test.

PubMed

Shum, Bennett O V; Henner, Ilya; Belluoccio, Daniele; Hinchcliffe, Marcus J

2017-07-01

The sensitivity and specificity of next-generation sequencing laboratory developed tests (LDTs) are typically determined by an analyte-specific approach. Analyte-specific validations use disease-specific controls to assess an LDT's ability to detect known pathogenic variants. Alternatively, a methods-based approach can be used for LDT technical validations. Methods-focused validations do not use disease-specific controls but use benchmark reference DNA that contains known variants (benign, variants of unknown significance, and pathogenic) to assess variant calling accuracy of a next-generation sequencing workflow. Recently, four whole-genome reference materials (RMs) from the National Institute of Standards and Technology (NIST) were released to standardize methods-based validations of next-generation sequencing panels across laboratories. We provide a practical method for using NIST RMs to validate multigene panels. We analyzed the utility of RMs in validating a novel newborn screening test that targets 70 genes, called NEO1. Despite the NIST RM variant truth set originating from multiple sequencing platforms, replicates, and library types, we discovered a 5.2% false-negative variant detection rate in the RM truth set genes that were assessed in our validation. We developed a strategy using complementary non-RM controls to demonstrate 99.6% sensitivity of the NEO1 test in detecting variants. Our findings have implications for laboratories or proficiency testing organizations using whole-genome NIST RMs for testing. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Clinical Functional Capacity Testing in Patients With Facioscapulohumeral Muscular Dystrophy: Construct Validity and Interrater Reliability of Antigravity Tests.

PubMed

Rijken, Noortje H; van Engelen, Baziel G; Weerdesteyn, Vivian; Geurts, Alexander C

2015-12-01

To evaluate the construct validity and interrater reliability of 4 simple antigravity tests in a small group of patients with facioscapulohumeral muscular dystrophy (FSHD). Case-control study. University medical center. Patients with various severity levels of FSHD (n=9) and healthy control subjects (n=10) were included (N=19). Not applicable. A 4-point ordinal scale was designed to grade performance on the following 4 antigravity tests: sit to stance, stance to sit, step up, and step down. In addition, the 6-minute walk test, 10-m walking test, Berg Balance Scale, and timed Up and Go test were administered as conventional tests. Construct validity was determined by linear regression analysis using the Clinical Severity Score (CSS) as the dependent variable. Interrater agreement was tested using a κ analysis. Patients with FSHD performed worse on all 4 antigravity tests compared with the controls. Stronger correlations were found within than between test categories (antigravity vs conventional). The antigravity tests revealed the highest explained variance with regard to the CSS (R(2)=.86, P=.014). Interrater agreement was generally good. The results of this exploratory study support the construct validity and interrater reliability of the proposed antigravity tests for the assessment of functional capacity in patients with FSHD taking into account the use of compensatory strategies. Future research should further validate these results in a larger sample of patients with FSHD. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Investigation of the Lollipop Test as a Pre-Kindergarten Screening Instrument.

ERIC Educational Resources Information Center

Chew, Alex L.; Morris, John D.

1987-01-01

The validity of the Lollipop Test: A Diagnostic Screening Test of School Readiness was examined for 129 pre-kindergarten subjects using the Developmental Indicator for the Assessment of Learning as the criterion. Concurrent validity was demonstrated across the test batteries. The Lollipop Test appears to be an attractive alternative…
Cultural Adaptation of the Portuguese Version of the “Sniffin’ Sticks” Smell Test: Reliability, Validity, and Normative Data

PubMed Central

Ribeiro, João Carlos; Simões, João; Silva, Filipe; Silva, Eduardo D.; Hummel, Cornelia; Hummel, Thomas; Paiva, António

2016-01-01

The cross-cultural adaptation and validation of the Sniffin`Sticks test for the Portuguese population is described. Over 270 people participated in four experiments. In Experiment 1, 67 participants rated the familiarity of presented odors and seven descriptors of the original test were adapted to a Portuguese context. In Experiment 2, the Portuguese version of Sniffin`Sticks test was administered to 203 healthy participants. Older age, male gender and active smoking status were confirmed as confounding factors. The third experiment showed the validity of the Portuguese version of Sniffin`Sticks test in discriminating healthy controls from patients with olfactory dysfunction. In Experiment 4, the test-retest reliability for both the composite score (r71 = 0.86) and the identification test (r71 = 0.62) was established (p<0.001). Normative data for the Portuguese version of Sniffin`Sticks test is provided, showing good validity and reliability and effectively distinguishing patients from healthy controls with high sensitivity and specificity. The Portuguese version of Sniffin`Sticks test identification test is a clinically suitable screening tool in routine outpatient Portuguese settings. PMID:26863023
Perspectives on Validation of High-Throughput Assays Supporting 21st Century Toxicity Testing1

PubMed Central

Judson, Richard; Kavlock, Robert; Martin, Matt; Reif, David; Houck, Keith; Knudsen, Thomas; Richard, Ann; Tice, Raymond R.; Whelan, Maurice; Xia, Menghang; Huang, Ruili; Austin, Christopher; Daston, George; Hartung, Thomas; Fowle, John R.; Wooge, William; Tong, Weida; Dix, David

2014-01-01

Summary In vitro, high-throughput screening (HTS) assays are seeing increasing use in toxicity testing. HTS assays can simultaneously test many chemicals, but have seen limited use in the regulatory arena, in part because of the need to undergo rigorous, time-consuming formal validation. Here we discuss streamlining the validation process, specifically for prioritization applications in which HTS assays are used to identify a high-concern subset of a collection of chemicals. The high-concern chemicals could then be tested sooner rather than later in standard guideline bioassays. The streamlined validation process would continue to ensure the reliability and relevance of assays for this application. We discuss the following practical guidelines: (1) follow current validation practice to the extent possible and practical; (2) make increased use of reference compounds to better demonstrate assay reliability and relevance; (3) deemphasize the need for cross-laboratory testing, and; (4) implement a web-based, transparent and expedited peer review process. PMID:23338806
The Anomalous Sentences Repetition Test: Replication and Validation Study.

ERIC Educational Resources Information Center

Weeks, David J.

1986-01-01

Presents a brief clinical test, derived from earlier neuropsychological instruments, with evidence for its reliability, interscorer agreement, and validity. The latter is based upon correlations with both CAT scan measures of cortical atrophy and ventricular enlargement, as well as correlations with seven other previously validated cognitive…
40 CFR 761.389 - Testing parameter requirements.

Code of Federal Regulations, 2012 CFR

2012-07-01

... variable testing parameters described in this section which may be used in the validation study. The conditions demonstrated in the validation study for these variables shall become the required conditions for.... During the validation study, use the same ratio of contaminated surface area to soak solvent volume as...
40 CFR 761.389 - Testing parameter requirements.

Code of Federal Regulations, 2014 CFR

2014-07-01

... variable testing parameters described in this section which may be used in the validation study. The conditions demonstrated in the validation study for these variables shall become the required conditions for.... During the validation study, use the same ratio of contaminated surface area to soak solvent volume as...
40 CFR 761.389 - Testing parameter requirements.

Code of Federal Regulations, 2013 CFR

2013-07-01

... variable testing parameters described in this section which may be used in the validation study. The conditions demonstrated in the validation study for these variables shall become the required conditions for.... During the validation study, use the same ratio of contaminated surface area to soak solvent volume as...
Assessing Discriminative Performance at External Validation of Clinical Prediction Models

PubMed Central

Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W.

2016-01-01

Introduction External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. Methods We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. Results The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. Conclusion The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients. PMID:26881753
Assessing Discriminative Performance at External Validation of Clinical Prediction Models.

PubMed

Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W

2016-01-01

External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients.
Assessing working memory in children with ADHD: Minor administration and scoring changes may improve digit span backward's construct validity.

PubMed

Wells, Erica L; Kofler, Michael J; Soto, Elia F; Schaefer, Hillary S; Sarver, Dustin E

2018-01-01

Pediatric ADHD is associated with impairments in working memory, but these deficits often go undetected when using clinic-based tests such as digit span backward. The current study pilot-tested minor administration/scoring modifications to improve digit span backward's construct and predictive validities in a well-characterized sample of children with ADHD. WISC-IV digit span was modified to administer all trials (i.e., ignore discontinue rule) and count digits rather than trials correct. Traditional and modified scores were compared to a battery of criterion working memory (construct validity) and academic achievement tests (predictive validity) for 34 children with ADHD ages 8-13 (M=10.41; 11 girls). Traditional digit span backward scores failed to predict working memory or KTEA-2 achievement (allns). Alternate administration/scoring of digit span backward significantly improved its associations with working memory reordering (r=.58), working memory dual-processing (r=.53), working memory updating (r=.28), and KTEA-2 achievement (r=.49). Consistent with prior work, these findings urge caution when interpreting digit span performance. Minor test modifications may address test validity concerns, and should be considered in future test revisions. Digit span backward becomes a valid measure of working memory at exactly the point that testing is traditionally discontinued. Copyright © 2017 Elsevier Ltd. All rights reserved.
The development and psychometric testing of a Disaster Response Self-Efficacy Scale among undergraduate nursing students.

PubMed

Li, Hong-Yan; Bi, Rui-Xue; Zhong, Qing-Ling

2017-12-01

Disaster nurse education has received increasing importance in China. Knowing the abilities of disaster response in undergraduate nursing students is beneficial to promote teaching and learning. However, there are few valid and reliable tools that measure the abilities of disaster response in undergraduate nursing students. To develop a self-report scale of self-efficacy in disaster response for Chinese undergraduate nursing students and test its psychometric properties. Nursing students (N=318) from two medical colleges were chosen by purposive sampling. The Disaster Response Self-Efficacy Scale (DRSES) was developed and psychometrically tested. Reliability and content validity were studied. Construct validity was tested by exploratory and confirmatory factor analysis. Reliability was tested by internal consistency and test-retest reliability. The DRSES consisted of 3 factors and 19 items with a 5-point rating. The content validity was 0.91, Cronbach's alpha coefficient was 0.912, and the intraclass correlation coefficient for test-retest reliability was 0.953. The construct validity was good (χ 2 /df=2.440, RMSEA=0.068, NFI=0.907, CFI=0.942, IFI=0.430, p<0.001). The newly developed DRSES has proven good reliability and validity. It could therefore be used as an assessment tool to evaluate self-efficacy in disaster response for Chinese undergraduate nursing students. Copyright © 2017. Published by Elsevier Ltd.
Comprehensive validation scheme for in situ fiber optics dissolution method for pharmaceutical drug product testing.

PubMed

Mirza, Tahseen; Liu, Qian Julie; Vivilecchia, Richard; Joshi, Yatindra

2009-03-01

There has been a growing interest during the past decade in the use of fiber optics dissolution testing. Use of this novel technology is mainly confined to research and development laboratories. It has not yet emerged as a tool for end product release testing despite its ability to generate in situ results and efficiency improvement. One potential reason may be the lack of clear validation guidelines that can be applied for the assessment of suitability of fiber optics. This article describes a comprehensive validation scheme and development of a reliable, robust, reproducible and cost-effective dissolution test using fiber optics technology. The test was successfully applied for characterizing the dissolution behavior of a 40-mg immediate-release tablet dosage form that is under development at Novartis Pharmaceuticals, East Hanover, New Jersey. The method was validated for the following parameters: linearity, precision, accuracy, specificity, and robustness. In particular, robustness was evaluated in terms of probe sampling depth and probe orientation. The in situ fiber optic method was found to be comparable to the existing manual sampling dissolution method. Finally, the fiber optic dissolution test was successfully performed by different operators on different days, to further enhance the validity of the method. The results demonstrate that the fiber optics technology can be successfully validated for end product dissolution/release testing. (c) 2008 Wiley-Liss, Inc. and the American Pharmacists Association
Correlation Results for a Mass Loaded Vehicle Panel Test Article Finite Element Models and Modal Survey Tests

NASA Technical Reports Server (NTRS)

Maasha, Rumaasha; Towner, Robert L.

2012-01-01

High-fidelity Finite Element Models (FEMs) were developed to support a recent test program at Marshall Space Flight Center (MSFC). The FEMs correspond to test articles used for a series of acoustic tests. Modal survey tests were used to validate the FEMs for five acoustic tests (a bare panel and four different mass-loaded panel configurations). An additional modal survey test was performed on the empty test fixture (orthogrid panel mounting fixture, between the reverb and anechoic chambers). Modal survey tests were used to test-validate the dynamic characteristics of FEMs used for acoustic test excitation. Modal survey testing and subsequent model correlation has validated the natural frequencies and mode shapes of the FEMs. The modal survey test results provide a basis for the analysis models used for acoustic loading response test and analysis comparisons
Objectivity, Reliability, and Validity of the Bent-Knee Push-Up for College-Age Women

ERIC Educational Resources Information Center

Wood, Heather M.; Baumgartner, Ted A.

2004-01-01

The revised push-up test has been found to have good validity but it produces many zero scores for women. Maybe there should be an alternative to the revised push-up test for college-age women. The purpose of this study was to determine the objectivity, reliability, and validity for the bent-knee push-up test (executed on hands and knees) for…
Math Placement Validation Study: A Summary of the Criterion-Related Validity Evidence and Multiple Measures Data for the San Diego Community College District.

ERIC Educational Resources Information Center

Armstrong, William B.

In Fall 1994, the San Diego Community College District (SDCCD), in California, conducted a study to determine the validity of the Mathematics Diagnostic Testing Project (MDTP) placement test. The MDTP provides tests at four levels (i.e., algebra readiness, elementary algebra, intermediate algebra, and pre-calculus) and is used in the District for…
Criterion-Related Validity of the Distance- and Time-Based Walk/Run Field Tests for Estimating Cardiorespiratory Fitness: A Systematic Review and Meta-Analysis.

PubMed

Mayorga-Vega, Daniel; Bocanegra-Parrilla, Raúl; Ornelas, Martha; Viciana, Jesús

2016-01-01

The main purpose of the present meta-analysis was to examine the criterion-related validity of the distance- and time-based walk/run tests for estimating cardiorespiratory fitness among apparently healthy children and adults. Relevant studies were searched from seven electronic bibliographic databases up to August 2015 and through other sources. The Hunter-Schmidt's psychometric meta-analysis approach was conducted to estimate the population criterion-related validity of the following walk/run tests: 5,000 m, 3 miles, 2 miles, 3,000 m, 1.5 miles, 1 mile, 1,000 m, ½ mile, 600 m, 600 yd, ¼ mile, 15 min, 12 min, 9 min, and 6 min. From the 123 included studies, a total of 200 correlation values were analyzed. The overall results showed that the criterion-related validity of the walk/run tests for estimating maximum oxygen uptake ranged from low to moderate (rp = 0.42-0.79), with the 1.5 mile (rp = 0.79, 0.73-0.85) and 12 min walk/run tests (rp = 0.78, 0.72-0.83) having the higher criterion-related validity for distance- and time-based field tests, respectively. The present meta-analysis also showed that sex, age and maximum oxygen uptake level do not seem to affect the criterion-related validity of the walk/run tests. When the evaluation of an individual's maximum oxygen uptake attained during a laboratory test is not feasible, the 1.5 mile and 12 min walk/run tests represent useful alternatives for estimating cardiorespiratory fitness. As in the assessment with any physical fitness field test, evaluators must be aware that the performance score of the walk/run field tests is simply an estimation and not a direct measure of cardiorespiratory fitness.

The Trunk Impairment Scale - modified to ordinal scales in the Norwegian version.

PubMed

Gjelsvik, Bente; Breivik, Kyrre; Verheyden, Geert; Smedal, Tori; Hofstad, Håkon; Strand, Liv Inger

2012-01-01

To translate the Trunk Impairment Scale (TIS), a measure of trunk control in patients after stroke, into Norwegian (TIS-NV), and to explore its construct validity, internal consistency, intertester and test-retest reliability. TIS was translated according to international guidelines. The validity study was performed on data from 201 patients with acute stroke. Fifty patients with stroke and acquired brain injury were recruited to examine intertester and test-retest reliability. Construct validity was analyzed with exploratory and confirmatory factor analysis and item response theory, internal consistency with Cronbach's alpha test, and intertester and test-retest reliability with kappa and intraclass correlation coefficient tests. The back-translated version of TIS-NV was validated by the original developer. The subscale Static sitting balance was removed. By combining items from the subscales Dynamic sitting balance and Coordination, six ordinal superitems (testlets) were constructed. The TIS-NV was renamed the modified TIS-NV (TIS-modNV). After modifications the TIS-modNV fitted well to a locally dependent unidimensional item response theory model. It demonstrated good construct validity, excellent internal consistency, and high intertester and test-retest reliability for the total score. This study supports that the TIS-modNV is a valid and reliable scale for use in clinical practice and research.
Validation of the Arabic Version of the Internet Gaming Disorder-20 Test.

PubMed

Hawi, Nazir S; Samaha, Maya

2017-04-01

In recent years, researchers have been trying to shed light on gaming addiction and its association with different psychiatric disorders and psychological determinants. The latest edition version of the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) included in its Section 3 Internet Gaming Disorder (IGD) as a condition for further empirical study and proposed nine criteria for the diagnosis of IGD. The 20-item Internet Gaming Disorder (IGD-20) Test was developed as a valid and reliable tool to assess gaming addiction based on the nine criteria set by the DSM-5. The aim of this study is to validate an Arabic version of the IGD-20 Test. The Arabic version of IGD-20 will not only help in identifying Arabic-speaking pathological gamers but also stimulate cross-cultural studies that could contribute to an area in need of more research for insight and treatment. After a process of translation and back-translation and with the participation of a sizable sample of Arabic-speaking adolescents, the present study conducted a psychometric validation of the IGD-20 Test. Our confirmatory factor analysis showed the validity of the Arabic version of the IGD-20 Test. The one-factor model of the Arabic IGD-20 Test had very good psychometric properties, and it fitted the sample data extremely well. In addition, correlation analysis between the IGD-20 Test and the daily duration on weekdays and weekends gameplay revealed significant positive relationships that warranted a criterion-related validation. Thus, the Arabic version of the IGD-20 Test is a valid and reliable measure of IGD among Arabic-speaking populations.
Does IQ Really Predict Job Performance?

PubMed Central

Richardson, Ken; Norgate, Sarah H.

2015-01-01

IQ has played a prominent part in developmental and adult psychology for decades. In the absence of a clear theoretical model of internal cognitive functions, however, construct validity for IQ tests has always been difficult to establish. Test validity, therefore, has always been indirect, by correlating individual differences in test scores with what are assumed to be other criteria of intelligence. Job performance has, for several reasons, been one such criterion. Correlations of around 0.5 have been regularly cited as evidence of test validity, and as justification for the use of the tests in developmental studies, in educational and occupational selection and in research programs on sources of individual differences. Here, those correlations are examined together with the quality of the original data and the many corrections needed to arrive at them. It is concluded that considerable caution needs to be exercised in citing such correlations for test validation purposes. PMID:26405429
The Role of Testing in Affirmative Action.

ERIC Educational Resources Information Center

Manning, Winton H.

Graphs and charts pertaining to testing in affirmative action are presented. Data concern the following: the predictive validity of College Board admissions tests using freshman grade point average as the criterion; validity coefficients of undergraduate grade point average (UGPA) alone, Law School Admission Test (LSAT) scores, and undergraduate…
Tests examining skill outcomes in sport: a systematic review of measurement properties and feasibility.

PubMed

Robertson, Samuel J; Burnett, Angus F; Cochrane, Jodie

2014-04-01

A high level of participant skill is influential in determining the outcome of many sports. Thus, tests assessing skill outcomes in sport are commonly used by coaches and researchers to estimate an athlete's ability level, to evaluate the effectiveness of interventions or for the purpose of talent identification. The objective of this systematic review was to examine the methodological quality, measurement properties and feasibility characteristics of sporting skill outcome tests reported in the peer-reviewed literature. A search of both SPORTDiscus and MEDLINE databases was undertaken. Studies that examined tests of sporting skill outcomes were reviewed. Only studies that investigated measurement properties of the test (reliability or validity) were included. A total of 22 studies met the inclusion/exclusion criteria. A customised checklist of assessment criteria, based on previous research, was utilised for the purpose of this review. A range of sports were the subject of the 22 studies included in this review, with considerations relating to methodological quality being generally well addressed by authors. A range of methods and statistical procedures were used by researchers to determine the measurement properties of their skill outcome tests. The majority (95%) of the reviewed studies investigated test-retest reliability, and where relevant, inter and intra-rater reliability was also determined. Content validity was examined in 68% of the studies, with most tests investigating multiple skill domains relevant to the sport. Only 18% of studies assessed all three reviewed forms of validity (content, construct and criterion), with just 14% investigating the predictive validity of the test. Test responsiveness was reported in only 9% of studies, whilst feasibility received varying levels of attention. In organised sport, further tests may exist which have not been investigated in this review. This could be due to such tests firstly not being published in the peer-review literature and secondly, not having their measurement properties (i.e., reliability or validity) examined formally. Of the 22 studies included in this review, items relating to test methodological quality were, on the whole, well addressed. Test-retest reliability was determined in all but one of the reviewed studies, whilst most studies investigated at least two aspects of validity (i.e., content, construct or criterion-related validity). Few studies examined predictive validity or responsiveness. While feasibility was addressed in over half of the studies, practicality and test limitations were rarely addressed. Consideration of study quality, measurement properties and feasibility components assessed in this review can assist future researchers when developing or modifying tests of sporting skill outcomes.
Validation of the German version of the Ford Insomnia Response to Stress Test.

PubMed

Dieck, Arne; Helbig, Susanne; Drake, Christopher L; Backhaus, Jutta

2018-06-01

The purpose of this study was to assess the psychometric properties of a German version of the Ford Insomnia Response to Stress Test with groups with and without sleep problems. Three studies were analysed. Data set 1 was based on an initial screening for a sleep training program (n = 393), data set 2 was based on a study to test the test-retest reliability of the Ford Insomnia Response to Stress Test (n = 284) and data set 3 was based on a study to examine the influence of competitive sport on sleep (n = 37). Data sets 1 and 2 were used to test internal consistency, factor structure, convergent validity, discriminant validity and test-retest reliability of the Ford Insomnia Response to Stress Test. Content validity was tested using data set 3. Cronbach's alpha of the Ford Insomnia Response to Stress Test was good (α = 0.80) and test-retest reliability was satisfactory (r = 0.72). Overall, the one-factor model showed the best fit. Furthermore, significant positive correlations between the Ford Insomnia Response to Stress Test and impaired sleep quality, depression and stress reactivity were in line with the expectations regarding the convergent validity. Subjects with sleep problems had significantly higher scores in the Ford Insomnia Response to Stress Test than subjects without sleep problems (P < 0.01). Competitive athletes with higher scores in the Ford Insomnia Response to Stress Test had significantly lower sleep quality (P = 0.01), demonstrating that vulnerability for stress-induced sleep disturbances accompanies poorer sleep quality in stressful episodes. The findings show that the German version of the Ford Insomnia Response to Stress Test is a reliable and valid questionnaire to assess the vulnerability to stress-induced sleep disturbances. © 2017 European Sleep Research Society.
The ad-libitum alcohol 'taste test': secondary analyses of potential confounds and construct validity.

PubMed

Jones, Andrew; Button, Emily; Rose, Abigail K; Robinson, Eric; Christiansen, Paul; Di Lemma, Lisa; Field, Matt

2016-03-01

Motivation to drink alcohol can be measured in the laboratory using an ad-libitum 'taste test', in which participants rate the taste of alcoholic drinks whilst their intake is covertly monitored. Little is known about the construct validity of this paradigm. The objective of this study was to investigate variables that may compromise the validity of this paradigm and its construct validity. We re-analysed data from 12 studies from our laboratory that incorporated an ad-libitum taste test. We considered time of day and participants' awareness of the purpose of the taste test as potential confounding variables. We examined whether gender, typical alcohol consumption, subjective craving, scores on the Alcohol Use Disorders Identification Test and perceived pleasantness of the drinks predicted ad-libitum consumption (construct validity). We included 762 participants (462 female). Participant awareness and time of day were not related to ad-libitum alcohol consumption. Males drank significantly more alcohol than females (p < 0.001), and individual differences in typical alcohol consumption (p = 0.04), craving (p < 0.001) and perceived pleasantness of the drinks (p = 0.04) were all significant predictors of ad-libitum consumption. We found little evidence that time of day or participant awareness influenced alcohol consumption. The construct validity of the taste test was supported by relationships between ad-libitum consumption and typical alcohol consumption, craving and pleasantness ratings of the drinks. The ad-libitum taste test is a valid method for the assessment of alcohol intake in the laboratory.
Performance validation of the ANSER control laws for the F-18 HARV

NASA Technical Reports Server (NTRS)

Messina, Michael D.

1995-01-01

The ANSER control laws were implemented in Ada by NASA Dryden for flight test on the High Alpha Research Vehicle (HARV). The Ada implementation was tested in the hardware-in-the-loop (HIL) simulation, and results were compared to those obtained with the NASA Langley batch Fortran implementation of the control laws which are considered the 'truth model.' This report documents the performance validation test results between these implementations. This report contains the ANSER performance validation test plan, HIL versus batch time-history comparisons, simulation scripts used to generate checkcases, and detailed analysis of discrepancies discovered during testing.
Performance validation of the ANSER Control Laws for the F-18 HARV

NASA Technical Reports Server (NTRS)

Messina, Michael D.

1995-01-01

The ANSER control laws were implemented in Ada by NASA Dryden for flight test on the High Alpha Research Vehicle (HARV). The Ada implementation was tested in the hardware-in-the-loop (HIL) simulation, and results were compared to those obtained with the NASA Langley batch Fortran implementation of the control laws which are considered the 'truth model'. This report documents the performance validation test results between these implementations. This report contains the ANSER performance validation test plan, HIL versus batch time-history comparisons, simulation scripts used to generate checkcases, and detailed analysis of discrepancies discovered during testing.
Development and validation of a new questionnaire assessing quality of life in adults with hypopituitarism: Adult Hypopituitarism Questionnaire (AHQ).

PubMed

Ishii, Hitoshi; Shimatsu, Akira; Okimura, Yasuhiko; Tanaka, Toshiaki; Hizuka, Naomi; Kaji, Hidesuke; Hanew, Kunihiko; Oki, Yutaka; Yamashiro, Sayuri; Takano, Koji; Chihara, Kazuo

2012-01-01

To develop and validate the Adult Hypopituitarism Questionnaire (AHQ) as a disease-specific, self-administered questionnaire for evaluation of quality of life (QOL) in adult patients with hypopituitarism. We developed and validated this new questionnaire, using a standardized procedure which included item development, pilot-testing and psychometric validation. Of the patients who participated in psychometric validation, those whose clinical conditions were judged to be stable were asked to answer the survey questionnaire twice, in order to assess test-retest reliability. Content validity of the initial questionnaire was evaluated via two pilot tests. After these tests, we made minor revisions and finalized the initial version of the questionnaire. The questionnaire was constructed with two domains, one psycho-social and the other physical. For psychometric assessment, analyses were performed on the responses of 192 adult patients with various types of hypopituitarism. The intraclass correlations of the respective domains were 0.91 and 0.95, and the Cronbach's alpha coefficients were 0.96 and 0.95, indicating adequate test-retest reliability and internal consistency for each domain. For known-group validity, patients with hypopituitarism due to hypothalamic disorder showed significantly lower scores in 11 out of 13 sub-domains compared to those who had hypopituitarism due to pituitary disorder. Regarding construct validity, the domain structure was found to be almost the same as that initially hypothesized. Exploratory factor analysis (n = 228) demonstrated that each domain consisted of six and seven sub-domains. The AHQ showed good reliability and validity for evaluating QOL in adult patients with hypopituitarism.
Development and Validity Testing of an Arthritis Self-Management Assessment Tool.

PubMed

Oh, HyunSoo; Han, SunYoung; Kim, SooHyun; Seo, WhaSook

Because of the chronic, progressive nature of arthritis and the substantial effects it has on quality of life, patients may benefit from self-management. However, no valid, reliable self-management assessment tool has been devised for patients with arthritis. This study was conducted to develop a comprehensive self-management assessment tool for patients with arthritis, that is, the Arthritis Self-Management Assessment Tool (ASMAT). To develop a list of qualified items corresponding to the conceptual definitions and attributes of arthritis self-management, a measurement model was established on the basis of theoretical and empirical foundations. Content validity testing was conducted to evaluate whether listed items were suitable for assessing arthritis self-management. Construct validity and reliability of the ASMAT were tested. Construct validity was examined using confirmatory factor analysis and nomological validity. The 32-item ASMAT was developed with a sample composed of patients in a clinic in South Korea. Content validity testing validated the 32 items, which comprised medical (10 items), behavioral (13 items), and psychoemotional (9 items) management subscales. Construct validity testing of the ASMAT showed that the 32 items properly corresponded with conceptual constructs of arthritis self-management, and were suitable for assessing self-management ability in patients with arthritis. Reliability was also well supported. The ASMAT devised in the present study may aid the evaluation of patient self-management ability and the effectiveness of self-management interventions. The authors believe the developed tool may also aid the identification of problems associated with the adoption of self-management practice, and thus improve symptom management, independence, and quality of life of patients with arthritis.
The development and psychometric testing of East Asian Acculturation Scale among Asian immigrant women in Taiwan.

PubMed

Kuo, Shu-Fen; Chang, Wen-Yin; Chang, Lu-I; Chou, Yu-Hua; Chen, Ching-Min

2013-01-01

This is a report of development and psychometric testing of the East Asian Acculturation Measure-Chinese version (EAAM-C) scale. An instrument validation design with a cross-sectional survey was conducted. The process was carried in two phases. In Phase 1, Barry's East Asian Acculturation Measure was translated and back translated to evaluate its content, face validity, and feasibility validity. In Phase 2, the 16-item EAAM-C was pilot-tested among 485 female immigrants for test-retest reliability, internal consistency, theoretically-supported construct validity and concurrent validity. The pilot work and the survey results indicated the tools possessed adequate content and face validity. The Cronbach's Alphas for the EAAM-C was 0.72, and 0.76-0.79 for its subscales, and the correlation of test-retest reliability (at 3 weeks) was 0.75. After dropping one item, four theoretically-supported factors which explained 61.82% of the variance were abstracted using exploratory factor analysis: assimilation, integration, separation, and marginalization. Based on the underlying four-factor theoretical structures of the EAAM, the confirmatory factor analysis of the EAAM-C was further examined. The analysis revealed that the four-factor model was an acceptable fit for the data which demonstrated adequate finding in its construct validity. These factors were inter-correlated, and showed statistically significant correlation with the Chinese Health Questionnaire, indicating adequate concurrent validity. The scale shows acceptable validity and consistency, and suggests that immigrant acculturation is a complex construct. This quick evaluation instrument can be applied to assess clients' acculturation and in further developing certain interventions to improve their health.
Performance Tested Method multiple laboratory validation study of ELISA-based assays for the detection of peanuts in food.

PubMed

Park, Douglas L; Coates, Scott; Brewer, Vickery A; Garber, Eric A E; Abouzied, Mohamed; Johnson, Kurt; Ritter, Bruce; McKenzie, Deborah

2005-01-01

Performance Tested Method multiple laboratory validations for the detection of peanut protein in 4 different food matrixes were conducted under the auspices of the AOAC Research Institute. In this blind study, 3 commercially available ELISA test kits were validated: Neogen Veratox for Peanut, R-Biopharm RIDASCREEN FAST Peanut, and Tepnel BioKits for Peanut Assay. The food matrixes used were breakfast cereal, cookies, ice cream, and milk chocolate spiked at 0 and 5 ppm peanut. Analyses of the samples were conducted by laboratories representing industry and international and U.S governmental agencies. All 3 commercial test kits successfully identified spiked and peanut-free samples. The validation study required 60 analyses on test samples at the target level 5 microg peanut/g food and 60 analyses at a peanut-free level, which was designed to ensure that the lower 95% confidence limit for the sensitivity and specificity would not be <90%. The probability that a test sample contains an allergen given a prevalence rate of 5% and a positive test result using a single test kit analysis with 95% sensitivity and 95% specificity, which was demonstrated for these test kits, would be 50%. When 2 test kits are run simultaneously on all samples, the probability becomes 95%. It is therefore recommended that all field samples be analyzed with at least 2 of the validated kits.
Tests for the Assessment of Sport-Specific Performance in Olympic Combat Sports: A Systematic Review With Practical Recommendations.

PubMed

Chaabene, Helmi; Negra, Yassine; Bouguezzi, Raja; Capranica, Laura; Franchini, Emerson; Prieske, Olaf; Hbacha, Hamdi; Granacher, Urs

2018-01-01

The regular monitoring of physical fitness and sport-specific performance is important in elite sports to increase the likelihood of success in competition. This study aimed to systematically review and to critically appraise the methodological quality, validation data, and feasibility of the sport-specific performance assessment in Olympic combat sports like amateur boxing, fencing, judo, karate, taekwondo, and wrestling. A systematic search was conducted in the electronic databases PubMed, Google-Scholar, and Science-Direct up to October 2017. Studies in combat sports were included that reported validation data (e.g., reliability, validity, sensitivity) of sport-specific tests. Overall, 39 studies were eligible for inclusion in this review. The majority of studies (74%) contained sample sizes <30 subjects. Nearly, 1/3 of the reviewed studies lacked a sufficient description (e.g., anthropometrics, age, expertise level) of the included participants. Seventy-two percent of studies did not sufficiently report inclusion/exclusion criteria of their participants. In 62% of the included studies, the description and/or inclusion of a familiarization session (s) was either incomplete or not existent. Sixty-percent of studies did not report any details about the stability of testing conditions. Approximately half of the studies examined reliability measures of the included sport-specific tests (intraclass correlation coefficient [ICC] = 0.43-1.00). Content validity was addressed in all included studies, criterion validity (only the concurrent aspect of it) in approximately half of the studies with correlation coefficients ranging from r = -0.41 to 0.90. Construct validity was reported in 31% of the included studies and predictive validity in only one. Test sensitivity was addressed in 13% of the included studies. The majority of studies (64%) ignored and/or provided incomplete information on test feasibility and methodological limitations of the sport-specific test. In 28% of the included studies, insufficient information or a complete lack of information was provided in the respective field of the test application. Several methodological gaps exist in studies that used sport-specific performance tests in Olympic combat sports. Additional research should adopt more rigorous validation procedures in the application and description of sport-specific performance tests in Olympic combat sports.
The Queensland high risk foot form (QHRFF) – is it a reliable and valid clinical research tool for foot disease?

PubMed Central

2014-01-01

Background Foot disease complications, such as foot ulcers and infection, contribute to considerable morbidity and mortality. These complications are typically precipitated by “high-risk factors”, such as peripheral neuropathy and peripheral arterial disease. High-risk factors are more prevalent in specific “at risk” populations such as diabetes, kidney disease and cardiovascular disease. To the best of the authors’ knowledge a tool capturing multiple high-risk factors and foot disease complications in multiple at risk populations has yet to be tested. This study aimed to develop and test the validity and reliability of a Queensland High Risk Foot Form (QHRFF) tool. Methods The study was conducted in two phases. Phase one developed a QHRFF using an existing diabetes foot disease tool, literature searches, stakeholder groups and expert panel. Phase two tested the QHRFF for validity and reliability. Four clinicians, representing different levels of expertise, were recruited to test validity and reliability. Three cohorts of patients were recruited; one tested criterion measure reliability (n = 32), another tested criterion validity and inter-rater reliability (n = 43), and another tested intra-rater reliability (n = 19). Validity was determined using sensitivity, specificity and positive predictive values (PPV). Reliability was determined using Kappa, weighted Kappa and intra-class correlation (ICC) statistics. Results A QHRFF tool containing 46 items across seven domains was developed. Criterion measure reliability of at least moderate categories of agreement (Kappa > 0.4; ICC > 0.75) was seen in 91% (29 of 32) tested items. Criterion validity of at least moderate categories (PPV > 0.7) was seen in 83% (60 of 72) tested items. Inter- and intra-rater reliability of at least moderate categories (Kappa > 0.4; ICC > 0.75) was seen in 88% (84 of 96) and 87% (20 of 23) tested items respectively. Conclusions The QHRFF had acceptable validity and reliability across the majority of items; particularly items identifying relevant co-morbidities, high-risk factors and foot disease complications. Recommendations have been made to improve or remove identified weaker items for future QHRFF versions. Overall, the QHRFF possesses suitable practicality, validity and reliability to assess and capture relevant foot disease items across multiple at risk populations. PMID:24468080
Parameterization of Model Validating Sets for Uncertainty Bound Optimizations. Revised

NASA Technical Reports Server (NTRS)

Lim, K. B.; Giesy, D. P.

2000-01-01

Given measurement data, a nominal model and a linear fractional transformation uncertainty structure with an allowance on unknown but bounded exogenous disturbances, easily computable tests for the existence of a model validating uncertainty set are given. Under mild conditions, these tests are necessary and sufficient for the case of complex, nonrepeated, block-diagonal structure. For the more general case which includes repeated and/or real scalar uncertainties, the tests are only necessary but become sufficient if a collinearity condition is also satisfied. With the satisfaction of these tests, it is shown that a parameterization of all model validating sets of plant models is possible. The new parameterization is used as a basis for a systematic way to construct or perform uncertainty tradeoff with model validating uncertainty sets which have specific linear fractional transformation structure for use in robust control design and analysis. An illustrative example which includes a comparison of candidate model validating sets is given.
Adaptation and validation of the Tower of London test of planning and problem solving in people with intellectual disabilities.

PubMed

Masson, J D; Dagnan, D; Evans, J

2010-05-01

There is a need for validated, standardised tools for the assessment of executive functions in adults with intellectual disabilities (ID). This study examines the validity of a test of planning and problem solving (Tower of London) with adults with ID. Participants completed an adapted version of the Tower of London (ToL) while day-centre staff completed adaptive function (Adaptive Behaviour Scale - Residential and Community: Second Edition, modified version) and dysexecutive function (DEX-Independent Rater) questionnaires for each participant. Correlation analyses of test and questionnaire variables were undertaken. The adapted ToL has a robust structure and shows significant associations with independent living skills, challenging behaviour and behaviours related to dysexecutive function. The adapted ToL is a valid test for use with people with ID. However, there is also a need to develop other ecologically valid tools based on everyday planning tasks undertaken by people with ID.
Author Response to Sabour (2018), "Comment on Hall et al. (2017), 'How to Choose Between Measures of Tinnitus Loudness for Clinical Research? A Report on the Reliability and Validity of an Investigator-Administered Test and a Patient-Reported Measure Using Baseline Data Collected in a Phase IIa Drug Trial'".

PubMed

Hall, Deborah A; Mehta, Rajnikant L; Fackrell, Kathryn

2018-03-08

The authors respond to a letter to the editor (Sabour, 2018) concerning the interpretation of validity in the context of evaluating treatment-related change in tinnitus loudness over time. The authors refer to several landmark methodological publications and an international standard concerning the validity of patient-reported outcome measurement instruments. The tinnitus loudness rating performed better against our reported acceptability criteria for (face and convergent) validity than did the tinnitus loudness matching test. It is important to distinguish between tests that evaluate the validity of measuring treatment-related change over time and tests that quantify the accuracy of diagnosing tinnitus as a case and non-case.
Directed Design of Experiments for Validating Probability of Detection Capability of a Testing System

NASA Technical Reports Server (NTRS)

Generazio, Edward R. (Inventor)

2012-01-01

A method of validating a probability of detection (POD) testing system using directed design of experiments (DOE) includes recording an input data set of observed hit and miss or analog data for sample components as a function of size of a flaw in the components. The method also includes processing the input data set to generate an output data set having an optimal class width, assigning a case number to the output data set, and generating validation instructions based on the assigned case number. An apparatus includes a host machine for receiving the input data set from the testing system and an algorithm for executing DOE to validate the test system. The algorithm applies DOE to the input data set to determine a data set having an optimal class width, assigns a case number to that data set, and generates validation instructions based on the case number.
Practical Aspects of Designing and Conducting Validation Studies Involving Multi-study Trials.

PubMed

Coecke, Sandra; Bernasconi, Camilla; Bowe, Gerard; Bostroem, Ann-Charlotte; Burton, Julien; Cole, Thomas; Fortaner, Salvador; Gouliarmou, Varvara; Gray, Andrew; Griesinger, Claudius; Louhimies, Susanna; Gyves, Emilio Mendoza-de; Joossens, Elisabeth; Prinz, Maurits-Jan; Milcamps, Anne; Parissis, Nicholaos; Wilk-Zasadna, Iwona; Barroso, João; Desprez, Bertrand; Langezaal, Ingrid; Liska, Roman; Morath, Siegfried; Reina, Vittorio; Zorzoli, Chiara; Zuang, Valérie

This chapter focuses on practical aspects of conducting prospective in vitro validation studies, and in particular, by laboratories that are members of the European Union Network of Laboratories for the Validation of Alternative Methods (EU-NETVAL) that is coordinated by the EU Reference Laboratory for Alternatives to Animal Testing (EURL ECVAM). Prospective validation studies involving EU-NETVAL, comprising a multi-study trial involving several laboratories or "test facilities", typically consist of two main steps: (1) the design of the validation study by EURL ECVAM and (2) the execution of the multi-study trial by a number of qualified laboratories within EU-NETVAL, coordinated and supported by EURL ECVAM. The approach adopted in the conduct of these validation studies adheres to the principles described in the OECD Guidance Document on the Validation and International Acceptance of new or updated test methods for Hazard Assessment No. 34 (OECD 2005). The context and scope of conducting prospective in vitro validation studies is dealt with in Chap. 4 . Here we focus mainly on the processes followed to carry out a prospective validation of in vitro methods involving different laboratories with the ultimate aim of generating a dataset that can support a decision in relation to the possible development of an international test guideline (e.g. by the OECD) or the establishment of performance standards.

A comprehensive review of the psychometric properties of the Drug Abuse Screening Test.

PubMed

Yudko, Errol; Lozhkina, Olga; Fouts, Adriana

2007-03-01

This article reviews the reliability and the validity of the (10-, 20-, and 28-item) Drug Abuse Screening Test (DAST). The reliability and the validity of the adolescent version of the DAST are also reviewed. An extensive literature review was conducted using the Medline and Psychinfo databases from the years 1982 to 2005. All articles that addressed the reliability and the validity of the DAST were examined. Publications in which the DAST was used as a screening tool but had no data on its psychometric properties were not included. Descriptive information about each version of the test, as well as discussion of the empirical literature that has explored measures of the reliability and the validity of the DAST, has been included. The DAST tended to have moderate to high levels of test-retest, interitem, and item-total reliabilities. The DAST also tended to have moderate to high levels of validity, sensitivity, and specificity. In general, all versions of the DAST yield satisfactory measures of reliability and validity for use as clinical or research tools. Furthermore, these tests are easy to administer and have been used in a variety of populations.
Creating a flipbook as a medium of instruction based on the research on activity test of kencur extract

NASA Astrophysics Data System (ADS)

Monika, Icha; Yeni, Laili Fitri; Ariyati, Eka

2016-02-01

This research aimed to reveal the validity of the flipbook as a medium of learning for the sub-material of environmental pollution in the tenth grade based on the results of the activity test of kencur (Kaempferia galanga) extract to control the growth of the Fusarium oxysporum fungus. The research consisted of two stages. First, testing the validity of the medium of flipbook through validation by seven assessors and analyzed based on the total average score of all aspects. Second, testing the activity of the kencur extract against the growth of Fusarium oxysporum by using the experimental method with 10 treatments and 3 repetitions which were analyzed using one-way analysis of variance (ANOVA) test. The making of the flipbook medium was done through the stages of analysis for the potential and problems, data collection, design, validation, and revision. The validation analysis on the flipbook received an average score of 3.7 and was valid to a certain extent, so it could be used in the teaching and learning process especially in the sub-material of environmental pollution in the tenth grade of the senior high school.
Knowledge Activation, Integration, and Validation during Narrative Text Comprehension

ERIC Educational Resources Information Center

Cook, Anne E.; O'Brien, Edward J.

2014-01-01

Previous text comprehension studies using the contradiction paradigm primarily tested assumptions of the activation mechanism involved in reading. However, the nature of the contradiction in such studies relied on validation of information in readers' general world knowledge. We directly tested this validation process by varying the strength of…
Concurrent Validity of the TONI-3

ERIC Educational Resources Information Center

Banks, Sandra H.; Franzen, Michael D.

2010-01-01

The literature pertaining to intelligence assessment reveals an ongoing discussion about the areas of intelligence captured by nonverbal tests. To date, few studies have investigated the criterion validity of the Test of Nonverbal Intelligence, Third Edition (TONI-3). The present study investigates the concurrent validity of the TONI-3 in a sample…
Inter-Rater Reliability and Validity of the Australian Football League’s Kicking and Handball Tests

PubMed Central

Cripps, Ashley J.; Hopper, Luke S.; Joyce, Christopher

2015-01-01

Talent identification tests used at the Australian Football League’s National Draft Combine assess the capacities of athletes to compete at a professional level. Tests created for the National Draft Combine are also commonly used for talent identification and athlete development in development pathways. The skills tests created by the Australian Football League required players to either handball (striking the ball with the hand) or kick to a series of 6 randomly generated targets. Assessors subjectively rate each skill execution giving a 0-5 score for each disposal. This study aimed to investigate the inter-rater reliability and validity of the skills tests at an adolescent sub-elite level. Male Australian footballers were recruited from sub-elite adolescent teams (n = 121, age = 15.7 ± 0.3 years, height = 1.77 ± 0.07 m, mass = 69.17 ± 8.08 kg). The coaches (n = 7) of each team were also recruited. Inter-rater reliability was assessed using Inter-class correlations (ICC) and Limits of Agreement statistics. Both the kicking (ICC = 0.96, p < .01) and handball tests (ICC = 0.89, p < .01) demonstrated strong reliability and acceptable levels of absolute agreement. Content validity was determined by examining the test scores sensitivity to laterality and distance. Concurrent validity was assessed by comparing coaches’ perceptions of skill to actual test outcomes. Multivariate analysis of variance (MANOVA) examined the main effect of laterality, with scores on the dominant hand (p = .04) and foot (p < .01) significantly higher compared to the non-dominant side. Follow-up univariate analysis reported significant differences at every distance in the kicking test. A poor correlation was found between coaches’ perceptions of skill and testing outcomes. The results of this study demonstrate both skill tests demonstrate acceptable inter-rater reliable. Partial content validity was confirmed for the kicking test, however further research is required to confirm validity of the handball test. Key points The skill tests created by the AFL demonstrated acceptable levels of relative and absolute inter-rater reliability. Both the AFL’s skills tests are able to differentiate between athletes dominant and non-dominant limbs. However, only the kicking test could consistently differentiated between score outcomes over a range of Australian Football specific disposal distances. Both tests demonstrated poor concurrent validity, with no correlation found between coaches’ perceptions of technical skills and actual skill outcomes measured. PMID:26336356
Developing the Persian version of the homophone meaning generation test

PubMed Central

Ebrahimipour, Mona; Motamed, Mohammad Reza; Ashayeri, Hassan; Modarresi, Yahya; Kamali, Mohammad

2016-01-01

Background: Finding the right word is a necessity in communication, and its evaluation has always been a challenging clinical issue, suggesting the need for valid and reliable measurements. The Homophone Meaning Generation Test (HMGT) can measure the ability to switch between verbal concepts, which is required in word retrieval. The purpose of this study was to adapt and validate the Persian version of the HMGT. Methods: The first phase involved the adaptation of the HMGT to the Persian language. The second phase concerned the psychometric testing. The word-finding performance was assessed in 90 Persian-speaking healthy individuals (20-50 year old; 45 males and 45 females) through three naming tasks: Semantic Fluency, Phonemic Fluency, and Homophone Meaning Generation Test. The participants had no history of neurological or psychiatric diseases, alcohol abuse, severe depression, or history of speech, language, or learning problems. Results: The internal consistency coefficient was larger than 0.8 for all the items with a total Cronbach’s alpha of 0.80. Interrater and intrarater reliability were also excellent. The validity of all items was above 0.77, and the content validity index (0.99) was appropriate. The Persian HMGT had strong convergent validity with semantic and phonemic switching and adequate divergent validity with semantic and phonemic clustering. Conclusion: The Persian version of the Homophone Meaning Generation Test is an appropriate, valid, and reliable test to evaluate the ability to switch between verbal concepts in the assessment of word-finding performance. PMID:27390705
Developing the Persian version of the homophone meaning generation test.

PubMed

Ebrahimipour, Mona; Motamed, Mohammad Reza; Ashayeri, Hassan; Modarresi, Yahya; Kamali, Mohammad

2016-01-01

Finding the right word is a necessity in communication, and its evaluation has always been a challenging clinical issue, suggesting the need for valid and reliable measurements. The Homophone Meaning Generation Test (HMGT) can measure the ability to switch between verbal concepts, which is required in word retrieval. The purpose of this study was to adapt and validate the Persian version of the HMGT. The first phase involved the adaptation of the HMGT to the Persian language. The second phase concerned the psychometric testing. The word-finding performance was assessed in 90 Persian-speaking healthy individuals (20-50 year old; 45 males and 45 females) through three naming tasks: Semantic Fluency, Phonemic Fluency, and Homophone Meaning Generation Test. The participants had no history of neurological or psychiatric diseases, alcohol abuse, severe depression, or history of speech, language, or learning problems. The internal consistency coefficient was larger than 0.8 for all the items with a total Cronbach's alpha of 0.80. Interrater and intrarater reliability were also excellent. The validity of all items was above 0.77, and the content validity index (0.99) was appropriate. The Persian HMGT had strong convergent validity with semantic and phonemic switching and adequate divergent validity with semantic and phonemic clustering. The Persian version of the Homophone Meaning Generation Test is an appropriate, valid, and reliable test to evaluate the ability to switch between verbal concepts in the assessment of word-finding performance.
Optimal test selection for prediction uncertainty reduction

DOE PAGES

Mullins, Joshua; Mahadevan, Sankaran; Urbina, Angel

2016-12-02

Economic factors and experimental limitations often lead to sparse and/or imprecise data used for the calibration and validation of computational models. This paper addresses resource allocation for calibration and validation experiments, in order to maximize their effectiveness within given resource constraints. When observation data are used for model calibration, the quality of the inferred parameter descriptions is directly affected by the quality and quantity of the data. This paper characterizes parameter uncertainty within a probabilistic framework, which enables the uncertainty to be systematically reduced with additional data. The validation assessment is also uncertain in the presence of sparse and imprecisemore » data; therefore, this paper proposes an approach for quantifying the resulting validation uncertainty. Since calibration and validation uncertainty affect the prediction of interest, the proposed framework explores the decision of cost versus importance of data in terms of the impact on the prediction uncertainty. Often, calibration and validation tests may be performed for different input scenarios, and this paper shows how the calibration and validation results from different conditions may be integrated into the prediction. Then, a constrained discrete optimization formulation that selects the number of tests of each type (calibration or validation at given input conditions) is proposed. Furthermore, the proposed test selection methodology is demonstrated on a microelectromechanical system (MEMS) example.« less
Validity and test-retest reliability in assessing current body size with figure drawings in Chinese adolescents.

PubMed

Lo, Wing-Sze; Ho, Sai-Yin; Wong, Bonny Yee-Man; Mak, Kwok-Kei; Lam, Tai-Hing

2011-06-01

The reliability and validity of Stunkard's Figure Rating Scale (FRS) as a measure of current body size (CBS) was established in Western adolescent girls but not in non-Western population. We examined the validity and test-retest reliability of Stunkard's FRS in assessing CBS among Chinese adolescents. Methods. In a school-based survey in Hong Kong, 5666 adolescents (boys: 45.1%; mean age 14.7 years) provided data on self-reported height and weight, CBS, perceived weight status, and health-related quality of life using the Medical Outcomes Study Short-Form version 2 (SF-12v2). Height and weight were also objectively measured. Spearman's correlation was used to assess construct validity, concurrent validity and test-retest reliability. Convergent and discriminant validity were good: CBS correlated strongly with weight and self-reported/measured BMI, but only weakly with SF-12v2. CBS correlated strongly with perceived weight status, showing concurrent validity. Spearman's correlation (r) for CBS was 0.78 for girls and 0.72 for boys indicating good test-retest reliability. Validity and reliability results did not differ significantly between senior and junior grade adolescents. Our findings support the use of Stunkard's FRS to measure body size among Chinese adolescents.
Validation of the Spanish Addiction Severity Index Multimedia Version (S-ASI-MV).

PubMed

Butler, Stephen F; Redondo, José Pedro; Fernandez, Kathrine C; Villapiano, Albert

2009-01-01

This study aimed to develop and test the reliability and validity of a Spanish adaptation of the ASI-MV, a computer administered version of the Addiction Severity Index, called the S-ASI-MV. Participants were 185 native Spanish-speaking adult clients from substance abuse treatment facilities serving Spanish-speaking clients in Florida, New Mexico, California, and Puerto Rico. Participants were administered the S-ASI-MV as well as Spanish versions of the general health subscale of the SF-36, the work and family unit subscales of the Social Adjustment Scale Self-Report, the Michigan Alcohol Screening Test, the alcohol and drug subscales of the Personality Assessment Inventory, and the Hopkins Symptom Checklist-90. Three-to-five-day test-retest reliability was examined along with criterion validity, convergent/discriminant validity, and factorial validity. Measurement invariance between the English and Spanish versions of the ASI-MV was also examined. The S-ASI-MV demonstrated good test-retest reliability (ICCs for composite scores between .59 and .93), criterion validity (rs for composite scores between .66 and .87), and convergent/discriminant validity. Factorial validity and measurement invariance were demonstrated. These results compared favorably with those reported for the original interviewer version of the ASI and the English version of the ASI-MV.
[Turkish validity and reliability study of fear of pain questionnaire-III].

PubMed

Ünver, Seher; Turan, Fatma Nesrin

2018-01-01

This study aimed to develop a Turkish version of the Fear of Pain Questionnaire-III developed by McNeil and Rainwater (1998) and examine its validity and reliability indicators. The study was conducted with 459 university students studying in the nursing department. The Turkish translation of the scale was conducted by language experts and the original scale owner. Expert opinions were taken for language validity, and the Lawshe's content validity ratio formula was used to calculate the content validity. Exploratory factor analysis was used to assess the construct validity. The factors were rotated using the Varimax rotation (orthogonal) method. For reliability indicators of the questionnaire, the internal consistency coefficient and test re-test reliability were utilized. Explanatory factor analyses using the three-factor model (explaining 50.5% of the total variance) revealed that the item factor loads varied were above the limit value of 0.30 which indicated that the questionnaire had good construct validity. The Cronbach's alpha value for the total questionnaire was 0.938, and test re-test value was 0.846 for the total scale. The Turkish version of the Fear of Pain Questionnaire-III had sufficiently high reliability and validity to be used as a tool in evaluating the fear of pain among the young Turkish population.
Reliability and validity of the test of incremental respiratory endurance measures of inspiratory muscle performance in COPD.

PubMed

Formiga, Magno F; Roach, Kathryn E; Vital, Isabel; Urdaneta, Gisel; Balestrini, Kira; Calderon-Candelario, Rafael A; Campos, Michael A; Cahalin, Lawrence P

2018-01-01

The Test of Incremental Respiratory Endurance (TIRE) provides a comprehensive assessment of inspiratory muscle performance by measuring maximal inspiratory pressure (MIP) over time. The integration of MIP over inspiratory duration (ID) provides the sustained maximal inspiratory pressure (SMIP). Evidence on the reliability and validity of these measurements in COPD is not currently available. Therefore, we assessed the reliability, responsiveness and construct validity of the TIRE measures of inspiratory muscle performance in subjects with COPD. Test-retest reliability, known-groups and convergent validity assessments were implemented simultaneously in 81 male subjects with mild to very severe COPD. TIRE measures were obtained using the portable PrO2 device, following standard guidelines. All TIRE measures were found to be highly reliable, with SMIP demonstrating the strongest test-retest reliability with a nearly perfect intraclass correlation coefficient (ICC) of 0.99, while MIP and ID clustered closely together behind SMIP with ICC values of about 0.97. Our findings also demonstrated known-groups validity of all TIRE measures, with SMIP and ID yielding larger effect sizes when compared to MIP in distinguishing between subjects of different COPD status. Finally, our analyses confirmed convergent validity for both SMIP and ID, but not MIP. The TIRE measures of MIP, SMIP and ID have excellent test-retest reliability and demonstrated known-groups validity in subjects with COPD. SMIP and ID also demonstrated evidence of moderate convergent validity and appear to be more stable measures in this patient population than the traditional MIP.
Validation of the Sport Competition Anxiety Test.

ERIC Educational Resources Information Center

Cheatham, T.; Rosentswieg, J.

1982-01-01

Fifteen female varsity softball coaches were administered the Sport Competition Anxiety Test prior to competition. Their heart rates, continuously monitored by tilemetry, did not relate significantly to the anxiety test data. The test does not appear to be a valid measure of trait anxiety for women softball coaches. (Author/PN)
The Predictive Validity of the Metropolitan Readiness Tests, 1976 Edition.

ERIC Educational Resources Information Center

Nagle, Richard J.

1979-01-01

A sample of 176 first-grade children was tested on the Metropolitan Readiness Tests, 1976 Edition (MRT), during the initial month of school and was retested eight months later on the Stanford Achievement Test. Results demonstrated substantial validity of the MRT for predicting first-grade achievement. (Author/CTM)
75 FR 25867 - National Toxicology Program (NTP) Interagency Center for the Evaluation of Alternative...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-05-10

... validation studies. NICEATM and ICCVAM work collaboratively to evaluate new and improved test methods... for nomination of test methods for validation studies, and guidelines for submission of test methods... for human and veterinary vaccine post-licensing potency and safety testing. Plenary and breakout...
Shifting the Focus of Validity for Test Use

ERIC Educational Resources Information Center

Moss, Pamela A.

2016-01-01

The conventional focus of validity in educational measurement has been on intended interpretations and uses of test scores. Empirical studies of test use by teachers, administrators and policy-makers show that actual interpretations and uses of test scores in context are invariably shaped by local users' questions, which frequently require…
Test-retest reliability, smallest real difference and concurrent validity of six different balance tests on young people with mild to moderate intellectual disability.

PubMed

Blomqvist, Sven; Wester, Anita; Sundelin, Gunnevi; Rehn, Börje

2012-12-01

Some studies have reported that people with intellectual disability may have reduced balance ability compared with the population in general. However, none of these studies involved adolescents, and the reliability and validity of balance tests in this population are not known. The purpose of this study was to examine the reliability of six different balance tests and to investigate their concurrent validity. Test-retest reliability assessment. All subjects were recruited from a special school for people with intellectual disability in Bollnäs, Sweden. Eighty-nine adolescents (35 females and 54 males) with mild to moderate intellectual disability with a mean age of 18 years (range 16 to 20 years). All subjects followed the same test protocol on two occasions within an 11-day period. Balance test performances. Intraclass correlation coefficients greater than 0.80 were achieved for four of the balance tests: Extended Timed Up and Go Test, Modified Functional Reach Test, One-leg Stance Test and Force Platform Test. The smallest real differences ranged from 12% to 40%; less than 20% is considered to be low. Concurrent validity among these balance tests varied between no and low correlation. The results indicate that these tests could be used to evaluate changes in balance ability over time in people with mild to moderate intellectual disability. The low concurrent validity illustrates the importance of knowing more about the influence of various sensory subsystems that are significant for balance among adolescents with intellectual disability. Copyright © 2011 Chartered Society of Physiotherapy. Published by Elsevier Ltd. All rights reserved.
Does Test Preparation Work? Implications for Score Validity

ERIC Educational Resources Information Center

Xie, Qin

2013-01-01

This article reports an empirical study that examined the pattern of test preparation for College English Test Band 4 (CET4) and the differential effects of test preparation practices on its scores, thereby drawing implications for CET4 score validity. Data collection involved 1,003 test takers of CET4. A pretest was administered at the beginning…
Validation of laboratory-scale recycling test method of paper PSA label products

Treesearch

Carl Houtman; Karen Scallon; Richard Oldack

2008-01-01

Starting with test methods and a specification developed by the U.S. Postal Service (USPS) Environmentally Benign Pressure Sensitive Adhesive Postage Stamp Program, a laboratory-scale test method and a specification were developed and validated for pressure-sensitive adhesive labels, By comparing results from this new test method and pilot-scale tests, which have been...
Assessment of Preschoolers' Gross Motor Proficiency: Revisiting Bruininks-Oseretsky Test of Motor Proficiency

ERIC Educational Resources Information Center

Lam, Hazel Mei Yung

2011-01-01

Literature reveals that there are very few validated motor proficiency tests for young children. According to Gallahue and Ozmun, the Bruininks-Oseretsky Test of Motor Proficiency is a valid test. However, manipulative skills, which are classified as gross motor skills by most motor development specialists, are only tested in the Upper Limb…

An Evaluation of Test Speededness in an Assessment for Third-Grade Gifted Students

ERIC Educational Resources Information Center

Hailey, Emily; Callahan, Carolyn M.; Azano, Amy; Moon, Tonya R.

2012-01-01

Reliability and validity are integral concepts in assessment design. Test speededness, the influence of time constraints on test taker performance, is often an overlooked threat to reliability and validity, especially in classroom-based testing. The purpose of this study is to evaluate the degree of test speededness of classroom-based assessments…
Predictive validity of the Biomedical Admissions Test: an evaluation and case study.

PubMed

McManus, I C; Ferguson, Eamonn; Wakeford, Richard; Powis, David; James, David

2011-01-01

There has been an increase in the use of pre-admission selection tests for medicine. Such tests need to show good psychometric properties. Here, we use a paper by Emery and Bell [2009. The predictive validity of the Biomedical Admissions Test for pre-clinical examination performance. Med Educ 43:557-564] as a case study to evaluate and comment on the reporting of psychometric data in the field of medical student selection (and the comments apply to many papers in the field). We highlight pitfalls when reliability data are not presented, how simple zero-order associations can lead to inaccurate conclusions about the predictive validity of a test, and how biases need to be explored and reported. We show with BMAT that it is the knowledge part of the test which does all the predictive work. We show that without evidence of incremental validity it is difficult to assess the value of any selection tests for medicine.
Clinical inquiries. What test is the best for diagnosing infectious mononucleosis?

PubMed

Bell, Amy Trelease; Fortune, Barbara; Sheeler, Robert

2006-09-01

Tests for antibodies to Epstein-Barr viral capsid antigen or Epstein-Barr nuclear antigen are the most sensitive, are highly specific, and are also the most expensive for diagnosing infectious mononucleosis (strength of recommendation [SOR]: C, based on validating cohort study). Heterophile antibody tests have similar specificity and are cheaper, but are less sensitive in children or in adults during the early days of the illness (SOR: C, based on validating cohort study). The polymerase chain reaction assay for Epstein-Barr virus DNA is more sensitive than the heterophile antibody test in children, is highly specific, but is also expensive (SOR: C, based on validating cohort study). The percentages of atypical lymphocytes and total lymphocytes on a complete blood count provide another specific and moderately sensitive, yet inexpensive, test (SOR: C, based on validating cohort study).
Validity of Integrity Tests for Predicting Drug and Alcohol Abuse

DTIC Science & Technology

1993-08-31

Wiinkler and Sheridan (1989) found that employees who entered employee assistance programs for treating drug addiction were more likely be absent...August 31, 1993 Final 4. TITLE AND SUBTITLE S. FUNDING NUMBERS Validity of Integrity Tests for Predicting Drug and Alcohol Abuse C No. N00014-92-J...words) This research used psychometric meta-analysis (Hunter & Schmidt, 1990b) to examine the validity of integrity tests for predicting drug and
Reliability and validity of clinical tests to assess the anatomical integrity of the cervical spine in adults with neck pain and its associated disorders: Part 1-A systematic review from the Cervical Assessment and Diagnosis Research Evaluation (CADRE) Collaboration.

PubMed

Lemeunier, Nadège; da Silva-Oolup, S; Chow, N; Southerst, D; Carroll, L; Wong, J J; Shearer, H; Mastragostino, P; Cox, J; Côté, E; Murnaghan, K; Sutton, D; Côté, P

2017-09-01

To determine the reliability and validity of clinical tests to assess the anatomical integrity of the cervical spine in adults with neck pain and its associated disorders. We updated the systematic review of the 2000-2010 Bone and Joint Decade Task Force on Neck Pain and its Associated Disorders. We also searched the literature to identify studies on the reliability and validity of Doppler velocimetry for the evaluation of cervical arteries. Two independent reviewers screened and critically appraised studies. We conducted a best evidence synthesis of low risk of bias studies and ranked the phases of investigations using the classification proposed by Sackett and Haynes. We screened 9022 articles and critically appraised 8 studies; all 8 studies had low risk of bias (three reliability and five validity Phase II-III studies). Preliminary evidence suggests that the extension-rotation test may be reliable and has adequate validity to rule out pain arising from facet joints. The evidence suggests variable reliability and preliminary validity for the evaluation of cervical radiculopathy including neurological examination (manual motor testing, dermatomal sensory testing, deep tendon reflexes, and pathological reflex testing), Spurling's and the upper limb neurodynamic tests. No evidence was found for doppler velocimetry. Little evidence exists to support the use of clinical tests to evaluate the anatomical integrity of the cervical spine in adults with neck pain and its associated disorders. We found preliminary evidence to support the use of the extension-rotation test, neurological examination, Spurling's and the upper limb neurodynamic tests.
Intratester Reliability and Construct Validity of a Hip Abductor Eccentric Strength Test.

PubMed

Brindle, Richard A; Ebaugh, David; Milner, Clare E

2018-06-06

Side-lying hip abductor strength tests are commonly used to evaluate muscle strength. In a "break" test, the tester applies sufficient force to lower the limb to the table while the patient resists. The peak force is postulated to occur while the leg is lowering, thus representing the participant's eccentric muscle strength. However, it is unclear whether peak force occurs before or after the leg begins to lower. To determine intrarater reliability and construct validity of a hip abductor eccentric strength test. Intrarater reliability and construct validity study. Twenty healthy adults (26 [6] y; 1.66 [0.06] m; 62.2 [8.0] kg) made 2 visits to the laboratory at least 1 week apart. During the hip abductor eccentric strength test, a handheld dynamometer recorded peak force and time to peak force, and limb position was recorded via a motion capture system. Intrarater reliability was determined using intraclass correlation, SEM, and minimal detectable difference. Construct validity was assessed by determining if peak force occurred after the start of the lowering phase using a 1-sample t test. The hip abductor eccentric strength test had substantial intrarater reliability (intraclass correlation (3,3) = .88; 95% confidence interval, .65-.95), SEM of 0.9 %BWh, and a minimal detectable difference of 2.5 %BWh. Construct validity was established as peak force occurred 2.1 (0.6) seconds (range: 0.7-3.7 s) after the start of the lowering phase of the test (P ≤ .001). The hip abductor eccentric strength test is a valid and reliable measure of eccentric muscle strength. This test may be used clinically to assess changes in eccentric muscle strength over time.
Validation of Helicopter Gear Condition Indicators Using Seeded Fault Tests

NASA Technical Reports Server (NTRS)

Dempsey, Paula; Brandon, E. Bruce

2013-01-01

A "seeded fault test" in support of a rotorcraft condition based maintenance program (CBM), is an experiment in which a component is tested with a known fault while health monitoring data is collected. These tests are performed at operating conditions comparable to operating conditions the component would be exposed to while installed on the aircraft. Performance of seeded fault tests is one method used to provide evidence that a Health Usage Monitoring System (HUMS) can replace current maintenance practices required for aircraft airworthiness. Actual in-service experience of the HUMS detecting a component fault is another validation method. This paper will discuss a hybrid validation approach that combines in service-data with seeded fault tests. For this approach, existing in-service HUMS flight data from a naturally occurring component fault will be used to define a component seeded fault test. An example, using spiral bevel gears as the targeted component, will be presented. Since the U.S. Army has begun to develop standards for using seeded fault tests for HUMS validation, the hybrid approach will be mapped to the steps defined within their Aeronautical Design Standard Handbook for CBM. This paper will step through their defined processes, and identify additional steps that may be required when using component test rig fault tests to demonstrate helicopter CI performance. The discussion within this paper will provide the reader with a better appreciation for the challenges faced when defining a seeded fault test for HUMS validation.
Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures.

PubMed

Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D

2014-05-01

The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.
Validation of the German version of the Nurse-Work Instability Scale: baseline survey findings of a prospective study of a cohort of geriatric care workers

PubMed Central

2013-01-01

Background A prospective study of a cohort of nursing staff from nursing homes was undertaken to validate the Nurse-Work Instability Scale (Nurse-WIS). Baseline investigation data was used to test reliability, construct validity and criterion validity. Method A survey of nursing staff from nursing homes was conducted using a questionnaire containing the Nurse-WIS along with other survey instruments (including SF-12, WAI, SPE). The self-reported number of days’ sick leave taken and if a pension for reduced work capacity was drawn were recorded. The reliability of the scale was checked by item difficulty (P), item discrimination (rjt) and by internal consistency according to Cronbach’s coefficient. The hypotheses for checking construct validity were tested on the basis of correlations. Pearson’s chi-square was used to test concurrent criterion validity; discriminant validity was tested by means of binary logistic regression. Results 396 persons answered the questionnaire (21.3% response rate). More than 80% were female and mostly work full-time in a rotating shift pattern. Following the test for item discrimination, two items were removed from the Nurse-WIS test. According to Cronbach’s (0.927) the scale provides a high degree of measuring accuracy. All hypotheses and assumptions used to test validity were confirmed: As the Nurse-WIS risk increases, health-related quality of life, work ability and job satisfaction decline. Depressive symptoms and a poor subjective prognosis of earning capacity are also more frequent. Musculoskeletal disorders and impairments of psychological well-being are more frequent. Age also influences the Nurse-WIS result. While 12.0% of those below the age of 35 had an increased risk, the figure for those aged over 55 was 50%. Conclusion This study is the first validation study of the Nurse-WIS to date. The Nurse-WIS shows good reliability, good validity and a good level of measuring accuracy. It appears to be suitable for recording prevention and rehabilitation needs among health care workers. If, in the follow-up, the Nurse-WIS likewise proves to be a reliable screening instrument with good predictive validity, it could ensure that suitable action is taken at an early stage, thereby helping to counteract early retirement and the anticipated shortage of health care workers. PMID:24330532
Development of an Agility Test for Badminton Players and Assessment of Its Validity and Test-Retest Reliability.

PubMed

Loureiro, Luiz de França Bahia; de Freitas, Paulo Barbosa

2016-04-01

Badminton requires open and fast actions toward the shuttlecock, but there is no specific agility test for badminton players with specific movements. To develop an agility test that simultaneously assesses perception and motor capacity and examine the test's concurrent and construct validity and its test-retest reliability. The Badcamp agility test consists of running as fast as possible to 6 targets placed on the corners and middle points of a rectangular area (5.6 × 4.2 m) from the start position located in the center of it, following visual stimuli presented in a luminous panel. The authors recruited 43 badminton players (17-32 y old) to evaluate concurrent (with shuttle-run agility test--SRAT) and construct validity and test-retest reliability. Results revealed that Badcamp presents concurrent and construct validity, as its performance is strongly related to SRAT (ρ = 0.83, P < .001), with performance of experts being better than nonexpert players (P < .01). In addition, Badcamp is reliable, as no difference (P = .07) and a high intraclass correlation (ICC = .93) were found in the performance of the players on 2 different occasions. The findings indicate that Badcamp is an effective, valid, and reliable tool to measure agility, allowing coaches and athletic trainers to evaluate players' athletic condition and training effectiveness and possibly detect talented individuals in this sport.
Could situational judgement tests be used for selection into dental foundation training?

PubMed

Patterson, F; Ashworth, V; Mehra, S; Falcon, H

2012-07-13

To pilot and evaluate a machine-markable situational judgement test (SJT) designed to select candidates into UK dental foundation training. Single centre pilot study. UK postgraduate deanery in 2010. Seventy-four candidates attending interview for dental foundation training in Oxford and Wessex Deaneries volunteered to complete the situational judgement test. The situational judgement test was developed to assess relevant professional attributes for dentistry (for example, empathy and integrity) in a machine-markable format. Test content was developed by subject matter experts working with experienced psychometricians. Evaluation of psychometric properties of the pilot situational judgement test (for example, reliability, validity and fairness). Scores in the dental foundation training selection process (short-listing and interviews) were used to examine criterion-related validity. Candidates completed an evaluation questionnaire to examine candidate reactions and face validity of the new test. Forty-six candidates were female and 28 male; mean age was 23.5-years-old (range 22-32). Situational judgement test scores were normally distributed and the test showed good internal reliability when corrected for test length (α = 0.74). Situational judgement test scores positively correlated with the management, leadership and professionalism interview (N = 50; r = 0.43, p <0.01) but not with the clinical skills interview, providing initial evidence of criterion-related validity as the situational judgement test is designed to test non-cognitive professional attributes beyond clinical knowledge. Most candidates perceived the situational judgement test as relevant to dentistry, appropriate for their training level, and fair. This initial pilot study suggests that a situational judgement test is an appropriate and innovative method to measure professional attributes (eg empathy and integrity) for selection into foundation training. Further research will explore the long-term predictive validity of the situational judgement test once candidates have entered training.
Psychometrics of the Home Safety Self-Assessment Tool (HSSAT) to prevent falls in community-dwelling older adults.

PubMed

Tomita, Machiko R; Saharan, Sumandeep; Rajendran, Sheela; Nochajski, Susan M; Schweitzer, Jo A

2014-01-01

OBJECTIVE. To identify psychometric properties of the Home Safety Self-Assessment Tool (HSSAT) to prevent falls in community-dwelling older adults. METHOD. We tested content validity, test-retest reliability, interrater reliability, construct validity, convergent and discriminant validity, and responsiveness to change. RESULTS. The content validity index was .98, the intraclass correlation coefficient for test-retest reliability was .97, and the interrater reliability was .89. The difference on identified risk factors between the use and nonuse of the HSSAT was significant (p = .005). Convergent validity with the Centers for Disease Control and Prevention Home Safety Checklist was high (r = .65), and discriminant validity with fear of falling was very low (r = .10). The responsiveness to change was moderate (standardized response mean = 0.57). CONCLUSION. The HSSAT is a reliable and valid instrument to identify fall risks in a home environment, and the HSSAT booklet is effective as educational material leading to improvement in home safety. Copyright © 2014 by the American Occupational Therapy Association, Inc.
Internet cognitive testing of large samples needed in genetic research.

PubMed

Haworth, Claire M A; Harlaar, Nicole; Kovas, Yulia; Davis, Oliver S P; Oliver, Bonamy R; Hayiou-Thomas, Marianna E; Frances, Jane; Busfield, Patricia; McMillan, Andrew; Dale, Philip S; Plomin, Robert

2007-08-01

Quantitative and molecular genetic research requires large samples to provide adequate statistical power, but it is expensive to test large samples in person, especially when the participants are widely distributed geographically. Increasing access to inexpensive and fast Internet connections makes it possible to test large samples efficiently and economically online. Reliability and validity of Internet testing for cognitive ability have not been previously reported; these issues are especially pertinent for testing children. We developed Internet versions of reading, language, mathematics and general cognitive ability tests and investigated their reliability and validity for 10- and 12-year-old children. We tested online more than 2500 pairs of 10-year-old twins and compared their scores to similar internet-based measures administered online to a subsample of the children when they were 12 years old (> 759 pairs). Within 3 months of the online testing at 12 years, we administered standard paper and pencil versions of the reading and mathematics tests in person to 30 children (15 pairs of twins). Scores on Internet-based measures at 10 and 12 years correlated .63 on average across the two years, suggesting substantial stability and high reliability. Correlations of about .80 between Internet measures and in-person testing suggest excellent validity. In addition, the comparison of the internet-based measures to ratings from teachers based on criteria from the UK National Curriculum suggests good concurrent validity for these tests. We conclude that Internet testing can be reliable and valid for collecting cognitive test data on large samples even for children as young as 10 years.
The Validity of Value-Added Estimates from Low-Stakes Testing Contexts: The Impact of Change in Test-Taking Motivation and Test Consequences

ERIC Educational Resources Information Center

Finney, Sara J.; Sundre, Donna L.; Swain, Matthew S.; Williams, Laura M.

2016-01-01

Accountability mandates often prompt assessment of student learning gains (e.g., value-added estimates) via achievement tests. The validity of these estimates have been questioned when performance on tests is low stakes for students. To assess the effects of motivation on value-added estimates, we assigned students to one of three test consequence…
Inventory of Motive of Preference for Conventional Paper-and-Pencil Tests: A Study of Validity and Reliability

ERIC Educational Resources Information Center

Eser, Mehmet Taha; Dogan, Nuri

2017-01-01

Purpose: The objective of this study is to develop the Inventory of Motive of Preference for Conventional Paper-And-Pencil Tests and to evaluate students' motives for preferring written tests, short-answer tests, true/false tests or multiple-choice tests. This will add a measurement tool to the literature with valid and reliable results to help…
Brazilian Center for the Validation of Alternative Methods (BraCVAM) and the process of validation in Brazil.

PubMed

Presgrave, Octavio; Moura, Wlamir; Caldeira, Cristiane; Pereira, Elisabete; Bôas, Maria H Villas; Eskes, Chantra

2016-03-01

The need for the creation of a Brazilian centre for the validation of alternative methods was recognised in 2008, and members of academia, industry and existing international validation centres immediately engaged with the idea. In 2012, co-operation between the Oswaldo Cruz Foundation (FIOCRUZ) and the Brazilian Health Surveillance Agency (ANVISA) instigated the establishment of the Brazilian Center for the Validation of Alternative Methods (BraCVAM), which was officially launched in 2013. The Brazilian validation process follows OECD Guidance Document No. 34, where BraCVAM functions as the focal point to identify and/or receive requests from parties interested in submitting tests for validation. BraCVAM then informs the Brazilian National Network on Alternative Methods (RENaMA) of promising assays, which helps with prioritisation and contributes to the validation studies of selected assays. A Validation Management Group supervises the validation study, and the results obtained are peer-reviewed by an ad hoc Scientific Review Committee, organised under the auspices of BraCVAM. Based on the peer-review outcome, BraCVAM will prepare recommendations on the validated test method, which will be sent to the National Council for the Control of Animal Experimentation (CONCEA). CONCEA is in charge of the regulatory adoption of all validated test methods in Brazil, following an open public consultation. 2016 FRAME.
Experimental investigation of an RNA sequence space

NASA Technical Reports Server (NTRS)

Lee, Youn-Hyung; Dsouza, Lisa; Fox, George E.

1993-01-01

Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring, and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs.
Validity, Reliability, and the Questionable Role of Psychometrics in Plastic Surgery

PubMed Central

2014-01-01

Summary: This report examines the meaning of validity and reliability and the role of psychometrics in plastic surgery. Study titles increasingly include the word “valid” to support the authors’ claims. Studies by other investigators may be labeled “not validated.” Validity simply refers to the ability of a device to measure what it intends to measure. Validity is not an intrinsic test property. It is a relative term most credibly assigned by the independent user. Similarly, the word “reliable” is subject to interpretation. In psychometrics, its meaning is synonymous with “reproducible.” The definitions of valid and reliable are analogous to accuracy and precision. Reliability (both the reliability of the data and the consistency of measurements) is a prerequisite for validity. Outcome measures in plastic surgery are intended to be surveys, not tests. The role of psychometric modeling in plastic surgery is unclear, and this discipline introduces difficult jargon that can discourage investigators. Standard statistical tests suffice. The unambiguous term “reproducible” is preferred when discussing data consistency. Study design and methodology are essential considerations when assessing a study’s validity. PMID:25289354
Changing abilities vs. changing tasks: Examining validity degradation with test scores and college performance criteria both assessed longitudinally.

PubMed

Dahlke, Jeffrey A; Kostal, Jack W; Sackett, Paul R; Kuncel, Nathan R

2018-05-03

We explore potential explanations for validity degradation using a unique predictive validation data set containing up to four consecutive years of high school students' cognitive test scores and four complete years of those students' college grades. This data set permits analyses that disentangle the effects of predictor-score age and timing of criterion measurements on validity degradation. We investigate the extent to which validity degradation is explained by criterion dynamism versus the limited shelf-life of ability scores. We also explore whether validity degradation is attributable to fluctuations in criterion variability over time and/or GPA contamination from individual differences in course-taking patterns. Analyses of multiyear predictor data suggest that changes to the determinants of performance over time have much stronger effects on validity degradation than does the shelf-life of cognitive test scores. The age of predictor scores had only a modest relationship with criterion-related validity when the criterion measurement occasion was held constant. Practical implications and recommendations for future research are discussed. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Psychometric properties of the Bulgarian translation of noise sensitivity scale short form (NSS-SF): implementation in the field of noise control.

PubMed

Dzhambov, Angel M; Dimitrova, Donka D

2014-01-01

The Noise Sensitivity Scale Short Form (NSS-SF), developed in English as a more practical form of the classical Weinstein NSS, has not to date been validated in other cultures, and its validity and reliability have not yet been confirmed. This study aimed to validate NSS-SF in Bulgarian and to demonstrate its applicability. The study comprised test-retest (n = 115) and a field-testing (n = 71) of the newly validated scale. Its construct validity was examined with confirmatory factor analysis, and very good model-fit was observed. Temporal stability was assessed in a test-retest (r = 0.990), convergent validity was examined with single-item susceptibility to the noise scale (r = 0.906) and discriminant validity was confirmed with single-item noise annoyance scale (r = 0.718). The lowest observed McDonald's omega across the studies was 0.923. The cross-cultural validation of NSS-SF was successful but it proved to be somewhat problematic with respect to its annoyance-based items.

Are we really measuring what we say we're measuring? Using video techniques to supplement traditional construct validation procedures.

PubMed

Podsakoff, Nathan P; Podsakoff, Philip M; Mackenzie, Scott B; Klinger, Ryan L

2013-01-01

Several researchers have persuasively argued that the most important evidence to consider when assessing construct validity is whether variations in the construct of interest cause corresponding variations in the measures of the focal construct. Unfortunately, the literature provides little practical guidance on how researchers can go about testing this. Therefore, the purpose of this article is to describe how researchers can use video techniques to test whether their scales measure what they purport to measure. First, we discuss how researchers can develop valid manipulations of the focal construct that they hope to measure. Next, we explain how to design a study to use this manipulation to test the validity of the scale. Finally, comparing and contrasting traditional and contemporary perspectives on validation, we discuss the advantages and limitations of video-based validation procedures. PsycINFO Database Record (c) 2013 APA, all rights reserved.
The Epidemiology of Modern Test Score Use: Anticipating Aggregation, Adjustment, and Equating

ERIC Educational Resources Information Center

Ho, Andrew

2013-01-01

In his thoughtful focus article, Haertel (this issue) pushes testing experts to broaden the scope of their validation efforts and to invite scholars from other disciplines to join them. He credits existing validation frameworks for helping the measurement community to identify incomplete or nonexistent validity arguments. However, he notes his…
Development and Validation of the Musical Ear Training Assessment (META)

ERIC Educational Resources Information Center

Wolf, Anna; Kopiez, Reinhard

2018-01-01

In the following study, we have developed an assessment instrument for the practice-dependent skill of analytical hearing following a strict test theoretical validation, resulting in the Musical Ear Training Assessment (META). By means of three pilot studies, a developmental study, and a validation study, we verified a one-dimensional test model…
The Michigan Alcoholism Screening Test (MAST): A Statistical Validation Analysis

ERIC Educational Resources Information Center

Laux, John M.; Newman, Isadore; Brown, Russ

2004-01-01

This study extends the Michigan Alcoholism Screening Test (MAST; M. L. Selzer, 1971) literature base by examining 4 issues related to the validity of the MAST scores. Specifically, the authors examine the validity of the MAST scores in light of the presence of impression management, participant demographic variables, and item endorsement…
Ultrasonic inspection of a glued laminated timber fabricated with defects

Treesearch

Robert Emerson; David Pollock; David McLean; Kenneth Fridley; Robert Ross; Roy Pellerin

2001-01-01

The Federal Highway Administration (FHWA) set up a validation test to compare the effectiveness of various nondestructive inspection techniques for detecting artificial defects in glulam members. The validation test consisted of a glulam beam fabricated with artificial defects known to FHWA personnel but not originally known to the scientists performing the validation...
Following Phaedrus: Alternate Choices in Surmounting the Reliability/Validity Dilemma

ERIC Educational Resources Information Center

Slomp, David H.; Fuite, Jim

2004-01-01

Specialists in the field of large-scale, high-stakes writing assessment have, over the last forty years alternately discussed the issue of maximizing either reliability or validity in test design. Factors complicating the debate--such as Messick's (1989) expanded definition of validity, and the ethical implications of testing--are explored. An…
Domestic violence on children: development and validation of an instrument to evaluate knowledge of health professionals 1

PubMed Central

Oliveira, Lanuza Borges; Soares, Fernanda Amaral; Silveira, Marise Fagundes; de Pinho, Lucinéia; Caldeira, Antônio Prates; Leite, Maísa Tavares de Souza

2016-01-01

ABSTRACT Objective: to develop and validate an instrument to evaluate the knowledge of health professionals about domestic violence on children. Method: this was a study conducted with 194 physicians, nurses and dentists. A literature review was performed for preparation of the items and identification of the dimensions. Apparent and content validation was performed using analysis of three experts and 27 professors of the pediatric health discipline. For construct validation, Cronbach's alpha was used, and the Kappa test was applied to verify reproducibility. The criterion validation was conducted using the Student's t-test. Results: the final instrument included 56 items; the Cronbach alpha was 0.734, the Kappa test showed a correlation greater than 0.6 for most items, and the Student t-test showed a statistically significant value to the level of 5% for the two selected variables: years of education and using the Family Health Strategy. Conclusion: the instrument is valid and can be used as a promising tool to develop or direct actions in public health and evaluate knowledge about domestic violence on children. PMID:27556878
SDG and qualitative trend based model multiple scale validation

NASA Astrophysics Data System (ADS)

Gao, Dong; Xu, Xin; Yin, Jianjin; Zhang, Hongyu; Zhang, Beike

2017-09-01

Verification, Validation and Accreditation (VV&A) is key technology of simulation and modelling. For the traditional model validation methods, the completeness is weak; it is carried out in one scale; it depends on human experience. The SDG (Signed Directed Graph) and qualitative trend based multiple scale validation is proposed. First the SDG model is built and qualitative trends are added to the model. And then complete testing scenarios are produced by positive inference. The multiple scale validation is carried out by comparing the testing scenarios with outputs of simulation model in different scales. Finally, the effectiveness is proved by carrying out validation for a reactor model.
Prevalence of Invalid Performance on Baseline Testing for Sport-Related Concussion by Age and Validity Indicator.

PubMed

Abeare, Christopher A; Messa, Isabelle; Zuccato, Brandon G; Merker, Bradley; Erdodi, Laszlo

2018-03-12

Estimated base rates of invalid performance on baseline testing (base rates of failure) for the management of sport-related concussion range from 6.1% to 40.0%, depending on the validity indicator used. The instability of this key measure represents a challenge in the clinical interpretation of test results that could undermine the utility of baseline testing. To determine the prevalence of invalid performance on baseline testing and to assess whether the prevalence varies as a function of age and validity indicator. This retrospective, cross-sectional study included data collected between January 1, 2012, and December 31, 2016, from a clinical referral center in the Midwestern United States. Participants included 7897 consecutively tested, equivalently proportioned male and female athletes aged 10 to 21 years, who completed baseline neurocognitive testing for the purpose of concussion management. Baseline assessment was conducted with the Immediate Postconcussion Assessment and Cognitive Testing (ImPACT), a computerized neurocognitive test designed for assessment of concussion. Base rates of failure on published ImPACT validity indicators were compared within and across age groups. Hypotheses were developed after data collection but prior to analyses. Of the 7897 study participants, 4086 (51.7%) were male, mean (SD) age was 14.71 (1.78) years, 7820 (99.0%) were primarily English speaking, and the mean (SD) educational level was 8.79 (1.68) years. The base rate of failure ranged from 6.4% to 47.6% across individual indicators. Most of the sample (55.7%) failed at least 1 of 4 validity indicators. The base rate of failure varied considerably across age groups (117 of 140 [83.6%] for those aged 10 years to 14 of 48 [29.2%] for those aged 21 years), representing a risk ratio of 2.86 (95% CI, 2.60-3.16; P < .001). The results for base rate of failure were surprisingly high overall and varied widely depending on the specific validity indicator and the age of the examinee. The strong age association, with 3 of 4 participants aged 10 to 12 years failing validity indicators, suggests that the clinical interpretation and utility of baseline testing in this age group is questionable. These findings underscore the need for close scrutiny of performance validity indicators on baseline testing across age groups.
Validating a UAV artificial intelligence control system using an autonomous test case generator

NASA Astrophysics Data System (ADS)

Straub, Jeremy; Huber, Justin

2013-05-01

The validation of safety-critical applications, such as autonomous UAV operations in an environment which may include human actors, is an ill posed problem. To confidence in the autonomous control technology, numerous scenarios must be considered. This paper expands upon previous work, related to autonomous testing of robotic control algorithms in a two dimensional plane, to evaluate the suitability of similar techniques for validating artificial intelligence control in three dimensions, where a minimum level of airspeed must be maintained. The results of human-conducted testing are compared to this automated testing, in terms of error detection, speed and testing cost.
The CPT Reading Comprehension Test: A Validity Study.

ERIC Educational Resources Information Center

Napoli, Anthony R.; Raymond, Lanette A.; Coffey, Cheryl A.; Bosco, Diane M.

1998-01-01

Describes a study done at Suffolk County Community College (New York) that assessed the validity of the College Board's Computerized Placement Test in Reading Comprehension (CPT-R) by comparing test results of 1,154 freshmen with the results of the Degree of Power Reading Test. Results confirmed the CPT-R's reliability in identifying basic…
Test-Retest Reliability and Predictive Validity of the Implicit Association Test in Children

ERIC Educational Resources Information Center

Rae, James R.; Olson, Kristina R.

2018-01-01

The Implicit Association Test (IAT) is increasingly used in developmental research despite minimal evidence of whether children's IAT scores are reliable across time or predictive of behavior. When test-retest reliability and predictive validity have been assessed, the results have been mixed, and because these studies have differed on many…
The Validity and Clinical Uses of the Pepper Visual Skills for Reading Test.

ERIC Educational Resources Information Center

Watson, G.; And Others

1990-01-01

The Pepper Visual Skills for Reading Test was assessed as a measure of reading ability with meaningful text in 38 adults with macular degeneration; scores were compared with assessment made using the Gray Oral Reading Test, a previously standardized assessment. The test's validity was confirmed. (Author/JDD)
Interactional Competence: Challenges for Validity.

ERIC Educational Resources Information Center

Young, Richard F.

One of the ways in which language testing interfaces with applied linguistics is in the definition and validation of the constructs that underlie language tests. When language testers and score users interpret scores on a test, they do so by implicit and explicit reference to the construct on which the test is based. Equally, when applied to new…
A Longitudinal Study of the Predictive Validity of a Kindergarten Screening Battery.

ERIC Educational Resources Information Center

Kilgallon, Mary K.; Mueller, Richard J.

Test validity was studied in nine subtests of a kindergarten screening battery used to predict reading comprehension for children up to five years after entering kindergarten. The independent variables were kindergarteners' scores on the: (1) Otis-Lennon Mental Ability Test; (2) Bender Visual Motor Gestalt Test; (3) Detroit Tests of Learning…
Readability Level of Standardized Test Items and Student Performance: The Forgotten Validity Variable

ERIC Educational Resources Information Center

Hewitt, Margaret A.; Homan, Susan P.

2004-01-01

Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
Cross-Validation of the Computerized Adaptive Screening Test (CAST).

ERIC Educational Resources Information Center

Pliske, Rebecca M.; And Others

The Computerized Adaptive Screening Test (CAST) was developed to provide an estimate at recruiting stations of prospects' Armed Forces Qualification Test (AFQT) scores. The CAST was designed to replace the paper-and-pencil Enlistment Screening Test (EST). The initial validation study of CAST indicated that CAST predicts AFQT at least as accurately…
Development and Validation of a New Questionnaire Assessing Quality of Life in Adults with Hypopituitarism: Adult Hypopituitarism Questionnaire (AHQ)

PubMed Central

Ishii, Hitoshi; Shimatsu, Akira; Okimura, Yasuhiko; Tanaka, Toshiaki; Hizuka, Naomi; Kaji, Hidesuke; Hanew, Kunihiko; Oki, Yutaka; Yamashiro, Sayuri; Takano, Koji; Chihara, Kazuo

2012-01-01

Objective To develop and validate the Adult Hypopituitarism Questionnaire (AHQ) as a disease-specific, self-administered questionnaire for evaluation of quality of life (QOL) in adult patients with hypopituitarism. Methods We developed and validated this new questionnaire, using a standardized procedure which included item development, pilot-testing and psychometric validation. Of the patients who participated in psychometric validation, those whose clinical conditions were judged to be stable were asked to answer the survey questionnaire twice, in order to assess test-retest reliability. Results Content validity of the initial questionnaire was evaluated via two pilot tests. After these tests, we made minor revisions and finalized the initial version of the questionnaire. The questionnaire was constructed with two domains, one psycho-social and the other physical. For psychometric assessment, analyses were performed on the responses of 192 adult patients with various types of hypopituitarism. The intraclass correlations of the respective domains were 0.91 and 0.95, and the Cronbach’s alpha coefficients were 0.96 and 0.95, indicating adequate test-retest reliability and internal consistency for each domain. For known-group validity, patients with hypopituitarism due to hypothalamic disorder showed significantly lower scores in 11 out of 13 sub-domains compared to those who had hypopituitarism due to pituitary disorder. Regarding construct validity, the domain structure was found to be almost the same as that initially hypothesized. Exploratory factor analysis (n = 228) demonstrated that each domain consisted of six and seven sub-domains. Conclusion The AHQ showed good reliability and validity for evaluating QOL in adult patients with hypopituitarism. PMID:22984490
Test-retest reliability and construct validity of the ENERGY-parent questionnaire on parenting practices, energy balance-related behaviours and their potential behavioural determinants: the ENERGY-project.

PubMed

Singh, Amika S; Chinapaw, Mai J M; Uijtdewilligen, Léonie; Vik, Froydis N; van Lippevelde, Wendy; Fernández-Alvira, Juan M; Stomfai, Sarolta; Manios, Yannis; van der Sluijs, Maria; Terwee, Caroline; Brug, Johannes

2012-08-13

Insight in parental energy balance-related behaviours, their determinants and parenting practices are important to inform childhood obesity prevention. Therefore, reliable and valid tools to measure these variables in large-scale population research are needed. The objective of the current study was to examine the test-retest reliability and construct validity of the parent questionnaire used in the ENERGY-project, assessing parental energy balance-related behaviours, their determinants, and parenting practices among parents of 10-12 year old children. We collected data among parents (n = 316 in the test-retest reliability study; n = 109 in the construct validity study) of 10-12 year-old children in six European countries, i.e. Belgium, Greece, Hungary, the Netherlands, Norway, and Spain. Test-retest reliability was assessed using the intra-class correlation coefficient (ICC) and percentage agreement comparing scores from two measurements, administered one week apart. To assess construct validity, the agreement between questionnaire responses and a subsequent interview was assessed using ICC and percentage agreement.All but one item showed good to excellent test-retest reliability as indicated by ICCs > .60 or percentage agreement ≥ 75%. Construct validity appeared to be good to excellent for 92 out of 121 items, as indicated by ICCs > .60 or percentage agreement ≥ 75%. From the other 29 items, construct validity was moderate for 24 and poor for 5 items. The reliability and construct validity of the items of the ENERGY-parent questionnaire on multiple energy balance-related behaviours, their potential determinants, and parenting practices appears to be good. Based on the results of the validity study, we strongly recommend adapting parts of the ENERGY-parent questionnaire if used in future research.
Translation and validation of a Nepalese version of the Psychosocial Impact of Dental Aesthetic Questionnaire (PIDAQ).

PubMed

Singh, Varun Pratap; Singh, Rajkumar

2014-03-01

The aim of this study was to develop a reliable and valid Nepali version of the Psychosocial Impact of Dental Aesthetic Questionnaire (PIDAQ). Cross-sectional descriptive validation study. B.P. Koirala Institute of Health Sciences, Dharan, Nepal. A rigorous translation process including conceptual and semantic evaluation, translation, back translation and pre-testing was carried out. Two hundred and fifty-two undergraduates, including equal numbers of males and females with an age ranging from 18 to 29 years (mean age: 22·33±2·114 years), participated in this study. Reliability was assessed by Cronbach's alpha coefficient and the coefficient of correlation was used to assess correlation between items and test-retest reliability. The construct validity was tested by factorial analysis. Convergent construct validity was tested by comparison of PIDAQ scores with the aesthetic component of the index of orthodontic treatment needs (IOTN-AC) and perception of occlusion scale (POS), respectively. Discriminant construct validity was assessed by differences in score for those who demand treatment and those who did not. The response rate was 100%. One hundred and twenty-three individuals had a demand for orthodontic treatment. The Nepali PIDAQ had excellent reliability with Cronbach's alpha of 0·945, corrected item correlation between 0·525 and 0·790 and overall test-retest reliability of 0·978. The construct validity was good with formation of a new sub-domain 'Dental self-consciousness'. The scale had good correlation with IOTN-AC and POS fulfilling convergent construct validity. The discriminant construct validity was proved by significant differences in scores for subjects with demand and without demand for treatment. To conclude, Nepali version of PIDAQ has good psychometric properties and can be used effectively in this population group for further research.

Uncertainty Analysis of OC5-DeepCwind Floating Semisubmersible Offshore Wind Test Campaign

DOE Office of Scientific and Technical Information (OSTI.GOV)

Robertson, Amy N

This paper examines how to assess the uncertainty levels for test measurements of the Offshore Code Comparison, Continued, with Correlation (OC5)-DeepCwind floating offshore wind system, examined within the OC5 project. The goal of the OC5 project was to validate the accuracy of ultimate and fatigue load estimates from a numerical model of the floating semisubmersible using data measured during scaled tank testing of the system under wind and wave loading. The examination of uncertainty was done after the test, and it was found that the limited amount of data available did not allow for an acceptable uncertainty assessment. Therefore, thismore » paper instead qualitatively examines the sources of uncertainty associated with this test to start a discussion of how to assess uncertainty for these types of experiments and to summarize what should be done during future testing to acquire the information needed for a proper uncertainty assessment. Foremost, future validation campaigns should initiate numerical modeling before testing to guide the test campaign, which should include a rigorous assessment of uncertainty, and perform validation during testing to ensure that the tests address all of the validation needs.« less
Uncertainty Analysis of OC5-DeepCwind Floating Semisubmersible Offshore Wind Test Campaign: Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

Robertson, Amy N

This paper examines how to assess the uncertainty levels for test measurements of the Offshore Code Comparison, Continued, with Correlation (OC5)-DeepCwind floating offshore wind system, examined within the OC5 project. The goal of the OC5 project was to validate the accuracy of ultimate and fatigue load estimates from a numerical model of the floating semisubmersible using data measured during scaled tank testing of the system under wind and wave loading. The examination of uncertainty was done after the test, and it was found that the limited amount of data available did not allow for an acceptable uncertainty assessment. Therefore, thismore » paper instead qualitatively examines the sources of uncertainty associated with this test to start a discussion of how to assess uncertainty for these types of experiments and to summarize what should be done during future testing to acquire the information needed for a proper uncertainty assessment. Foremost, future validation campaigns should initiate numerical modeling before testing to guide the test campaign, which should include a rigorous assessment of uncertainty, and perform validation during testing to ensure that the tests address all of the validation needs.« less
Reliability and Validity of the Korean Version of the Internet Addiction Test among College Students

PubMed Central

Lee, Kounseok; Lee, Hye-Kyung; Gyeong, Hyunsu; Yu, Byeongkwan; Song, Yul-Mai

2013-01-01

We developed a Korean translation of the Internet Addiction Test (KIAT), widely used self-report for internet addiction and tested its reliability and validity in a sample of college students. Two hundred seventy-nine college students at a national university completed the KIAT. Internal consistency and two week test-retest reliability were calculated from the data, and principal component factor analysis was conducted. Participants also completed the Internet Addiction Diagnostic Questionnaire (IADQ), the Korea Internet addiction scale (K-scale), and the Patient Health Questionnaire-9 for the criterion validity. Cronbach's alpha of the whole scale was 0.91, and test-retest reliability was also good (r = 0.73). The IADQ, the K-scale, and depressive symptoms were significantly correlated with the KIAT scores, demonstrating concurrent and convergent validity. The factor analysis extracted four factors (Excessive use, Dependence, Withdrawal, and Avoidance of reality) that accounted for 59% of total variance. The KIAT has outstanding internal consistency and high test-retest reliability. Also, the factor structure and validity data show that the KIAT is comparable to the original version. Thus, the KIAT is a psychometrically sound tool for assessing internet addiction in the Korean-speaking population. PMID:23678270
Psychometric evaluations of the efficacy expectations and Outcome Expectations for Exercise Scales in African American women.

PubMed

Murrock, Carolyn J; Gary, Faye

2014-01-01

This secondary analysis tested the reliability and validity of the Self-Efficacy for Exercise (SEE) and the Outcome Expectations for Exercise (OEE) scales in 126 community dwelling, middle aged African American women. Social Cognitive Theory postulates self-efficacy is behavior age, gender and culture specific. Therefore, it is important to determine ifself-efficacy scales developed and tested in older Caucasian female adults are reliable and valid in middle aged, minority women. Cronbach's alpha and construct validity using hypothesis testing and confirmatory factor analysis supported the reliability and validity of the SEE and OEE scales in community dwelling, middle aged African American women.
A standardised protocol for the validation of banking methodologies for arterial allografts.

PubMed

Lomas, R J; Dodd, P D F; Rooney, P; Pegg, D E; Hogg, P A; Eagle, M E; Bennett, K E; Clarkson, A; Kearney, J N

2013-09-01

The objective of this study was to design and test a protocol for the validation of banking methodologies for arterial allografts. A series of in vitro biomechanical and biological assessments were derived, and applied to paired fresh and banked femoral arteries. The ultimate tensile stress and strain, suture pullout stress and strain, expansion/rupture under hydrostatic pressure, histological structure and biocompatibility properties of disinfected and cryopreserved femoral arteries were compared to those of fresh controls. No significant differences were detected in any of the test criteria. This validation protocol provides an effective means of testing and validating banking protocols for arterial allografts.
C-TOC (Cognitive Testing on Computer): investigating the usability and validity of a novel self-administered cognitive assessment tool in aging and early dementia.

PubMed

Jacova, Claudia; McGrenere, Joanna; Lee, Hyunsoo S; Wang, William W; Le Huray, Sarah; Corenblith, Emily F; Brehmer, Matthew; Tang, Charlotte; Hayden, Sherri; Beattie, B Lynn; Hsiung, Ging-Yuek R

2015-01-01

Cognitive Testing on Computer (C-TOC) is a novel computer-based test battery developed to improve both usability and validity in the computerized assessment of cognitive function in older adults. C-TOC's usability was evaluated concurrently with its iterative development to version 4 in subjects with and without cognitive impairment, and health professional advisors representing different ethnocultural groups. C-TOC version 4 was then validated against neuropsychological tests (NPTs), and by comparing performance scores of subjects with normal cognition, Cognitive Impairment Not Dementia (CIND) and Alzheimer disease. C-TOC's language tests were validated in subjects with aphasic disorders. The most important usability issue that emerged from consultations with 27 older adults and with 8 cultural advisors was the test-takers' understanding of the task, particularly executive function tasks. User interface features did not pose significant problems. C-TOC version 4 tests correlated with comparator NPT (r=0.4 to 0.7). C-TOC test scores were normal (n=16)>CIND (n=16)>Alzheimer disease (n=6). All normal/CIND NPT performance differences were detected on C-TOC. Low computer knowledge adversely affected test performance, particularly in CIND. C-TOC detected impairments in aphasic disorders (n=11). In general, C-TOC had good validity in detecting cognitive impairment. Ensuring test-takers' understanding of the tasks, and considering their computer knowledge appear important steps towards C-TOC's implementation.
Psychometric properties of Persian version of the Sustained Auditory Attention Capacity Test in children with attention deficit-hyperactivity disorder.

PubMed

Soltanparast, Sanaz; Jafari, Zahra; Sameni, Seyed Jalal; Salehi, Masoud

2014-01-01

The purpose of the present study was to evaluate the psychometric properties (validity and reliability) of the Persian version of the Sustained Auditory Attention Capacity Test in children with attention deficit hyperactivity disorder. The Persian version of the Sustained Auditory Attention Capacity Test was constructed to assess sustained auditory attention using the method provided by Feniman and colleagues (2007). In this test, comments were provided to assess the child's attentional deficit by determining inattention and impulsiveness error, the total scores of the sustained auditory attention capacity test and attention span reduction index. In the present study for determining the validity and reliability of in both Rey Auditory Verbal Learning test and the Persian version of the Sustained Auditory Attention Capacity Test (SAACT), 46 normal children and 41 children with Attention Deficit Hyperactivity (ADHD), all right-handed and aged between 7 and 11 of both genders, were evaluated. In determining convergent validity, a negative significant correlation was found between the three parts of the Rey Auditory Verbal Learning test (first, fifth, and immediate recall) and all indicators of the SAACT except attention span reduction. By comparing the test scores between the normal and ADHD groups, discriminant validity analysis showed significant differences in all indicators of the test except for attention span reduction (p< 0.001). The Persian version of the Sustained Auditory Attention Capacity test has good validity and reliability, that matches other reliable tests, and it can be used for the identification of children with attention deficits and if they suspected to have Attention Deficit Hyperactivity Disorder.
The validation of science virtual test to assess 7th grade students’ critical thinking on matter and heat topic (SVT-MH)

NASA Astrophysics Data System (ADS)

Sya’bandari, Y.; Firman, H.; Rusyati, L.

2018-05-01

The method used in this research was descriptive research for profiling the validation of SVT-MH to measure students’ critical thinking on matter and heat topic in junior high school. The subject is junior high school students of 7th grade (13 years old) while science teacher and expert as the validators. The instruments that used as a tool to obtain the data are rubric expert judgment (content, media, education) and rubric of readability test. There are four steps to validate SVT-MH in 7th grade Junior High School. These steps are analysis of core competence and basic competence based on Curriculum 2013, expert judgment (content, media, education), readability test and trial test (limited and larger trial test). The instrument validation resulted 30 items that represent 8 elements and 21 sub-elements to measure students’ critical thinking based on Inch in matter and heat topic. The alpha Cronbach (α) is 0.642 which means that the instrument is sufficient to measure students’ critical thinking matter and heat topic.
Developing and testing instruments for improving cooperation and patient's participation in mental health care.

PubMed

Latvala, E; Saranto, K; Pekkala, E

2004-10-01

The main purpose of the project was to develop computerized instruments that could be used by nurses and patients to assess their cooperation and mutual contributions to care. This paper presents a part of the project: the reliability and validity testing phase of a process of instrument development. To test the validity and reliability of the instruments, data were collected with questionnaires from nurses (n = 146) and patients (n = 286). The validity evaluated as construct validity and the reliability evaluated as internal consistency of the instruments were quite good. Construct validity was tested by factor analysis, and internal consistency was tested by Cronbach's alpha coefficient, which varied from 0.69 to 0.79. The instruments, which consisted of a software application that can be operated in a www environment, were meant to be used as tools in the psychiatric nursing context for assessing the cooperation between the nurses and patients and the patient's participation in his/her care. Furthermore, the computer programme can be used as a tool for developing and assessing the patient orientation in nursing.
Development and validation of climate change system thinking instrument (CCSTI) for measuring system thinking on climate change content

NASA Astrophysics Data System (ADS)

Meilinda; Rustaman, N. Y.; Firman, H.; Tjasyono, B.

2018-05-01

The Climate Change System Thinking Instrument (CCSTI) is developed to measure a system thinking ability in the concept of climate change. CCSTI is developed in four phase’s development including instrument draft development, validation and evaluation including readable material test, expert validation, and field test. The result of field test is analyzed by looking at the readability score in Cronbach’s alpha test. Draft instrument is tested on college students majoring in Biology Education, Physics Education, and Chemistry Education randomly with a total number of 80 college students. Score of Content Validation Index at 0.86, which means that the CCSTI developed are categorized as very appropriate with question indicators and Cronbach’s alpha about 0.605 which mean categorized undesirable to minimal acceptable. From 45 questions of system thinking, there are 37 valid questions spread in four indicators of system thinking, which are system thinking phase I (pre-requirement), system thinking phase II (basic), system thinking phase III (intermediate), and system thinking phase IV (coherent expert).
Developing self-concept instrument for pre-service mathematics teachers

NASA Astrophysics Data System (ADS)

Afgani, M. W.; Suryadi, D.; Dahlan, J. A.

2018-01-01

This study aimed to develop self-concept instrument for undergraduate students of mathematics education in Palembang, Indonesia. Type of this study was development research of non-test instrument in questionnaire form. A Validity test of the instrument was performed with construct validity test by using Pearson product moment and factor analysis, while reliability test used Cronbach’s alpha. The instrument was tested by 65 undergraduate students of mathematics education in one of the universities at Palembang, Indonesia. The instrument consisted of 43 items with 7 aspects of self-concept, that were the individual concern, social identity, individual personality, view of the future, the influence of others who become role models, the influence of the environment inside or outside the classroom, and view of the mathematics. The result of validity test showed there was one invalid item because the value of Pearson’s r was 0.107 less than the critical value (0.244; α = 0.05). The item was included in social identity aspect. After the invalid item was removed, Construct validity test with factor analysis generated only one factor. The Kaiser-Meyer-Olkin (KMO) coefficient was 0.846 and reliability coefficient was 0.91. From that result, we concluded that the self-concept instrument for undergraduate students of mathematics education in Palembang, Indonesia was valid and reliable with 42 items.
Comprehension of Written Grammar Test: Reliability and Known-Groups Validity Study With Hearing and Deaf and Hard-of-Hearing Students.

PubMed

Cannon, Joanna E; Hubley, Anita M; Millhoff, Courtney; Mazlouman, Shahla

2016-01-01

The aim of the current study was to gather validation evidence for the Comprehension of Written Grammar (CWG; Easterbrooks, 2010) receptive test of 26 grammatical structures of English print for use with children who are deaf and hard of hearing (DHH). Reliability and validity data were collected for 98 participants (49 DHH and 49 hearing) in Grades 2-6. The objectives were to: (a) examine 4-week test-retest reliability data; and (b) provide evidence of known-groups validity by examining expected differences between the groups on the CWG vocabulary pretest and main test, as well as selected structures. Results indicated excellent test-retest reliability estimates for CWG test scores. DHH participants performed statistically significantly lower on the CWG vocabulary pretest and main test than the hearing participants. Significantly lower performance by DHH participants on most expected grammatical structures (e.g., basic sentence patterns, auxiliary "be" singular/plural forms, tense, comparatives, and complementation) also provided known groups evidence. Overall, the findings of this study showed strong evidence of the reliability of scores and known group-based validity of inferences made from the CWG. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
WEC-SIM Validation Testing Plan FY14 Q4.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ruehl, Kelley Michelle

2016-02-01

The WEC-Sim project is currently on track, having met both the SNL and NREL FY14 Milestones, as shown in Table 1 and Table 2. This is also reflected in the Gantt chart uploaded to the WEC-Sim SharePoint site in the FY14 Q4 Deliverables folder. The work completed in FY14 includes code verification through code-to-code comparison (FY14 Q1 and Q2), preliminary code validation through comparison to experimental data (FY14 Q2 and Q3), presentation and publication of the WEC-Sim project at OMAE 2014 [1], [2], [3] and GMREC/METS 2014 [4] (FY14 Q3), WEC-Sim code development and public open-source release (FY14 Q3), andmore » development of a preliminary WEC-Sim validation test plan (FY14 Q4). This report presents the preliminary Validation Testing Plan developed in FY14 Q4. The validation test effort started in FY14 Q4 and will go on through FY15. Thus far the team has developed a device selection method, selected a device, and placed a contract with the testing facility, established several collaborations including industry contacts, and have working ideas on the testing details such as scaling, device design, and test conditions.« less
Development and validation of an energy-balance knowledge test for fourth- and fifth-grade students.

PubMed

Chen, Senlin; Zhu, Xihe; Kang, Minsoo

2017-05-01

A valid test measuring children's energy-balance (EB) knowledge is lacking in research. This study developed and validated the energy-balance knowledge test (EBKT) for fourth and fifth grade students. The original EBKT contained 25 items but was reduced to 23 items based on pilot result and intensive expert panel discussion. De-identified data were collected from 468 fourth and fifth grade students enrolled in four schools to examine the psychometric properties of the EBKT items. The Rasch model analysis was conducted using the Winstep 3.65.0 software. Differential item functioning (DIF) analysis flagged 1 item (item #4) functioning differently between boys and girls, which was deleted. The final 22-item EBKT showed desirable model-data fit indices. The items had large variability ranging from -3.58 logit (item #10, the easiest) to 1.70 logit (item #3, the hardest). The average person ability on the test was 0.28 logit (SD = .78). Additional analyses supported known-group difference validity of the EBKT scores in capturing gender- and grade-based ability differences. The test was overall valid but could be further improved by expanding test items to discern various ability levels. For lack of a better test, researchers and practitioners may use the EBKT to assess fourth- and fifth-grade students' EB knowledge.
Development and validation of the Child Oral Health Impact Profile - Preschool version.

PubMed

Ruff, R R; Sischo, L; Chinn, C H; Broder, H L

2017-09-01

The Child Oral Health Impact Profile (COHIP) is a validated instrument created to measure the oral health-related quality of life of school-aged children. The purpose of this study was to develop and validate a preschool version of the COHIP (COHIP-PS) for children aged 2-5. The COHIP-PS was developed and validated using a multi-stage process consisting of item selection, face validity testing, item impact testing, reliability and validity testing, and factor analysis. A cross-sectional convenience sample of caregivers having children 2-5 years old from four groups completed item clarity and impact forms. Groups were recruited from pediatric health clinics or preschools/daycare centers, speech clinics, dental clinics, or cleft/craniofacial centers. Participants had a variety of oral health-related conditions, including caries, congenital orofacial anomalies, and speech/language deficiencies such as articulation and language disorders. COHIP-PS. The COHIP-PS was found to have acceptable internal validity (a = 0.71) and high test-retest reliability (0.87), though internal validity was below the accepted threshold for the community sample. While discriminant validity results indicated significant differences across study groups, the overall magnitude of differences was modest. Results from confirmatory factor analyses support the use of a four-factor model consisting of 11 items across oral health, functional well-being, social-emotional well-being, and self-image domains. Quality of life is an integral factor in understanding and assessing children's well-being. The COHIP-PS is a validated oral health-related quality of life measure for preschool children with cleft or other oral conditions. Copyright© 2017 Dennis Barber Ltd.
Pretest information for a test to validate plume simulation procedures (FA-17)

NASA Technical Reports Server (NTRS)

Hair, L. M.

1978-01-01

The results of an effort to plan a final verification wind tunnel test to validate the recommended correlation parameters and application techniques were presented. The test planning effort was complete except for test site finalization and the associated coordination. Two suitable test sites were identified. Desired test conditions were shown. Subsequent sections of this report present the selected model and test site, instrumentation of this model, planned test operations, and some concluding remarks.
Validation of Blockage Interference Corrections in the National Transonic Facility

NASA Technical Reports Server (NTRS)

Walker, Eric L.

2007-01-01

A validation test has recently been constructed for wall interference methods as applied to the National Transonic Facility (NTF). The goal of this study was to begin to address the uncertainty of wall-induced-blockage interference corrections, which will make it possible to address the overall quality of data generated by the facility. The validation test itself is not specific to any particular modeling. For this present effort, the Transonic Wall Interference Correction System (TWICS) as implemented at the NTF is the mathematical model being tested. TWICS uses linear, potential boundary conditions that must first be calibrated. These boundary conditions include three different classical, linear. homogeneous forms that have been historically used to approximate the physical behavior of longitudinally slotted test section walls. Results of the application of the calibrated wall boundary conditions are discussed in the context of the validation test.
Contemporary Test Validity in Theory and Practice: A Primer for Discipline-Based Education Researchers.

PubMed

Reeves, Todd D; Marbach-Ad, Gili

2016-01-01

Most discipline-based education researchers (DBERs) were formally trained in the methods of scientific disciplines such as biology, chemistry, and physics, rather than social science disciplines such as psychology and education. As a result, DBERs may have never taken specific courses in the social science research methodology--either quantitative or qualitative--on which their scholarship often relies so heavily. One particular aspect of (quantitative) social science research that differs markedly from disciplines such as biology and chemistry is the instrumentation used to quantify phenomena. In response, this Research Methods essay offers a contemporary social science perspective on test validity and the validation process. The instructional piece explores the concepts of test validity, the validation process, validity evidence, and key threats to validity. The essay also includes an in-depth example of a validity argument and validation approach for a test of student argument analysis. In addition to DBERs, this essay should benefit practitioners (e.g., lab directors, faculty members) in the development, evaluation, and/or selection of instruments for their work assessing students or evaluating pedagogical innovations. © 2016 T. D. Reeves and G. Marbach-Ad. CBE—Life Sciences Education © 2016 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).
Design and validation of a self-administered test to assess bullying (bull-M) in high school Mexicans: a pilot study.

PubMed

Ramos-Jimenez, Arnulfo; Wall-Medrano, Abraham; Villar, Oscar Esparza-Del; Hernández-Torres, Rosa P

2013-04-11

Bullying (Bull) is a public health problem worldwide, and Mexico is not exempt. However, its epidemiology and early detection in our country is limited, in part, by the lack of validated tests to ensure the respondents' anonymity. The aim of this study was to validate a self-administered test (Bull-M) for assessing Bull among high-school Mexicans. Experts and school teachers from highly violent areas of Ciudad Juarez (Chihuahua, México), reported common Bull behaviors. Then, a 10-item test was developed based on twelve of these behaviors; the students' and peers' participation in Bull acts and in some somatic consequences in Bull victims with a 5-point Likert frequency scale. Validation criteria were: content (CV, judges); reliability [Cronbach's alpha (CA), test-retest (spearman correlation, rs)]; construct [principal component (PCA), confirmatory factor (CFA), goodness-of-fit (GF) analysis]; and convergent (Bull-M vs. Bull-S test) validity. Bull-M showed good reliability (CA = 0.75, rs = 0.91; p < 0.001). Two factors were identified (PCA) and confirmed (CFA): "bullying me (victim)" and "bullying others (aggressor)". GF indices were: Root mean square error of approximation (0.031), GF index (0.97), and normalized fit index (0.92). Bull-M was as good as Bull-S for measuring Bull prevalence. Bull-M has a good reliability and convergent validity and a bi-modal factor structure for detecting Bull victims and aggressors; however, its external validity and sensitivity should be analyzed on a wider and different population.
The dialysis orders objective structured clinical examination (OSCE): a formative assessment for nephrology fellows.

PubMed

Prince, Lisa K; Campbell, Ruth C; Gao, Sam W; Kendrick, Jessica; Lebrun, Christopher J; Little, Dustin J; Mahoney, David L; Maursetter, Laura A; Nee, Robert; Saddler, Mark; Watson, Maura A; Yuan, Christina M

2018-04-01

Few quantitative nephrology-specific simulations assess fellow competency. We describe the development and initial validation of a formative objective structured clinical examination (OSCE) assessing fellow competence in ordering acute dialysis. The three test scenarios were acute continuous renal replacement therapy, chronic dialysis initiation in moderate uremia and acute dialysis in end-stage renal disease-associated hyperkalemia. The test committee included five academic nephrologists and four clinically practicing nephrologists outside of academia. There were 49 test items (58 points). A passing score was 46/58 points. No item had median relevance less than 'important'. The content validity index was 0.91. Ninety-five percent of positive-point items were easy-medium difficulty. Preliminary validation was by 10 board-certified volunteers, not test committee members, a median of 3.5 years from graduation. The mean score was 49 [95% confidence interval (CI) 46-51], κ = 0.68 (95% CI 0.59-0.77), Cronbach's α = 0.84. We subsequently administered the test to 25 fellows. The mean score was 44 (95% CI 43-45); 36% passed the test. Fellows scored significantly less than validators (P < 0.001). Of evidence-based questions, 72% were answered correctly by validators and 54% by fellows (P = 0.018). Fellows and validators scored least well on the acute hyperkalemia question. In self-assessing proficiency, 71% of fellows surveyed agreed or strongly agreed that the OSCE was useful. The OSCE may be used to formatively assess fellow proficiency in three common areas of acute dialysis practice. Further validation studies are in progress.

Extended version of the "Sniffin' Sticks" identification test: test-retest reliability and validity.

PubMed

Sorokowska, A; Albrecht, E; Haehner, A; Hummel, T

2015-03-30

The extended, 32-item version of the Sniffin' Sticks identification test was developed in order to create a precise tool enabling repeated, longitudinal testing of individual olfactory subfunctions. Odors of the previous test version had to be changed for technical reasons, and the odor identification test needed re-investigation in terms of reliability, validity, and normative values. In our study we investigated olfactory abilities of a group of 100 patients with olfactory dysfunction and 100 controls. We reconfirmed the high test-retest reliability of the extended version of the Sniffin' Sticks identification test and high correlations between the new and the original part of this tool. In addition, we confirmed the validity of the test as it discriminated clearly between controls and patients with olfactory loss. The additional set of 16 odor identification sticks can be either included in the current olfactory test, thus creating a more detailed diagnosis tool, or it can be used separately, enabling to follow olfactory function over time. Additionally, the normative values presented in our paper might provide useful guidelines for interpretation of the extended identification test results. The revised version of the Sniffin' Sticks 32-item odor identification test is a reliable and valid tool for the assessment of olfactory function. Copyright © 2015 Elsevier B.V. All rights reserved.
Liver fibrosis diagnosis by blood test and elastography in chronic hepatitis C: agreement or combination?

PubMed

Calès, P; Boursier, J; Lebigot, J; de Ledinghen, V; Aubé, C; Hubert, I; Oberti, F

2017-04-01

In chronic hepatitis C, the European Association for the Study of the Liver and the Asociacion Latinoamericana para el Estudio del Higado recommend performing transient elastography plus a blood test to diagnose significant fibrosis; test concordance confirms the diagnosis. To validate this rule and improve it by combining a blood test, FibroMeter (virus second generation, Echosens, Paris, France) and transient elastography (constitutive tests) into a single combined test, as suggested by the American Association for the Study of Liver Diseases and the Infectious Diseases Society of America. A total of 1199 patients were included in an exploratory set (HCV, n = 679) or in two validation sets (HCV ± HIV, HBV, n = 520). Accuracy was mainly evaluated by correct diagnosis rate for severe fibrosis (pathological Metavir F ≥ 3, primary outcome) by classical test scores or a fibrosis classification, reflecting Metavir staging, as a function of test concordance. Score accuracy: there were no significant differences between the blood test (75.7%), elastography (79.1%) and the combined test (79.4%) (P = 0.066); the score accuracy of each test was significantly (P < 0.001) decreased in discordant vs. concordant tests. Classification accuracy: combined test accuracy (91.7%) was significantly (P < 0.001) increased vs. the blood test (84.1%) and elastography (88.2%); accuracy of each constitutive test was significantly (P < 0.001) decreased in discordant vs. concordant tests but not with combined test: 89.0 vs. 92.7% (P = 0.118). Multivariate analysis for accuracy showed an interaction between concordance and fibrosis level: in the 1% of patients with full classification discordance and severe fibrosis, non-invasive tests were unreliable. The advantage of combined test classification was confirmed in the validation sets. The concordance recommendation is validated. A combined test, expressed in classification instead of score, improves this rule and validates the recommendation of a combined test, avoiding 99% of biopsies, and offering precise staging. © 2017 John Wiley & Sons Ltd.
Improving the Validity and Reliability of a Health Promotion Survey for Physical Therapists

PubMed Central

Stephens, Jaca L.; Lowman, John D.; Graham, Cecilia L.; Morris, David M.; Kohler, Connie L.; Waugh, Jonathan B.

2013-01-01

Purpose Physical therapists (PTs) have a unique opportunity to intervene in the area of health promotion. However, no instrument has been validated to measure PTs’ views on health promotion in physical therapy practice. The purpose of this study was to evaluate the content validity and test-retest reliability of a health promotion survey designed for PTs. Methods An expert panel of PTs assessed the content validity of “The Role of Health Promotion in Physical Therapy Survey” and provided suggestions for revision. Item content validity was assessed using the content validity ratio (CVR) as well as the modified kappa statistic. Therapists then participated in the test-retest reliability assessment of the revised health promotion survey, which was assessed using a weighted kappa statistic. Results Based on feedback from the expert panelists, significant revisions were made to the original survey. The expert panel reached at least a majority consensus agreement for all items in the revised survey and the survey-CVR improved from 0.44 to 0.66. Only one item on the revised survey had substantial test-retest agreement, with 55% of the items having moderate agreement and 43% poor agreement. Conclusions All items on the revised health promotion survey demonstrated at least fair validity, but few items had reasonable test-retest reliability. Further modifications should be made to strengthen the validity and improve the reliability of this survey. PMID:23754935
40 CFR 610.24 - Validity of test data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 30 2011-07-01 2011-07-01 false Validity of test data. 610.24 Section 610.24 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) ENERGY POLICY FUEL ECONOMY RETROFIT DEVICES Test Procedures and Evaluation Criteria Evaluation Criteria for the Preliminary...
40 CFR 610.24 - Validity of test data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 31 2012-07-01 2012-07-01 false Validity of test data. 610.24 Section 610.24 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) ENERGY POLICY FUEL ECONOMY RETROFIT DEVICES Test Procedures and Evaluation Criteria Evaluation Criteria for the Preliminary...
40 CFR 610.24 - Validity of test data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 30 2014-07-01 2014-07-01 false Validity of test data. 610.24 Section 610.24 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) ENERGY POLICY FUEL ECONOMY RETROFIT DEVICES Test Procedures and Evaluation Criteria Evaluation Criteria for the Preliminary...
40 CFR 610.24 - Validity of test data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 31 2013-07-01 2013-07-01 false Validity of test data. 610.24 Section 610.24 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) ENERGY POLICY FUEL ECONOMY RETROFIT DEVICES Test Procedures and Evaluation Criteria Evaluation Criteria for the Preliminary...
40 CFR 610.24 - Validity of test data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 29 2010-07-01 2010-07-01 false Validity of test data. 610.24 Section 610.24 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) ENERGY POLICY FUEL ECONOMY RETROFIT DEVICES Test Procedures and Evaluation Criteria Evaluation Criteria for the Preliminary...
Validity of FAA-approved color vision tests for class II and class III aeromedical screening.

DOT National Transportation Integrated Search

1993-09-01

All clinical color vision tests currently used in the medical examination of pilots were studied regarding validity for prediction of performance on practical tests of ability to discriminate the aviation signal colors, red, green, and white given un...
Phase 1 Validation Testing and Simulation for the WEC-Sim Open Source Code

NASA Astrophysics Data System (ADS)

Ruehl, K.; Michelen, C.; Gunawan, B.; Bosma, B.; Simmons, A.; Lomonaco, P.

2015-12-01

WEC-Sim is an open source code to model wave energy converters performance in operational waves, developed by Sandia and NREL and funded by the US DOE. The code is a time-domain modeling tool developed in MATLAB/SIMULINK using the multibody dynamics solver SimMechanics, and solves the WEC's governing equations of motion using the Cummins time-domain impulse response formulation in 6 degrees of freedom. The WEC-Sim code has undergone verification through code-to-code comparisons; however validation of the code has been limited to publicly available experimental data sets. While these data sets provide preliminary code validation, the experimental tests were not explicitly designed for code validation, and as a result are limited in their ability to validate the full functionality of the WEC-Sim code. Therefore, dedicated physical model tests for WEC-Sim validation have been performed. This presentation provides an overview of the WEC-Sim validation experimental wave tank tests performed at the Oregon State University's Directional Wave Basin at Hinsdale Wave Research Laboratory. Phase 1 of experimental testing was focused on device characterization and completed in Fall 2015. Phase 2 is focused on WEC performance and scheduled for Winter 2015/2016. These experimental tests were designed explicitly to validate the performance of WEC-Sim code, and its new feature additions. Upon completion, the WEC-Sim validation data set will be made publicly available to the wave energy community. For the physical model test, a controllable model of a floating wave energy converter has been designed and constructed. The instrumentation includes state-of-the-art devices to measure pressure fields, motions in 6 DOF, multi-axial load cells, torque transducers, position transducers, and encoders. The model also incorporates a fully programmable Power-Take-Off system which can be used to generate or absorb wave energy. Numerical simulations of the experiments using WEC-Sim will be presented. These simulations highlight the code features included in the latest release of WEC-Sim (v1.2), including: wave directionality, nonlinear hydrostatics and hydrodynamics, user-defined wave elevation time-series, state space radiation, and WEC-Sim compatibility with BEMIO (open source AQWA/WAMI/NEMOH coefficient parser).
Reliability and validity of pendulum test measures of spasticity obtained with the Polhemus tracking system from patients with chronic stroke

PubMed Central

Bohannon, Richard W; Harrison, Steven; Kinsella-Shaw, Jeffrey

2009-01-01

Background Spasticity is a common impairment accompanying stroke. Spasticity of the quadriceps femoris muscle can be quantified using the pendulum test. The measurement properties of pendular kinematics captured using a magnetic tracking system has not been studied among patients who have experienced a stroke. Therefore, this study describes the test-retest reliability and known groups and convergent validity of the pendulum test measures obtained with the Polhemus tracking system. Methods Eight patients with chronic stroke underwent pendulum tests with their affected and unaffected lower limbs, with and without the addition of a 2.2 kg cuff weight at the ankle, using the Polhemus magnetic tracking system. Also measured bilaterally were knee resting angles, Ashworth scores (grades 0–4) of quadriceps femoris muscles, patellar tendon (knee jerk) reflexes (grades 0–4), and isometric knee extension force. Results Three measures obtained from pendular traces of the affected side were reliable (intraclass correlation coefficient ≥ .844). Known groups validity was confirmed by demonstration of a significant difference in the measurements between sides. Convergent validity was supported by correlations ≥ .57 between pendulum test measures and other measures reflective of spasticity. Conclusion Pendulum test measures obtained with the Polhemus tracking system from the affected side of patients with stroke have good test-retest reliability and both known groups and convergent validity. PMID:19642989
Reliability and validity of pendulum test measures of spasticity obtained with the Polhemus tracking system from patients with chronic stroke.

PubMed

Bohannon, Richard W; Harrison, Steven; Kinsella-Shaw, Jeffrey

2009-07-30

Spasticity is a common impairment accompanying stroke. Spasticity of the quadriceps femoris muscle can be quantified using the pendulum test. The measurement properties of pendular kinematics captured using a magnetic tracking system has not been studied among patients who have experienced a stroke. Therefore, this study describes the test-retest reliability and known groups and convergent validity of the pendulum test measures obtained with the Polhemus tracking system. Eight patients with chronic stroke underwent pendulum tests with their affected and unaffected lower limbs, with and without the addition of a 2.2 kg cuff weight at the ankle, using the Polhemus magnetic tracking system. Also measured bilaterally were knee resting angles, Ashworth scores (grades 0-4) of quadriceps femoris muscles, patellar tendon (knee jerk) reflexes (grades 0-4), and isometric knee extension force. Three measures obtained from pendular traces of the affected side were reliable (intraclass correlation coefficient > or = .844). Known groups validity was confirmed by demonstration of a significant difference in the measurements between sides. Convergent validity was supported by correlations > or = .57 between pendulum test measures and other measures reflective of spasticity. Pendulum test measures obtained with the Polhemus tracking system from the affected side of patients with stroke have good test-retest reliability and both known groups and convergent validity.
NIH Toolbox Cognition Battery (NIHTB-CB): list sorting test to measure working memory.

PubMed

Tulsky, David S; Carlozzi, Noelle; Chiaravalloti, Nancy D; Beaumont, Jennifer L; Kisala, Pamela A; Mungas, Dan; Conway, Kevin; Gershon, Richard

2014-07-01

The List Sorting Working Memory Test was designed to assess working memory (WM) as part of the NIH Toolbox Cognition Battery. List Sorting is a sequencing task requiring children and adults to sort and sequence stimuli that are presented visually and auditorily. Validation data are presented for 268 participants ages 20 to 85 years. A subset of participants (N=89) was retested 7 to 21 days later. As expected, the List Sorting Test had moderately high correlations with other measures of working memory and executive functioning (convergent validity) but a low correlation with a test of receptive vocabulary (discriminant validity). Furthermore, List Sorting demonstrates expected changes over the age span and has excellent test-retest reliability. Collectively, these results provide initial support for the construct validity of the List Sorting Working Memory Measure as a measure of working memory. However, the relationship between the List Sorting Test and general executive function has yet to be determined.
Preliminary Validation of a New Measure of Negative Response Bias: The Temporal Memory Sequence Test.

PubMed

Hegedish, Omer; Kivilis, Naama; Hoofien, Dan

2015-01-01

The Temporal Memory Sequence Test (TMST) is a new measure of negative response bias (NRB) that was developed to enrich the forced-choice paradigm. The TMST does not resemble the common structure of forced-choice tests and is presented as a temporal recall memory test. The validation sample consisted of 81 participants: 21 healthy control participants, 20 coached simulators, and 40 patients with acquired brain injury (ABI). The TMST had high reliability and significantly high positive correlations with the Test of Memory Malingering and Word Memory Test effort scales. Moreover, the TMST effort scales exhibited high negative correlations with the Glasgow Coma Scale, thus validating the previously reported association between probable malingering and mild traumatic brain injury. A suggested cutoff score yielded acceptable classification rates in the ABI group as well as in the simulator and control groups. The TMST appears to be a promising measure of NRB detection, with respectable rates of reliability and construct and criterion validity.
NIH Toolbox Cognition Battery (NIHTB-CB): The List Sorting Test to Measure Working Memory

PubMed Central

Tulsky, David S.; Carlozzi, Noelle; Chiaravalloti, Nancy D.; Beaumont, Jennifer L.; Kisala, Pamela A.; Mungas, Dan; Conway, Kevin; Gershon, Richard

2015-01-01

The List Sorting Working Memory Test was designed to assess working memory (WM) as part of the NIH Toolbox Cognition Battery. List Sorting is a sequencing task requiring children and adults to sort and sequence stimuli that are presented visually and auditorily. Validation data are presented for 268 participants ages 20 to 85 years. A subset of participants (N=89) was retested 7 to 21 days later. As expected, the List Sorting Test had moderately high correlations with other measures of working memory and executive functioning (convergent validity) but a low correlation with a test of receptive vocabulary (discriminant validity). Furthermore, List Sorting demonstrates expected changes over the age span and has excellent test-retest reliability. Collectively, these results provide initial support the construct validity of the List Sorting Working Memory Measure as a measure of working memory. However, the relation between the List Sorting Test and general executive function has yet to be determined. PMID:24959983
Evaluating the dynamic response of in-flight thrust calculation techniques during throttle transients

NASA Technical Reports Server (NTRS)

Ray, Ronald J.

1994-01-01

New flight test maneuvers and analysis techniques for evaluating the dynamic response of in-flight thrust models during throttle transients have been developed and validated. The approach is based on the aircraft and engine performance relationship between thrust and drag. Two flight test maneuvers, a throttle step and a throttle frequency sweep, were developed and used in the study. Graphical analysis techniques, including a frequency domain analysis method, were also developed and evaluated. They provide quantitative and qualitative results. Four thrust calculation methods were used to demonstrate and validate the test technique. Flight test applications on two high-performance aircraft confirmed the test methods as valid and accurate. These maneuvers and analysis techniques were easy to implement and use. Flight test results indicate the analysis techniques can identify the combined effects of model error and instrumentation response limitations on the calculated thrust value. The methods developed in this report provide an accurate approach for evaluating, validating, or comparing thrust calculation methods for dynamic flight applications.
Differential Predictive Validity of a Preschool Battery Across Race and Sex.

ERIC Educational Resources Information Center

Reynolds, Cecil R.

Determination of the fairness of preschool tests for use with children of varying cultural backgrounds is the major objective of this study. The predictive validity of a battery of preschool tests, chosen to represent the core areas of preschool assessment, across race and sex, was evaluated. Validity of the battery was examined over a 12-month…
The Role of Psychometric Modeling in Test Validation: An Application of Multidimensional Item Response Theory

ERIC Educational Resources Information Center

Schilling, Stephen G.

2007-01-01

In this paper the author examines the role of item response theory (IRT), particularly multidimensional item response theory (MIRT) in test validation from a validity argument perspective. The author provides justification for several structural assumptions and interpretations, taking care to describe the role he believes they should play in any…
Validity of the Mayer-Salovey-Caruso Emotional Intelligence Test: Youth Version-Research Edition

ERIC Educational Resources Information Center

Peters, Christine; Kranzler, John H.; Rossen, Eric

2009-01-01

This study examines the criterion-related validity evidence of scores on the Mayer-Salovey-Caruso Emotional Intelligence Test: Youth Version-Research Version. The authors also investigate the relationship between scores on the MSCEIT-YV and chronological age. Results provide initial support for the construct validity of the MSCEIT-YV but also…
Developing a Validity Argument through Abductive Reasoning with an Empirical Demonstration of the Latent Class Analysis

ERIC Educational Resources Information Center

Wu, Amery D.; Stone, Jake E.; Liu, Yan

2016-01-01

This article proposes and demonstrates a methodology for test score validation through abductive reasoning. It describes how abductive reasoning can be utilized in support of the claims made about test score validity. This methodology is demonstrated with a real data example of the Canadian English Language Proficiency Index Program…

Development and Construct Validation of a Situational Judgment Test of Strategic Knowledge of Classroom Management in Elementary Schools

ERIC Educational Resources Information Center

Gold, Bernadette; Holodynski, Manfred

2015-01-01

The current study describes the development and construct validation of a situational judgment test for assessing the strategic knowledge of classroom management in elementary schools. Classroom scenarios and accompanying courses of action were constructed, of which 17 experts confirmed the content validity. A pilot study and a cross-validation…
Exploring the Reliability and Validity of the Social-Moral Awareness Test

ERIC Educational Resources Information Center

Livesey, Alexandra; Dodd, Karen; Pote, Helen; Marlow, Elizabeth

2012-01-01

Background: The aim of the study was to explore the validity of the social-moral awareness test (SMAT) a measure designed for assessing socio-moral rule knowledge and reasoning in people with learning disabilities. Comparisons between Theory of Mind and socio-moral reasoning allowed the exploration of construct validity of the tool. Factor…
Content Validation of the Comprehension of Written Grammar Assessment for Deaf and Hard of Hearing Students

ERIC Educational Resources Information Center

Cannon, Joanna E.; Hubley, Anita M.

2014-01-01

Content validation is a crucial, but often neglected, component of good test development. In the present study, content validity evidence was collected to determine the degree to which elements (e.g., grammatical structures, items, picture responses, administration, and scoring instructions) of the Comprehension of Written Grammar (CWG) test are…
Exploring Validity of Computer-Based Test Scores with Examinees' Response Behaviors and Response Times

ERIC Educational Resources Information Center

Sahin, Füsun

2017-01-01

Examining the testing processes, as well as the scores, is needed for a complete understanding of validity and fairness of computer-based assessments. Examinees' rapid-guessing and insufficient familiarity with computers have been found to be major issues that weaken the validity arguments of scores. This study has three goals: (a) improving…
Testing for purchasing power parity in 21 African countries using several unit root tests

NASA Astrophysics Data System (ADS)

Choji, Niri Martha; Sek, Siok Kun

2017-04-01

Purchasing power parity is used as a basis for international income and expenditure comparison through the exchange rate theory. However, empirical studies show disagreement on the validity of PPP. In this paper, we conduct the testing on the validity of PPP using panel data approach. We apply seven different panel unit root tests to test the validity of the purchasing power parity (PPP) hypothesis based on the quarterly data on real effective exchange rate for 21 African countries from the period 1971: Q1-2012: Q4. All the results of the seven tests rejected the hypothesis of stationarity meaning that absolute PPP does not hold in those African Countries. This result confirmed the claim from previous studies that standard panel unit tests fail to support the PPP hypothesis.
Methodology for testing and validating knowledge bases

NASA Technical Reports Server (NTRS)

Krishnamurthy, C.; Padalkar, S.; Sztipanovits, J.; Purves, B. R.

1987-01-01

A test and validation toolset developed for artificial intelligence programs is described. The basic premises of this method are: (1) knowledge bases have a strongly declarative character and represent mostly structural information about different domains, (2) the conditions for integrity, consistency, and correctness can be transformed into structural properties of knowledge bases, and (3) structural information and structural properties can be uniformly represented by graphs and checked by graph algorithms. The interactive test and validation environment have been implemented on a SUN workstation.
A simple randomisation procedure for validating discriminant analysis: a methodological note.

PubMed

Wastell, D G

1987-04-01

Because the goal of discriminant analysis (DA) is to optimise classification, it designedly exaggerates between-group differences. This bias complicates validation of DA. Jack-knifing has been used for validation but is inappropriate when stepwise selection (SWDA) is employed. A simple randomisation test is presented which is shown to give correct decisions for SWDA. The general superiority of randomisation tests over orthodox significance tests is discussed. Current work on non-parametric methods of estimating the error rates of prediction rules is briefly reviewed.
Tests for the Assessment of Sport-Specific Performance in Olympic Combat Sports: A Systematic Review With Practical Recommendations

PubMed Central

Chaabene, Helmi; Negra, Yassine; Bouguezzi, Raja; Capranica, Laura; Franchini, Emerson; Prieske, Olaf; Hbacha, Hamdi; Granacher, Urs

2018-01-01

The regular monitoring of physical fitness and sport-specific performance is important in elite sports to increase the likelihood of success in competition. This study aimed to systematically review and to critically appraise the methodological quality, validation data, and feasibility of the sport-specific performance assessment in Olympic combat sports like amateur boxing, fencing, judo, karate, taekwondo, and wrestling. A systematic search was conducted in the electronic databases PubMed, Google-Scholar, and Science-Direct up to October 2017. Studies in combat sports were included that reported validation data (e.g., reliability, validity, sensitivity) of sport-specific tests. Overall, 39 studies were eligible for inclusion in this review. The majority of studies (74%) contained sample sizes <30 subjects. Nearly, 1/3 of the reviewed studies lacked a sufficient description (e.g., anthropometrics, age, expertise level) of the included participants. Seventy-two percent of studies did not sufficiently report inclusion/exclusion criteria of their participants. In 62% of the included studies, the description and/or inclusion of a familiarization session (s) was either incomplete or not existent. Sixty-percent of studies did not report any details about the stability of testing conditions. Approximately half of the studies examined reliability measures of the included sport-specific tests (intraclass correlation coefficient [ICC] = 0.43–1.00). Content validity was addressed in all included studies, criterion validity (only the concurrent aspect of it) in approximately half of the studies with correlation coefficients ranging from r = −0.41 to 0.90. Construct validity was reported in 31% of the included studies and predictive validity in only one. Test sensitivity was addressed in 13% of the included studies. The majority of studies (64%) ignored and/or provided incomplete information on test feasibility and methodological limitations of the sport-specific test. In 28% of the included studies, insufficient information or a complete lack of information was provided in the respective field of the test application. Several methodological gaps exist in studies that used sport-specific performance tests in Olympic combat sports. Additional research should adopt more rigorous validation procedures in the application and description of sport-specific performance tests in Olympic combat sports. PMID:29692739
The Use of Variants of the Trail Making Test in Serial Assessment: A Construct Validity Study

ERIC Educational Resources Information Center

Atkinson, Thomas M.; Ryan, Jeanne P.

2008-01-01

The construct validity of three variants of the Trail Making Test was investigated using 162 undergraduate psychology students. During a 3-week period, the Trail Making Test of the Delis-Kaplan Executive Function System, Comprehensive Trail Making Test, and Connections Task were administered in six possible orders. Using confirmatory factor…
Construction of Economics Achievement Test for Assessment of Students

ERIC Educational Resources Information Center

Osadebe, P. U.

2014-01-01

The study was carried out to construct a valid and reliable test in Economics for secondary school students. Two research questions were drawn to guide the establishment of validity and reliability for the Economics Achievement Test (EAT). It is a multiple choice objective test of five options with 100 items. A sample of 1000 students was randomly…
Measuring Explicit and Implicit Knowledge: A Psychometric Study in SLA

ERIC Educational Resources Information Center

Ebadi, Mandana Rohollahzadeh; Abedalaziz, Nabeel; Saad, Mohd Rashid Mohd

2015-01-01

Lack of valid means of measuring explicit and implicit knowledge in acquisition of second language is a concern issue in investigations of explicit and implicit learning. This paper endeavors to validate the use of four tests (i.e., Untimed Judgment Grammatical Test, UJGT; Test of Metalinguistic Knowledge, TMK; Elicited Oral Imitation Test, EOIT;…
78 FR 27240 - Agency Information Collection Activities; Submission to OMB for Review and Approval; Public...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-05-09

... study, a cognitive pre-test will be conducted to refine and test the face validity and internal validity... methodologies and procedures. The pre-test will include cognitive interviews to ensure that the questions are being understood as were intended. Interviews conducted in the pre-test and the national study are...
Validity of the Optometry Admission Test in Predicting Performance in Schools and Colleges of Optometry.

ERIC Educational Resources Information Center

Kramer, Gene A.; Johnston, JoElle

1997-01-01

A study examined the relationship between Optometry Admission Test scores and pre-optometry or undergraduate grade point average (GPA) with first and second year performance in optometry schools. The test's predictive validity was limited but significant, and comparable to those reported for other admission tests. In addition, the scores…
Investigating Administered Essay and Multiple-Choice Tests in the English Department of Islamic Azad University, Hamedan Branch

ERIC Educational Resources Information Center

Karimi, Lotfollah; Mehrdad, Ali Gholami

2012-01-01

This study has attempted to investigate the administered written tests in the language department of Islamic Azad University of Hamedan, Iran from validity, practicality and reliability points of view. To this end two steps were taken. First, examining 112 tests, we knew that the face validity of 50 tests had been threatened, 9 tests lacked…
The Importance of Symptom Validity Testing in Adolescents and Young Adults Undergoing Assessments for Learning or Attention Difficulties

ERIC Educational Resources Information Center

Harrison, Allyson G.; Green, Paul; Flaro, Lloyd

2012-01-01

It is almost self-evident that test results will be unreliable and misleading if those undergoing assessments do not make a full effort on testing. Nevertheless, objective tests of effort have not typically been used with young adults to determine whether test results are valid or not. Because of the potential economic and/or recreational benefits…
Computer Literacy and the Construct Validity of a High-Stakes Computer-Based Writing Assessment

ERIC Educational Resources Information Center

Jin, Yan; Yan, Ming

2017-01-01

One major threat to validity in high-stakes testing is construct-irrelevant variance. In this study we explored whether the transition from a paper-and-pencil to a computer-based test mode in a high-stakes test in China, the College English Test, has brought about variance irrelevant to the construct being assessed in this test. Analyses of the…
An entropy-based nonparametric test for the validation of surrogate endpoints.

PubMed

Miao, Xiaopeng; Wang, Yong-Cheng; Gangopadhyay, Ashis

2012-06-30

We present a nonparametric test to validate surrogate endpoints based on measure of divergence and random permutation. This test is a proposal to directly verify the Prentice statistical definition of surrogacy. The test does not impose distributional assumptions on the endpoints, and it is robust to model misspecification. Our simulation study shows that the proposed nonparametric test outperforms the practical test of the Prentice criterion in terms of both robustness of size and power. We also evaluate the performance of three leading methods that attempt to quantify the effect of surrogate endpoints. The proposed method is applied to validate magnetic resonance imaging lesions as the surrogate endpoint for clinical relapses in a multiple sclerosis trial. Copyright © 2012 John Wiley & Sons, Ltd.
Use of the color trails test as an embedded measure of performance validity.

PubMed

Henry, George K; Algina, James

2013-01-01

One hundred personal injury litigants and disability claimants referred for a forensic neuropsychological evaluation were administered both portions of the Color Trails Test (CTT) as part of a more comprehensive battery of standardized tests. Subjects who failed two or more free-standing tests of cognitive performance validity formed the Failed Performance Validity (FPV) group, while subjects who passed all free-standing performance validity measures were assigned to the Passed Performance Validity (PPV) group. A cutscore of ≥45 seconds to complete Color Trails 1 (CT1) was associated with a classification accuracy of 78%, good sensitivity (66%) and high specificity (90%), while a cutscore of ≥84 seconds to complete Color Trails 2 (CT2) was associated with a classification accuracy of 82%, good sensitivity (74%) and high specificity (90%). A CT1 cutscore of ≥58 seconds, and a CT2 cutscore ≥100 seconds was associated with 100% positive predictive power at base rates from 20 to 50%.
Real-Time Sensor Validation, Signal Reconstruction, and Feature Detection for an RLV Propulsion Testbed

NASA Technical Reports Server (NTRS)

Jankovsky, Amy L.; Fulton, Christopher E.; Binder, Michael P.; Maul, William A., III; Meyer, Claudia M.

1998-01-01

A real-time system for validating sensor health has been developed in support of the reusable launch vehicle program. This system was designed for use in a propulsion testbed as part of an overall effort to improve the safety, diagnostic capability, and cost of operation of the testbed. The sensor validation system was designed and developed at the NASA Lewis Research Center and integrated into a propulsion checkout and control system as part of an industry-NASA partnership, led by Rockwell International for the Marshall Space Flight Center. The system includes modules for sensor validation, signal reconstruction, and feature detection and was designed to maximize portability to other applications. Review of test data from initial integration testing verified real-time operation and showed the system to perform correctly on both hard and soft sensor failure test cases. This paper discusses the design of the sensor validation and supporting modules developed at LeRC and reviews results obtained from initial test cases.
The Validity and Reliability Test of the Indonesian Version of Gastroesophageal Reflux Disease Quality of Life (GERD-QOL) Questionnaire.

PubMed

Siahaan, Laura A; Syam, Ari F; Simadibrata, Marcellus; Setiati, Siti

2017-01-01

to obtain a valid and reliable GERD-QOL questionnaire for Indonesian application. at the initial stage, the GERD-QOL questionnaire was first translated into Indonesian language and the translated questionnaire was subsequently translated back into the original language (back-to-back translation). The results were evaluated by the researcher team and therefore, an Indonesian version of GERD-QOL questionnaire was developed. Ninety-one patients who had been clinically diagnosed with GERD based on the Montreal criteria were interviewed using the Indonesian version of GERD-QOL questionnaire and the SF 36 questionnaire. The validity was evaluated using a method of construct validity and external validity, and reliability can be tested by the method of internal consistency and test retest. the Indonesian version of GERD-QOL questionnaire had a good internal consistency reliability with a Cronbach Alpha of 0.687-0.842 and a good test retest reliability with an intra-class correlation coefficient of 0.756-0.936; p<0.05). The questionnaire had also been demonstrated to have a good validity with a proven high correlation to each question of SF-36 (p<0.05). the Indonesian version of GERD-QOL questionnaire has been proven valid and reliable to evaluate the quality of life of GERD patients.

Validation of a pregnancy planning measure for Arabic-speaking women.

PubMed

Almaghaslah, Eman; Rochat, Roger; Farhat, Ghada

2017-01-01

The prevalence of unplanned pregnancy in Saudi Arabia has not been thoroughly investigated. To conduct a psychometric evaluation study of the Arabic version of the London Measure of Unplanned Pregnancy (LMUP). To evaluate the psychometric properties of the LMUP, we conducted a self-administered online survey among 796 ever-married Saudi women aged 20-49 years, and a re-test survey among 24 women. The psychometric properties evaluated included content validity measured by content validity index (CVI), structural validity assessed by exploratory factor analysis (EFA), substantive validity assessed by hypothesis testing, contextual stability for the test-retest assessed by weighted Kappa, and internal consistency assessed by Cronbach's alpha. The psychometric analysis of the Arabic version of LMUP exhibited valid and reliable properties. The CVIs for individual items and at the scale level were >0.7. EFA confirmed a unidimensional extraction of the scale item. Hypothesis testing confirmed expected associations. The tool was stable with weighted kappa = 0.78 and Cronbach's alpha = 0.88. In this study, the validity and reliability of the Arabic version of the LMUP were confirmed according to well-known psychometric criteria. This LMUP version can be used in research studies among Arabic-speaking women to measure unplanned pregnancy and investigate correlates and outcomes related to unplanned pregnancy.
Solar Tower Experiments for Radiometric Calibration and Validation of Infrared Imaging Assets and Analysis Tools for Entry Aero-Heating Measurements

NASA Technical Reports Server (NTRS)

Splinter, Scott C.; Daryabeigi, Kamran; Horvath, Thomas J.; Mercer, David C.; Ghanbari, Cheryl M.; Ross, Martin N.; Tietjen, Alan; Schwartz, Richard J.

2008-01-01

The NASA Engineering and Safety Center sponsored Hypersonic Thermodynamic Infrared Measurements assessment team has a task to perform radiometric calibration and validation of land-based and airborne infrared imaging assets and tools for remote thermographic imaging. The IR assets and tools will be used for thermographic imaging of the Space Shuttle Orbiter during entry aero-heating to provide flight boundary layer transition thermography data that could be utilized for calibration and validation of empirical and theoretical aero-heating tools. A series of tests at the Sandia National Laboratories National Solar Thermal Test Facility were designed for this task where reflected solar radiation from a field of heliostats was used to heat a 4 foot by 4 foot test panel consisting of LI 900 ceramic tiles located on top of the 200 foot tall Solar Tower. The test panel provided an Orbiter-like entry temperature for the purposes of radiometric calibration and validation. The Solar Tower provided an ideal test bed for this series of radiometric calibration and validation tests because it had the potential to rapidly heat the large test panel to spatially uniform and non-uniform elevated temperatures. Also, the unsheltered-open-air environment of the Solar Tower was conducive to obtaining unobstructed radiometric data by land-based and airborne IR imaging assets. Various thermocouples installed on the test panel and an infrared imager located in close proximity to the test panel were used to obtain surface temperature measurements for evaluation and calibration of the radiometric data from the infrared imaging assets. The overall test environment, test article, test approach, and typical test results are discussed.
Development and validation of Dutch version of Lasater Clinical Judgment Rubric in hospital practice: An instrument design study.

PubMed

Vreugdenhil, Jettie; Spek, Bea

2018-03-01

Clinical reasoning in patient care is a skill that cannot be observed directly. So far, no reliable, valid instrument exists for the assessment of nursing students' clinical reasoning skills in hospital practice. Lasater's clinical judgment rubric (LCJR), based on Tanner's model "Thinking like a nurse" has been tested, mainly in academic simulation settings. The aim is to develop a Dutch version of the LCJR (D-LCJR) and to test its psychometric properties when used in a hospital traineeship context. A mixed-model approach was used to develop and to validate the instrument. Ten dedicated educational units in a university hospital. A well-mixed group of 52 nursing students, nurse coaches and nurse educators. A Delphi panel developed the D-LCJR. Students' clinical reasoning skills were assessed "live" by nurse coaches, nurse educators and students who rated themselves. The psychometric properties tested during the assessment process are reliability, reproducibility, content validity and construct validity by testing two hypothesis: 1) a positive correlation between assessed and self-reported sum scores (convergent validity) and 2) a linear relation between experience and sum score (clinical validity). The obtained D-LCJR was found to be internally consistent, Cronbach's alpha 0.93. The rubric is also reproducible with intraclass correlations between 0.69 and 0.78. Experts judged it to be content valid. The two hypothesis were both tested significant, supporting evidence for construct validity. The translated and modified LCJR, is a promising tool for the evaluation of nursing students' development in clinical reasoning in hospital traineeships, by students, nurse coaches and nurse educators. More evidence on construct validity is necessary, in particular for students at the end of their hospital traineeship. Based on our research, the D-LCJR applied in hospital traineeships is a usable and reliable tool. Copyright © 2017 Elsevier Ltd. All rights reserved.
Establishing the reliability and concurrent validity of physical performance tests using virtual reality equipment for community-dwelling healthy elders.

PubMed

Griswold, David; Rockwell, Kyle; Killa, Carri; Maurer, Michael; Landgraff, Nancy; Learman, Ken

2015-01-01

The aim of this study was to determine the reliability and concurrent validity of commonly used physical performance tests using the OmniVR Virtual Rehabilitation System for healthy community-dwelling elders. Participants (N = 40) were recruited by the authors and were screened for eligibility. The initial method of measurement was randomized to either virtual reality (VR) or clinically based measures (CM). Physical performance tests included the five times sit to stand, Timed Up and Go (TUG), Forward Functional Reach (FFR) and 30-s stand test. A random number generator determined the testing order. The test-re-test reliability for the VR and CM was determined. Furthermore, concurrent validity was determined using a Pearson product moment correlation (Pearson r). The VR demonstrated excellent reliability for 5 × STS intraclass correlation coefficient (ICC) = 0.931(3,1), FFR ICC = 0.846(3,1) and the TUG ICC = 0.944(3,1). The concurrent validity data for the VR and CM (ICC 3, k) were moderate for FFR ICC = 0.682, excellent 5 × STS ICC = 0.889 and excellent for the TUG ICC = 0.878. The concurrent validity of the 30-s stand test was good ICC = 0.735(3,1). This study supports the use of VR equipment for measuring physical performance tests in the clinic for healthy community-dwelling elders. Virtual reality equipment is not only used to treat balance impairments but it is also used to measure and determine physical impairments through the use of physical performance tests. Virtual reality equipment is a reliable and valid tool for collecting physical performance data for the 5 × STS, FFR, TUG and 30-s stand test for healthy community-dwelling elders.
Reinforced Carbon-Carbon Subcomponent Flat Plate Impact Testing for Space Shuttle Orbiter Return to Flight

NASA Technical Reports Server (NTRS)

Melis, Matthew E.; Brand, Jeremy H.; Pereira, J. Michael; Revilock, Duane M.

2007-01-01

Following the tragedy of the Space Shuttle Columbia on February 1, 2003, a major effort commenced to develop a better understanding of debris impacts and their effect on the Space Shuttle subsystems. An initiative to develop and validate physics-based computer models to predict damage from such impacts was a fundamental component of this effort. To develop the models it was necessary to physically characterize Reinforced Carbon-Carbon (RCC) and various debris materials which could potentially shed on ascent and impact the Orbiter RCC leading edges. The validated models enabled the launch system community to use the impact analysis software LS DYNA to predict damage by potential and actual impact events on the Orbiter leading edge and nose cap thermal protection systems. Validation of the material models was done through a three-level approach: fundamental tests to obtain independent static and dynamic material model properties of materials of interest, sub-component impact tests to provide highly controlled impact test data for the correlation and validation of the models, and full-scale impact tests to establish the final level of confidence for the analysis methodology. This paper discusses the second level subcomponent test program in detail and its application to the LS DYNA model validation process. The level two testing consisted of over one hundred impact tests in the NASA Glenn Research Center Ballistic Impact Lab on 6 by 6 in. and 6 by 12 in. flat plates of RCC and evaluated three types of debris projectiles: BX 265 External Tank foam, ice, and PDL 1034 External Tank foam. These impact tests helped determine the level of damage generated in the RCC flat plates by each projectile. The information obtained from this testing validated the LS DYNA damage prediction models and provided a certain level of confidence to begin performing analysis for full-size RCC test articles for returning NASA to flight with STS 114 and beyond.
Criterion-Related Validity of the Distance- and Time-Based Walk/Run Field Tests for Estimating Cardiorespiratory Fitness: A Systematic Review and Meta-Analysis

PubMed Central

Mayorga-Vega, Daniel; Bocanegra-Parrilla, Raúl; Ornelas, Martha; Viciana, Jesús

2016-01-01

Objectives The main purpose of the present meta-analysis was to examine the criterion-related validity of the distance- and time-based walk/run tests for estimating cardiorespiratory fitness among apparently healthy children and adults. Materials and Methods Relevant studies were searched from seven electronic bibliographic databases up to August 2015 and through other sources. The Hunter-Schmidt’s psychometric meta-analysis approach was conducted to estimate the population criterion-related validity of the following walk/run tests: 5,000 m, 3 miles, 2 miles, 3,000 m, 1.5 miles, 1 mile, 1,000 m, ½ mile, 600 m, 600 yd, ¼ mile, 15 min, 12 min, 9 min, and 6 min. Results From the 123 included studies, a total of 200 correlation values were analyzed. The overall results showed that the criterion-related validity of the walk/run tests for estimating maximum oxygen uptake ranged from low to moderate (rp = 0.42–0.79), with the 1.5 mile (rp = 0.79, 0.73–0.85) and 12 min walk/run tests (rp = 0.78, 0.72–0.83) having the higher criterion-related validity for distance- and time-based field tests, respectively. The present meta-analysis also showed that sex, age and maximum oxygen uptake level do not seem to affect the criterion-related validity of the walk/run tests. Conclusions When the evaluation of an individual’s maximum oxygen uptake attained during a laboratory test is not feasible, the 1.5 mile and 12 min walk/run tests represent useful alternatives for estimating cardiorespiratory fitness. As in the assessment with any physical fitness field test, evaluators must be aware that the performance score of the walk/run field tests is simply an estimation and not a direct measure of cardiorespiratory fitness. PMID:26987118
Reliability and validity of an audio signal modified shuttle walk test.

PubMed

Singla, Rupak; Rai, Richa; Faye, Abhishek Anil; Jain, Anil Kumar; Chowdhury, Ranadip; Bandyopadhyay, Debdutta

2017-01-01

The audio signal in the conventionally accepted protocol of shuttle walk test (SWT) is not well-understood by the patients and modification of the audio signal may improve the performance of the test. The aim of this study is to study the validity and reliability of an audio signal modified SWT, called the Singla-Richa modified SWT (SWTSR), in healthy normal adults. In SWTSR, the audio signal was modified with the addition of reverse counting to it. A total of 54 healthy normal adults underwent conventional SWT (CSWT) at one instance and two times SWTSRon the same day. The validity was assessed by comparing outcomes of the SWTSRto outcomes of CSWT using the Pearson correlation coefficient and Bland-Altman plot. Test-retest reliability of SWTSRwas assessed using the intraclass correlation coefficient (ICC). The acceptability of the modified test in comparison to the conventional test was assessed using Likert scale. The distance walked (mean ± standard deviation) in the CSWT and SWTSRtest was 853.33 ± 217.33 m and 857.22 ± 219.56 m, respectively (Pearson correlation coefficient - 0.98; P < 0.001) indicating SWTSRto be a valid test. The SWTSRwas found to be a reliable test with ICC of 0.98 (95% confidence interval: 0.97-0.99). The acceptability of SWTSRwas significantly higher than CSWT. The SWTSRwith modified audio signal with reverse counting is a reliable as well as a valid test when compared with CSWT in healthy normal adults. It better understood by subjects compared to CSWT.
Validity and Reliability of Baseline Testing in a Standardized Environment.

PubMed

Higgins, Kathryn L; Caze, Todd; Maerlender, Arthur

2017-08-11

The Immediate Postconcussion Assessment and Cognitive Testing (ImPACT) is a computerized neuropsychological test battery commonly used to determine cognitive recovery from concussion based on comparing post-injury scores to baseline scores. This model is based on the premise that ImPACT baseline test scores are a valid and reliable measure of optimal cognitive function at baseline. Growing evidence suggests that this premise may not be accurate and a large contributor to invalid and unreliable baseline test scores may be the protocol and environment in which baseline tests are administered. This study examined the effects of a standardized environment and administration protocol on the reliability and performance validity of athletes' baseline test scores on ImPACT by comparing scores obtained in two different group-testing settings. Three hundred-sixty one Division 1 cohort-matched collegiate athletes' baseline data were assessed using a variety of indicators of potential performance invalidity; internal reliability was also examined. Thirty-one to thirty-nine percent of the baseline cases had at least one indicator of low performance validity, but there were no significant differences in validity indicators based on environment in which the testing was conducted. Internal consistency reliability scores were in the acceptable to good range, with no significant differences between administration conditions. These results suggest that athletes may be reliably performing at levels lower than their best effort would produce. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
The Category Cued Recall test in very mild Alzheimer's disease: discriminative validity and correlation with semantic memory functions.

PubMed

Vogel, A; Mortensen, E L; Gade, A; Waldemar, G

2007-01-01

Episodic memory tests that measure cued recall may be particularly effective in the diagnosis of early Alzheimer's disease (AD) because they examine both episodic and semantic memory functions. The Category Cued Recall (CCR) test provides superordinate semantic cues at encoding and retrieval, and high discriminative validity has been claimed for this test. The aim of this study was to investigate the discriminative validity for this test when compared with the 10-word memory list from Alzheimer's Disease Assessment Scale (ADAS-cog) that measures free recall. The clinical diagnosis of AD was taken as the standard. It was also investigated whether the two episodic memory tests correlated with measures of semantic memory. The tests were administered to 35 patients with very mild AD (Mini Mental State Examination score >22) and 28 control subjects. Both tests had high sensitivity (>88%) with high specificity (>89%). One out of the five semantic memory tests was significantly correlated to performances on CCR, whereas delayed recall on the ADAS-cog memory test was significantly correlated to two semantic tests. In conclusion, the discriminative validity of the CCR test and the ADAS-cog memory test was equivalent in very mild AD. This may be because CCR did not tap more semantic processes, which are impaired in the earliest phases of AD, than a test of free recall.
Testing Reading Comprehension of Theoretical Discourse with Cloze.

ERIC Educational Resources Information Center

Greene, Benjamin B., Jr.

2001-01-01

Presents evidence from a large sample of reading test scores for the validity of cloze-based assessments of reading comprehension for the discourse typically encountered in introductory college economics textbooks. Notes that results provide strong evidence that appropriately designed cloze tests permit valid assessments of reading comprehension…
Test Use and Abuse

ERIC Educational Resources Information Center

Tienken, Christopher H.

2015-01-01

The ubiquitous use of standardized test results to make varied judgments about educators, students, and schools within the public school system raises concerns of validity. If the test results have not been validated for making multiple determinations, then the decisions made about educators, students, schools, and school districts based on the…
Reliability and Validity of the Standing Heel-Rise Test

ERIC Educational Resources Information Center

Yocum, Allison; McCoy, Sarah Westcott; Bjornson, Kristie F.; Mullens, Pamela; Burton, Gay Naganuma

2010-01-01

A standardized protocol for a pediatric heel-rise test was developed and reliability and validity are reported. Fifty-seven children developing typically (CDT) and 34 children with plantar flexion weakness performed three tests: unilateral heel rise, vertical jump, and force measurement using handheld dynamometry. Intraclass correlation…
Test Design Considerations for Students with Significant Cognitive Disabilities

ERIC Educational Resources Information Center

Anderson, Daniel; Farley, Dan; Tindal, Gerald

2015-01-01

Students with significant cognitive disabilities present an assessment dilemma that centers on access and validity in large-scale testing programs. Typically, access is improved by eliminating construct-irrelevant barriers, while validity is improved, in part, through test standardization. In this article, one state's alternate assessment data…
10 CFR 26.5 - Definitions.

Code of Federal Regulations, 2010 CFR

2010-01-01

... means a group of validity screening tests that were made from the same starting material. ... past 24 hours. Adulterated specimen means a urine specimen that has been altered, as evidenced by test... whole specimen. Analytical run means the process of testing a group of urine specimens for validity or...
10 CFR 26.5 - Definitions.

Code of Federal Regulations, 2013 CFR

2013-01-01

... means a group of validity screening tests that were made from the same starting material. ... past 24 hours. Adulterated specimen means a urine specimen that has been altered, as evidenced by test... whole specimen. Analytical run means the process of testing a group of urine specimens for validity or...
10 CFR 26.5 - Definitions.

Code of Federal Regulations, 2014 CFR

2014-01-01

... means a group of validity screening tests that were made from the same starting material. ... past 24 hours. Adulterated specimen means a urine specimen that has been altered, as evidenced by test... whole specimen. Analytical run means the process of testing a group of urine specimens for validity or...
10 CFR 26.5 - Definitions.

Code of Federal Regulations, 2012 CFR

2012-01-01

... means a group of validity screening tests that were made from the same starting material. ... past 24 hours. Adulterated specimen means a urine specimen that has been altered, as evidenced by test... whole specimen. Analytical run means the process of testing a group of urine specimens for validity or...
10 CFR 26.5 - Definitions.

Code of Federal Regulations, 2011 CFR

2011-01-01

... means a group of validity screening tests that were made from the same starting material. ... past 24 hours. Adulterated specimen means a urine specimen that has been altered, as evidenced by test... whole specimen. Analytical run means the process of testing a group of urine specimens for validity or...
Psychological Testing and Psychological Assessment: A Review of Evidence and Issues.

ERIC Educational Resources Information Center

Meyer, Gregory J.; Finn, Stephen E.; Eyde, Lorraine D.; Kay, Gary G.; Moreland, Kevin L.; Dies, Robert R.; Eisman, Elena J.; Kubiszyn, Tom W.; Reed, Geoffrey M.

2001-01-01

Summarizes issues associated with psychological assessment, concluding that: psychological test validity is strong and is comparable to medical test validity; distinct assessment methods provide unique sources of information; and clinicians who rely solely on interviews are prone to incomplete understandings. Suggests that multimethod assessment…
Reliability and criterion-related validity testing (construct) of the Endotracheal Suction Assessment Tool (ESAT©).

PubMed

Davies, Kylie; Bulsara, Max K; Ramelet, Anne-Sylvie; Monterosso, Leanne

2018-05-01

To establish criterion-related construct validity and test-retest reliability for the Endotracheal Suction Assessment Tool© (ESAT©). Endotracheal tube suction performed in children can significantly affect clinical stability. Previously identified clinical indicators for endotracheal tube suction were used as criteria when designing the ESAT©. Content validity was reported previously. The final stages of psychometric testing are presented. Observational testing was used to measure construct validity and determine whether the ESAT© could guide "inexperienced" paediatric intensive care nurses' decision-making regarding endotracheal tube suction. Test-retest reliability of the ESAT© was performed at two time points. The researchers and paediatric intensive care nurse "experts" developed 10 hypothetical clinical scenarios with predetermined endotracheal tube suction outcomes. "Experienced" (n = 12) and "inexperienced" (n = 14) paediatric intensive care nurses were presented with the scenarios and the ESAT© guiding decision-making about whether to perform endotracheal tube suction for each scenario. Outcomes were compared with those predetermined by the "experts" (n = 9). Test-retest reliability of the ESAT© was measured at two consecutive time points (4 weeks apart) with "experienced" and "inexperienced" paediatric intensive care nurses using the same scenarios and tool to guide decision-making. No differences were observed between endotracheal tube suction decisions made by "experts" (n = 9), "inexperienced" (n = 14) and "experienced" (n = 12) nurses confirming the tool's construct validity. No differences were observed between groups for endotracheal tube suction decisions at T1 and T2. Criterion-related construct validity and test-retest reliability of the ESAT© were demonstrated. Further testing is recommended to confirm reliability in the clinical setting with the "inexperienced" nurse to guide decision-making related to endotracheal tube suction. The ESAT© is the first validated tool to systematically guide endotracheal nursing practice for the "inexperienced" nurse. © 2018 John Wiley & Sons Ltd.

Isokinetic knee strength qualities as predictors of jumping performance in high-level volleyball athletes: multiple regression approach.

PubMed

Sattler, Tine; Sekulic, Damir; Spasic, Miodrag; Osmankac, Nedzad; Vicente João, Paulo; Dervisevic, Edvin; Hadzic, Vedran

2016-01-01

Previous investigations noted potential importance of isokinetic strength in rapid muscular performances, such as jumping. This study aimed to identify the influence of isokinetic-knee-strength on specific jumping performance in volleyball. The secondary aim of the study was to evaluate reliability and validity of the two volleyball-specific jumping tests. The sample comprised 67 female (21.96±3.79 years; 68.26±8.52 kg; 174.43±6.85 cm) and 99 male (23.62±5.27 years; 84.83±10.37 kg; 189.01±7.21 cm) high- volleyball players who competed in 1st and 2nd National Division. Subjects were randomly divided into validation (N.=55 and 33 for males and females, respectively) and cross-validation subsamples (N.=54 and 34 for males and females, respectively). Set of predictors included isokinetic tests, to evaluate the eccentric and concentric strength capacities of the knee extensors, and flexors for dominant and non-dominant leg. The main outcome measure for the isokinetic testing was peak torque (PT) which was later normalized for body mass and expressed as PT/Kg. Block-jump and spike-jump performances were measured over three trials, and observed as criteria. Forward stepwise multiple regressions were calculated for validation subsamples and then cross-validated. Cross validation included correlations between and t-test differences between observed and predicted scores; and Bland Altman graphics. Jumping tests were found to be reliable (spike jump: ICC of 0.79 and 0.86; block-jump: ICC of 0.86 and 0.90; for males and females, respectively), and their validity was confirmed by significant t-test differences between 1st vs. 2nd division players. Isokinetic variables were found to be significant predictors of jumping performance in females, but not among males. In females, the isokinetic-knee measures were shown to be stronger and more valid predictors of the block-jump (42% and 64% of the explained variance for validation and cross-validation subsample, respectively) than that of the spike-jump (39% and 34% of the explained variance for validation and cross-validation subsample, respectively). Differences between prediction models calculated for males and females are mostly explained by gender-specific biomechanics of jumping. Study defined importance of knee-isokinetic-strength in volleyball jumping performance in female athletes. Further studies should evaluate association between ankle-isokinetic-strength and volleyball-specific jumping performances. Results reinforce the need for the cross-validation of the prediction-models in sport and exercise sciences.
Testing for the validity of purchasing power parity theory both in the long-run and the short-run for ASEAN-5

NASA Astrophysics Data System (ADS)

Choji, Niri Martha; Sek, Siok Kun

2017-11-01

The purchasing power parity theory says that the trade rates among two nations ought to be equivalent to the proportion of the total price levels between the two nations. For more than a decade, there has been substantial interest in testing for the validity of the Purchasing Power Parity (PPP) empirically. This paper performs a series of tests to see if PPP is valid for ASEAN-5 nations for the period of 2000-2016 using monthly data. For this purpose, we conducted four different tests of stationarity, two cointegration tests (Pedroni and Westerlund), and also the VAR model. The stationarity (unit root) tests reveal that the variables are not stationary at levels however stationary at first difference. Cointegration test results did not reject the H0 of no cointegration implying the absence long-run association among the variables and results of the VAR model did not reveal a strong short-run relationship. Based on the data, we, therefore, conclude that PPP is not valid in long-and short-run for ASEAN-5 during 2000-2016.
EUCLID/NISP GRISM qualification model AIT/AIV campaign: optical, mechanical, thermal and vibration tests

NASA Astrophysics Data System (ADS)

Caillat, A.; Costille, A.; Pascal, S.; Rossin, C.; Vives, S.; Foulon, B.; Sanchez, P.

2017-09-01

Dark matter and dark energy mysteries will be explored by the Euclid ESA M-class space mission which will be launched in 2020. Millions of galaxies will be surveyed through visible imagery and NIR imagery and spectroscopy in order to map in three dimensions the Universe at different evolution stages over the past 10 billion years. The massive NIR spectroscopic survey will be done efficiently by the NISP instrument thanks to the use of grisms (for "Grating pRISMs") developed under the responsibility of the LAM. In this paper, we present the verification philosophy applied to test and validate each grism before the delivery to the project. The test sequence covers a large set of verifications: optical tests to validate efficiency and WFE of the component, mechanical tests to validate the robustness to vibration, thermal tests to validate its behavior in cryogenic environment and a complete metrology of the assembled component. We show the test results obtained on the first grism Engineering and Qualification Model (EQM) which will be delivered to the NISP project in fall 2016.
A Psychometric Study of the Bayley Scales of Infant and Toddler Development in Persian Language Children.

PubMed

Azari, Nadia; Soleimani, Farin; Vameghi, Roshanak; Sajedi, Firoozeh; Shahshahani, Soheila; Karimi, Hossein; Kraskian, Adis; Shahrokhi, Amin; Teymouri, Robab; Gharib, Masoud

2017-01-01

Bayley Scales of infant & toddler development is a well-known diagnostic developmental assessment tool for children aged 1-42 months. Our aim was investigating the validity & reliability of this scale in Persian speaking children. The method was descriptive-analytic. Translation- back translation and cultural adaptation was done. Content & face validity of translated scale was determined by experts' opinions. Overall, 403 children aged 1 to 42 months were recruited from health centers of Tehran, during years of 2013-2014 for developmental assessment in cognitive, communicative (receptive & expressive) and motor (fine & gross) domains. Reliability of scale was calculated through three methods; internal consistency using Cronbach's alpha coefficient, test-retest and interrater methods. Construct validity was calculated using factor analysis and comparison of the mean scores methods. Cultural and linguistic changes were made in items of all domains especially on communication subscale. Content and face validity of the test were approved by experts' opinions. Cronbach's alpha coefficient was above 0.74 in all domains. Pearson correlation coefficient in various domains, were ≥ 0.982 in test retest method, and ≥0.993 in inter-rater method. Construct validity of the test was approved by factor analysis. Moreover, the mean scores for the different age groups were compared and statistically significant differences were observed between mean scores of different age groups, that confirms validity of the test. The Bayley Scales of Infant and Toddler Development is a valid and reliable tool for child developmental assessment in Persian language children.
[Selection of risk and diagnosis in diabetic polyneuropathy. Validation of method of new systems].

PubMed

Jurado, Jerónimo; Caula, Jacinto; Pou i Torelló, Josep Maria

2006-06-30

In a previous study we developed a specific algorithm, the polyneuropathy selection method (PSM) with 4 parameters (age, HDL-C, HbA1c, and retinopathy), to select patients at risk of diabetic polyneuropathy (DPN). We also developed a simplified method for DPN diagnosis: outpatient polyneuropathy diagnosis (OPD), with 4 variables (symptoms and 3 objective tests). To confirm the validity of conventional tests for DPN diagnosis; to validate the discriminatory power of the PSM and the diagnostic value of OPD by evaluating their relationship to electrodiagnosis studies and objective clinical neurological assessment; and to evaluate the correlation of DPN and pro-inflammatory status. Cross-sectional, crossed association for PSM validation. Paired samples for OPD validation. Primary care in 3 counties. Random sample of 75 subjects from the type-2 diabetes census for PSM evaluation. Thirty DPN patients and 30 non-DPN patients (from 2 DM2 sub-groups in our earlier study) for OPD evaluation. The gold standard for DPN diagnosis will be studied by means of a clinical neurological study (symptoms, physical examination, and sensitivity tests) and electrodiagnosis studies (sensitivity and motor EMG). Risks of neuropathy, macroangiopathy and pro-inflammatory status (PCR, TNF soluble fraction and total TGF-beta1) will be studied in every subject. Electrodiagnosis studies should confirm the validity of conventional tests for DPN diagnosis. PSM and OPD will be valid methods for selecting patients at risk and diagnosing DPN. There will be a significant relationship between DPN and pro-inflammatory tests.
PLCO Ovarian Phase III Validation Study — EDRN Public Portal

Cancer.gov

Our preliminary data indicate that the performance of CA 125 as a screening test for ovarian cancer can be improved upon by additional biomarkers. With completion of one additional validation step, we will be ready to test the performance of a consensus marker panel in a phase III validation study. Given the original aims of the PLCO trial, we believe that the PLCO represents an ideal longitudinal cohort offering specimens for phase III validation of ovarian cancer biomarkers.
Methods to validate the accuracy of an indirect calorimeter in the in-vitro setting.

PubMed

Oshima, Taku; Ragusa, Marco; Graf, Séverine; Dupertuis, Yves Marc; Heidegger, Claudia-Paula; Pichard, Claude

2017-12-01

The international ICALIC initiative aims at developing a new indirect calorimeter according to the needs of the clinicians and researchers in the field of clinical nutrition and metabolism. The project initially focuses on validating the calorimeter for use in mechanically ventilated acutely ill adult patient. However, standard methods to validate the accuracy of calorimeters have not yet been established. This paper describes the procedures for the in-vitro tests to validate the accuracy of the new indirect calorimeter, and defines the ranges for the parameters to be evaluated in each test to optimize the validation for clinical and research calorimetry measurements. Two in-vitro tests have been defined to validate the accuracy of the gas analyzers and the overall function of the new calorimeter. 1) Gas composition analysis allows validating the accuracy of O 2 and CO 2 analyzers. Reference gas of known O 2 (or CO 2 ) concentration is diluted by pure nitrogen gas to achieve predefined O 2 (or CO 2 ) concentration, to be measured by the indirect calorimeter. O 2 and CO 2 concentrations to be tested were determined according to their expected ranges of concentrations during calorimetry measurements. 2) Gas exchange simulator analysis validates O 2 consumption (VO 2 ) and CO 2 production (VCO 2 ) measurements. CO 2 gas injection into artificial breath gas provided by the mechanical ventilator simulates VCO 2 . Resulting dilution of O 2 concentration in the expiratory air is analyzed by the calorimeter as VO 2 . CO 2 gas of identical concentration to the fraction of inspired O 2 (FiO 2 ) is used to simulate identical VO 2 and VCO 2 . Indirect calorimetry results from publications were analyzed to determine the VO 2 and VCO 2 values to be tested for the validation. O 2 concentration in respiratory air is highest at inspiration, and can decrease to 15% during expiration. CO 2 concentration can be as high as 5% in expired air. To validate analyzers for measurements of FiO 2 up to 70%, ranges of O 2 and CO 2 concentrations to be tested were defined as 15-70% and 0.5-5.0%, respectively. The mean VO 2 in 426 adult mechanically ventilated patients was 270 ml/min, with 2 standard deviation (SD) ranges of 150-391 ml/min. Thus, VO 2 and VCO 2 to be simulated for the validation were defined as 150, 250, and 400 ml/min. The procedures for the in-vitro tests of the new indirect calorimeter and the ranges for the parameters to be evaluated in each test have been defined to optimize the validation of accuracy for clinical and research indirect calorimetry measurements. The combined methods will be used to validate the accuracy of the new indirect calorimeter developed by the ICALIC initiative, and should become the standard method to validate the accuracy of any future indirect calorimeters. Copyright © 2017 European Society for Clinical Nutrition and Metabolism. Published by Elsevier Ltd. All rights reserved.
An Adaptation of the Original Fresno Test to Measure Evidence-Based Practice Competence in Pediatric Bedside Nurses.

PubMed

Laibhen-Parkes, Natasha; Kimble, Laura P; Melnyk, Bernadette Mazurek; Sudia, Tanya; Codone, Susan

2018-06-01

Instruments used to assess evidence-based practice (EBP) competence in nurses have been subjective, unreliable, or invalid. The Fresno test was identified as the only instrument to measure all the steps of EBP with supportive reliability and validity data. However, the items and psychometric properties of the original Fresno test are only relevant to measure EBP with medical residents. Therefore, the purpose of this paper is to describe the development of the adapted Fresno test for pediatric nurses, and provide preliminary validity and reliability data for its use with Bachelor of Science in Nursing-prepared pediatric bedside nurses. General adaptations were made to the original instrument's case studies, item content, wording, and format to meet the needs of a pediatric nursing sample. The scoring rubric was also modified to complement changes made to the instrument. Content and face validity, and intrarater reliability of the adapted Fresno test were assessed during a mixed-methods pilot study conducted from October to December 2013 with 29 Bachelor of Science in Nursing-prepared pediatric nurses. Validity data provided evidence for good content and face validity. Intrarater reliability estimates were high. The adapted Fresno test presented here appears to be a valid and reliable assessment of EBP competence in Bachelor of Science in Nursing-prepared pediatric nurses. However, further testing of this instrument is warranted using a larger sample of pediatric nurses in diverse settings. This instrument can be a starting point for evaluating the impact of EBP competence on patient outcomes. © 2018 Sigma Theta Tau International.
Validity of a novel computerized screening test system for mild cognitive impairment.

PubMed

Park, Jin-Hyuck; Jung, Minye; Kim, Jongbae; Park, Hae Yean; Kim, Jung-Ran; Park, Ji-Hyuk

2018-06-20

ABSTRACTBackground:The mobile screening test system for screening mild cognitive impairment (mSTS-MCI) was developed for clinical use. However, the clinical usefulness of mSTS-MCI to detect elderly with MCI from those who are cognitively healthy has yet to be validated. Moreover, the comparability between this system and traditional screening tests for MCI has not been evaluated. The purpose of this study was to examine the validity and reliability of the mSTS-MCI and confirm the cut-off scores to detect MCI. The data were collected from 107 healthy elderly people and 74 elderly people with MCI. Concurrent validity was examined using the Korean version of Montreal Cognitive Assessment (MoCA-K) as a gold standard test, and test-retest reliability was investigated using 30 of the study participants at four-week intervals. The sensitivity, specificity, positive predictive value, and negative predictive value (NPV) were confirmed through Receiver Operating Characteristic (ROC) analysis, and the cut-off scores for elderly people with MCI were identified. Concurrent validity showed statistically significant correlations between the mSTS-MCI and MoCA-K and test-rests reliability indicated high correlation. As a result of screening predictability, the mSTS-MCI had a higher NPV than the MoCA-K. The mSTS-MCI was identified as a system with a high degree of validity and reliability. In addition, the mSTS-MCI showed high screening predictability, indicating it can be used in the clinical field as a screening test system for mild cognitive impairment.
Performance Validity Testing in Neuropsychology: Methods for Measurement Development and Maximizing Diagnostic Accuracy.

PubMed

Wodushek, Thomas R; Greher, Michael R

2017-05-01

In the first column in this 2-part series, Performance Validity Testing in Neuropsychology: Scientific Basis and Clinical Application-A Brief Review, the authors introduced performance validity tests (PVTs) and their function, provided a justification for why they are necessary, traced their ongoing endorsement by neuropsychological organizations, and described how they are used and interpreted by ever increasing numbers of clinical neuropsychologists. To enhance readers' understanding of these measures, this second column briefly describes common detection strategies used in PVTs as well as the typical methods used to validate new PVTs and determine cut scores for valid/invalid determinations. We provide a discussion of the latest research demonstrating how neuropsychologists can combine multiple PVTs in a single battery to improve sensitivity/specificity to invalid responding. Finally, we discuss future directions for the research and application of PVTs.
Use of the Ames Check Standard Model for the Validation of Wall Interference Corrections

NASA Technical Reports Server (NTRS)

Ulbrich, N.; Amaya, M.; Flach, R.

2018-01-01

The new check standard model of the NASA Ames 11-ft Transonic Wind Tunnel was chosen for a future validation of the facility's wall interference correction system. The chosen validation approach takes advantage of the fact that test conditions experienced by a large model in the slotted part of the tunnel's test section will change significantly if a subset of the slots is temporarily sealed. Therefore, the model's aerodynamic coefficients have to be recorded, corrected, and compared for two different test section configurations in order to perform the validation. Test section configurations with highly accurate Mach number and dynamic pressure calibrations were selected for the validation. First, the model is tested with all test section slots in open configuration while keeping the model's center of rotation on the tunnel centerline. In the next step, slots on the test section floor are sealed and the model is moved to a new center of rotation that is 33 inches below the tunnel centerline. Then, the original angle of attack sweeps are repeated. Afterwards, wall interference corrections are applied to both test data sets and response surface models of the resulting aerodynamic coefficients in interference-free flow are generated. Finally, the response surface models are used to predict the aerodynamic coefficients for a family of angles of attack while keeping dynamic pressure, Mach number, and Reynolds number constant. The validation is considered successful if the corrected aerodynamic coefficients obtained from the related response surface model pair show good agreement. Residual differences between the corrected coefficient sets will be analyzed as well because they are an indicator of the overall accuracy of the facility's wall interference correction process.
Psychometric instrumentation: reliability and validity of instruments used for clinical practice, evidence-based practice projects and research studies.

PubMed

Mayo, Ann M

2015-01-01

It is important for CNSs and other APNs to consider the reliability and validity of instruments chosen for clinical practice, evidence-based practice projects, or research studies. Psychometric testing uses specific research methods to evaluate the amount of error associated with any particular instrument. Reliability estimates explain more about how well the instrument is designed, whereas validity estimates explain more about scores that are produced by the instrument. An instrument may be architecturally sound overall (reliable), but the same instrument may not be valid. For example, if a specific group does not understand certain well-constructed items, then the instrument does not produce valid scores when used with that group. Many instrument developers may conduct reliability testing only once, yet continue validity testing in different populations over many years. All CNSs should be advocating for the use of reliable instruments that produce valid results. Clinical nurse specialists may find themselves in situations where reliability and validity estimates for some instruments that are being utilized are unknown. In such cases, CNSs should engage key stakeholders to sponsor nursing researchers to pursue this most important work.
Implementation and application of an interactive user-friendly validation software for RADIANCE

NASA Astrophysics Data System (ADS)

Sundaram, Anand; Boonn, William W.; Kim, Woojin; Cook, Tessa S.

2012-02-01

RADIANCE extracts CT dose parameters from dose sheets using optical character recognition and stores the data in a relational database. To facilitate validation of RADIANCE's performance, a simple user interface was initially implemented and about 300 records were evaluated. Here, we extend this interface to achieve a wider variety of functions and perform a larger-scale validation. The validator uses some data from the RADIANCE database to prepopulate quality-testing fields, such as correspondence between calculated and reported total dose-length product. The interface also displays relevant parameters from the DICOM headers. A total of 5,098 dose sheets were used to test the performance accuracy of RADIANCE in dose data extraction. Several search criteria were implemented. All records were searchable by accession number, study date, or dose parameters beyond chosen thresholds. Validated records were searchable according to additional criteria from validation inputs. An error rate of 0.303% was demonstrated in the validation. Dose monitoring is increasingly important and RADIANCE provides an open-source solution with a high level of accuracy. The RADIANCE validator has been updated to enable users to test the integrity of their installation and verify that their dose monitoring is accurate and effective.
Hyper-X Engine Design and Ground Test Program

NASA Technical Reports Server (NTRS)

Voland, R. T.; Rock, K. E.; Huebner, L. D.; Witte, D. W.; Fischer, K. E.; McClinton, C. R.

1998-01-01

The Hyper-X Program, NASA's focused hypersonic technology program jointly run by NASA Langley and Dryden, is designed to move hypersonic, air-breathing vehicle technology from the laboratory environment to the flight environment, the last stage preceding prototype development. The Hyper-X research vehicle will provide the first ever opportunity to obtain data on an airframe integrated supersonic combustion ramjet propulsion system in flight, providing the first flight validation of wind tunnel, numerical and analytical methods used for design of these vehicles. A substantial portion of the integrated vehicle/engine flowpath development, engine systems verification and validation and flight test risk reduction efforts are experimentally based, including vehicle aeropropulsive force and moment database generation for flight control law development, and integrated vehicle/engine performance validation. The Mach 7 engine flowpath development tests have been completed, and effort is now shifting to engine controls, systems and performance verification and validation tests, as well as, additional flight test risk reduction tests. The engine wind tunnel tests required for these efforts range from tests of partial width engines in both small and large scramjet test facilities, to tests of the full flight engine on a vehicle simulator and tests of a complete flight vehicle in the Langley 8-Ft. High Temperature Tunnel. These tests will begin in the summer of 1998 and continue through 1999. The first flight test is planned for early 2000.
Derivation and Applicability of Asymptotic Results for Multiple Subtests Person-Fit Statistics

PubMed Central

Albers, Casper J.; Meijer, Rob R.; Tendeiro, Jorge N.

2016-01-01

In high-stakes testing, it is important to check the validity of individual test scores. Although a test may, in general, result in valid test scores for most test takers, for some test takers, test scores may not provide a good description of a test taker’s proficiency level. Person-fit statistics have been proposed to check the validity of individual test scores. In this study, the theoretical asymptotic sampling distribution of two person-fit statistics that can be used for tests that consist of multiple subtests is first discussed. Second, simulation study was conducted to investigate the applicability of this asymptotic theory for tests of finite length, in which the correlation between subtests and number of items in the subtests was varied. The authors showed that these distributions provide reasonable approximations, even for tests consisting of subtests of only 10 items each. These results have practical value because researchers do not have to rely on extensive simulation studies to simulate sampling distributions. PMID:29881053
A clinical test of stepping and change of direction to identify multiple falling older adults.

PubMed

Dite, Wayne; Temple, Viviene A

2002-11-01

To establish the reliability and validity of a new clinical test of dynamic standing balance, the Four Square Step Test (FSST), to evaluate its sensitivity, specificity, and predictive value in identifying subjects who fall, and to compare it with 3 established balance and mobility tests. A 3-group comparison performed by using 3 validated tests and 1 new test. A rehabilitation center and university medical school in Australia. Eighty-one community-dwelling adults over the age of 65 years. Subjects were age- and gender-matched to form 3 groups: multiple fallers, nonmultiple fallers, and healthy comparisons. Not applicable. Time to complete the FSST and Timed Up and Go test and the number of steps to complete the Step Test and Functional Reach Test distance. High reliability was found for interrater (n=30, intraclass correlation coefficient [ICC]=.99) and retest reliability (n=20, ICC=.98). Evidence for validity was found through correlation with other existing balance tests. Validity was supported, with the FSST showing significantly better performance scores (P<.01) for each of the healthier and less impaired groups. The FSST also revealed a sensitivity of 85%, a specificity of 88% to 100%, and a positive predictive value of 86%. As a clinical test, the FSST is reliable, valid, easy to score, quick to administer, requires little space, and needs no special equipment. It is unique in that it involves stepping over low objects (2.5cm) and movement in 4 directions. The FSST had higher combined sensitivity and specificity for identifying differences between groups in the selected sample population of older adults than the 3 tests with which it was compared. Copyright 2002 by the American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation
Initial validation of a web-based self-administered neuropsychological test battery for older adults and seniors.

PubMed

Hansen, Tor Ivar; Haferstrom, Elise Christina D; Brunner, Jan F; Lehn, Hanne; Håberg, Asta Kristine

2015-01-01

Computerized neuropsychological tests are effective in assessing different cognitive domains, but are often limited by the need of proprietary hardware and technical staff. Web-based tests can be more accessible and flexible. We aimed to investigate validity, effects of computer familiarity, education, and age, and the feasibility of a new web-based self-administered neuropsychological test battery (Memoro) in older adults and seniors. A total of 62 (37 female) participants (mean age 60.7 years) completed the Memoro web-based neuropsychological test battery and a traditional battery composed of similar tests intended to measure the same cognitive constructs. Participants were assessed on computer familiarity and how they experienced the two batteries. To properly test the factor structure of Memoro, an additional factor analysis in 218 individuals from the HUNT population was performed. Comparing Memoro to traditional tests, we observed good concurrent validity (r = .49-.63). The performance on the traditional and Memoro test battery was consistent, but differences in raw scores were observed with higher scores on verbal memory and lower in spatial memory in Memoro. Factor analysis indicated two factors: verbal and spatial memory. There were no correlations between test performance and computer familiarity after adjustment for age or age and education. Subjects reported that they preferred web-based testing as it allowed them to set their own pace, and they did not feel scrutinized by an administrator. Memoro showed good concurrent validity compared to neuropsychological tests measuring similar cognitive constructs. Based on the current results, Memoro appears to be a tool that can be used to assess cognitive function in older and senior adults. Further work is necessary to ascertain its validity and reliability.
Initial validation of a web-based self-administered neuropsychological test battery for older adults and seniors

PubMed Central

Hansen, Tor Ivar; Haferstrom, Elise Christina D.; Brunner, Jan F.; Lehn, Hanne; Håberg, Asta Kristine

2015-01-01

Introduction: Computerized neuropsychological tests are effective in assessing different cognitive domains, but are often limited by the need of proprietary hardware and technical staff. Web-based tests can be more accessible and flexible. We aimed to investigate validity, effects of computer familiarity, education, and age, and the feasibility of a new web-based self-administered neuropsychological test battery (Memoro) in older adults and seniors. Method: A total of 62 (37 female) participants (mean age 60.7 years) completed the Memoro web-based neuropsychological test battery and a traditional battery composed of similar tests intended to measure the same cognitive constructs. Participants were assessed on computer familiarity and how they experienced the two batteries. To properly test the factor structure of Memoro, an additional factor analysis in 218 individuals from the HUNT population was performed. Results: Comparing Memoro to traditional tests, we observed good concurrent validity (r = .49–.63). The performance on the traditional and Memoro test battery was consistent, but differences in raw scores were observed with higher scores on verbal memory and lower in spatial memory in Memoro. Factor analysis indicated two factors: verbal and spatial memory. There were no correlations between test performance and computer familiarity after adjustment for age or age and education. Subjects reported that they preferred web-based testing as it allowed them to set their own pace, and they did not feel scrutinized by an administrator. Conclusions: Memoro showed good concurrent validity compared to neuropsychological tests measuring similar cognitive constructs. Based on the current results, Memoro appears to be a tool that can be used to assess cognitive function in older and senior adults. Further work is necessary to ascertain its validity and reliability. PMID:26009791
Validation of a Video-based Game-Understanding Test Procedure in Badminton.

ERIC Educational Resources Information Center

Blomqvist, Minna T.; Luhtanen, Pekka; Laakso, Lauri; Keskinen, Esko

2000-01-01

Reports the development and validation of video-based game-understanding tests in badminton for elementary and secondary students. The tests included different sequences that simulated actual game situations. Players had to solve tactical problems by selecting appropriate solutions and arguments for their decisions. Results suggest that the test…
40 CFR 1045.501 - How do I run a valid emission test?

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 34 2013-07-01 2013-07-01 false How do I run a valid emission test? 1045.501 Section 1045.501 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR POLLUTION CONTROLS CONTROL OF EMISSIONS FROM SPARK-IGNITION PROPULSION MARINE ENGINES AND VESSELS Test...

40 CFR 1045.501 - How do I run a valid emission test?

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 33 2014-07-01 2014-07-01 false How do I run a valid emission test? 1045.501 Section 1045.501 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR POLLUTION CONTROLS CONTROL OF EMISSIONS FROM SPARK-IGNITION PROPULSION MARINE ENGINES AND VESSELS Test...
40 CFR 1045.501 - How do I run a valid emission test?

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 32 2010-07-01 2010-07-01 false How do I run a valid emission test? 1045.501 Section 1045.501 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR POLLUTION CONTROLS CONTROL OF EMISSIONS FROM SPARK-IGNITION PROPULSION MARINE ENGINES AND VESSELS Test...
40 CFR 1045.501 - How do I run a valid emission test?

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 33 2011-07-01 2011-07-01 false How do I run a valid emission test? 1045.501 Section 1045.501 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR POLLUTION CONTROLS CONTROL OF EMISSIONS FROM SPARK-IGNITION PROPULSION MARINE ENGINES AND VESSELS Test...
40 CFR 1045.501 - How do I run a valid emission test?

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 34 2012-07-01 2012-07-01 false How do I run a valid emission test? 1045.501 Section 1045.501 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR POLLUTION CONTROLS CONTROL OF EMISSIONS FROM SPARK-IGNITION PROPULSION MARINE ENGINES AND VESSELS Test...
Validation Test Report for the 1/8 deg Global Navy Coastal Ocean Model Nowcast/Forecast System

DTIC Science & Technology

2007-01-24

Test Report for the 1/8° Global Navy Coastal Ocean Model Nowcast/Forecast System Charlie N. BarroN a. Birol Kara roBert C. rhodes ClarK rowley......OF ACRONYMS ......................................................................48 VALIDATION TEST REPORT FOR THE 1/8° GLOBAL NAVY COASTAL
Absolute and Relative Measures of Instructional Sensitivity

ERIC Educational Resources Information Center

Naumann, Alexander; Hartig, Johannes; Hochweber, Jan

2017-01-01

Valid inferences on teaching drawn from students' test scores require that tests are sensitive to the instruction students received in class. Accordingly, measures of the test items' instructional sensitivity provide empirical support for validity claims about inferences on instruction. In the present study, we first introduce the concepts of…
Development and Validation of a Test for Bulimia.

ERIC Educational Resources Information Center

Smith, Marcia C.; Thelen, Mark H.

1984-01-01

Developed the Bulimia Test (BULIT) based on responses of clinically identified females (N=18) and normal female college students (N=119) to preliminary test items. Results showed that the BULIT provided an objective, reliable, and valid measure by which to identify individuals with symptoms of bulimia. (Instrument is appended.) (LLL)
Perspectives on Validation of High-Throughput Assays Supporting 21st Century Toxicity Testing

EPA Science Inventory

In vitro high-throughput screening (HTS) assays are seeing increasing use in toxicity testing. HTS assays can simultaneously test many chemicals but have seen limited use in the regulatory arena, in part because of the need to undergo rigorous, time-consuming formal validation. ...
Validating a Spanish Developmental Spelling Test.

ERIC Educational Resources Information Center

Ferroli, Lou; Krajenta, Marilyn

The creation and validation of a Spanish version of an English developmental spelling test (DST) is described. An introductory section reviews related literature on the rationale for and construction of DSTs, spelling development in the early grades, and Spanish-English bilingual education. Differences between the English and Spanish test versions…
Testing for Factorial Invariance in the Context of Construct Validation

ERIC Educational Resources Information Center

Dimitrov, Dimiter M.

2010-01-01

This article describes the logic and procedures behind testing for factorial invariance across groups in the context of construct validation. The procedures include testing for configural, measurement, and structural invariance in the framework of multiple-group confirmatory factor analysis (CFA). The "forward" (sequential constraint imposition)…
Cross-Cultural Validation of TEMAS, a Minority Projective Test.

ERIC Educational Resources Information Center

Costantino, Giuseppe; And Others

The theoretical framework and cross-cultural validation of Tell-Me-A-Story (TEMAS), a projective test developed to measure personality development in ethnic minority children, is presented. The TEMAS test consists of 23 chromatic pictures which incorporate the following characteristics: (1) representation of antithetical concepts which the…
Comments on Implementing Validity Theory

ERIC Educational Resources Information Center

Gafni, Naomi

2016-01-01

Naomi Gafni, director of Research and Development, National Institute for Testing and Evaluation, Jerusalem, Israel, has devoted a substantial part of her career to the development of admissions tests and other educational tests and to the investigation of their validity. As such she is keenly aware of the complexities involved in this process.…
The Thinking-about-Derivative Test for Undergraduate Students: Development and Validation

ERIC Educational Resources Information Center

Aydin, Utkun; Ubuz, Behiye

2015-01-01

Two studies were conducted for the development and validation of a multidimensional test to assess undergraduate students' mathematical thinking about derivative. The first study involved two phases: question generation and refinement of the Thinking-about-Derivative Test (TDT). The second study included four phases as follows: test…
Validation and cross-cultural pilot testing of compliance with standard precautions scale: self-administered instrument for clinical nurses.

PubMed

Lam, Simon C

2014-05-01

To perform detailed psychometric testing of the compliance with standard precautions scale (CSPS) in measuring compliance with standard precautions of clinical nurses and to conduct cross-cultural pilot testing and assess the relevance of the CSPS on an international platform. A cross-sectional and correlational design with repeated measures. Nursing students from a local registered nurse training university, nurses from different hospitals in Hong Kong, and experts in an international conference. The psychometric properties of the CSPS were evaluated via internal consistency, 2-week and 3-month test-retest reliability, concurrent validation, and construct validation. The cross-cultural pilot testing and relevance check was examined by experts on infection control from various developed and developing regions. Among 453 participants, 193 were nursing students, 165 were enrolled nurses, and 95 were registered nurses. The results showed that the CSPS had satisfactory reliability (Cronbach α = 0.73; intraclass correlation coefficient, 0.79 for 2-week test-retest and 0.74 for 3-month test-retest) and validity (optimum correlation with criterion measure; r = 0.76, P < .001; satisfactory results on known-group method and hypothesis testing). A total of 19 experts from 16 countries assured that most of the CSPS findings were relevant and globally applicable. The CSPS demonstrated satisfactory results on the basis of the standard international criteria on psychometric testing, which ascertained the reliability and validity of this instrument in measuring the compliance of clinical nurses with standard precautions. The cross-cultural pilot testing further reinforced the instrument's relevance and applicability in most developed and developing regions.
Validation of biological activity testing procedure of recombinant human interleukin-7.

PubMed

Lutsenko, T N; Kovalenko, M V; Galkin, O Yu

2017-01-01

Validation procedure for method of monitoring the biological activity of reсombinant human interleukin-7 has been developed and conducted according to the requirements of national and international recommendations. This method is based on the ability of recombinant human interleukin-7 to induce proliferation of T lymphocytes. It has been shown that to control the biological activity of recombinant human interleukin-7 peripheral blood mononuclear cells (PBMCs) derived from blood or cell lines can be used. Validation characteristics that should be determined depend on the method, type of product or object test/measurement and biological test systems used in research. The validation procedure for the method of control of biological activity of recombinant human interleukin-7 in peripheral blood mononuclear cells showed satisfactory results on all parameters tested such as specificity, accuracy, precision and linearity.
Hyper-X: Flight Validation of Hypersonic Airbreathing Technology

NASA Technical Reports Server (NTRS)

Rausch, Vincent L.; McClinton, Charles R.; Crawford, J. Larry

1997-01-01

This paper provides an overview of NASA's focused hypersonic technology program, i.e. the Hyper-X program. This program is designed to move hypersonic, air breathing vehicle technology from the laboratory environment to the flight environment, the last stage preceding prototype development. This paper presents some history leading to the flight test program, research objectives, approach, schedule and status. Substantial experimental data base and concept validation have been completed. The program is concentrating on Mach 7 vehicle development, verification and validation in preparation for wind tunnel testing in 1998 and flight testing in 1999. It is also concentrating on finalization of the Mach 5 and 10 vehicle designs. Detailed evaluation of the Mach 7 vehicle at the flight conditions is nearing completion, and will provide a data base for validation of design methods once flight test data are available.
Further examination of embedded performance validity indicators for the Conners' Continuous Performance Test and Brief Test of Attention in a large outpatient clinical sample.

PubMed

Sharland, Michael J; Waring, Stephen C; Johnson, Brian P; Taran, Allise M; Rusin, Travis A; Pattock, Andrew M; Palcher, Jeanette A

2018-01-01

Assessing test performance validity is a standard clinical practice and although studies have examined the utility of cognitive/memory measures, few have examined attention measures as indicators of performance validity beyond the Reliable Digit Span. The current study further investigates the classification probability of embedded Performance Validity Tests (PVTs) within the Brief Test of Attention (BTA) and the Conners' Continuous Performance Test (CPT-II), in a large clinical sample. This was a retrospective study of 615 patients consecutively referred for comprehensive outpatient neuropsychological evaluation. Non-credible performance was defined two ways: failure on one or more PVTs and failure on two or more PVTs. Classification probability of the BTA and CPT-II into non-credible groups was assessed. Sensitivity, specificity, positive predictive value, and negative predictive value were derived to identify clinically relevant cut-off scores. When using failure on two or more PVTs as the indicator for non-credible responding compared to failure on one or more PVTs, highest classification probability, or area under the curve (AUC), was achieved by the BTA (AUC = .87 vs. .79). CPT-II Omission, Commission, and Total Errors exhibited higher classification probability as well. Overall, these findings corroborate previous findings, extending them to a large clinical sample. BTA and CPT-II are useful embedded performance validity indicators within a clinical battery but should not be used in isolation without other performance validity indicators.
The Math Essential Skills Screener--Upper Elementary Version (MESS-U): Studies of Reliability and Validity

ERIC Educational Resources Information Center

Erford, Bradley T.; Biddison, Amanda R.

2006-01-01

The Math Essential Skills Screener--Upper Elementary Version (MESS-U) is part of a series of screening tests designed to help identify students ages 9-11 who are at risk for mathematics failure. Internal consistency, test-retest reliability, item analysis, decision efficiency, convergent validity and factorial validity of the MESS-U were studied…
Development and Validation of Criterion-Referenced Clinically Relevant Fitness Standards for Maintaining Physical Independence in Later Years

ERIC Educational Resources Information Center

Rikli, Roberta E.; Jones, C. Jessie

2013-01-01

Purpose: To develop and validate criterion-referenced fitness standards for older adults that predict the level of capacity needed for maintaining physical independence into later life. The proposed standards were developed for use with a previously validated test battery for older adults--the Senior Fitness Test (Rikli, R. E., & Jones, C. J.…
Validity of GRE General Test Scores and TOEFL Scores for Graduate Admission to a Technical University in Western Europe

ERIC Educational Resources Information Center

Zimmermann, Judith; von Davier, Alina A.; Buhmann, Joachim M.; Heinimann, Hans R.

2018-01-01

Graduate admission has become a critical process in tertiary education, whereby selecting valid admissions instruments is key. This study assessed the validity of Graduate Record Examination (GRE) General Test scores for admission to Master's programmes at a technical university in Europe. We investigated the indicative value of GRE scores for the…

Development and Validation of Scores from an Instrument Measuring Student Test-Taking Motivation

ERIC Educational Resources Information Center

Eklof, Hanna

2006-01-01

Using the expectancy-value model of achievement motivation as a basis, this study's purpose is to develop, apply, and validate scores from a self-report instrument measuring student test-taking motivation. Sampled evidence of construct validity for the present sample indicates that a number of the items in the instrument could be used as an…
Autism Spectrum Disorders and Self-Reports: Testing Validity and Reliability Using the NEO-PI-R

ERIC Educational Resources Information Center

Hesselmark, Eva; Eriksson, Jonna M.; Westerlund, Joakim; Bejerot, Susanne

2015-01-01

Although self-reported measures are frequently used to assess adults with autism spectrum disorders (ASD), the validity of self-reports is under-researched in ASD. The core symptoms of ASD may negatively affect the psychometric properties of self-reported measures. The aim of the present study was to test the validity and reliability of…
Ada (Tradename) Compiler Validation Summary Report. Harris Corporation. HARRIS Ada Compiler, Version 1.0. Harris H1200 and H800.

DTIC Science & Technology

This Validations Summary Report (VSR) summarizes the results and conclusions of validation testing performed on the HARRIS Ada Compiler, Version 1.0...at compile time, at link time, or during execution. On-site testing was performed 28 APR 1986 through 30 APR 1986 at Harris Corporation, Ft. Lauderdale
The Meaning of Validity in the New "Standards for Educational and Psychological Testing": Implications for Measurement Courses.

ERIC Educational Resources Information Center

Goodwin, Laura D.; Leech, Nancy L.

2003-01-01

The treatment of validity in the newest edition of "Standards for Educational and Psychological Testing" is quite different from coverage in earlier editions of the Standards and in most measurement textbooks. The view of validity in the 1999 Standards is discussed, and suggestions for instructors of measurement courses are offered. (Contains 56…
Effects of Coaching on the Validity of the SAT: A Simulation Study.

ERIC Educational Resources Information Center

Baydar, Nazli

The effects of student coaching in preparation for the College Board Scholastic Aptitude Test (SAT) on the predictive validity of this test for freshman year performance were studied using data on 1985 freshman year students from four colleges. After the validity of the SAT was estimated for each school, a given proportion of students was picked,…
Validation of an Instrument to Measure High School Students' Attitudes toward Fitness Testing

ERIC Educational Resources Information Center

Mercier, Kevin; Silverman, Stephen

2014-01-01

Purpose: The purpose of this investigation was to develop an instrument that has scores that are valid and reliable for measuring students' attitudes toward fitness testing. Method: The method involved the following steps: (a) an elicitation study, (b) item development, (c) a pilot study, and (d) a validation study. The pilot study included 427…
Accuracy and Feasibility of Video Analysis for Assessing Hamstring Flexibility and Validity of the Sit-and-Reach Test

ERIC Educational Resources Information Center

Mier, Constance M.

2011-01-01

The accuracy of video analysis of the passive straight-leg raise test (PSLR) and the validity of the sit-and-reach test (SR) were tested in 60 men and women. Computer software measured static hip-joint flexion accurately. High within-session reliability of the PSLR was demonstrated (R greater than 0.97). Test-retest (separate days) reliability for…
Validation of Recipes for Double-Blind Placebo-Controlled Challenges With Milk, Egg White, and Hazelnut.

PubMed

González-Mancebo, E; Alonso Díaz de Durana, M D; García Estringana, Y; Meléndez Baltanás, A; Rodriguez-Alvarez, M; de la Hoz Caballer, B; Del Prado, N; Fernández-Rivas, M

The double-blind, placebo-controlled food challenge (DBPCFC) is considered the definitive diagnostic test for food allergy. Nevertheless, validated recipes for masking the foods are scarce, have not been standardized, and differ between centers. Sensory evaluation techniques such as the triangle test are necessary to validate the recipes used for DBPCFC. We developed 3 recipes for use in DBPCFC with milk, egg white, and hazelnut and used the triangle test to validate them in a 2-phase study in which 197 volunteers participated. In each phase, participants tried 3 samples (2 active-1 placebo or 2 placebo-1 active) and had to identify the odd one. In phase 1, the 3 samples were given simultaneously, whereas in phase 2, the 3 samples of foods that failed validation in phase 1 were given sequentially. A visual analog scale (VAS) ranging from 1 to 10 was used to evaluate how much participants liked the recipes. In phase 1, the egg white recipe was validated (n=89 volunteers, 38.9% found the odd sample, P=.16). Milk and hazelnut recipes were validated in phase 2 (for both foods, n=30 participants, 36.7% found the odd sample, P=.36). Median VAS scores for the 3 recipes ranged from 6.6 to 9.7. We used sensory testing to validate milk, egg white, and hazelnut recipes for use in DBPCFC. The validated recipes are easy to prepare in a clinical setting, provide the equivalent of 1 serving dose, and were liked by most participants.
Cross-cultural adaptation and validation of the Ankle Osteoarthritis Scale for use in French-speaking populations.

PubMed

Angers, Magalie; Svotelis, Amy; Balg, Frederic; Allard, Jean-Pascal

2016-04-01

The Ankle Osteoarthritis Scale (AOS) is a self-administered score specific for ankle osteoarthritis (OA) with excellent reliability and strong construct and criterion validity. Many recent randomized multicentre trials have used the AOS, and the involvement of the French-speaking population is limited by the absence of a French version. Our goal was to develop a French version and validate the psychometric properties to assure equivalence to the original English version. Translation was performed according to American Association of Orthopaedic Surgeons (AAOS) 2000 guidelines for cross-cultural adaptation. Similar to the validation process of the English AOS, we evaluated the psychometric properties of the French version (AOS-Fr): criterion validity (AOS-Fr v. Western Ontario and McMaster Universities Arthritis Index [WOMAC] and SF-36 scores), construct validity (AOS-Fr correlation to single heel-lift test), and reliability (AOS-Fr test-retest). Sixty healthy individuals tested a prefinal version of the AOS-Fr for comprehension, leading to modifications and a final version that was approved by C. Saltzman, author of the AOS. We then recruited patients with ankle OA for evaluation of the AOS-Fr psychometric properties. Twenty-eight patients with ankle OA participated in the evaluation. The AOS-Fr showed strong criterion validity (AOS:WOMAC r = 0.709 and AOS:SF-36 r = -0.654) and construct validity (r = 0.664) and proved to be reliable (test-retest intraclass correlation coefficient = 0.922). The AOS-Fr is a reliable and valid score equivalent to the English version in terms of psychometric properties, thus is available for use in multicentre trials.
[The appraisal of reliability and validity of subjective workload assessment technique and NASA-task load index].

PubMed

Xiao, Yuan-mei; Wang, Zhi-ming; Wang, Mian-zhen; Lan, Ya-jia

2005-06-01

To test the reliability and validity of two mental workload assessment scales, i.e. subjective workload assessment technique (SWAT) and NASA task load index (NASA-TLX). One thousand two hundred and sixty-eight mental workers were sampled from various kinds of occupations, such as scientific research, education, administration and medicine, etc, with randomized cluster sampling. The re-test reliability, split-half reliability, Cronbach's alpha coefficient and correlation coefficients between item score and total score were adopted to test the reliability. The test of validity included structure validity. The re-test reliability coefficients of these two scales and their items were ranged from 0.516 to 0.753 (P < 0.01), indicating the two scales had good re-test reliability; the split-half reliability of SWAT was 0.645, and its Cronbach's alpha coefficient was more than 0.80, all the correlation coefficients between its items score and total score were more than 0.70; as for NASA-TLX, both the split-half reliability and Cronbach's alpha coefficient were more than 0.80, the correlation coefficients between its items score and total score were all more than 0.60 (P < 0.01) except the item of performance. Both scales had good inner consistency. The Pearson correlation coefficient between the two scales was 0.492 (P < 0.01), implying the results of the two scales had good consistency. Factor analysis showed that the two scales had good structure validity. Both SWAT and NASA-TLX have good reliability and validity and may be used as a valid tool to assess mental workload in China after being revised properly.
The dialysis orders objective structured clinical examination (OSCE): a formative assessment for nephrology fellows

PubMed Central

Prince, Lisa K; Campbell, Ruth C; Gao, Sam W; Kendrick, Jessica; Lebrun, Christopher J; Little, Dustin J; Mahoney, David L; Maursetter, Laura A; Nee, Robert; Saddler, Mark; Watson, Maura A

2018-01-01

Abstract Background Few quantitative nephrology-specific simulations assess fellow competency. We describe the development and initial validation of a formative objective structured clinical examination (OSCE) assessing fellow competence in ordering acute dialysis. Methods The three test scenarios were acute continuous renal replacement therapy, chronic dialysis initiation in moderate uremia and acute dialysis in end-stage renal disease-associated hyperkalemia. The test committee included five academic nephrologists and four clinically practicing nephrologists outside of academia. There were 49 test items (58 points). A passing score was 46/58 points. No item had median relevance less than ‘important’. The content validity index was 0.91. Ninety-five percent of positive-point items were easy–medium difficulty. Preliminary validation was by 10 board-certified volunteers, not test committee members, a median of 3.5 years from graduation. The mean score was 49 [95% confidence interval (CI) 46–51], κ = 0.68 (95% CI 0.59–0.77), Cronbach’s α = 0.84. Results We subsequently administered the test to 25 fellows. The mean score was 44 (95% CI 43–45); 36% passed the test. Fellows scored significantly less than validators (P < 0.001). Of evidence-based questions, 72% were answered correctly by validators and 54% by fellows (P = 0.018). Fellows and validators scored least well on the acute hyperkalemia question. In self-assessing proficiency, 71% of fellows surveyed agreed or strongly agreed that the OSCE was useful. Conclusions The OSCE may be used to formatively assess fellow proficiency in three common areas of acute dialysis practice. Further validation studies are in progress. PMID:29644053
Design and validation of a self-administered test to assess bullying (bull-M) in high school Mexicans: a pilot study

PubMed Central

2013-01-01

Background Bullying (Bull) is a public health problem worldwide, and Mexico is not exempt. However, its epidemiology and early detection in our country is limited, in part, by the lack of validated tests to ensure the respondents’ anonymity. The aim of this study was to validate a self-administered test (Bull-M) for assessing Bull among high-school Mexicans. Methods Experts and school teachers from highly violent areas of Ciudad Juarez (Chihuahua, México), reported common Bull behaviors. Then, a 10-item test was developed based on twelve of these behaviors; the students’ and peers’ participation in Bull acts and in some somatic consequences in Bull victims with a 5-point Likert frequency scale. Validation criteria were: content (CV, judges); reliability [Cronbach’s alpha (CA), test-retest (spearman correlation, rs)]; construct [principal component (PCA), confirmatory factor (CFA), goodness-of-fit (GF) analysis]; and convergent (Bull-M vs. Bull-S test) validity. Results Bull-M showed good reliability (CA = 0.75, rs = 0.91; p < 0.001). Two factors were identified (PCA) and confirmed (CFA): “bullying me (victim)” and “bullying others (aggressor)”. GF indices were: Root mean square error of approximation (0.031), GF index (0.97), and normalized fit index (0.92). Bull-M was as good as Bull-S for measuring Bull prevalence. Conclusions Bull-M has a good reliability and convergent validity and a bi-modal factor structure for detecting Bull victims and aggressors; however, its external validity and sensitivity should be analyzed on a wider and different population. PMID:23577755
Measuring leprosy-related stigma - a pilot study to validate a toolkit of instruments.

PubMed

Rensen, Carin; Bandyopadhyay, Sudhakar; Gopal, Pala K; Van Brakel, Wim H

2011-01-01

Stigma negatively affects the quality of life of leprosy-affected people. Instruments are needed to assess levels of stigma and to monitor and evaluate stigma reduction interventions. We conducted a validation study of such instruments in Tamil Nadu and West Bengal, India. Four instruments were tested in a 'Community Based Rehabilitation' (CBR) setting, the Participation Scale, Internalised Scale of Mental Illness (ISMI) adapted for leprosy-affected persons, Explanatory Model Interview Catalogue (EMIC) for leprosy-affected and non-affected persons and the General Self-Efficacy (GSE) Scale. We evaluated the following components of validity, construct validity, internal consistency, test-retest reproducibility and reliability to distinguish between groups. Construct validity was tested by correlating instrument scores and by triangulating quantitative and qualitative findings. Reliability was evaluated by comparing levels of stigma among people affected by leprosy and community controls, and among affected people living in CBR project areas and those in non-CBR areas. For the Participation, ISMI and EMIC scores significant differences were observed between those affected by leprosy and those not affected (p = 0.0001), and between affected persons in the CBR and Control group (p < 0.05). The internal consistency of the instruments measured with Cronbach's α ranged from 0.83 to 0.96 and was very good for all instruments. Test-retest reproducibility coefficients were 0.80 for the Participation score, 0.70 for the EMIC score, 0.62 for the ISMI score and 0.50 for the GSE score. The construct validity of all instruments was confirmed. The Participation and EMIC Scales met all validity criteria, but test-retest reproducibility of the ISMI and GSE Scales needs further evaluation with a shorter test-retest interval and longer training and additional adaptations for the latter.
The analysis of reliability and validity of the IT-MAIS, MAIS and MUSS.

PubMed

Zhong, Yan; Xu, Tianqiu; Dong, Ruijuan; Lyu, Jing; Liu, Bo; Chen, Xueqing

2017-05-01

The aim of this study was to investigate the reliability and validity of the Infant-toddler Meaningful Auditory Integration Scale (IT-MAIS), Meaningful Auditory Integration Scale (MAIS), and Meaningful Use of Speech Scale (MUSS). IT-MAIS, MAIS and MUSS were divided into 3 sub dimensions. 300 children with cochlear implants (CI) were included in the investigation. To assess test-retest reliability of these questionnaires, 30 children were selected randomly to be evaluated at a two-week interval indicated that there were no significant changes between test and retest. Furthermore random test analysis by different evaluators was also administered to 30 users. Reliability test: Test-retest reliability of the three scales was proved to be satisfactory. All domains had correlation coefficients that exceeded 0.750(P < 0.01). The Cronbach's α of the three scales and their three domains were greater than 0.700. Reliability between evaluators of the three scales were considered to be satisfactory. All domains had correlation coefficients that exceeded 0.750(P < 0.01). Validity test: The evaluation of content validity by expert review showed the questionnaire had good content validity; The correlation coefficients between the overall scores of the three scales and their three domains were 0.699-0.978(P < 0.01). There were correlations among the three sub-domains but the strength of the correlations was relatively low. There was certain construct validity. IT-MAIS, MAIS, MUSS scales have good reliability and validity, and can be used to measure the outcome for children with cochlear implants hearing and speech evaluation. Copyright © 2017 Elsevier B.V. All rights reserved.
Translation and validation of the Dutch new Knee Society Scoring System ©.

PubMed

Van Der Straeten, Catherine; Witvrouw, Erik; Willems, Tine; Bellemans, Johan; Victor, Jan

2013-11-01

A new version of The Knee Society Knee Scoring System(©) (KSS) has recently been developed. Before this scale can be used in non-English-speaking populations, it has to be translated and validated for a particular population. We evaluated the construct and content validity, the test-retest reliability, and the internal consistency of the Dutch version of the New Knee Society KSS. A Dutch translation was performed using a forward-backward translation protocol. We tested the construct validity of the Dutch New KSS by comparing it with the Dutch versions of the WOMAC, Knee Injury and Osteoarthritis Outcome Score (KOOS), and SF-12 scores in 137 patients undergoing total knee arthroplasty (TKA). Content validity was assessed by comparing pre- and postoperative scores and by checking floor and ceiling effects. To evaluate test-retest reliability and consistency, 47 patients completed the questionnaire a second time with a mean of 8 days interval (range, 2-20 days) between tests. Construct validity was demonstrated because the Dutch New KSS correlated well with the Dutch WOMAC (r = -0.751; p < 0.001), Dutch KOOS (r = -0.723; p < 0.001), and Dutch SF-12 (r = 0.569; p < 0.001). There was a significant difference between pre- and postoperative scores (p < 0.001) in line with the other scores. Test-retest reliability proved excellent with an intraclass correlation coefficient between 0.73 and 0.92 depending on the domain tested. Consistency as indicated by Cronbach's alpha ranging from 0.84 to 0.96 was good to excellent. As demonstrated by the validation procedure, the Dutch New KSS is an excellent instrument to evaluate TKA outcome in Dutch-speaking patients.
Validity and Reliability of Field-Based Measures for Assessing Movement Skill Competency in Lifelong Physical Activities: A Systematic Review.

PubMed

Hulteen, Ryan M; Lander, Natalie J; Morgan, Philip J; Barnett, Lisa M; Robertson, Samuel J; Lubans, David R

2015-10-01

It has been suggested that young people should develop competence in a variety of 'lifelong physical activities' to ensure that they can be active across the lifespan. The primary aim of this systematic review is to report the methodological properties, validity, reliability, and test duration of field-based measures that assess movement skill competency in lifelong physical activities. A secondary aim was to clearly define those characteristics unique to lifelong physical activities. A search of four electronic databases (Scopus, SPORTDiscus, ProQuest, and PubMed) was conducted between June 2014 and April 2015 with no date restrictions. Studies addressing the validity and/or reliability of lifelong physical activity tests were reviewed. Included articles were required to assess lifelong physical activities using process-oriented measures, as well as report either one type of validity or reliability. Assessment criteria for methodological quality were adapted from a checklist used in a previous review of sport skill outcome assessments. Movement skill assessments for eight different lifelong physical activities (badminton, cycling, dance, golf, racquetball, resistance training, swimming, and tennis) in 17 studies were identified for inclusion. Methodological quality, validity, reliability, and test duration (time to assess a single participant), for each article were assessed. Moderate to excellent reliability results were found in 16 of 17 studies, with 71% reporting inter-rater reliability and 41% reporting intra-rater reliability. Only four studies in this review reported test-retest reliability. Ten studies reported validity results; content validity was cited in 41% of these studies. Construct validity was reported in 24% of studies, while criterion validity was only reported in 12% of studies. Numerous assessments for lifelong physical activities may exist, yet only assessments for eight lifelong physical activities were included in this review. Generalizability of results may be more applicable if more heterogeneous samples are used in future research. Moderate to excellent levels of inter- and intra-rater reliability were reported in the majority of studies. However, future work should look to establish test-retest reliability. Validity was less commonly reported than reliability, and further types of validity other than content validity need to be established in future research. Specifically, predictive validity of 'lifelong physical activity' movement skill competency is needed to support the assertion that such activities provide the foundation for a lifetime of activity.
Test Takers' Beliefs and Experiences of a High-Stakes Computer-Based English Listening and Speaking Test

ERIC Educational Resources Information Center

Zhan, Ying; Wan, Zhi Hong

2016-01-01

Test takers' beliefs or experiences have been overlooked in most validation studies in language education. Meanwhile, a mutual exclusion has been observed in the literature, with little or no dialogue between validation studies and studies concerning the uses and consequences of testing. To help fill these research gaps, a group of Senior III…
The Validity and Reliability of the Back Saver Sit-and-Reach Test in Middle School Girls and Boys.

ERIC Educational Resources Information Center

Patterson, Patricia; And Others

1996-01-01

This study examined the validity and reliability of the Back Saver Sit-and-Reach test for middle school students. Students completed the test during physical education class. Results indicated that the test was moderately related to hamstring flexibility, but its relationship to lower back flexibility was quite low for both sexes. (SM)
Aptitude Tests and Successful College Students: The Predictive Validity of the General Aptitude Test (GAT) in Saudi Arabia

ERIC Educational Resources Information Center

Alnahdi, Ghaleb Hamad

2015-01-01

Aptitude tests should predict student success at the university level. This study examined the predictive validity of the General Aptitude Test (GAT) in Saudi Arabia. Data for 27420 students enrolled at Prince Sattam bin Abdulaziz University were analyzed. Of these students, 17565 were male students, and 9855 were female students. Multiple…
Preliminary Report on a National Cross-Validation of the Computerized Adaptive Screening Test (CAST).

ERIC Educational Resources Information Center

Knapp, Deirdre J.; Pliske, Rebecca M.

A study was conducted to validate the Army's Computerized Adaptive Screening Test (CAST), using data from 2,240 applicants from 60 army recruiting stations across the nation. CAST is a computer-assisted adaptive test used to predict performance on the Armed Forces Qualification Test (AFQT). AFQT scores are computed by adding four subtest scores of…

Factorial validity and measurement equivalence of the Client Assessment of Treatment Scale for psychiatric inpatient care - a study in three European countries.

PubMed

Richardson, Michelle; Katsakou, Christina; Torres-González, Francisco; Onchev, George; Kallert, Thomas; Priebe, Stefan

2011-06-30

Patients' views of inpatient care need to be assessed for research and routine evaluation. For this a valid instrument is required. The Client Assessment of Treatment Scale (CAT) has been used in large scale international studies, but its psychometric properties have not been well established. The structural validity of the CAT was tested among involuntary inpatients with psychosis. Data from locations in three separate European countries (England, Spain and Bulgaria) were collected. The factorial validity was initially tested using single sample confirmatory factor analyses in each country. Subsequent multi-sample analyses were used to test for invariance of the factor loadings, and factor variances across the countries. Results provide good initial support for the factorial validity and invariance of the CAT scores. Future research is needed to cross-validate these findings and to generalise them to other countries, treatment settings, and patient populations. Copyright © 2011 Elsevier Ltd. All rights reserved.
Assessing reliability and validity measures in managed care studies.

PubMed

Montoya, Isaac D

2003-01-01

To review the reliability and validity literature and develop an understanding of these concepts as applied to managed care studies. Reliability is a test of how well an instrument measures the same input at varying times and under varying conditions. Validity is a test of how accurately an instrument measures what one believes is being measured. A review of reliability and validity instructional material was conducted. Studies of managed care practices and programs abound. However, many of these studies utilize measurement instruments that were developed for other purposes or for a population other than the one being sampled. In other cases, instruments have been developed without any testing of the instrument's performance. The lack of reliability and validity information may limit the value of these studies. This is particularly true when data are collected for one purpose and used for another. The usefulness of certain studies without reliability and validity measures is questionable, especially in cases where the literature contradicts itself
Evidence-Based Toxicology.

PubMed

Hoffmann, Sebastian; Hartung, Thomas; Stephens, Martin

Evidence-based toxicology (EBT) was introduced independently by two groups in 2005, in the context of toxicological risk assessment and causation as well as based on parallels between the evaluation of test methods in toxicology and evidence-based assessment of diagnostics tests in medicine. The role model of evidence-based medicine (EBM) motivated both proposals and guided the evolution of EBT, whereas especially systematic reviews and evidence quality assessment attract considerable attention in toxicology.Regarding test assessment, in the search of solutions for various problems related to validation, such as the imperfectness of the reference standard or the challenge to comprehensively evaluate tests, the field of Diagnostic Test Assessment (DTA) was identified as a potential resource. DTA being an EBM discipline, test method assessment/validation therefore became one of the main drivers spurring the development of EBT.In the context of pathway-based toxicology, EBT approaches, given their objectivity, transparency and consistency, have been proposed to be used for carrying out a (retrospective) mechanistic validation.In summary, implementation of more evidence-based approaches may provide the tools necessary to adapt the assessment/validation of toxicological test methods and testing strategies to face the challenges of toxicology in the twenty first century.
[Attempt for development of rapid word reading test for children--evaluation of reliability and validity].

PubMed

Hashimoto, Ryusaku; Kashiwagi, Mitsuru; Suzuki, Shuhei

2008-09-01

We developed a rapid word reading test for examining the phonological processing ability of Japanese children. We prepared two versions of the test, version A and B. Each test has word and non-word tasks. Twenty-two healthy boys of third grade in primary schools participated in this validation study. For criterion related validity, we performed the serial Hiragana reading test, the sentence reading test, Raven's coloured progressive matrices (RCPM), the Token test for children, the Kana word dictation test, the standardized comprehension test of abstract words (SCTAW), and Trail Circle test. The reading times of the newly developed test correlated moderately or highly with those of the serial Hiragana reading test and the sentence reading test. However, the scores of the other tests (RCPM, Token test for children, Kana word dictation test, SCTAW, Trail Circle test) did not correlated with the reading time of the rapid word reading test. Test-retest reliabilities in the word tasks were more than moderate: 0.52 and 0.76 in versions A and B, while those in the non-word tasks were high: 0.91 and 0.88 in versions A and B. The correlation coefficient between versions A and B was 0.7 for the word tasks and 0.92 for the non-word tasks. This study showed that the rapid word reading test has substantial validity and reliability for testing the phonological processing ability of Japanese children. In addition, the non-word tasks were more suitable for selectively examining the speed of the grapheme to phoneme conversion process.
SEQUenCE: a service user-centred quality of care instrument for mental health services.

PubMed

Hester, Lorraine; O'Doherty, Lorna Jane; Schnittger, Rebecca; Skelly, Niamh; O'Donnell, Muireann; Butterly, Lisa; Browne, Robert; Frorath, Charlotte; Morgan, Craig; McLoughlin, Declan M; Fearon, Paul

2015-08-01

To develop a quality of care instrument that is grounded in the service user perspective and validate it in a mental health service. The instrument (SEQUenCE (SErvice user QUality of CarE)) was developed through analysis of focus group data and clinical practice guidelines, and refined through field-testing and psychometric analyses. All participants were attending an independent mental health service in Ireland. Participants had a diagnosis of bipolar affective disorder (BPAD) or a psychotic disorder. Twenty-nine service users participated in six focus group interviews. Seventy-one service users participated in field-testing: 10 judged the face validity of an initial 61-item instrument; 28 completed a revised 52-item instrument from which 12 items were removed following test-retest and convergent validity analyses; 33 completed the resulting 40-item instrument. Test-retest reliability, internal consistency and convergent validity of the instrument. The final instrument showed acceptable test-retest reliability at 5-7 days (r = 0.65; P < 0.001), good convergent validity with the Verona Service Satisfaction Scale (r = 0.84, P < 0.001) and good internal consistency (Cronbach's alpha = 0.87). SEQUenCE is a valid, reliable scale that is grounded in the service user perspective and suitable for routine use. It may serve as a useful tool in individual care planning, service evaluation and research. The instrument was developed and validated with service users with a diagnosis of either BPAD or a psychotic disorder; it does not yet have established external validity for other diagnostic groups. © The Author 2015. Published by Oxford University Press in association with the International Society for Quality in Health Care; all rights reserved.
Validating emotional attention regulation as a component of emotional intelligence: A Stroop approach to individual differences in tuning in to and out of nonverbal cues.

PubMed

Elfenbein, Hillary Anger; Jang, Daisung; Sharma, Sudeep; Sanchez-Burks, Jeffrey

2017-03-01

Emotional intelligence (EI) has captivated researchers and the public alike, but it has been challenging to establish its components as objective abilities. Self-report scales lack divergent validity from personality traits, and few ability tests have objectively correct answers. We adapt the Stroop task to introduce a new facet of EI called emotional attention regulation (EAR), which involves focusing emotion-related attention for the sake of information processing rather than for the sake of regulating one's own internal state. EAR includes 2 distinct components. First, tuning in to nonverbal cues involves identifying nonverbal cues while ignoring alternate content, that is, emotion recognition under conditions of distraction by competing stimuli. Second, tuning out of nonverbal cues involves ignoring nonverbal cues while identifying alternate content, that is, the ability to interrupt emotion recognition when needed to focus attention elsewhere. An auditory test of valence included positive and negative words spoken in positive and negative vocal tones. A visual test of approach-avoidance included green- and red-colored facial expressions depicting happiness and anger. The error rates for incongruent trials met the key criteria for establishing the validity of an EI test, in that the measure demonstrated test-retest reliability, convergent validity with other EI measures, divergent validity from factors such as general processing speed and mostly personality, and predictive validity in this case for well-being. By demonstrating that facets of EI can be validly theorized and empirically assessed, results also speak to the validity of EI more generally. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Testing and validating environmental models

USGS Publications Warehouse

Kirchner, J.W.; Hooper, R.P.; Kendall, C.; Neal, C.; Leavesley, G.

1996-01-01

Generally accepted standards for testing and validating ecosystem models would benefit both modellers and model users. Universally applicable test procedures are difficult to prescribe, given the diversity of modelling approaches and the many uses for models. However, the generally accepted scientific principles of documentation and disclosure provide a useful framework for devising general standards for model evaluation. Adequately documenting model tests requires explicit performance criteria, and explicit benchmarks against which model performance is compared. A model's validity, reliability, and accuracy can be most meaningfully judged by explicit comparison against the available alternatives. In contrast, current practice is often characterized by vague, subjective claims that model predictions show 'acceptable' agreement with data; such claims provide little basis for choosing among alternative models. Strict model tests (those that invalid models are unlikely to pass) are the only ones capable of convincing rational skeptics that a model is probably valid. However, 'false positive' rates as low as 10% can substantially erode the power of validation tests, making them insufficiently strict to convince rational skeptics. Validation tests are often undermined by excessive parameter calibration and overuse of ad hoc model features. Tests are often also divorced from the conditions under which a model will be used, particularly when it is designed to forecast beyond the range of historical experience. In such situations, data from laboratory and field manipulation experiments can provide particularly effective tests, because one can create experimental conditions quite different from historical data, and because experimental data can provide a more precisely defined 'target' for the model to hit. We present a simple demonstration showing that the two most common methods for comparing model predictions to environmental time series (plotting model time series against data time series, and plotting predicted versus observed values) have little diagnostic power. We propose that it may be more useful to statistically extract the relationships of primary interest from the time series, and test the model directly against them.
POLYGON - A New Fundamental Movement Skills Test for 8 Year Old Children: Construction and Validation.

PubMed

Zuvela, Frane; Bozanic, Ana; Miletic, Durdica

2011-01-01

Inadequately adopted fundamental movement skills (FMS) in early childhood may have a negative impact on the motor performance in later life (Gallahue and Ozmun, 2005). The need for an efficient FMS testing in Physical Education was recognized. The aim of this paper was to construct and validate a new FMS test for 8 year old children. Ninety-five 8 year old children were used for the testing. A total of 24 new FMS tasks were constructed and only the best representatives of movement areas entered into the final test product - FMS-POLYGON. The ICC showed high values for all 24 tasks (0.83-0.97) and the factorial analysis revealed the best representatives of each movement area that entered the FMS-POLYGON: tossing and catching the volleyball against a wall, running across obstacles, carrying the medicine balls, and straight running. The ICC for the FMS-POLYGON showed a very high result (0.98) and, therefore, confirmed the test's intra-rater reliability. Concurrent validity was tested with the use of the "Test of Gross Motor Development" (TGMD-2). Correlation analysis between the newly constructed FMS-POLYGON and the TGMD-2 revealed the coefficient of -0.82 which indicates a high correlation. In conclusion, the new test for FMS assessment proved to be a reliable and valid instrument for 8 year old children. Application of this test in schools is justified and could play an important factor in physical education and sport practice. Key pointsAll 21 newly constructed tasks demonstrated high intra-rater reliability (0.83-0.97) in FMS assessment. High reliability was also noted in the FMS-POLYGON test (0.98).A high correlation was found between the FMS-POLYGON and TGMD-2 which is a confirmation of the new test's concurrent validity.The research resolved the problem of long and detailed FMS assessment by adding a new dimension using quick and effective norm-referenced approach but also covering all the most important movement areas.New and validated test can be of great use primarily in school practice for physical education teachers and FMS experts.
Development of Modal Test Techniques for Validation of a Solar Sail Design

NASA Technical Reports Server (NTRS)

Gaspar, James L.; Mann, Troy; Behun, Vaughn; Wilkie, W. Keats; Pappa, Richard

2004-01-01

This paper focuses on the development of modal test techniques for validation of a solar sail gossamer space structure design. The major focus is on validating and comparing the capabilities of various excitation techniques for modal testing solar sail components. One triangular shaped quadrant of a solar sail membrane was tested in a 1 Torr vacuum environment using various excitation techniques including, magnetic excitation, and surface-bonded piezoelectric patch actuators. Results from modal tests performed on the sail using piezoelectric patches at different positions are discussed. The excitation methods were evaluated for their applicability to in-vacuum ground testing and to the development of on orbit flight test techniques. The solar sail membrane was tested in the horizontal configuration at various tension levels to assess the variation in frequency with tension in a vacuum environment. A segment of a solar sail mast prototype was also tested in ambient atmospheric conditions using various excitation techniques, and these methods are also assessed for their ground test capabilities and on-orbit flight testing.
Cross-cultural adaptation and psychometric evaluation of oral health impact profile among school teacher community

PubMed Central

Vyas, Shaleen; Nagarajappa, Sandesh; Dasar, Pralhad L.; Mishra, Prashant

2018-01-01

AIM: To translate OHIP-14 into Hindi and test its psychometric properties among school teacher community. METHODS: The OHIP-14 was translated to OHIP-14-H using WHO recommended translation protocol. During pre-testing, an expert panel assessed content validity of the questionnaire. Face validity was assessed on a sample of 10 individuals. The OHIP-14-H was administered on a random sample of 170 primary school teachers. Internal consistency and test-retest reliability were assessed using Cronbach's alpha and Intra-class correlation coefficient (ICC) respectively, with 2 weeks interval. Predictive validity was tested by comparing OHIP-14-H scores with clinical parameters. The concurrent validity was assessed using self-reported oral health and discriminant validity was ascertained through negative association with sociodemographic variables. RESULTS: The mean OHIP-14-H score was 9.57 (S.D = 4.58). ICC and Cronbach's alpha for OHIP-14-H was 0.96 and 0.92 respectively. Concurrent validity using binomial regression model indicated that good (OR = 0.56, 95% CI = 0.55 – 4.47) and moderate (OR = 0.25, 95% CI = 0.17 – 1.87) OHIP-14-H scores were negative but significant risk indicators of poor self reported oral health (P < 0.009). Significant predictive validity was observed between OHIP-14-H scores and clinical parameters (P < 0.000). CONCLUSION: Translated and culturally adapted OHIP-14-H indicates good reliability and validity among primary school teachers. PMID:29417064
Test-retest reliability and cross validation of the functioning everyday with a wheelchair instrument.

PubMed

Mills, Tamara L; Holm, Margo B; Schmeler, Mark

2007-01-01

The purpose of this study was to establish the test-retest reliability and content validity of an outcomes tool designed to measure the effectiveness of seating-mobility interventions on the functional performance of individuals who use wheelchairs or scooters as their primary seating-mobility device. The instrument, Functioning Everyday With a Wheelchair (FEW), is a questionnaire designed to measure perceived user function related to wheelchair/scooter use. Using consumer-generated items, FEW Beta Version 1.0 was developed and test-retest reliability was established. Cross-validation of FEW Beta Version 1.0 was then carried out with five samples of seating-mobility users to establish content validity. Based on the content validity study, FEW Version 2.0 was developed and administered to seating-mobility consumers to examine its test-retest reliability. FEW Beta Version 1.0 yielded an intraclass correlation coefficient (ICC) Model (3,k) of .92, p < .001, and the content validity results revealed that FEW Beta Version 1.0 captured 55% of seating-mobility goals reported by consumers across five samples. FEW Version 2.0 yielded ICC(3,k) = .86, p < .001, and captured 98.5% of consumers' seating-mobility goals. The cross-validation study identified new categories of seating-mobility goals for inclusion in FEW Version 2.0, and the content validity of FEW Version 2.0 was confirmed. FEW Beta Version 1.0 and FEW Version 2.0 were highly stable in their measurement of participants' seating-mobility goals over a 1-week interval.
Development and psychometric testing of the Protective Reasons Against Suicide Inventory for assessing older Chinese-speaking outpatients in primary care settings.

PubMed

Wang, Yi-Wen; Tsai, Yun-Fang; Lee, Shwu-Hua; Chen, Ying-Jen; Chen, Hsiu-Fang

2016-07-01

To develop and psychometrically test the Protective Reasons against Suicide Inventory among older Chinese-speaking outpatients. Tools currently exist to test reasons for living among individuals of all ages in western countries, but few are available to assess older adults' protective reasons against suicide in Asia. A cross-sectional survey to investigate protective reasons against suicide among older Chinese-speaking outpatients. The Protective Reasons against Suicide Inventory was developed based on individual interviews with 83 older outpatients in Taiwan, the literature and the authors' clinical experiences. The resulting Inventory was examined in 2013 for content validity, face validity, construct validity, criterion-related validity, internal consistency reliability and test-retest reliability. The Inventory had excellent content validity and face validity. Factor analysis yielded a seven-factor solution, accounting for 87·7% of the variance. Scores on the global Inventory and its subscales tended to be higher in outpatients diagnosed without suicidal ideation than in outpatients diagnosed with suicidal ideation, indicating good criterion validity. Inventory reliability and the intraclass correlation coefficient were satisfactory. The Protective Reasons against Suicide Inventory can be completed in 5 minutes and is perceived as easy to complete. Moreover, the Inventory yielded highly acceptable parameters for validity and reliability. The Protective Reasons against Suicide Inventory can be used to assess older Chinese-speaking outpatients for factors that protect them from attempting suicide. © 2016 John Wiley & Sons Ltd.
Revalidation of the NASA Ames 11-by 11-Foot Transonic Wind Tunnel with a Commercial Airplane Model

NASA Technical Reports Server (NTRS)

Kmak, Frank J.; Hudgins, M.; Hergert, D.; George, Michael W. (Technical Monitor)

2001-01-01

The 11-By 11-Foot Transonic leg of the Unitary Plan Wind Tunnel (UPWT) was modernized to improve tunnel performance, capability, productivity, and reliability. Wind tunnel tests to demonstrate the readiness of the tunnel for a return to production operations included an Integrated Systems Test (IST), calibration tests, and airplane validation tests. One of the two validation tests was a 0.037-scale Boeing 777 model that was previously tested in the 11-By 11-Foot tunnel in 1991. The objective of the validation tests was to compare pre-modernization and post-modernization results from the same airplane model in order to substantiate the operational readiness of the facility. Evaluation of within-test, test-to-test, and tunnel-to-tunnel data repeatability were made to study the effects of the tunnel modifications. Tunnel productivity was also evaluated to determine the readiness of the facility for production operations. The operation of the facility, including model installation, tunnel operations, and the performance of tunnel systems, was observed and facility deficiency findings generated. The data repeatability studies and tunnel-to-tunnel comparisons demonstrated outstanding data repeatability and a high overall level of data quality. Despite some operational and facility problems, the validation test was successful in demonstrating the readiness of the facility to perform production airplane wind tunnel%, tests.
Evaluating a technical university's placement test using the Rasch measurement model

NASA Astrophysics Data System (ADS)

Salleh, Tuan Salwani; Bakri, Norhayati; Zin, Zalhan Mohd

2016-10-01

This study discusses the process of validating a mathematics placement test at a technical university. The main objective is to produce a valid and reliable test to measure students' prerequisite knowledge to learn engineering technology mathematics. It is crucial to have a valid and reliable test as the results will be used in a critical decision making to assign students into different groups of Technical Mathematics 1. The placement test which consists of 50 mathematics questions were tested on 82 new diplomas in engineering technology students at a technical university. This study employed rasch measurement model to analyze the data through the Winsteps software. The results revealed that there are ten test questions lower than less able students' ability. Nevertheless, all the ten questions satisfied infit and outfit standard values. Thus, all the questions can be reused in the future placement test at the technical university.
Development of the Military Women's Attitudes Toward Menstrual Suppression Scale: from construct definition to pilot testing.

PubMed

Trego, Lori L

2009-01-01

The Military Women's Attitudes Toward Menstrual Suppression scale (MWATMS) was created to measure attitudes toward menstrual suppression during deployment. The human health and social ecology theories were integrated to conceptualize an instrument that accounts for military-unique aspects of the environment on attitudes toward suppression. A three-step instrument development process was followed to develop the MWATMS. The instrument was pilot tested on a convenience sample of 206 military women with deployment experience. Reliability was tested with measures of internal consistency (alpha = .97); validity was tested with principal components analysis with varimax rotation. Four components accounted for 65% of variance: Benefits/Interest, Hygiene, Convenience, and Soldier/Stress. The pilot test of the MWATMS supported its reliability and validity. Further testing is warranted for validation of this instrument.
Survey Development to Assess College Students' Perceptions of the Campus Environment.

PubMed

Sowers, Morgan F; Colby, Sarah; Greene, Geoffrey W; Pickett, Mackenzie; Franzen-Castle, Lisa; Olfert, Melissa D; Shelnutt, Karla; Brown, Onikia; Horacek, Tanya M; Kidd, Tandalayo; Kattelmann, Kendra K; White, Adrienne A; Zhou, Wenjun; Riggsbee, Kristin; Yan, Wangcheng; Byrd-Bredbenner, Carol

2017-11-01

We developed and tested a College Environmental Perceptions Survey (CEPS) to assess college students' perceptions of the healthfulness of their campus. CEPS was developed in 3 stages: questionnaire development, validity testing, and reliability testing. Questionnaire development was based on an extensive literature review and input from an expert panel to establish content validity. Face validity was established with the target population using cognitive interviews with 100 college students. Concurrent-criterion validity was established with in-depth interviews (N = 30) of college students compared to surveys completed by the same 30 students. Surveys completed by college students from 8 universities (N = 1147) were used to test internal structure (factor analysis) and internal consistency (Cronbach's alpha). After development and testing, 15 items remained from the original 48 items. A 5-factor solution emerged: physical activity (4 items, α = .635), water (3 items, α = .773), vending (2 items, α = .680), healthy food (2 items, α = .631), and policy (2 items, α = .573). The mean total score for all universities was 62.71 (±11.16) on a 100-point scale. CEPS appears to be a valid and reliable tool for assessing college students' perceptions of their health-related campus environment.
K(3)EDTA Vacuum Tubes Validation for Routine Hematological Testing.

PubMed

Lima-Oliveira, Gabriel; Lippi, Giuseppe; Salvagno, Gian Luca; Montagnana, Martina; Poli, Giovanni; Solero, Giovanni Pietro; Picheth, Geraldo; Guidi, Gian Cesare

2012-01-01

Background and Objective. Some in vitro diagnostic devices (e.g, blood collection vacuum tubes and syringes for blood analyses) are not validated before the quality laboratory managers decide to start using or to change the brand. Frequently, the laboratory or hospital managers select the vacuum tubes for blood collection based on cost considerations or on relevance of a brand. The aim of this study was to validate two dry K(3)EDTA vacuum tubes of different brands for routine hematological testing. Methods. Blood specimens from 100 volunteers in two different K(3)EDTA vacuum tubes were collected by a single, expert phlebotomist. The routine hematological testing was done on Advia 2120i hematology system. The significance of the differences between samples was assessed by paired Student's t-test after checking for normality. The level of statistical significance was set at P < 0.05. Results and Conclusions. Different brand's tubes evaluated can represent a clinically relevant source of variations only on mean platelet volume (MPV) and platelet distribution width (PDW). Basically, our validation will permit the laboratory or hospital managers to select the brand's vacuum tubes validated according to him/her technical or economical reasons for routine hematological tests.
K3EDTA Vacuum Tubes Validation for Routine Hematological Testing

PubMed Central

Lima-Oliveira, Gabriel; Lippi, Giuseppe; Salvagno, Gian Luca; Montagnana, Martina; Poli, Giovanni; Solero, Giovanni Pietro; Picheth, Geraldo; Guidi, Gian Cesare

2012-01-01

Background and Objective. Some in vitro diagnostic devices (e.g, blood collection vacuum tubes and syringes for blood analyses) are not validated before the quality laboratory managers decide to start using or to change the brand. Frequently, the laboratory or hospital managers select the vacuum tubes for blood collection based on cost considerations or on relevance of a brand. The aim of this study was to validate two dry K3EDTA vacuum tubes of different brands for routine hematological testing. Methods. Blood specimens from 100 volunteers in two different K3EDTA vacuum tubes were collected by a single, expert phlebotomist. The routine hematological testing was done on Advia 2120i hematology system. The significance of the differences between samples was assessed by paired Student's t-test after checking for normality. The level of statistical significance was set at P < 0.05. Results and Conclusions. Different brand's tubes evaluated can represent a clinically relevant source of variations only on mean platelet volume (MPV) and platelet distribution width (PDW). Basically, our validation will permit the laboratory or hospital managers to select the brand's vacuum tubes validated according to him/her technical or economical reasons for routine hematological tests. PMID:22888448
Concurrent validity of the Harris Infant Neuromotor Test and the Alberta Infant Motor Scale.

PubMed

Tse, Lillian; Mayson, Tanja A; Leo, Sara; Lee, Leanna L S; Harris, Susan R; Hayes, Virginia E; Backman, Catherine L; Cameron, Dianne; Tardif, Megan

2008-02-01

We examined concurrent validity of scores for two infant motor screening tools, the Harris Infant Neuromotor Test (HINT) and the Alberta Infant Motor Scale, in 121 Canadian infants. Relationships between the two tests for the overall sample were as follows: r = -.83 at 4 to 6.5 months (n = 121; p < .01) and r = -.85 at 10 to 12.5 months (n = 109; p < .01), suggesting that the HINT, the newer of the two measures, is valid in determining motor delays. Each test has advantages and disadvantages, and practitioners should determine which one best meets their infant assessment needs.
Cross-validation and hypothesis testing in neuroimaging: An irenic comment on the exchange between Friston and Lindquist et al.

PubMed

Reiss, Philip T

2015-08-01

The "ten ironic rules for statistical reviewers" presented by Friston (2012) prompted a rebuttal by Lindquist et al. (2013), which was followed by a rejoinder by Friston (2013). A key issue left unresolved in this discussion is the use of cross-validation to test the significance of predictive analyses. This note discusses the role that cross-validation-based and related hypothesis tests have come to play in modern data analyses, in neuroimaging and other fields. It is shown that such tests need not be suboptimal and can fill otherwise-unmet inferential needs. Copyright © 2015 Elsevier Inc. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.