Evaluating statistical validity of research reports: a guide for managers, planners, and researchers
Amanda L. Golbeck
1986-01-01
Inappropriate statistical methods, as well as appropriate methods inappropriately used, can lead to incorrect conclusions in any research report. Incorrect conclusions may also be due to the fact that the research problem is just hard to quantify in a satisfactory way. Publication of a research report does not guarantee that appropriate statistical methods have been...
A statistical approach to selecting and confirming validation targets in -omics experiments
2012-01-01
Background Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. Results Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. Conclusions For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. PMID:22738145
Austin, Peter C.; van Klaveren, David; Vergouwe, Yvonne; Nieboer, Daan; Lee, Douglas S.; Steyerberg, Ewout W.
2017-01-01
Objective Validation of clinical prediction models traditionally refers to the assessment of model performance in new patients. We studied different approaches to geographic and temporal validation in the setting of multicenter data from two time periods. Study Design and Setting We illustrated different analytic methods for validation using a sample of 14,857 patients hospitalized with heart failure at 90 hospitals in two distinct time periods. Bootstrap resampling was used to assess internal validity. Meta-analytic methods were used to assess geographic transportability. Each hospital was used once as a validation sample, with the remaining hospitals used for model derivation. Hospital-specific estimates of discrimination (c-statistic) and calibration (calibration intercepts and slopes) were pooled using random effects meta-analysis methods. I2 statistics and prediction interval width quantified geographic transportability. Temporal transportability was assessed using patients from the earlier period for model derivation and patients from the later period for model validation. Results Estimates of reproducibility, pooled hospital-specific performance, and temporal transportability were on average very similar, with c-statistics of 0.75. Between-hospital variation was moderate according to I2 statistics and prediction intervals for c-statistics. Conclusion This study illustrates how performance of prediction models can be assessed in settings with multicenter data at different time periods. PMID:27262237
TSP Symposium 2012 Proceedings
2012-11-01
and Statistical Model 78 7.3 Analysis and Results 79 7.4 Threats to Validity and Limitations 85 7.5 Conclusions 86 7.6 Acknowledgments 87 7.7...Table 12: Overall Statistics of the Experiment 32 Table 13: Results of Pairwise ANOVA Analysis, Highlighting Statistically Significant Differences...we calculated the percentage of defects injected. The distribution statistics are shown in Table 2. Table 2: Mean Lower, Upper Confidence Interval
Improving the Validity of Activity of Daily Living Dependency Risk Assessment
Clark, Daniel O.; Stump, Timothy E.; Tu, Wanzhu; Miller, Douglas K.
2015-01-01
Objectives Efforts to prevent activity of daily living (ADL) dependency may be improved through models that assess older adults’ dependency risk. We evaluated whether cognition and gait speed measures improve the predictive validity of interview-based models. Method Participants were 8,095 self-respondents in the 2006 Health and Retirement Survey who were aged 65 years or over and independent in five ADLs. Incident ADL dependency was determined from the 2008 interview. Models were developed using random 2/3rd cohorts and validated in the remaining 1/3rd. Results Compared to a c-statistic of 0.79 in the best interview model, the model including cognitive measures had c-statistics of 0.82 and 0.80 while the best fitting gait speed model had c-statistics of 0.83 and 0.79 in the development and validation cohorts, respectively. Conclusion Two relatively brief models, one that requires an in-person assessment and one that does not, had excellent validity for predicting incident ADL dependency but did not significantly improve the predictive validity of the best fitting interview-based models. PMID:24652867
Précis of statistical significance: rationale, validity, and utility.
Chow, S L
1998-04-01
The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of data. Neither can any of them fulfill the nonstatistical functions expected of them by critics.
ERIC Educational Resources Information Center
Koziol, Natalie A.; Bovaird, James A.
2018-01-01
Evaluations of measurement invariance provide essential construct validity evidence--a prerequisite for seeking meaning in psychological and educational research and ensuring fair testing procedures in high-stakes settings. However, the quality of such evidence is partly dependent on the validity of the resulting statistical conclusions. Type I or…
PCA as a practical indicator of OPLS-DA model reliability.
Worley, Bradley; Powers, Robert
Principal Component Analysis (PCA) and Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) are powerful statistical modeling tools that provide insights into separations between experimental groups based on high-dimensional spectral measurements from NMR, MS or other analytical instrumentation. However, when used without validation, these tools may lead investigators to statistically unreliable conclusions. This danger is especially real for Partial Least Squares (PLS) and OPLS, which aggressively force separations between experimental groups. As a result, OPLS-DA is often used as an alternative method when PCA fails to expose group separation, but this practice is highly dangerous. Without rigorous validation, OPLS-DA can easily yield statistically unreliable group separation. A Monte Carlo analysis of PCA group separations and OPLS-DA cross-validation metrics was performed on NMR datasets with statistically significant separations in scores-space. A linearly increasing amount of Gaussian noise was added to each data matrix followed by the construction and validation of PCA and OPLS-DA models. With increasing added noise, the PCA scores-space distance between groups rapidly decreased and the OPLS-DA cross-validation statistics simultaneously deteriorated. A decrease in correlation between the estimated loadings (added noise) and the true (original) loadings was also observed. While the validity of the OPLS-DA model diminished with increasing added noise, the group separation in scores-space remained basically unaffected. Supported by the results of Monte Carlo analyses of PCA group separations and OPLS-DA cross-validation metrics, we provide practical guidelines and cross-validatory recommendations for reliable inference from PCA and OPLS-DA models.
Assessing Discriminative Performance at External Validation of Clinical Prediction Models
Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W.
2016-01-01
Introduction External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. Methods We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. Results The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. Conclusion The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients. PMID:26881753
2013-01-01
Background Relative validity (RV), a ratio of ANOVA F-statistics, is often used to compare the validity of patient-reported outcome (PRO) measures. We used the bootstrap to establish the statistical significance of the RV and to identify key factors affecting its significance. Methods Based on responses from 453 chronic kidney disease (CKD) patients to 16 CKD-specific and generic PRO measures, RVs were computed to determine how well each measure discriminated across clinically-defined groups of patients compared to the most discriminating (reference) measure. Statistical significance of RV was quantified by the 95% bootstrap confidence interval. Simulations examined the effects of sample size, denominator F-statistic, correlation between comparator and reference measures, and number of bootstrap replicates. Results The statistical significance of the RV increased as the magnitude of denominator F-statistic increased or as the correlation between comparator and reference measures increased. A denominator F-statistic of 57 conveyed sufficient power (80%) to detect an RV of 0.6 for two measures correlated at r = 0.7. Larger denominator F-statistics or higher correlations provided greater power. Larger sample size with a fixed denominator F-statistic or more bootstrap replicates (beyond 500) had minimal impact. Conclusions The bootstrap is valuable for establishing the statistical significance of RV estimates. A reasonably large denominator F-statistic (F > 57) is required for adequate power when using the RV to compare the validity of measures with small or moderate correlations (r < 0.7). Substantially greater power can be achieved when comparing measures of a very high correlation (r > 0.9). PMID:23721463
Stanisavljevic, Dejana; Trajkovic, Goran; Marinkovic, Jelena; Bukumiric, Zoran; Cirkovic, Andja; Milic, Natasa
2014-01-01
Background Medical statistics has become important and relevant for future doctors, enabling them to practice evidence based medicine. Recent studies report that students’ attitudes towards statistics play an important role in their statistics achievements. The aim of the study was to test the psychometric properties of the Serbian version of the Survey of Attitudes Towards Statistics (SATS) in order to acquire a valid instrument to measure attitudes inside the Serbian educational context. Methods The validation study was performed on a cohort of 417 medical students who were enrolled in an obligatory introductory statistics course. The SATS adaptation was based on an internationally accepted methodology for translation and cultural adaptation. Psychometric properties of the Serbian version of the SATS were analyzed through the examination of factorial structure and internal consistency. Results Most medical students held positive attitudes towards statistics. The average total SATS score was above neutral (4.3±0.8), and varied from 1.9 to 6.2. Confirmatory factor analysis validated the six-factor structure of the questionnaire (Affect, Cognitive Competence, Value, Difficulty, Interest and Effort). Values for fit indices TLI (0.940) and CFI (0.961) were above the cut-off of ≥0.90. The RMSEA value of 0.064 (0.051–0.078) was below the suggested value of ≤0.08. Cronbach’s alpha of the entire scale was 0.90, indicating scale reliability. In a multivariate regression model, self-rating of ability in mathematics and current grade point average were significantly associated with the total SATS score after adjusting for age and gender. Conclusion Present study provided the evidence for the appropriate metric properties of the Serbian version of SATS. Confirmatory factor analysis validated the six-factor structure of the scale. The SATS might be reliable and a valid instrument for identifying medical students’ attitudes towards statistics in the Serbian educational context. PMID:25405489
Parsons, Nick R; Price, Charlotte L; Hiskens, Richard; Achten, Juul; Costa, Matthew L
2012-04-25
The application of statistics in reported research in trauma and orthopaedic surgery has become ever more important and complex. Despite the extensive use of statistical analysis, it is still a subject which is often not conceptually well understood, resulting in clear methodological flaws and inadequate reporting in many papers. A detailed statistical survey sampled 100 representative orthopaedic papers using a validated questionnaire that assessed the quality of the trial design and statistical analysis methods. The survey found evidence of failings in study design, statistical methodology and presentation of the results. Overall, in 17% (95% confidence interval; 10-26%) of the studies investigated the conclusions were not clearly justified by the results, in 39% (30-49%) of studies a different analysis should have been undertaken and in 17% (10-26%) a different analysis could have made a difference to the overall conclusions. It is only by an improved dialogue between statistician, clinician, reviewer and journal editor that the failings in design methodology and analysis highlighted by this survey can be addressed.
Enhancement and Validation of an Arab Surname Database
Schwartz, Kendra; Beebani, Ganj; Sedki, Mai; Tahhan, Mamon; Ruterbusch, Julie J.
2015-01-01
Objectives Arab Americans constitute a large, heterogeneous, and quickly growing subpopulation in the United States. Health statistics for this group are difficult to find because US governmental offices do not recognize Arab as separate from white. The development and validation of an Arab- and Chaldean-American name database will enhance research efforts in this population subgroup. Methods A previously validated name database was supplemented with newly identified names gathered primarily from vital statistic records and then evaluated using a multistep process. This process included 1) review by 4 Arabic- and Chaldean-speaking reviewers, 2) ethnicity assessment by social media searches, and 3) self-report of ancestry obtained from a telephone survey. Results Our Arab- and Chaldean-American name algorithm has a positive predictive value of 91% and a negative predictive value of 100%. Conclusions This enhanced name database and algorithm can be used to identify Arab Americans in health statistics data, such as cancer and hospital registries, where they are often coded as white, to determine the extent of health disparities in this population. PMID:24625771
Kratochwill, Thomas R; Levin, Joel R
2014-04-01
In this commentary, we add to the spirit of the articles appearing in the special series devoted to meta- and statistical analysis of single-case intervention-design data. Following a brief discussion of historical factors leading to our initial involvement in statistical analysis of such data, we discuss: (a) the value added by including statistical-analysis recommendations in the What Works Clearinghouse Standards for single-case intervention designs; (b) the importance of visual analysis in single-case intervention research, along with the distinctive role that could be played by single-case effect-size measures; and (c) the elevated internal validity and statistical-conclusion validity afforded by the incorporation of various forms of randomization into basic single-case design structures. For the future, we envision more widespread application of quantitative analyses, as critical adjuncts to visual analysis, in both primary single-case intervention research studies and literature reviews in the behavioral, educational, and health sciences. Copyright © 2014 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.
Validation of a Survey Questionnaire on Organ Donation: An Arabic World Scenario
Agarwal, Tulika Mehta; Al-Thani, Hassan; Al Maslamani, Yousuf
2018-01-01
Objective To validate a questionnaire for measuring factors influencing organ donation and transplant. Methods The constructed questionnaire was based on the theory of planned behavior by Ajzen Icek and had 45 questions including general inquiry and demographic information. Four experts on the topic, Arabic culture, and the Arabic and English languages established content validity through review. It was quantified by content validity index (CVI). Construct validity was established by principal component analysis (PCA), whereas internal consistency was checked by Cronbach's Alpha and intraclass correlation coefficient (ICC). Statistical analysis was performed by SPSS 22.0 statistical package. Results Content validity in the form of S-CVI/Average and S-CVI/UA was 0.95 and 0.82, respectively, suggesting adequate relevance content of the questionnaire. Factor analysis indicated that the construct validity for each domain (knowledge, attitudes, beliefs, and intention) was 65%, 71%, 77%, and 70%, respectively. Cronbach's Alpha and ICC coefficients were 0.90, 0.67, 0.75, and 0.74 and 0.82, 0.58, 0.61, and 0.74, respectively, for the domains. Conclusion The questionnaire consists of 39 items on knowledge, attitudes, beliefs, and intention domains which is valid and reliable tool to use for organ donation and transplant survey. PMID:29593894
Vahedi, Shahram; Farrokhi, Farahman
2011-01-01
Objective The aim of this study is to explore the confirmatory factor analysis results of the Persian adaptation of Statistics Anxiety Measure (SAM), proposed by Earp. Method The validity and reliability assessments of the scale were performed on 298 college students chosen randomly from Tabriz University in Iran. Confirmatory factor analysis (CFA) was carried out to determine the factor structures of the Persian adaptation of SAM. Results As expected, the second order model provided a better fit to the data than the three alternative models. Conclusions Hence, SAM provides an equally valid measure for use among college students. The study both expands and adds support to the existing body of math anxiety literature. PMID:22952530
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-01-01
Aims A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R2), using R2 as the primary metric of assay agreement. However, the use of R2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. Methods We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Results Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. Conclusions The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. PMID:28747393
Classical Statistics and Statistical Learning in Imaging Neuroscience
Bzdok, Danilo
2017-01-01
Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques. PMID:29056896
Zaki, Rafdzah; Bulgiba, Awang; Ismail, Roshidi; Ismail, Noor Azina
2012-01-01
Background Accurate values are a must in medicine. An important parameter in determining the quality of a medical instrument is agreement with a gold standard. Various statistical methods have been used to test for agreement. Some of these methods have been shown to be inappropriate. This can result in misleading conclusions about the validity of an instrument. The Bland-Altman method is the most popular method judging by the many citations of the article proposing this method. However, the number of citations does not necessarily mean that this method has been applied in agreement research. No previous study has been conducted to look into this. This is the first systematic review to identify statistical methods used to test for agreement of medical instruments. The proportion of various statistical methods found in this review will also reflect the proportion of medical instruments that have been validated using those particular methods in current clinical practice. Methodology/Findings Five electronic databases were searched between 2007 and 2009 to look for agreement studies. A total of 3,260 titles were initially identified. Only 412 titles were potentially related, and finally 210 fitted the inclusion criteria. The Bland-Altman method is the most popular method with 178 (85%) studies having used this method, followed by the correlation coefficient (27%) and means comparison (18%). Some of the inappropriate methods highlighted by Altman and Bland since the 1980s are still in use. Conclusions This study finds that the Bland-Altman method is the most popular method used in agreement research. There are still inappropriate applications of statistical methods in some studies. It is important for a clinician or medical researcher to be aware of this issue because misleading conclusions from inappropriate analyses will jeopardize the quality of the evidence, which in turn will influence quality of care given to patients in the future. PMID:22662248
An observational examination of the literature in diagnostic anatomic pathology.
Foucar, Elliott; Wick, Mark R
2005-05-01
Original research published in the medical literature confronts the reader with three very basic and closely linked questions--are the authors' conclusions true in the contextual setting in which the work was performed (internally valid); if so, are the conclusions also applicable in other practice settings (externally valid); and, if the conclusions of the study are bona fide, do they represent an important contribution to medical practice or are they true-but-insignificant? Most publications attempt to convince readers that the researchers' conclusions are both internally valid and important, and occasionally papers also directly address external validity. Developing standardized methods to facilitate the prospective determination of research importance would be useful to both journals and their readers, but has proven difficult. In contrast, the evidence-based medicine (EBM) movement has had more success with understanding and codifying factors thought to promote research validity. Of the many variables that can influence research validity, research design is the one that has received the most attention. The present paper reviews the contributions of EBM to understanding research validity, looking for areas where EBM's body of knowledge is applicable to the anatomic pathology (AP) literature. As part of this project, the authors performed a pilot observational analysis of a representative sample of the current pertinent literature on diagnostic tissue pathology. The results of that review showed that most of the latter publications employ one of the four categories of "observational" research design that have been delineated by the EBM movement, and that the most common of these observational designs is a "cross-sectional" comparison. Pathologists do not presently use the "experimental" research designs so admired by advocates of EBM. Slightly > 50% of AP observational studies employed statistical evaluations to support their final conclusions. Comparison of the current AP literature with a selected group of papers published in 1977 shows a discernible change over that period that has affected not just technological procedures, but also research design and use of statistics. Although we feel that advocates of EBM deserve credit for bringing attention to the close link between research design and research validity, much of the EBM effort has centered on refining "experimental" methodology, and the complexities of observational research have often been treated in an inappropriately dismissive manner. For advocates of EBM, an observational study is what you are relegated to as a second choice when you are unable to do an experimental study. The latter viewpoint may be true for evaluating new chemotherapeutic agents, but is unacceptable to pathologists, whose research advances are currently completely dependent on well-conducted observational research. Rather than succumb to randomization envy and accept EBM's assertion that observational research is second best, the challenge to AP is to develop and adhere to standards for observational research that will allow our patients to benefit from the full potential of this time tested approach to developing valid insights into disease.
Improving the Validity and Reliability of a Health Promotion Survey for Physical Therapists
Stephens, Jaca L.; Lowman, John D.; Graham, Cecilia L.; Morris, David M.; Kohler, Connie L.; Waugh, Jonathan B.
2013-01-01
Purpose Physical therapists (PTs) have a unique opportunity to intervene in the area of health promotion. However, no instrument has been validated to measure PTs’ views on health promotion in physical therapy practice. The purpose of this study was to evaluate the content validity and test-retest reliability of a health promotion survey designed for PTs. Methods An expert panel of PTs assessed the content validity of “The Role of Health Promotion in Physical Therapy Survey” and provided suggestions for revision. Item content validity was assessed using the content validity ratio (CVR) as well as the modified kappa statistic. Therapists then participated in the test-retest reliability assessment of the revised health promotion survey, which was assessed using a weighted kappa statistic. Results Based on feedback from the expert panelists, significant revisions were made to the original survey. The expert panel reached at least a majority consensus agreement for all items in the revised survey and the survey-CVR improved from 0.44 to 0.66. Only one item on the revised survey had substantial test-retest agreement, with 55% of the items having moderate agreement and 43% poor agreement. Conclusions All items on the revised health promotion survey demonstrated at least fair validity, but few items had reasonable test-retest reliability. Further modifications should be made to strengthen the validity and improve the reliability of this survey. PMID:23754935
Single-Item Measurement of Suicidal Behaviors: Validity and Consequences of Misclassification
Millner, Alexander J.; Lee, Michael D.; Nock, Matthew K.
2015-01-01
Suicide is a leading cause of death worldwide. Although research has made strides in better defining suicidal behaviors, there has been less focus on accurate measurement. Currently, the widespread use of self-report, single-item questions to assess suicide ideation, plans and attempts may contribute to measurement problems and misclassification. We examined the validity of single-item measurement and the potential for statistical errors. Over 1,500 participants completed an online survey containing single-item questions regarding a history of suicidal behaviors, followed by questions with more precise language, multiple response options and narrative responses to examine the validity of single-item questions. We also conducted simulations to test whether common statistical tests are robust against the degree of misclassification produced by the use of single-items. We found that 11.3% of participants that endorsed a single-item suicide attempt measure engaged in behavior that would not meet the standard definition of a suicide attempt. Similarly, 8.8% of those who endorsed a single-item measure of suicide ideation endorsed thoughts that would not meet standard definitions of suicide ideation. Statistical simulations revealed that this level of misclassification substantially decreases statistical power and increases the likelihood of false conclusions from statistical tests. Providing a wider range of response options for each item reduced the misclassification rate by approximately half. Overall, the use of single-item, self-report questions to assess the presence of suicidal behaviors leads to misclassification, increasing the likelihood of statistical decision errors. Improving the measurement of suicidal behaviors is critical to increase understanding and prevention of suicide. PMID:26496707
Pattern statistics on Markov chains and sensitivity to parameter estimation
Nuel, Grégory
2006-01-01
Background: In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). Results: In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of σ, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. Conclusion: We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation. PMID:17044916
2014-01-01
In omic research, such as genome wide association studies, researchers seek to repeat their results in other datasets to reduce false positive findings and thus provide evidence for the existence of true associations. Unfortunately this standard validation approach cannot completely eliminate false positive conclusions, and it can also mask many true associations that might otherwise advance our understanding of pathology. These issues beg the question: How can we increase the amount of knowledge gained from high throughput genetic data? To address this challenge, we present an approach that complements standard statistical validation methods by drawing attention to both potential false negative and false positive conclusions, as well as providing broad information for directing future research. The Diverse Convergent Evidence approach (DiCE) we propose integrates information from multiple sources (omics, informatics, and laboratory experiments) to estimate the strength of the available corroborating evidence supporting a given association. This process is designed to yield an evidence metric that has utility when etiologic heterogeneity, variable risk factor frequencies, and a variety of observational data imperfections might lead to false conclusions. We provide proof of principle examples in which DiCE identified strong evidence for associations that have established biological importance, when standard validation methods alone did not provide support. If used as an adjunct to standard validation methods this approach can leverage multiple distinct data types to improve genetic risk factor discovery/validation, promote effective science communication, and guide future research directions. PMID:25071867
Benjamin, Sara E; Neelon, Brian; Ball, Sarah C; Bangdiwala, Shrikant I; Ammerman, Alice S; Ward, Dianne S
2007-01-01
Background Few assessment instruments have examined the nutrition and physical activity environments in child care, and none are self-administered. Given the emerging focus on child care settings as a target for intervention, a valid and reliable measure of the nutrition and physical activity environment is needed. Methods To measure inter-rater reliability, 59 child care center directors and 109 staff completed the self-assessment concurrently, but independently. Three weeks later, a repeat self-assessment was completed by a sub-sample of 38 directors to assess test-retest reliability. To assess criterion validity, a researcher-administered environmental assessment was conducted at 69 centers and was compared to a self-assessment completed by the director. A weighted kappa test statistic and percent agreement were calculated to assess agreement for each question on the self-assessment. Results For inter-rater reliability, kappa statistics ranged from 0.20 to 1.00 across all questions. Test-retest reliability of the self-assessment yielded kappa statistics that ranged from 0.07 to 1.00. The inter-quartile kappa statistic ranges for inter-rater and test-retest reliability were 0.45 to 0.63 and 0.27 to 0.45, respectively. When percent agreement was calculated, questions ranged from 52.6% to 100% for inter-rater reliability and 34.3% to 100% for test-retest reliability. Kappa statistics for validity ranged from -0.01 to 0.79, with an inter-quartile range of 0.08 to 0.34. Percent agreement for validity ranged from 12.9% to 93.7%. Conclusion This study provides estimates of criterion validity, inter-rater reliability and test-retest reliability for an environmental nutrition and physical activity self-assessment instrument for child care. Results indicate that the self-assessment is a stable and reasonably accurate instrument for use with child care interventions. We therefore recommend the Nutrition and Physical Activity Self-Assessment for Child Care (NAP SACC) instrument to researchers and practitioners interested in conducting healthy weight intervention in child care. However, a more robust, less subjective measure would be more appropriate for researchers seeking an outcome measure to assess intervention impact. PMID:17615078
Ganasegeran, Kurubaran; Selvaraj, Kamaraj; Rashid, Abdul
2017-01-01
Background The six item Confusion, Hubbub and Order Scale (CHAOS-6) has been validated as a reliable tool to measure levels of household disorder. We aimed to investigate the goodness of fit and reliability of a new Malay version of the CHAOS-6. Methods The original English version of the CHAOS-6 underwent forward-backward translation into the Malay language. The finalised Malay version was administered to 105 myocardial infarction survivors in a Malaysian cardiac health facility. We performed confirmatory factor analyses (CFAs) using structural equation modelling. A path diagram and fit statistics were yielded to determine the Malay version’s validity. Composite reliability was tested to determine the scale’s reliability. Results All 105 myocardial infarction survivors participated in the study. The CFA yielded a six-item, one-factor model with excellent fit statistics. Composite reliability for the single factor CHAOS-6 was 0.65, confirming that the scale is reliable for Malay speakers. Conclusion The Malay version of the CHAOS-6 was reliable and showed the best fit statistics for our study sample. We thus offer a simple, brief, validated, reliable and novel instrument to measure chaos, the Skala Kecelaruan, Keriuhan & Tertib Terubahsuai (CHAOS-6), for the Malaysian population. PMID:28951688
Dowd, Kieran P.; Harrington, Deirdre M.; Donnelly, Alan E.
2012-01-01
Background The activPAL has been identified as an accurate and reliable measure of sedentary behaviour. However, only limited information is available on the accuracy of the activPAL activity count function as a measure of physical activity, while no unit calibration of the activPAL has been completed to date. This study aimed to investigate the criterion validity of the activPAL, examine the concurrent validity of the activPAL, and perform and validate a value calibration of the activPAL in an adolescent female population. The performance of the activPAL in estimating posture was also compared with sedentary thresholds used with the ActiGraph accelerometer. Methodologies Thirty adolescent females (15 developmental; 15 cross-validation) aged 15–18 years performed 5 activities while wearing the activPAL, ActiGraph GT3X, and the Cosmed K4B2. A random coefficient statistics model examined the relationship between metabolic equivalent (MET) values and activPAL counts. Receiver operating characteristic analysis was used to determine activity thresholds and for cross-validation. The random coefficient statistics model showed a concordance correlation coefficient of 0.93 (standard error of the estimate = 1.13). An optimal moderate threshold of 2997 was determined using mixed regression, while an optimal vigorous threshold of 8229 was determined using receiver operating statistics. The activPAL count function demonstrated very high concurrent validity (r = 0.96, p<0.01) with the ActiGraph count function. Levels of agreement for sitting, standing, and stepping between direct observation and the activPAL and ActiGraph were 100%, 98.1%, 99.2% and 100%, 0%, 100%, respectively. Conclusions These findings suggest that the activPAL is a valid, objective measurement tool that can be used for both the measurement of physical activity and sedentary behaviours in an adolescent female population. PMID:23094069
Vanniyasingam, Thuva; Daly, Caitlin; Jin, Xuejing; Zhang, Yuan; Foster, Gary; Cunningham, Charles; Thabane, Lehana
2018-06-01
This study reviews simulation studies of discrete choice experiments to determine (i) how survey design features affect statistical efficiency, (ii) and to appraise their reporting quality. Statistical efficiency was measured using relative design (D-) efficiency, D-optimality, or D-error. For this systematic survey, we searched Journal Storage (JSTOR), Since Direct, PubMed, and OVID which included a search within EMBASE. Searches were conducted up to year 2016 for simulation studies investigating the impact of DCE design features on statistical efficiency. Studies were screened and data were extracted independently and in duplicate. Results for each included study were summarized by design characteristic. Previously developed criteria for reporting quality of simulation studies were also adapted and applied to each included study. Of 371 potentially relevant studies, 9 were found to be eligible, with several varying in study objectives. Statistical efficiency improved when increasing the number of choice tasks or alternatives; decreasing the number of attributes, attribute levels; using an unrestricted continuous "manipulator" attribute; using model-based approaches with covariates incorporating response behaviour; using sampling approaches that incorporate previous knowledge of response behaviour; incorporating heterogeneity in a model-based design; correctly specifying Bayesian priors; minimizing parameter prior variances; and using an appropriate method to create the DCE design for the research question. The simulation studies performed well in terms of reporting quality. Improvement is needed in regards to clearly specifying study objectives, number of failures, random number generators, starting seeds, and the software used. These results identify the best approaches to structure a DCE. An investigator can manipulate design characteristics to help reduce response burden and increase statistical efficiency. Since studies varied in their objectives, conclusions were made on several design characteristics, however, the validity of each conclusion was limited. Further research should be conducted to explore all conclusions in various design settings and scenarios. Additional reviews to explore other statistical efficiency outcomes and databases can also be performed to enhance the conclusions identified from this review.
Validation of Statistical Sampling Algorithms in Visual Sample Plan (VSP): Summary Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nuffer, Lisa L; Sego, Landon H.; Wilson, John E.
2009-02-18
The U.S. Department of Homeland Security, Office of Technology Development (OTD) contracted with a set of U.S. Department of Energy national laboratories, including the Pacific Northwest National Laboratory (PNNL), to write a Remediation Guidance for Major Airports After a Chemical Attack. The report identifies key activities and issues that should be considered by a typical major airport following an incident involving release of a toxic chemical agent. Four experimental tasks were identified that would require further research in order to supplement the Remediation Guidance. One of the tasks, Task 4, OTD Chemical Remediation Statistical Sampling Design Validation, dealt with statisticalmore » sampling algorithm validation. This report documents the results of the sampling design validation conducted for Task 4. In 2005, the Government Accountability Office (GAO) performed a review of the past U.S. responses to Anthrax terrorist cases. Part of the motivation for this PNNL report was a major GAO finding that there was a lack of validated sampling strategies in the U.S. response to Anthrax cases. The report (GAO 2005) recommended that probability-based methods be used for sampling design in order to address confidence in the results, particularly when all sample results showed no remaining contamination. The GAO also expressed a desire that the methods be validated, which is the main purpose of this PNNL report. The objective of this study was to validate probability-based statistical sampling designs and the algorithms pertinent to within-building sampling that allow the user to prescribe or evaluate confidence levels of conclusions based on data collected as guided by the statistical sampling designs. Specifically, the designs found in the Visual Sample Plan (VSP) software were evaluated. VSP was used to calculate the number of samples and the sample location for a variety of sampling plans applied to an actual release site. Most of the sampling designs validated are probability based, meaning samples are located randomly (or on a randomly placed grid) so no bias enters into the placement of samples, and the number of samples is calculated such that IF the amount and spatial extent of contamination exceeds levels of concern, at least one of the samples would be taken from a contaminated area, at least X% of the time. Hence, "validation" of the statistical sampling algorithms is defined herein to mean ensuring that the "X%" (confidence) is actually met.« less
Parametric vs. non-parametric statistics of low resolution electromagnetic tomography (LORETA).
Thatcher, R W; North, D; Biver, C
2005-01-01
This study compared the relative statistical sensitivity of non-parametric and parametric statistics of 3-dimensional current sources as estimated by the EEG inverse solution Low Resolution Electromagnetic Tomography (LORETA). One would expect approximately 5% false positives (classification of a normal as abnormal) at the P < .025 level of probability (two tailed test) and approximately 1% false positives at the P < .005 level. EEG digital samples (2 second intervals sampled 128 Hz, 1 to 2 minutes eyes closed) from 43 normal adult subjects were imported into the Key Institute's LORETA program. We then used the Key Institute's cross-spectrum and the Key Institute's LORETA output files (*.lor) as the 2,394 gray matter pixel representation of 3-dimensional currents at different frequencies. The mean and standard deviation *.lor files were computed for each of the 2,394 gray matter pixels for each of the 43 subjects. Tests of Gaussianity and different transforms were computed in order to best approximate a normal distribution for each frequency and gray matter pixel. The relative sensitivity of parametric vs. non-parametric statistics were compared using a "leave-one-out" cross validation method in which individual normal subjects were withdrawn and then statistically classified as being either normal or abnormal based on the remaining subjects. Log10 transforms approximated Gaussian distribution in the range of 95% to 99% accuracy. Parametric Z score tests at P < .05 cross-validation demonstrated an average misclassification rate of approximately 4.25%, and range over the 2,394 gray matter pixels was 27.66% to 0.11%. At P < .01 parametric Z score cross-validation false positives were 0.26% and ranged from 6.65% to 0% false positives. The non-parametric Key Institute's t-max statistic at P < .05 had an average misclassification error rate of 7.64% and ranged from 43.37% to 0.04% false positives. The nonparametric t-max at P < .01 had an average misclassification rate of 6.67% and ranged from 41.34% to 0% false positives of the 2,394 gray matter pixels for any cross-validated normal subject. In conclusion, adequate approximation to Gaussian distribution and high cross-validation can be achieved by the Key Institute's LORETA programs by using a log10 transform and parametric statistics, and parametric normative comparisons had lower false positive rates than the non-parametric tests.
Sabel, Michael S.; Rice, John D.; Griffith, Kent A.; Lowe, Lori; Wong, Sandra L.; Chang, Alfred E.; Johnson, Timothy M.; Taylor, Jeremy M.G.
2013-01-01
Introduction To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid SLN biopsy (SLNB). Several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests and support vector machines. We sought to validate recently published models meant to predict sentinel node status. Methods We queried our comprehensive, prospectively-collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon 4 published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false negative rate (FNR). Results Logistic regression performed comparably with our data when considering NPV (89.4% vs. 93.6%); however the model’s specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsies rates that were lower 87.7% vs. 94.1% and 29.8% vs. 14.3%, respectively. Two published models could not be applied to our data due to model complexity and the use of proprietary software. Conclusions Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Development of statistical predictive models must be created in a clinically applicable manner to allow for both validation and ultimately clinical utility. PMID:21822550
Harrison, Jay M; Breeze, Matthew L; Harrigan, George G
2011-08-01
Statistical comparisons of compositional data generated on genetically modified (GM) crops and their near-isogenic conventional (non-GM) counterparts typically rely on classical significance testing. This manuscript presents an introduction to Bayesian methods for compositional analysis along with recommendations for model validation. The approach is illustrated using protein and fat data from two herbicide tolerant GM soybeans (MON87708 and MON87708×MON89788) and a conventional comparator grown in the US in 2008 and 2009. Guidelines recommended by the US Food and Drug Administration (FDA) in conducting Bayesian analyses of clinical studies on medical devices were followed. This study is the first Bayesian approach to GM and non-GM compositional comparisons. The evaluation presented here supports a conclusion that a Bayesian approach to analyzing compositional data can provide meaningful and interpretable results. We further describe the importance of method validation and approaches to model checking if Bayesian approaches to compositional data analysis are to be considered viable by scientists involved in GM research and regulation. Copyright © 2011 Elsevier Inc. All rights reserved.
An Empirical Taxonomy of Hospital Governing Board Roles
Lee, Shoou-Yih D; Alexander, Jeffrey A; Wang, Virginia; Margolin, Frances S; Combes, John R
2008-01-01
Objective To develop a taxonomy of governing board roles in U.S. hospitals. Data Sources 2005 AHA Hospital Governance Survey, 2004 AHA Annual Survey of Hospitals, and Area Resource File. Study Design A governing board taxonomy was developed using cluster analysis. Results were validated and reviewed by industry experts. Differences in hospital and environmental characteristics across clusters were examined. Data Extraction Methods One-thousand three-hundred thirty-four hospitals with complete information on the study variables were included in the analysis. Principal Findings Five distinct clusters of hospital governing boards were identified. Statistical tests showed that the five clusters had high internal reliability and high internal validity. Statistically significant differences in hospital and environmental conditions were found among clusters. Conclusions The developed taxonomy provides policy makers, health care executives, and researchers a useful way to describe and understand hospital governing board roles. The taxonomy may also facilitate valid and systematic assessment of governance performance. Further, the taxonomy could be used as a framework for governing boards themselves to identify areas for improvement and direction for change. PMID:18355260
van Walraven, Carl; Jackson, Timothy D; Daneman, Nick
2016-04-01
OBJECTIVE Surgical site infections (SSIs) are common hospital-acquired infections. Tracking SSIs is important to monitor their incidence, and this process requires primary data collection. In this study, we derived and validated a method using health administrative data to predict the probability that a person who had surgery would develop an SSI within 30 days. METHODS All patients enrolled in the National Surgical Quality Improvement Program (NSQIP) from 2 sites were linked to population-based administrative datasets in Ontario, Canada. We derived a multivariate model, stratified by surgical specialty, to determine the independent association of SSI status with patient and hospitalization covariates as well as physician claim codes. This SSI risk model was validated in 2 cohorts. RESULTS The derivation cohort included 5,359 patients with a 30-day SSI incidence of 6.0% (n=118). The SSI risk model predicted the probability that a person had an SSI based on 7 covariates: index hospitalization diagnostic score; physician claims score; emergency visit diagnostic score; operation duration; surgical service; and potential SSI codes. More than 90% of patients had predicted SSI risks lower than 10%. In the derivation group, model discrimination and calibration was excellent (C statistic, 0.912; Hosmer-Lemeshow [H-L] statistic, P=.47). In the 2 validation groups, performance decreased slightly (C statistics, 0.853 and 0.812; H-L statistics, 26.4 [P=.0009] and 8.0 [P=.42]), but low-risk patients were accurately identified. CONCLUSION Health administrative data can effectively identify postoperative patients with a very low risk of surgical site infection within 30 days of their procedure. Records of higher-risk patients can be reviewed to confirm SSI status.
Valid statistical inference methods for a case-control study with missing data.
Tian, Guo-Liang; Zhang, Chi; Jiang, Xuejun
2018-04-01
The main objective of this paper is to derive the valid sampling distribution of the observed counts in a case-control study with missing data under the assumption of missing at random by employing the conditional sampling method and the mechanism augmentation method. The proposed sampling distribution, called the case-control sampling distribution, can be used to calculate the standard errors of the maximum likelihood estimates of parameters via the Fisher information matrix and to generate independent samples for constructing small-sample bootstrap confidence intervals. Theoretical comparisons of the new case-control sampling distribution with two existing sampling distributions exhibit a large difference. Simulations are conducted to investigate the influence of the three different sampling distributions on statistical inferences. One finding is that the conclusion by the Wald test for testing independency under the two existing sampling distributions could be completely different (even contradictory) from the Wald test for testing the equality of the success probabilities in control/case groups under the proposed distribution. A real cervical cancer data set is used to illustrate the proposed statistical methods.
The construction and assessment of a statistical model for the prediction of protein assay data.
Pittman, J; Sacks, J; Young, S Stanley
2002-01-01
The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.
The Serbian version of the Juvenile Arthritis Multidimensional Assessment Report (JAMAR).
Susic, Gordana; Vojinovic, Jelena; Vijatov-Djuric, Gordana; Stevanovic, Dejan; Lazarevic, Dragana; Djurovic, Nada; Novakovic, Dusica; Consolaro, Alessandro; Bovis, Francesca; Ruperto, Nicolino
2018-04-01
The Juvenile Arthritis Multidimensional Assessment Report (JAMAR) is a new parent/patient-reported outcome measure that enables a thorough assessment of the disease status in children with juvenile idiopathic arthritis (JIA). We report the results of the cross-cultural adaptation and validation of the parent and patient versions of the JAMAR in the Serbian language. The reading comprehension of the questionnaire was tested in 10 JIA parents and patients. Each participating centre was asked to collect demographic, clinical data and the JAMAR in 100 consecutive JIA patients or all consecutive patients seen in a 6-month period and to administer the JAMAR to 100 healthy children and their parents. The statistical validation phase explored descriptive statistics and the psychometric issues of the JAMAR: the three Likert assumptions, floor/ceiling effects, internal consistency, Cronbach's alpha, interscale correlations, test-retest reliability, and construct validity (convergent and discriminant validity). A total of 248 JIA patients (5.2% systemic, 44.3% oligoarticular, 23.8% RF-negative polyarthritis, 26.7% other categories) and 100 healthy children were enrolled in three centres. The JAMAR components discriminated healthy subjects from JIA patients. All JAMAR components revealed good psychometric performances. In conclusion, the Serbian version of the JAMAR is a valid tool for the assessment of children with JIA and is suitable for use both in routine clinical practice and clinical research.
The German version of the Juvenile Arthritis Multidimensional Assessment Report (JAMAR).
Holzinger, Dirk; Foell, Dirk; Horneff, Gerd; Foeldvari, Ivan; Tzaribachev, Nikolay; Tzaribachev, Catrin; Minden, Kirsten; Kallinich, Tilmann; Ganser, Gerd; Clara, Lucia; Haas, Johannes-Peter; Hügle, Boris; Huppertz, Hans-Iko; Weller, Frank; Consolaro, Alessandro; Bovis, Francesca; Ruperto, Nicolino
2018-04-01
The Juvenile Arthritis Multidimensional Assessment Report (JAMAR) is a new parent/patient reported outcome measure that enables a thorough assessment of the disease status in children with juvenile idiopathic arthritis (JIA). We report the results of the cross-cultural adaptation and validation of the parent and patient versions of the JAMAR in the German language. The reading comprehension of the questionnaire was tested in 10 JIA parents and patients. The participating centres were asked to collect demographic and clinical data along the JAMAR questionnaire in 100 consecutive JIA patients or all consecutive patients seen in a 6-month period and to administer the JAMAR to 100 healthy children and their parents. The statistical validation phase explored descriptive statistics and the psychometric issues of the JAMAR: the three Likert assumptions, floor/ceiling effects, internal consistency, Cronbach's alpha, interscale correlations, test-retest reliability, and construct validity (convergent and discriminant validity). A total of 319 JIA patients (2.8% systemic, 36.7% oligoarticular, 23.5% RF negative polyarthritis, and 37% other categories) and 100 healthy children were enrolled in eight centres. The JAMAR components discriminated well healthy subjects from JIA patients. All JAMAR components revealed good psychometric performances. In conclusion, the German version of the JAMAR is a valid tool for the assessment of children with JIA and is suitable for use both in routine clinical practice and in clinical research.
Bredbenner, Todd L.; Eliason, Travis D.; Francis, W. Loren; McFarland, John M.; Merkle, Andrew C.; Nicolella, Daniel P.
2014-01-01
Cervical spinal injuries are a significant concern in all trauma injuries. Recent military conflicts have demonstrated the substantial risk of spinal injury for the modern warfighter. Finite element models used to investigate injury mechanisms often fail to examine the effects of variation in geometry or material properties on mechanical behavior. The goals of this study were to model geometric variation for a set of cervical spines, to extend this model to a parametric finite element model, and, as a first step, to validate the parametric model against experimental data for low-loading conditions. Individual finite element models were created using cervical spine (C3–T1) computed tomography data for five male cadavers. Statistical shape modeling (SSM) was used to generate a parametric finite element model incorporating variability of spine geometry, and soft-tissue material property variation was also included. The probabilistic loading response of the parametric model was determined under flexion-extension, axial rotation, and lateral bending and validated by comparison to experimental data. Based on qualitative and quantitative comparison of the experimental loading response and model simulations, we suggest that the model performs adequately under relatively low-level loading conditions in multiple loading directions. In conclusion, SSM methods coupled with finite element analyses within a probabilistic framework, along with the ability to statistically validate the overall model performance, provide innovative and important steps toward describing the differences in vertebral morphology, spinal curvature, and variation in material properties. We suggest that these methods, with additional investigation and validation under injurious loading conditions, will lead to understanding and mitigating the risks of injury in the spine and other musculoskeletal structures. PMID:25506051
Statistical Validation of Image Segmentation Quality Based on a Spatial Overlap Index1
Zou, Kelly H.; Warfield, Simon K.; Bharatha, Aditya; Tempany, Clare M.C.; Kaus, Michael R.; Haker, Steven J.; Wells, William M.; Jolesz, Ferenc A.; Kikinis, Ron
2005-01-01
Rationale and Objectives To examine a statistical validation method based on the spatial overlap between two sets of segmentations of the same anatomy. Materials and Methods The Dice similarity coefficient (DSC) was used as a statistical validation metric to evaluate the performance of both the reproducibility of manual segmentations and the spatial overlap accuracy of automated probabilistic fractional segmentation of MR images, illustrated on two clinical examples. Example 1: 10 consecutive cases of prostate brachytherapy patients underwent both preoperative 1.5T and intraoperative 0.5T MR imaging. For each case, 5 repeated manual segmentations of the prostate peripheral zone were performed separately on preoperative and on intraoperative images. Example 2: A semi-automated probabilistic fractional segmentation algorithm was applied to MR imaging of 9 cases with 3 types of brain tumors. DSC values were computed and logit-transformed values were compared in the mean with the analysis of variance (ANOVA). Results Example 1: The mean DSCs of 0.883 (range, 0.876–0.893) with 1.5T preoperative MRI and 0.838 (range, 0.819–0.852) with 0.5T intraoperative MRI (P < .001) were within and at the margin of the range of good reproducibility, respectively. Example 2: Wide ranges of DSC were observed in brain tumor segmentations: Meningiomas (0.519–0.893), astrocytomas (0.487–0.972), and other mixed gliomas (0.490–0.899). Conclusion The DSC value is a simple and useful summary measure of spatial overlap, which can be applied to studies of reproducibility and accuracy in image segmentation. We observed generally satisfactory but variable validation results in two clinical applications. This metric may be adapted for similar validation tasks. PMID:14974593
Muller, David C; Johansson, Mattias; Brennan, Paul
2017-03-10
Purpose Several lung cancer risk prediction models have been developed, but none to date have assessed the predictive ability of lung function in a population-based cohort. We sought to develop and internally validate a model incorporating lung function using data from the UK Biobank prospective cohort study. Methods This analysis included 502,321 participants without a previous diagnosis of lung cancer, predominantly between 40 and 70 years of age. We used flexible parametric survival models to estimate the 2-year probability of lung cancer, accounting for the competing risk of death. Models included predictors previously shown to be associated with lung cancer risk, including sex, variables related to smoking history and nicotine addiction, medical history, family history of lung cancer, and lung function (forced expiratory volume in 1 second [FEV1]). Results During accumulated follow-up of 1,469,518 person-years, there were 738 lung cancer diagnoses. A model incorporating all predictors had excellent discrimination (concordance (c)-statistic [95% CI] = 0.85 [0.82 to 0.87]). Internal validation suggested that the model will discriminate well when applied to new data (optimism-corrected c-statistic = 0.84). The full model, including FEV1, also had modestly superior discriminatory power than one that was designed solely on the basis of questionnaire variables (c-statistic = 0.84 [0.82 to 0.86]; optimism-corrected c-statistic = 0.83; p FEV1 = 3.4 × 10 -13 ). The full model had better discrimination than standard lung cancer screening eligibility criteria (c-statistic = 0.66 [0.64 to 0.69]). Conclusion A risk prediction model that includes lung function has strong predictive ability, which could improve eligibility criteria for lung cancer screening programs.
2014-01-01
Background Thresholds for statistical significance are insufficiently demonstrated by 95% confidence intervals or P-values when assessing results from randomised clinical trials. First, a P-value only shows the probability of getting a result assuming that the null hypothesis is true and does not reflect the probability of getting a result assuming an alternative hypothesis to the null hypothesis is true. Second, a confidence interval or a P-value showing significance may be caused by multiplicity. Third, statistical significance does not necessarily result in clinical significance. Therefore, assessment of intervention effects in randomised clinical trials deserves more rigour in order to become more valid. Methods Several methodologies for assessing the statistical and clinical significance of intervention effects in randomised clinical trials were considered. Balancing simplicity and comprehensiveness, a simple five-step procedure was developed. Results For a more valid assessment of results from a randomised clinical trial we propose the following five-steps: (1) report the confidence intervals and the exact P-values; (2) report Bayes factor for the primary outcome, being the ratio of the probability that a given trial result is compatible with a ‘null’ effect (corresponding to the P-value) divided by the probability that the trial result is compatible with the intervention effect hypothesised in the sample size calculation; (3) adjust the confidence intervals and the statistical significance threshold if the trial is stopped early or if interim analyses have been conducted; (4) adjust the confidence intervals and the P-values for multiplicity due to number of outcome comparisons; and (5) assess clinical significance of the trial results. Conclusions If the proposed five-step procedure is followed, this may increase the validity of assessments of intervention effects in randomised clinical trials. PMID:24588900
Translation and Validation of the Knee Society Score - KSS for Brazilian Portuguese
Silva, Adriana Lucia Pastore e; Demange, Marco Kawamura; Gobbi, Riccardo Gomes; da Silva, Tânia Fernanda Cardoso; Pécora, José Ricardo; Croci, Alberto Tesconi
2012-01-01
Objective To translate, culturally adapt and validate the "Knee Society Score"(KSS) for the Portuguese language and determine its measurement properties, reproducibility and validity. Methods We analyzed 70 patients of both sexes, aged between 55 and 85 years, in a cross-sectional clinical trial, with diagnosis of primary osteoarthritis ,undergoing total knee arthroplasty surgery. We assessed the patients with the English version of the KSS questionnaire and after 30 minutes with the Portuguese version of the KSS questionnaire, done by a different evaluator. All the patients were assessed preoperatively, and again at three, and six months postoperatively. Results There was no statistical difference, using Cronbach's alpha index and the Bland-Altman graphical analysis, for the knees core during the preoperative period (p =1), and at three months (p =0.991) and six months postoperatively (p =0.985). There was no statistical difference for knee function score for all three periods (p =1.0). Conclusion The Brazilian version of the Knee Society Score is easy to apply, as well providing as a valid and reliable instrument for measuring the knee score and function of Brazilian patients undergoing TKA. Level of Evidence: Level I - Diagnostic Studies- Investigating a Diagnostic Test- Testing of previously developed diagnostic criteria on consecutive patients (with universally applied 'gold' reference standard). PMID:24453576
Luo, Wen; Medrek, Sarah; Misra, Jatin; Nohynek, Gerhard J
2007-02-01
The objective of this study was to construct and validate a quantitative structure-activity relationship model for skin absorption. Such models are valuable tools for screening and prioritization in safety and efficacy evaluation, and risk assessment of drugs and chemicals. A database of 340 chemicals with percutaneous absorption was assembled. Two models were derived from the training set consisting 306 chemicals (90/10 random split). In addition to the experimental K(ow) values, over 300 2D and 3D atomic and molecular descriptors were analyzed using MDL's QsarIS computer program. Subsequently, the models were validated using both internal (leave-one-out) and external validation (test set) procedures. Using the stepwise regression analysis, three molecular descriptors were determined to have significant statistical correlation with K(p) (R2 = 0.8225): logK(ow), X0 (quantification of both molecular size and the degree of skeletal branching), and SsssCH (count of aromatic carbon groups). In conclusion, two models to estimate skin absorption were developed. When compared to other skin absorption QSAR models in the literature, our model incorporated more chemicals and explored a large number of descriptors. Additionally, our models are reasonably predictive and have met both internal and external statistical validations.
Zaki, Rafdzah; Bulgiba, Awang; Ismail, Roshidi; Ismail, Noor Azina
2012-01-01
Accurate values are a must in medicine. An important parameter in determining the quality of a medical instrument is agreement with a gold standard. Various statistical methods have been used to test for agreement. Some of these methods have been shown to be inappropriate. This can result in misleading conclusions about the validity of an instrument. The Bland-Altman method is the most popular method judging by the many citations of the article proposing this method. However, the number of citations does not necessarily mean that this method has been applied in agreement research. No previous study has been conducted to look into this. This is the first systematic review to identify statistical methods used to test for agreement of medical instruments. The proportion of various statistical methods found in this review will also reflect the proportion of medical instruments that have been validated using those particular methods in current clinical practice. Five electronic databases were searched between 2007 and 2009 to look for agreement studies. A total of 3,260 titles were initially identified. Only 412 titles were potentially related, and finally 210 fitted the inclusion criteria. The Bland-Altman method is the most popular method with 178 (85%) studies having used this method, followed by the correlation coefficient (27%) and means comparison (18%). Some of the inappropriate methods highlighted by Altman and Bland since the 1980s are still in use. This study finds that the Bland-Altman method is the most popular method used in agreement research. There are still inappropriate applications of statistical methods in some studies. It is important for a clinician or medical researcher to be aware of this issue because misleading conclusions from inappropriate analyses will jeopardize the quality of the evidence, which in turn will influence quality of care given to patients in the future.
NASA Astrophysics Data System (ADS)
Bakker, Arthur; Ben-Zvi, Dani; Makar, Katie
2017-12-01
To understand how statistical and other types of reasoning are coordinated with actions to reduce uncertainty, we conducted a case study in vocational education that involved statistical hypothesis testing. We analyzed an intern's research project in a hospital laboratory in which reducing uncertainties was crucial to make a valid statistical inference. In his project, the intern, Sam, investigated whether patients' blood could be sent through pneumatic post without influencing the measurement of particular blood components. We asked, in the process of making a statistical inference, how are reasons and actions coordinated to reduce uncertainty? For the analysis, we used the semantic theory of inferentialism, specifically, the concept of webs of reasons and actions—complexes of interconnected reasons for facts and actions; these reasons include premises and conclusions, inferential relations, implications, motives for action, and utility of tools for specific purposes in a particular context. Analysis of interviews with Sam, his supervisor and teacher as well as video data of Sam in the classroom showed that many of Sam's actions aimed to reduce variability, rule out errors, and thus reduce uncertainties so as to arrive at a valid inference. Interestingly, the decisive factor was not the outcome of a t test but of the reference change value, a clinical chemical measure of analytic and biological variability. With insights from this case study, we expect that students can be better supported in connecting statistics with context and in dealing with uncertainty.
Miao, Hui; Hartman, Mikael; Bhoo-Pathy, Nirmala; Lee, Soo-Chin; Taib, Nur Aishah; Tan, Ern-Yu; Chan, Patrick; Moons, Karel G. M.; Wong, Hoong-Seam; Goh, Jeremy; Rahim, Siti Mastura; Yip, Cheng-Har; Verkooijen, Helena M.
2014-01-01
Background In Asia, up to 25% of breast cancer patients present with distant metastases at diagnosis. Given the heterogeneous survival probabilities of de novo metastatic breast cancer, individual outcome prediction is challenging. The aim of the study is to identify existing prognostic models for patients with de novo metastatic breast cancer and validate them in Asia. Materials and Methods We performed a systematic review to identify prediction models for metastatic breast cancer. Models were validated in 642 women with de novo metastatic breast cancer registered between 2000 and 2010 in the Singapore Malaysia Hospital Based Breast Cancer Registry. Survival curves for low, intermediate and high-risk groups according to each prognostic score were compared by log-rank test and discrimination of the models was assessed by concordance statistic (C-statistic). Results We identified 16 prediction models, seven of which were for patients with brain metastases only. Performance status, estrogen receptor status, metastatic site(s) and disease-free interval were the most common predictors. We were able to validate nine prediction models. The capacity of the models to discriminate between poor and good survivors varied from poor to fair with C-statistics ranging from 0.50 (95% CI, 0.48–0.53) to 0.63 (95% CI, 0.60–0.66). Conclusion The discriminatory performance of existing prediction models for de novo metastatic breast cancer in Asia is modest. Development of an Asian-specific prediction model is needed to improve prognostication and guide decision making. PMID:24695692
Statistical Validation for Clinical Measures: Repeatability and Agreement of Kinect™-Based Software
Tello, Emanuel; Rodrigo, Alejandro; Valentinuzzi, Max E.
2018-01-01
Background The rehabilitation process is a fundamental stage for recovery of people's capabilities. However, the evaluation of the process is performed by physiatrists and medical doctors, mostly based on their observations, that is, a subjective appreciation of the patient's evolution. This paper proposes a tracking platform of the movement made by an individual's upper limb using Kinect sensor(s) to be applied for the patient during the rehabilitation process. The main contribution is the development of quantifying software and the statistical validation of its performance, repeatability, and clinical use in the rehabilitation process. Methods The software determines joint angles and upper limb trajectories for the construction of a specific rehabilitation protocol and quantifies the treatment evolution. In turn, the information is presented via a graphical interface that allows the recording, storage, and report of the patient's data. For clinical purposes, the software information is statistically validated with three different methodologies, comparing the measures with a goniometer in terms of agreement and repeatability. Results The agreement of joint angles measured with the proposed software and goniometer is evaluated with Bland-Altman plots; all measurements fell well within the limits of agreement, meaning interchangeability of both techniques. Additionally, the results of Bland-Altman analysis of repeatability show 95% confidence. Finally, the physiotherapists' qualitative assessment shows encouraging results for the clinical use. Conclusion The main conclusion is that the software is capable of offering a clinical history of the patient and is useful for quantification of the rehabilitation success. The simplicity, low cost, and visualization possibilities enhance the use of the software Kinect for rehabilitation and other applications, and the expert's opinion endorses the choice of our approach for clinical practice. Comparison of the new measurement technique with established goniometric methods determines that the proposed software agrees sufficiently to be used interchangeably. PMID:29750166
Bryant, Fred B
2016-12-01
This paper introduces a special section of the current issue of the Journal of Evaluation in Clinical Practice that includes a set of 6 empirical articles showcasing a versatile, new machine-learning statistical method, known as optimal data (or discriminant) analysis (ODA), specifically designed to produce statistical models that maximize predictive accuracy. As this set of papers clearly illustrates, ODA offers numerous important advantages over traditional statistical methods-advantages that enhance the validity and reproducibility of statistical conclusions in empirical research. This issue of the journal also includes a review of a recently published book that provides a comprehensive introduction to the logic, theory, and application of ODA in empirical research. It is argued that researchers have much to gain by using ODA to analyze their data. © 2016 John Wiley & Sons, Ltd.
Selecting the most appropriate inferential statistical test for your quantitative research study.
Bettany-Saltikov, Josette; Whittaker, Victoria Jane
2014-06-01
To discuss the issues and processes relating to the selection of the most appropriate statistical test. A review of the basic research concepts together with a number of clinical scenarios is used to illustrate this. Quantitative nursing research generally features the use of empirical data which necessitates the selection of both descriptive and statistical tests. Different types of research questions can be answered by different types of research designs, which in turn need to be matched to a specific statistical test(s). Discursive paper. This paper discusses the issues relating to the selection of the most appropriate statistical test and makes some recommendations as to how these might be dealt with. When conducting empirical quantitative studies, a number of key issues need to be considered. Considerations for selecting the most appropriate statistical tests are discussed and flow charts provided to facilitate this process. When nursing clinicians and researchers conduct quantitative research studies, it is crucial that the most appropriate statistical test is selected to enable valid conclusions to be made. © 2013 John Wiley & Sons Ltd.
Incorrect likelihood methods were used to infer scaling laws of marine predator search behaviour.
Edwards, Andrew M; Freeman, Mervyn P; Breed, Greg A; Jonsen, Ian D
2012-01-01
Ecologists are collecting extensive data concerning movements of animals in marine ecosystems. Such data need to be analysed with valid statistical methods to yield meaningful conclusions. We demonstrate methodological issues in two recent studies that reached similar conclusions concerning movements of marine animals (Nature 451:1098; Science 332:1551). The first study analysed vertical movement data to conclude that diverse marine predators (Atlantic cod, basking sharks, bigeye tuna, leatherback turtles and Magellanic penguins) exhibited "Lévy-walk-like behaviour", close to a hypothesised optimal foraging strategy. By reproducing the original results for the bigeye tuna data, we show that the likelihood of tested models was calculated from residuals of regression fits (an incorrect method), rather than from the likelihood equations of the actual probability distributions being tested. This resulted in erroneous Akaike Information Criteria, and the testing of models that do not correspond to valid probability distributions. We demonstrate how this led to overwhelming support for a model that has no biological justification and that is statistically spurious because its probability density function goes negative. Re-analysis of the bigeye tuna data, using standard likelihood methods, overturns the original result and conclusion for that data set. The second study observed Lévy walk movement patterns by mussels. We demonstrate several issues concerning the likelihood calculations (including the aforementioned residuals issue). Re-analysis of the data rejects the original Lévy walk conclusion. We consequently question the claimed existence of scaling laws of the search behaviour of marine predators and mussels, since such conclusions were reached using incorrect methods. We discourage the suggested potential use of "Lévy-like walks" when modelling consequences of fishing and climate change, and caution that any resulting advice to managers of marine ecosystems would be problematic. For reproducibility and future work we provide R source code for all calculations.
Sakunpak, Apirak; Suksaeree, Jirapornchai; Monton, Chaowalit; Pathompak, Pathamaporn; Kraisintu, Krisana
2014-01-01
Objective To develop and validate an image analysis method for quantitative analysis of γ-oryzanol in cold pressed rice bran oil. Methods TLC-densitometric and TLC-image analysis methods were developed, validated, and used for quantitative analysis of γ-oryzanol in cold pressed rice bran oil. The results obtained by these two different quantification methods were compared by paired t-test. Results Both assays provided good linearity, accuracy, reproducibility and selectivity for determination of γ-oryzanol. Conclusions The TLC-densitometric and TLC-image analysis methods provided a similar reproducibility, accuracy and selectivity for the quantitative determination of γ-oryzanol in cold pressed rice bran oil. A statistical comparison of the quantitative determinations of γ-oryzanol in samples did not show any statistically significant difference between TLC-densitometric and TLC-image analysis methods. As both methods were found to be equal, they therefore can be used for the determination of γ-oryzanol in cold pressed rice bran oil. PMID:25182282
Challenges of Big Data Analysis.
Fan, Jianqing; Han, Fang; Liu, Han
2014-06-01
Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.
Challenges of Big Data Analysis
Fan, Jianqing; Han, Fang; Liu, Han
2014-01-01
Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions. PMID:25419469
Spouge, J L
1992-01-01
Reports on retroviral primate trials rarely publish any statistical analysis. Present statistical methodology lacks appropriate tests for these trials and effectively discourages quantitative assessment. This paper describes the theory behind VACMAN, a user-friendly computer program that calculates statistics for in vitro and in vivo infectivity data. VACMAN's analysis applies to many retroviral trials using i.v. challenges and is valid whenever the viral dose-response curve has a particular shape. Statistics from actual i.v. retroviral trials illustrate some unappreciated principles of effective animal use: dilutions other than 1:10 can improve titration accuracy; infecting titration animals at the lowest doses possible can lower challenge doses; and finally, challenging test animals in small trials with more virus than controls safeguards against false successes, "reuses" animals, and strengthens experimental conclusions. The theory presented also explains the important concept of viral saturation, a phenomenon that may cause in vitro and in vivo titrations to agree for some retroviral strains and disagree for others. PMID:1323844
Oliveira, Lanuza Borges; Soares, Fernanda Amaral; Silveira, Marise Fagundes; de Pinho, Lucinéia; Caldeira, Antônio Prates; Leite, Maísa Tavares de Souza
2016-01-01
ABSTRACT Objective: to develop and validate an instrument to evaluate the knowledge of health professionals about domestic violence on children. Method: this was a study conducted with 194 physicians, nurses and dentists. A literature review was performed for preparation of the items and identification of the dimensions. Apparent and content validation was performed using analysis of three experts and 27 professors of the pediatric health discipline. For construct validation, Cronbach's alpha was used, and the Kappa test was applied to verify reproducibility. The criterion validation was conducted using the Student's t-test. Results: the final instrument included 56 items; the Cronbach alpha was 0.734, the Kappa test showed a correlation greater than 0.6 for most items, and the Student t-test showed a statistically significant value to the level of 5% for the two selected variables: years of education and using the Family Health Strategy. Conclusion: the instrument is valid and can be used as a promising tool to develop or direct actions in public health and evaluate knowledge about domestic violence on children. PMID:27556878
Roumelioti, Maria; Leotsinidis, Michalis
2009-01-01
Background The use of food frequency questionnaires (FFQs) has become increasingly important in epidemiologic studies. During the past few decades, a wide variety of nutritional studies have used the semiquantitative FFQ as a tool for assessing and evaluating dietary intake. One of the main concerns in a dietary analysis is the validity of the collected dietary data. Methods This paper discusses several methodological and statistical issues related to the validation of a semiquantitative FFQ. This questionnaire was used to assess the nutritional habits of schoolchildren in western Greece. For validation purposes, we selected 200 schoolchildren and contacted their respective parents. We evaluated the relative validity of 400 FFQs (200 children's FFQs and 200 parents' FFQs). Results The correlations between the children's and the parents' questionnaire responses showed that the questionnaire we designed was appropriate for fulfilling the purposes of our study and in ranking subjects according to food group intake. Conclusion Our study shows that the semiquantitative FFQ provides a reasonably reliable measure of dietary intake and corroborates the relative validity of our questionnaire. PMID:19196469
Vieira, Gisele de Lacerda Chaves; Pagano, Adriana Silvino; Reis, Ilka Afonso; Rodrigues, Júlia Santos Nunes; Torres, Heloísa de Carvalho
2018-01-01
ABSTRACT Objective: to perform the translation, adaptation and validation of the Diabetes Attitudes Scale - third version instrument into Brazilian Portuguese. Methods: methodological study carried out in six stages: initial translation, synthesis of the initial translation, back-translation, evaluation of the translated version by the Committee of Judges (27 Linguists and 29 health professionals), pre-test and validation. The pre-test and validation (test-retest) steps included 22 and 120 health professionals, respectively. The Content Validity Index, the analyses of internal consistency and reproducibility were performed using the R statistical program. Results: in the content validation, the instrument presented good acceptance among the Judges with a mean Content Validity Index of 0.94. The scale presented acceptable internal consistency (Cronbach’s alpha = 0.60), while the correlation of the total score at the test and retest moments was considered high (Polychoric Correlation Coefficient = 0.86). The Intra-class Correlation Coefficient, for the total score, presented a value of 0.65. Conclusion: the Brazilian version of the instrument (Escala de Atitudes dos Profissionais em relação ao Diabetes Mellitus) was considered valid and reliable for application by health professionals in Brazil. PMID:29319739
The French version of the Juvenile Arthritis Multidimensional Assessment Report (JAMAR).
Quartier, Pierre; Hofer, Michael; Wouters, Carine; Truong, Thi Thanh Thao; Duong, Ngoc-Phoi; Agbo-Kpati, Kokou-Placide; Uettwiller, Florence; Melki, Isabelle; Mouy, Richard; Bader-Meunier, Brigitte; Consolaro, Alessandro; Bovis, Francesca; Ruperto, Nicolino
2018-04-01
The Juvenile Arthritis Multidimensional Assessment Report (JAMAR) is a new parent/patient reported outcome measure that enables a thorough assessment of the disease status in children with juvenile idiopathic arthritis (JIA). We report the results of the cross-cultural adaptation and validation of the parent and patient versions of the JAMAR in the French language. The reading comprehension of the questionnaire was tested in 10 JIA parents and patients. Each participating centre was asked to collect demographic, clinical data and the JAMAR in 100 consecutive JIA patients or all consecutive patients seen in a 6-month period and to administer the JAMAR to 100 healthy children and their parents. The statistical validation phase explored descriptive statistics and the psychometric issues of the JAMAR: the three Likert assumptions, floor/ceiling effects, internal consistency, Cronbach's alpha, interscale correlations and construct validity (convergent and discriminant validity). A total of 100 JIA patients (23% systemic, 45% oligoarticular, 20% RF negative polyarthritis, 12% other categories) and 122 healthy children, were enrolled at the paediatric rheumatology centre of the Necker Children's Hospital in Paris. Notably, none of the enrolled JIA patients is affected with psoriatic arthritis. The JAMAR components discriminated well healthy subjects from JIA patients. All JAMAR components revealed good psychometric performances. In conclusion, the French version of the JAMAR is a valid tool for the assessment of children with JIA and is suitable for use both in routine clinical practice and clinical research.
The Italian version of the Juvenile Arthritis Multidimensional Assessment Report (JAMAR).
Consolaro, Alessandro; Bovis, Francesca; Pistorio, Angela; Cimaz, Rolando; De Benedetti, Fabrizio; Miniaci, Angela; Corona, Fabrizia; Gerloni, Valeria; Martino, Silvana; Pastore, Serena; Barone, Patrizia; Pieropan, Sara; Cortis, Elisabetta; Podda, Rosa Anna; Gallizzi, Romina; Civino, Adele; Torre, Francesco La; Rigante, Donato; Consolini, Rita; Maggio, Maria Cristina; Magni-Manzoni, Silvia; Perfetti, Francesca; Filocamo, Giovanni; Toppino, Claudia; Licciardi, Francesco; Garrone, Marco; Scala, Silvia; Patrone, Elisa; Tonelli, Monica; Tani, Daniela; Ravelli, Angelo; Martini, Alberto; Ruperto, Nicolino
2018-04-01
The Juvenile Arthritis Multidimensional Assessment Report (JAMAR) is a new parent/patient reported outcome measure that enables a thorough assessment of the disease status in children with juvenile idiopathic arthritis (JIA). We report the results of the cross-cultural adaptation and validation of the parent and patient versions of the JAMAR in the Italian language.The reading comprehension of the questionnaire was tested in 10 JIA parents and patients. Each participating centre was asked to collect demographic, clinical data and the JAMAR in 100 consecutive JIA patients or all consecutive patients seen in a 6-month period and to administer the JAMAR to 100 healthy children and their parents.The statistical validation phase explored descriptive statistics and the psychometric issues of the JAMAR: the 3 Likert assumptions, floor/ceiling effects, internal consistency, Cronbach's alpha, interscale correlations, test-retest reliability, and construct validity (convergent and discriminant validity).A total of 1296 JIA patients (7.2% systemic, 59.5% oligoarticular, 21.4% RF negative polyarthritis, 11.9% other categories) and 100 healthy children, were enrolled in 18 centres. The JAMAR components discriminated well healthy subjects from JIA patients except for the Health Related Quality of Life (HRQoL) Psychosocial Health (PsH) subscales. All JAMAR components revealed good psychometric performances.In conclusion, the Italian version of the JAMAR is a valid tool for the assessment of children with JIA and is suitable for use both in routine clinical practice and clinical research.
The Paraguayan Spanish version of the Juvenile Arthritis Multidimensional Assessment Report (JAMAR).
Morel Ayala, Zoilo; Burgos-Vargas, Ruben; Consolaro, Alessandro; Bovis, Francesca; Ruperto, Nicolino
2018-04-01
The Juvenile Arthritis Multidimensional Assessment Report (JAMAR) is a new parent/patient reported outcome measure that enables a thorough assessment of the disease status in children with juvenile idiopathic arthritis (JIA). We report the results of the cross-cultural adaptation and validation of the parent and patient versions of the JAMAR in the Paraguayan Spanish language. The reading comprehension of the questionnaire was tested in 10 JIA parents and patients. Each participating centre was asked to collect demographic, clinical data and the JAMAR in 100 consecutive JIA patients or all consecutive patients seen in a 6-month period and to administer the JAMAR to 100 healthy children and their parents. The statistical validation phase explored descriptive statistics and the psychometric issues of the JAMAR: the 3 Likert assumptions, floor/ceiling effects, internal consistency, Cronbach's alpha, interscale correlations, and construct validity (convergent and discriminant validity). A total of 51 JIA patients (2% systemic, 27.4% oligoarticular, 37.2% RF negative polyarthritis, 33.4% other categories) and 100 healthy children, were enrolled. The JAMAR components discriminated well healthy subjects from JIA patients. Notably, there was no significant difference between healthy subjects and their affected peers in the school-related problem variable. All JAMAR components revealed good psychometric performances. In conclusion, the Paraguayan Spanish version of the JAMAR is a valid tool for the assessment of children with JIA and is suitable for use both in routine clinical practice and clinical research.
Hanskamp-Sebregts, Mirelle; Zegers, Marieke; Vincent, Charles; van Gurp, Petra J; de Vet, Henrica C W; Wollersheim, Hub
2016-01-01
Objectives Record review is the most used method to quantify patient safety. We systematically reviewed the reliability and validity of adverse event detection with record review. Design A systematic review of the literature. Methods We searched PubMed, EMBASE, CINAHL, PsycINFO and the Cochrane Library and from their inception through February 2015. We included all studies that aimed to describe the reliability and/or validity of record review. Two reviewers conducted data extraction. We pooled κ values (κ) and analysed the differences in subgroups according to number of reviewers, reviewer experience and training level, adjusted for the prevalence of adverse events. Results In 25 studies, the psychometric data of the Global Trigger Tool (GTT) and the Harvard Medical Practice Study (HMPS) were reported and 24 studies were included for statistical pooling. The inter-rater reliability of the GTT and HMPS showed a pooled κ of 0.65 and 0.55, respectively. The inter-rater agreement was statistically significantly higher when the group of reviewers within a study consisted of a maximum five reviewers. We found no studies reporting on the validity of the GTT and HMPS. Conclusions The reliability of record review is moderate to substantial and improved when a small group of reviewers carried out record review. The validity of the record review method has never been evaluated, while clinical data registries, autopsy or direct observations of patient care are potential reference methods that can be used to test concurrent validity. PMID:27550650
Papadopoulou, Soultana L; Exarchakos, Georgios; Christodoulou, Dimitrios; Theodorou, Stavroula; Beris, Alexandre; Ploumis, Avraam
2017-01-01
Introduction The Ohkuma questionnaire is a validated screening tool originally used to detect dysphagia among patients hospitalized in Japanese nursing facilities. Objective The purpose of this study is to evaluate the reliability and validity of the adapted Greek version of the Ohkuma questionnaire. Methods Following the steps for cross-cultural adaptation, we delivered the validated Ohkuma questionnaire to 70 patients (53 men, 17 women) who were either suffering from dysphagia or not. All of them completed the questionnaire a second time within a month. For all of them, we performed a bedside and VFSS study of dysphagia and asked participants to undergo a second VFSS screening, with the exception of nine individuals. Statistical analysis included measurement of internal consistency with Cronbach's α coefficient, reliability with Cohen's Kappa, Pearson's correlation coefficient and construct validity with categorical components, and One-Way Anova test. Results According to Cronbach's α coefficient (0.976) for total score, there was high internal consistency for the Ohkuma Dysphagia questionnaire. Test-retest reliability (Cohen's Kappa) ranged from 0.586 to 1.00, exhibiting acceptable stability. We also estimated the Pearson's correlation coefficient for the test-retest total score, which reached high levels (0.952; p = 0.000). The One-Way Anova test in the two measurement times showed statistically significant correlation in both measurements ( p = 0.02 and p = 0.016). Conclusion The adapted Greek version of the questionnaire is valid and reliable and can be used for the screening of dysphagia in the Greek-speaking patients.
Lindberg, Ann-Sofie; Oksa, Juha; Antti, Henrik; Malm, Christer
2015-01-01
Physical capacity has previously been deemed important for firefighters physical work capacity, and aerobic fitness, muscular strength, and muscular endurance are the most frequently investigated parameters of importance. Traditionally, bivariate and multivariate linear regression statistics have been used to study relationships between physical capacities and work capacities among firefighters. An alternative way to handle datasets consisting of numerous correlated variables is to use multivariate projection analyses, such as Orthogonal Projection to Latent Structures. The first aim of the present study was to evaluate the prediction and predictive power of field and laboratory tests, respectively, on firefighters' physical work capacity on selected work tasks. Also, to study if valid predictions could be achieved without anthropometric data. The second aim was to externally validate selected models. The third aim was to validate selected models on firefighters' and on civilians'. A total of 38 (26 men and 12 women) + 90 (38 men and 52 women) subjects were included in the models and the external validation, respectively. The best prediction (R2) and predictive power (Q2) of Stairs, Pulling, Demolition, Terrain, and Rescue work capacities included field tests (R2 = 0.73 to 0.84, Q2 = 0.68 to 0.82). The best external validation was for Stairs work capacity (R2 = 0.80) and worst for Demolition work capacity (R2 = 0.40). In conclusion, field and laboratory tests could equally well predict physical work capacities for firefighting work tasks, and models excluding anthropometric data were valid. The predictive power was satisfactory for all included work tasks except Demolition.
Choudhry, Shahid A.; Li, Jing; Davis, Darcy; Erdmann, Cole; Sikka, Rishi; Sutariya, Bharat
2013-01-01
Introduction: Preventing the occurrence of hospital readmissions is needed to improve quality of care and foster population health across the care continuum. Hospitals are being held accountable for improving transitions of care to avert unnecessary readmissions. Advocate Health Care in Chicago and Cerner (ACC) collaborated to develop all-cause, 30-day hospital readmission risk prediction models to identify patients that need interventional resources. Ideally, prediction models should encompass several qualities: they should have high predictive ability; use reliable and clinically relevant data; use vigorous performance metrics to assess the models; be validated in populations where they are applied; and be scalable in heterogeneous populations. However, a systematic review of prediction models for hospital readmission risk determined that most performed poorly (average C-statistic of 0.66) and efforts to improve their performance are needed for widespread usage. Methods: The ACC team incorporated electronic health record data, utilized a mixed-method approach to evaluate risk factors, and externally validated their prediction models for generalizability. Inclusion and exclusion criteria were applied on the patient cohort and then split for derivation and internal validation. Stepwise logistic regression was performed to develop two predictive models: one for admission and one for discharge. The prediction models were assessed for discrimination ability, calibration, overall performance, and then externally validated. Results: The ACC Admission and Discharge Models demonstrated modest discrimination ability during derivation, internal and external validation post-recalibration (C-statistic of 0.76 and 0.78, respectively), and reasonable model fit during external validation for utility in heterogeneous populations. Conclusions: The ACC Admission and Discharge Models embody the design qualities of ideal prediction models. The ACC plans to continue its partnership to further improve and develop valuable clinical models. PMID:24224068
Singal, Amit G.; Mukherjee, Ashin; Elmunzer, B. Joseph; Higgins, Peter DR; Lok, Anna S.; Zhu, Ji; Marrero, Jorge A; Waljee, Akbar K
2015-01-01
Background Predictive models for hepatocellular carcinoma (HCC) have been limited by modest accuracy and lack of validation. Machine learning algorithms offer a novel methodology, which may improve HCC risk prognostication among patients with cirrhosis. Our study's aim was to develop and compare predictive models for HCC development among cirrhotic patients, using conventional regression analysis and machine learning algorithms. Methods We enrolled 442 patients with Child A or B cirrhosis at the University of Michigan between January 2004 and September 2006 (UM cohort) and prospectively followed them until HCC development, liver transplantation, death, or study termination. Regression analysis and machine learning algorithms were used to construct predictive models for HCC development, which were tested on an independent validation cohort from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial. Both models were also compared to the previously published HALT-C model. Discrimination was assessed using receiver operating characteristic curve analysis and diagnostic accuracy was assessed with net reclassification improvement and integrated discrimination improvement statistics. Results After a median follow-up of 3.5 years, 41 patients developed HCC. The UM regression model had a c-statistic of 0.61 (95%CI 0.56-0.67), whereas the machine learning algorithm had a c-statistic of 0.64 (95%CI 0.60–0.69) in the validation cohort. The machine learning algorithm had significantly better diagnostic accuracy as assessed by net reclassification improvement (p<0.001) and integrated discrimination improvement (p=0.04). The HALT-C model had a c-statistic of 0.60 (95%CI 0.50-0.70) in the validation cohort and was outperformed by the machine learning algorithm (p=0.047). Conclusion Machine learning algorithms improve the accuracy of risk stratifying patients with cirrhosis and can be used to accurately identify patients at high-risk for developing HCC. PMID:24169273
Experimental design, power and sample size for animal reproduction experiments.
Chapman, Phillip L; Seidel, George E
2008-01-01
The present paper concerns statistical issues in the design of animal reproduction experiments, with emphasis on the problems of sample size determination and power calculations. We include examples and non-technical discussions aimed at helping researchers avoid serious errors that may invalidate or seriously impair the validity of conclusions from experiments. Screen shots from interactive power calculation programs and basic SAS power calculation programs are presented to aid in understanding statistical power and computing power in some common experimental situations. Practical issues that are common to most statistical design problems are briefly discussed. These include one-sided hypothesis tests, power level criteria, equality of within-group variances, transformations of response variables to achieve variance equality, optimal specification of treatment group sizes, 'post hoc' power analysis and arguments for the increased use of confidence intervals in place of hypothesis tests.
Rostami, Reza; Sadeghi, Vahid; Zarei, Jamileh; Haddadi, Parvaneh; Mohazzab-Torabi, Saman; Salamati, Payman
2013-01-01
Objective The aim of this study was to compare the Persian version of the wechsler intelligence scale for children - fourth edition (WISC-IV) and cognitive assessment system (CAS) tests, to determine the correlation between their scales and to evaluate the probable concurrent validity of these tests in patients with learning disorders. Methods One-hundered-sixty-two children with learning disorder who were presented at Atieh Comprehensive Psychiatry Center were selected in a consecutive non-randomized order. All of the patients were assessed based on WISC-IV and CAS scores questionnaires. Pearson correlation coefficient was used to analyze the correlation between the data and to assess the concurrent validity of the two tests. Linear regression was used for statistical modeling. The type one error was considered 5% in maximum. Findings There was a strong correlation between total score of WISC-IV test and total score of CAS test in the patients (r=0.75, P<0.001). The correlations among the other scales were mostly high and all of them were statistically significant (P<0.001). A linear regression model was obtained (α = 0.51, β = 0.81 and P<0.001). Conclusion There is an acceptable correlation between the WISC-IV scales and CAS test in children with learning disorders. A concurrent validity is established between the two tests and their scales. PMID:23724180
Spatial distribution of the gamma-ray bursts at very high redshift
NASA Astrophysics Data System (ADS)
Mészáros, Attila
2018-05-01
The author - with his collaborators - already in years 1995-96 have shown - purely from the analyses of the observations - that the gamma-ray bursts (GRBs) can be till redshift 20. Since that time several other statistical studies of the spatial distribution of GRBs were provided. Remarkable conclusions concerning the star-formation rate and the validity of the cosmological principle were obtained about the regions of the cosmic dawn. In this contribution these efforts are surveyed.
Yu, Ping; Pan, Yuesong; Wang, Yongjun; Wang, Xianwei; Liu, Liping; Ji, Ruijun; Meng, Xia; Jing, Jing; Tong, Xu; Guo, Li; Wang, Yilong
2016-01-01
Background and Purpose A case-mix adjustment model has been developed and externally validated, demonstrating promise. However, the model has not been thoroughly tested among populations in China. In our study, we evaluated the performance of the model in Chinese patients with acute stroke. Methods The case-mix adjustment model A includes items on age, presence of atrial fibrillation on admission, National Institutes of Health Stroke Severity Scale (NIHSS) score on admission, and stroke type. Model B is similar to Model A but includes only the consciousness component of the NIHSS score. Both model A and B were evaluated to predict 30-day mortality rates in 13,948 patients with acute stroke from the China National Stroke Registry. The discrimination of the models was quantified by c-statistic. Calibration was assessed using Pearson’s correlation coefficient. Results The c-statistic of model A in our external validation cohort was 0.80 (95% confidence interval, 0.79–0.82), and the c-statistic of model B was 0.82 (95% confidence interval, 0.81–0.84). Excellent calibration was reported in the two models with Pearson’s correlation coefficient (0.892 for model A, p<0.001; 0.927 for model B, p = 0.008). Conclusions The case-mix adjustment model could be used to effectively predict 30-day mortality rates in Chinese patients with acute stroke. PMID:27846282
Gibson, Eli; Fenster, Aaron; Ward, Aaron D
2013-10-01
Novel imaging modalities are pushing the boundaries of what is possible in medical imaging, but their signal properties are not always well understood. The evaluation of these novel imaging modalities is critical to achieving their research and clinical potential. Image registration of novel modalities to accepted reference standard modalities is an important part of characterizing the modalities and elucidating the effect of underlying focal disease on the imaging signal. The strengths of the conclusions drawn from these analyses are limited by statistical power. Based on the observation that in this context, statistical power depends in part on uncertainty arising from registration error, we derive a power calculation formula relating registration error, number of subjects, and the minimum detectable difference between normal and pathologic regions on imaging, for an imaging validation study design that accommodates signal correlations within image regions. Monte Carlo simulations were used to evaluate the derived models and test the strength of their assumptions, showing that the model yielded predictions of the power, the number of subjects, and the minimum detectable difference of simulated experiments accurate to within a maximum error of 1% when the assumptions of the derivation were met, and characterizing sensitivities of the model to violations of the assumptions. The use of these formulae is illustrated through a calculation of the number of subjects required for a case study, modeled closely after a prostate cancer imaging validation study currently taking place at our institution. The power calculation formulae address three central questions in the design of imaging validation studies: (1) What is the maximum acceptable registration error? (2) How many subjects are needed? (3) What is the minimum detectable difference between normal and pathologic image regions? Copyright © 2013 Elsevier B.V. All rights reserved.
Kulesz, Paulina A.; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M.; Francis, David J.
2015-01-01
Objective Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Method Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product moment correlation was compared with four robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator Results All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Conclusions Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PMID:25495830
Chang, Wen-Dien; Chang, Wan-Yi; Lee, Chia-Lun; Feng, Chi-Yen
2013-10-01
[Purpose] Balance is an integral part of human ability. The smart balance master system (SBM) is a balance test instrument with good reliability and validity, but it is expensive. Therefore, we modified a Wii Fit balance board, which is a convenient balance assessment tool, and analyzed its reliability and validity. [Subjects and Methods] We recruited 20 healthy young adults and 20 elderly people, and administered 3 balance tests. The correlation coefficient and intraclass correlation of both instruments were analyzed. [Results] There were no statistically significant differences in the 3 tests between the Wii Fit balance board and the SBM. The Wii Fit balance board had a good intraclass correlation (0.86-0.99) for the elderly people and positive correlations (r = 0.58-0.86) with the SBM. [Conclusions] The Wii Fit balance board is a balance assessment tool with good reliability and high validity for elderly people, and we recommend it as an alternative tool for assessing balance ability.
2014-01-01
Background The UK Clinical Aptitude Test (UKCAT) was designed to address issues identified with traditional methods of selection. This study aims to examine the predictive validity of the UKCAT and compare this to traditional selection methods in the senior years of medical school. This was a follow-up study of two cohorts of students from two medical schools who had previously taken part in a study examining the predictive validity of the UKCAT in first year. Methods The sample consisted of 4th and 5th Year students who commenced their studies at the University of Aberdeen or University of Dundee medical schools in 2007. Data collected were: demographics (gender and age group), UKCAT scores; Universities and Colleges Admissions Service (UCAS) form scores; admission interview scores; Year 4 and 5 degree examination scores. Pearson’s correlations were used to examine the relationships between admissions variables, examination scores, gender and age group, and to select variables for multiple linear regression analysis to predict examination scores. Results Ninety-nine and 89 students at Aberdeen medical school from Years 4 and 5 respectively, and 51 Year 4 students in Dundee, were included in the analysis. Neither UCAS form nor interview scores were statistically significant predictors of examination performance. Conversely, the UKCAT yielded statistically significant validity coefficients between .24 and .36 in four of five assessments investigated. Multiple regression analysis showed the UKCAT made a statistically significant unique contribution to variance in examination performance in the senior years. Conclusions Results suggest the UKCAT appears to predict performance better in the later years of medical school compared to earlier years and provides modest supportive evidence for the UKCAT’s role in student selection within these institutions. Further research is needed to assess the predictive validity of the UKCAT against professional and behavioural outcomes as the cohort commences working life. PMID:24762134
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martin, Spencer; Rodrigues, George, E-mail: george.rodrigues@lhsc.on.ca; Department of Epidemiology/Biostatistics, University of Western Ontario, London
2013-01-01
Purpose: To perform a rigorous technological assessment and statistical validation of a software technology for anatomic delineations of the prostate on MRI datasets. Methods and Materials: A 3-phase validation strategy was used. Phase I consisted of anatomic atlas building using 100 prostate cancer MRI data sets to provide training data sets for the segmentation algorithms. In phase II, 2 experts contoured 15 new MRI prostate cancer cases using 3 approaches (manual, N points, and region of interest). In phase III, 5 new physicians with variable MRI prostate contouring experience segmented the same 15 phase II datasets using 3 approaches: manual,more » N points with no editing, and full autosegmentation with user editing allowed. Statistical analyses for time and accuracy (using Dice similarity coefficient) endpoints used traditional descriptive statistics, analysis of variance, analysis of covariance, and pooled Student t test. Results: In phase I, average (SD) total and per slice contouring time for the 2 physicians was 228 (75), 17 (3.5), 209 (65), and 15 seconds (3.9), respectively. In phase II, statistically significant differences in physician contouring time were observed based on physician, type of contouring, and case sequence. The N points strategy resulted in superior segmentation accuracy when initial autosegmented contours were compared with final contours. In phase III, statistically significant differences in contouring time were observed based on physician, type of contouring, and case sequence again. The average relative timesaving for N points and autosegmentation were 49% and 27%, respectively, compared with manual contouring. The N points and autosegmentation strategies resulted in average Dice values of 0.89 and 0.88, respectively. Pre- and postedited autosegmented contours demonstrated a higher average Dice similarity coefficient of 0.94. Conclusion: The software provided robust contours with minimal editing required. Observed time savings were seen for all physicians irrespective of experience level and baseline manual contouring speed.« less
Griest, Susan; Zaugg, Tara L.; Thielman, Emily; Kaelin, Christine; Galvez, Gino; Carlson, Kathleen F.
2015-01-01
Purpose Individuals complaining of tinnitus often attribute hearing problems to the tinnitus. In such cases some (or all) of their reported “tinnitus distress” may in fact be caused by trouble communicating due to hearing problems. We developed the Tinnitus and Hearing Survey (THS) as a tool to rapidly differentiate hearing problems from tinnitus problems. Method For 2 of our research studies, we administered the THS twice (mean of 16.5 days between tests) to 67 participants who did not receive intervention. These data allow for measures of statistical validation of the THS. Results Reliability of the THS was good to excellent regarding internal consistency (α = .86–.94), test–retest reliability (r = .76–.83), and convergent validity between the Tinnitus Handicap Inventory (Newman, Jacobson, & Spitzer, 1996; Newman, Sandridge, & Jacobson, 1998) and the A (Tinnitus) subscale of the THS (r = .78). Factor analysis confirmed that the 2 subscales, A (Tinnitus) and B (Hearing), have strong internal structure, explaining 71.7% of the total variance, and low correlation with each other (r = .46), resulting in a small amount of shared variance (21%). Conclusion These results provide evidence that the THS is statistically validated and reliable for use in assisting patients and clinicians in quickly (and collaboratively) determining whether intervention for tinnitus is appropriate. PMID:25551458
Lambert, Carole; Gagnon, Robert; Nguyen, David; Charlin, Bernard
2009-01-01
Background The Script Concordance test (SCT) is a reliable and valid tool to evaluate clinical reasoning in complex situations where experts' opinions may be divided. Scores reflect the degree of concordance between the performance of examinees and that of a reference panel of experienced physicians. The purpose of this study is to demonstrate SCT's usefulness in radiation oncology. Methods A 90 items radiation oncology SCT was administered to 155 participants. Three levels of experience were tested: medical students (n = 70), radiation oncology residents (n = 38) and radiation oncologists (n = 47). Statistical tests were performed to assess reliability and to document validity. Results After item optimization, the test comprised 30 cases and 70 questions. Cronbach alpha was 0.90. Mean scores were 51.62 (± 8.19) for students, 71.20 (± 9.45) for residents and 76.67 (± 6.14) for radiation oncologists. The difference between the three groups was statistically significant when compared by the Kruskall-Wallis test (p < 0.001). Conclusion The SCT is reliable and useful to discriminate among participants according to their level of experience in radiation oncology. It appears as a useful tool to document the progression of reasoning during residency training. PMID:19203358
Onwujekwe, Obinna; Fox-Rushby, Julia; Hanson, Kara
2008-01-01
This study examines whether making question formats better fit the cultural context of markets would improve the construct validity of estimates of willingness to pay (WTP). WTP for insecticide-treated mosquito nets was elicited using the bidding game, binary with follow-up (BWFU), and a novel structured haggling technique (SH) that mimicked price taking in market places in the study area. The results show that different question formats generated different distributions of WTP. Following a comparison of alternative models for each question format, construct validity was compared using the most consistently appropriate model across question formats for the positive WTP values, in this case, ordinary least squares. Three criteria (the number of statistically significant explanatory variables that had the anticipated sign, the value of the adjusted R(2), and the proportion that were statistically significant With the anticipated sign) used to assess the relative performance of each question format indicated that SH performed best and BWFU worst. However, differences in the levels of income, education, and percentage of household heads responding to the different question formats across the samples complicate this conclusion. Hence, the results suggest that the SH technique is worthy of further investigation and use.
Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.
Harrington, Peter de Boves
2018-01-02
Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.
Fatehi, Zahra; Baradaran, Hamid Reza; Asadpour, Mohamad; Rezaeian, Mohsen
2017-01-01
Background: Individuals' listening styles differs based on their characters, professions and situations. This study aimed to assess the validity and reliability of Listening Styles Profile- Revised (LSP- R) in Iranian students. Methods: After translating into Persian, LSP-R was employed in a sample of 240 medical and nursing Persian speaking students in Iran. Statistical analysis was performed to test the reliability and validity of the LSP-R. Results: The study revealed high internal consistency and good test-retest reliability for the Persian version of the questionnaire. The Cronbach's alpha coefficient was 0.72 and intra-class correlation coefficient 0.87. The means for the content validity index and the content validity ratio (CVR) were 0.90 and 0.83, respectively. Exploratory factor analysis (EFA) yielded a four-factor solution accounted for 60.8% of the observed variance. Majority of medical students (73%) as well as majority of nursing students (70%) stated that their listening styles were task-oriented. Conclusion: In general, the study finding suggests that the Persian version of LSP-R is a valid and reliable instrument for assessing listening styles profile in the studied sample.
Design, development, testing and validation of a Photonics Virtual Laboratory for the study of LEDs
NASA Astrophysics Data System (ADS)
Naranjo, Francisco L.; Martínez, Guadalupe; Pérez, Ángel L.; Pardo, Pedro J.
2014-07-01
This work presents the design, development, testing and validation of a Photonic Virtual Laboratory, highlighting the study of LEDs. The study was conducted from a conceptual, experimental and didactic standpoint, using e-learning and m-learning platforms. Specifically, teaching tools that help ensure that our students perform significant learning have been developed. It has been brought together the scientific aspect, such as the study of LEDs, with techniques of generation and transfer of knowledge through the selection, hierarchization and structuring of information using concept maps. For the validation of the didactic materials developed, it has been used procedures with various assessment tools for the collection and processing of data, applied in the context of an experimental design. Additionally, it was performed a statistical analysis to determine the validity of the materials developed. The assessment has been designed to validate the contributions of the new materials developed over the traditional method of teaching, and to quantify the learning achieved by students, in order to draw conclusions that serve as a reference for its application in the teaching and learning processes, and comprehensively validate the work carried out.
Hippisley-Cox, Julia; Coupland, Carol; Brindle, Peter
2014-01-01
Objectives To validate the performance of a set of risk prediction algorithms developed using the QResearch database, in an independent sample from general practices contributing to the Clinical Research Data Link (CPRD). Setting Prospective open cohort study using practices contributing to the CPRD database and practices contributing to the QResearch database. Participants The CPRD validation cohort consisted of 3.3 million patients, aged 25–99 years registered at 357 general practices between 1 Jan 1998 and 31 July 2012. The validation statistics for QResearch were obtained from the original published papers which used a one-third sample of practices separate to those used to derive the score. A cohort from QResearch was used to compare incidence rates and baseline characteristics and consisted of 6.8 million patients from 753 practices registered between 1 Jan 1998 and until 31 July 2013. Outcome measures Incident events relating to seven different risk prediction scores: QRISK2 (cardiovascular disease); QStroke (ischaemic stroke); QDiabetes (type 2 diabetes); QFracture (osteoporotic fracture and hip fracture); QKidney (moderate and severe kidney failure); QThrombosis (venous thromboembolism); QBleed (intracranial bleed and upper gastrointestinal haemorrhage). Measures of discrimination and calibration were calculated. Results Overall, the baseline characteristics of the CPRD and QResearch cohorts were similar though QResearch had higher recording levels for ethnicity and family history. The validation statistics for each of the risk prediction scores were very similar in the CPRD cohort compared with the published results from QResearch validation cohorts. For example, in women, the QDiabetes algorithm explained 50% of the variation within CPRD compared with 51% on QResearch and the receiver operator curve value was 0.85 on both databases. The scores were well calibrated in CPRD. Conclusions Each of the algorithms performed practically as well in the external independent CPRD validation cohorts as they had in the original published QResearch validation cohorts. PMID:25168040
Riedl, Janet; Esslinger, Susanne; Fauhl-Hassek, Carsten
2015-07-23
Food fingerprinting approaches are expected to become a very potent tool in authentication processes aiming at a comprehensive characterization of complex food matrices. By non-targeted spectrometric or spectroscopic chemical analysis with a subsequent (multivariate) statistical evaluation of acquired data, food matrices can be investigated in terms of their geographical origin, species variety or possible adulterations. Although many successful research projects have already demonstrated the feasibility of non-targeted fingerprinting approaches, their uptake and implementation into routine analysis and food surveillance is still limited. In many proof-of-principle studies, the prediction ability of only one data set was explored, measured within a limited period of time using one instrument within one laboratory. Thorough validation strategies that guarantee reliability of the respective data basis and that allow conclusion on the applicability of the respective approaches for its fit-for-purpose have not yet been proposed. Within this review, critical steps of the fingerprinting workflow were explored to develop a generic scheme for multivariate model validation. As a result, a proposed scheme for "good practice" shall guide users through validation and reporting of non-targeted fingerprinting results. Furthermore, food fingerprinting studies were selected by a systematic search approach and reviewed with regard to (a) transparency of data processing and (b) validity of study results. Subsequently, the studies were inspected for measures of statistical model validation, analytical method validation and quality assurance measures. In this context, issues and recommendations were found that might be considered as an actual starting point for developing validation standards of non-targeted metabolomics approaches for food authentication in the future. Hence, this review intends to contribute to the harmonization and standardization of food fingerprinting, both required as a prior condition for the authentication of food in routine analysis and official control. Copyright © 2015 Elsevier B.V. All rights reserved.
Investigation of Super Learner Methodology on HIV-1 Small Sample: Application on Jaguar Trial Data.
Houssaïni, Allal; Assoumou, Lambert; Marcelin, Anne Geneviève; Molina, Jean Michel; Calvez, Vincent; Flandre, Philippe
2012-01-01
Background. Many statistical models have been tested to predict phenotypic or virological response from genotypic data. A statistical framework called Super Learner has been introduced either to compare different methods/learners (discrete Super Learner) or to combine them in a Super Learner prediction method. Methods. The Jaguar trial is used to apply the Super Learner framework. The Jaguar study is an "add-on" trial comparing the efficacy of adding didanosine to an on-going failing regimen. Our aim was also to investigate the impact on the use of different cross-validation strategies and different loss functions. Four different repartitions between training set and validations set were tested through two loss functions. Six statistical methods were compared. We assess performance by evaluating R(2) values and accuracy by calculating the rates of patients being correctly classified. Results. Our results indicated that the more recent Super Learner methodology of building a new predictor based on a weighted combination of different methods/learners provided good performance. A simple linear model provided similar results to those of this new predictor. Slight discrepancy arises between the two loss functions investigated, and slight difference arises also between results based on cross-validated risks and results from full dataset. The Super Learner methodology and linear model provided around 80% of patients correctly classified. The difference between the lower and higher rates is around 10 percent. The number of mutations retained in different learners also varys from one to 41. Conclusions. The more recent Super Learner methodology combining the prediction of many learners provided good performance on our small dataset.
A systematic review of the quality of homeopathic clinical trials
Jonas, Wayne B; Anderson, Rachel L; Crawford, Cindy C; Lyons, John S
2001-01-01
Background While a number of reviews of homeopathic clinical trials have been done, all have used methods dependent on allopathic diagnostic classifications foreign to homeopathic practice. In addition, no review has used established and validated quality criteria allowing direct comparison of the allopathic and homeopathic literature. Methods In a systematic review, we compared the quality of clinical-trial research in homeopathy to a sample of research on conventional therapies using a validated and system-neutral approach. All clinical trials on homeopathic treatments with parallel treatment groups published between 1945–1995 in English were selected. All were evaluated with an established set of 33 validity criteria previously validated on a broad range of health interventions across differing medical systems. Criteria covered statistical conclusion, internal, construct and external validity. Reliability of criteria application is greater than 0.95. Results 59 studies met the inclusion criteria. Of these, 79% were from peer-reviewed journals, 29% used a placebo control, 51% used random assignment, and 86% failed to consider potentially confounding variables. The main validity problems were in measurement where 96% did not report the proportion of subjects screened, and 64% did not report attrition rate. 17% of subjects dropped out in studies where this was reported. There was practically no replication of or overlap in the conditions studied and most studies were relatively small and done at a single-site. Compared to research on conventional therapies the overall quality of studies in homeopathy was worse and only slightly improved in more recent years. Conclusions Clinical homeopathic research is clearly in its infancy with most studies using poor sampling and measurement techniques, few subjects, single sites and no replication. Many of these problems are correctable even within a "holistic" paradigm given sufficient research expertise, support and methods. PMID:11801202
Matulis, Simone; Loos, Laura; Langguth, Nadine; Schreiber, Franziska; Gutermann, Jana; Gawrilow, Caterina; Steil, Regina
2015-01-01
Background The Trauma Symptom Checklist for Children (TSC-C) is the most widely used self-report scale to assess trauma-related symptoms in children and adolescents on six clinical scales. The purpose of the present study was to develop a German version of the TSC-C and to investigate its psychometric properties, such as factor structure, reliability, and validity, in a sample of German adolescents. Method A normative sample of N=583 and a clinical sample of N=41 adolescents with a history of physical or sexual abuse aged between 13 and 21 years participated in the study. Results The Confirmatory Factor Analysis on the six-factor model (anger, anxiety, depression, dissociation, posttraumatic stress, and sexual concerns with the subdimensions preoccupation and distress) revealed acceptable to good fit statistics in the normative sample. One item had to be excluded from the German version of the TSC-C because the factor loading was too low. All clinical scales presented acceptable to good reliability, with Cronbach's α's ranging from .80 to .86 in the normative sample and from .72 to .87 in the clinical sample. Concurrent validity was also demonstrated by the high correlations between the TSC-C scales and instruments measuring similar psychopathology. TSC-C scores reliably differentiated between adolescents with trauma history and those without trauma history, indicating discriminative validity. Conclusions In conclusion, the German version of the TSC-C is a reliable and valid instrument for assessing trauma-related symptoms on six different scales in adolescents aged between 13 and 21 years. PMID:26498182
Spriestersbach, Albert; Röhrig, Bernd; du Prel, Jean-Baptist; Gerhold-Ay, Aslihan; Blettner, Maria
2009-09-01
Descriptive statistics are an essential part of biometric analysis and a prerequisite for the understanding of further statistical evaluations, including the drawing of inferences. When data are well presented, it is usually obvious whether the author has collected and evaluated them correctly and in keeping with accepted practice in the field. Statistical variables in medicine may be of either the metric (continuous, quantitative) or categorical (nominal, ordinal) type. Easily understandable examples are given. Basic techniques for the statistical description of collected data are presented and illustrated with examples. The goal of a scientific study must always be clearly defined. The definition of the target value or clinical endpoint determines the level of measurement of the variables in question. Nearly all variables, whatever their level of measurement, can be usefully presented graphically and numerically. The level of measurement determines what types of diagrams and statistical values are appropriate. There are also different ways of presenting combinations of two independent variables graphically and numerically. The description of collected data is indispensable. If the data are of good quality, valid and important conclusions can already be drawn when they are properly described. Furthermore, data description provides a basis for inferential statistics.
Using statistical text classification to identify health information technology incidents
Chai, Kevin E K; Anthony, Stephen; Coiera, Enrico; Magrabi, Farah
2013-01-01
Objective To examine the feasibility of using statistical text classification to automatically identify health information technology (HIT) incidents in the USA Food and Drug Administration (FDA) Manufacturer and User Facility Device Experience (MAUDE) database. Design We used a subset of 570 272 incidents including 1534 HIT incidents reported to MAUDE between 1 January 2008 and 1 July 2010. Text classifiers using regularized logistic regression were evaluated with both ‘balanced’ (50% HIT) and ‘stratified’ (0.297% HIT) datasets for training, validation, and testing. Dataset preparation, feature extraction, feature selection, cross-validation, classification, performance evaluation, and error analysis were performed iteratively to further improve the classifiers. Feature-selection techniques such as removing short words and stop words, stemming, lemmatization, and principal component analysis were examined. Measurements κ statistic, F1 score, precision and recall. Results Classification performance was similar on both the stratified (0.954 F1 score) and balanced (0.995 F1 score) datasets. Stemming was the most effective technique, reducing the feature set size to 79% while maintaining comparable performance. Training with balanced datasets improved recall (0.989) but reduced precision (0.165). Conclusions Statistical text classification appears to be a feasible method for identifying HIT reports within large databases of incidents. Automated identification should enable more HIT problems to be detected, analyzed, and addressed in a timely manner. Semi-supervised learning may be necessary when applying machine learning to big data analysis of patient safety incidents and requires further investigation. PMID:23666777
Validating Smoking Data From the Veteran’s Affairs Health Factors Dataset, an Electronic Data Source
Brandt, Cynthia A.; Skanderson, Melissa; Justice, Amy C.; Shahrir, Shahida; Butt, Adeel A.; Brown, Sheldon T.; Freiberg, Matthew S.; Gibert, Cynthia L.; Goetz, Matthew Bidwell; Kim, Joon Woo; Pisani, Margaret A.; Rimland, David; Rodriguez-Barradas, Maria C.; Sico, Jason J.; Tindle, Hilary A.; Crothers, Kristina
2011-01-01
Introduction: We assessed smoking data from the Veterans Health Administration (VHA) electronic medical record (EMR) Health Factors dataset. Methods: To assess the validity of the EMR Health Factors smoking data, we first created an algorithm to convert text entries into a 3-category smoking variable (never, former, and current). We compared this EMR smoking variable to 2 different sources of patient self-reported smoking survey data: (a) 6,816 HIV-infected and -uninfected participants in the 8-site Veterans Aging Cohort Study (VACS-8) and (b) a subset of 13,689 participants from the national VACS Virtual Cohort (VACS-VC), who also completed the 1999 Large Health Study (LHS) survey. Sensitivity, specificity, and kappa statistics were used to evaluate agreement of EMR Health Factors smoking data with self-report smoking data. Results: For the EMR Health Factors and VACS-8 comparison of current, former, and never smoking categories, the kappa statistic was .66. For EMR Health Factors and VACS-VC/LHS comparison of smoking, the kappa statistic was .61. Conclusions: Based on kappa statistics, agreement between the EMR Health Factors and survey sources is substantial. Identification of current smokers nationally within the VHA can be used in future studies to track smoking status over time, to evaluate smoking interventions, and to adjust for smoking status in research. Our methodology may provide insights for other organizations seeking to use EMR data for accurate determination of smoking status. PMID:21911825
Sadegh Moghadam, Leila; Foroughan, Mahshid; Mohammadi Shahboulaghi, Farahnaz; Ahmadi, Fazlollah; Sajjadi, Moosa; Farhadi, Akram
2016-01-01
Background Perceptions of aging refer to individuals’ understanding of aging within their sociocultural context. Proper measurement of this concept in various societies requires accurate tools. Objective The present study was conducted with the aim to translate and validate the Brief Aging Perceptions Questionnaire (B-APQ) and assess its psychometric features in Iranian older adults. Method In this study, the Persian version of B-APQ was validated for 400 older adults. This questionnaire was translated into Persian according to the Wild et al’s model. The Persian version was validated using content, face, and construct (using confirmatory factor analysis) validities, and then its internal consistency and test–retest reliability were measured. Data were analyzed using the statistical software programs SPSS 18 and EQS-6.1. Results The confirmatory factor analysis confirmed construct validity and five subscales of B-APQ. Test–retest reliability with 3-week interval produced r=0.94. Cronbach’s alpha was found to be 0.75 for the whole questionnaire, and from 0.53 to 0.77 for the five factors. Conclusion The Persian version of B-APQ showed favorable validity and reliability, and thus it can be used for measuring different dimensions of perceptions of aging in Iranian older adults. PMID:27194907
2011-01-01
Background Although many biological databases are applying semantic web technologies, meaningful biological hypothesis testing cannot be easily achieved. Database-driven high throughput genomic hypothesis testing requires both of the capabilities of obtaining semantically relevant experimental data and of performing relevant statistical testing for the retrieved data. Tissue Microarray (TMA) data are semantically rich and contains many biologically important hypotheses waiting for high throughput conclusions. Methods An application-specific ontology was developed for managing TMA and DNA microarray databases by semantic web technologies. Data were represented as Resource Description Framework (RDF) according to the framework of the ontology. Applications for hypothesis testing (Xperanto-RDF) for TMA data were designed and implemented by (1) formulating the syntactic and semantic structures of the hypotheses derived from TMA experiments, (2) formulating SPARQLs to reflect the semantic structures of the hypotheses, and (3) performing statistical test with the result sets returned by the SPARQLs. Results When a user designs a hypothesis in Xperanto-RDF and submits it, the hypothesis can be tested against TMA experimental data stored in Xperanto-RDF. When we evaluated four previously validated hypotheses as an illustration, all the hypotheses were supported by Xperanto-RDF. Conclusions We demonstrated the utility of high throughput biological hypothesis testing. We believe that preliminary investigation before performing highly controlled experiment can be benefited. PMID:21342584
Willis, Brian H; Riley, Richard D
2017-09-20
An important question for clinicians appraising a meta-analysis is: are the findings likely to be valid in their own practice-does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity-where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple ('leave-one-out') cross-validation technique, we demonstrate how we may test meta-analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta-analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta-analysis and a tailored meta-regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within-study variance, between-study variance, study sample size, and the number of studies in the meta-analysis. Finally, we apply Vn to two published meta-analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta-analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Development and Validation of a Job Exposure Matrix for Physical Risk Factors in Low Back Pain
Solovieva, Svetlana; Pehkonen, Irmeli; Kausto, Johanna; Miranda, Helena; Shiri, Rahman; Kauppinen, Timo; Heliövaara, Markku; Burdorf, Alex; Husgafvel-Pursiainen, Kirsti; Viikari-Juntura, Eira
2012-01-01
Objectives The aim was to construct and validate a gender-specific job exposure matrix (JEM) for physical exposures to be used in epidemiological studies of low back pain (LBP). Materials and Methods We utilized two large Finnish population surveys, one to construct the JEM and another to test matrix validity. The exposure axis of the matrix included exposures relevant to LBP (heavy physical work, heavy lifting, awkward trunk posture and whole body vibration) and exposures that increase the biomechanical load on the low back (arm elevation) or those that in combination with other known risk factors could be related to LBP (kneeling or squatting). Job titles with similar work tasks and exposures were grouped. Exposure information was based on face-to-face interviews. Validity of the matrix was explored by comparing the JEM (group-based) binary measures with individual-based measures. The predictive validity of the matrix against LBP was evaluated by comparing the associations of the group-based (JEM) exposures with those of individual-based exposures. Results The matrix includes 348 job titles, representing 81% of all Finnish job titles in the early 2000s. The specificity of the constructed matrix was good, especially in women. The validity measured with kappa-statistic ranged from good to poor, being fair for most exposures. In men, all group-based (JEM) exposures were statistically significantly associated with one-month prevalence of LBP. In women, four out of six group-based exposures showed an association with LBP. Conclusions The gender-specific JEM for physical exposures showed relatively high specificity without compromising sensitivity. The matrix can therefore be considered as a valid instrument for exposure assessment in large-scale epidemiological studies, when more precise but more labour-intensive methods are not feasible. Although the matrix was based on Finnish data we foresee that it could be applicable, with some modifications, in other countries with a similar level of technology. PMID:23152793
Jin, X F; Wang, J; Li, Y J; Liu, J F; Ni, D F
2016-09-20
Objective: To cross-culturally translate the questionnaire of olfactory disorders(QOD)into a simplified Chinese version, and evaluate its reliability and validity in clinical. Method: A simplified Chinese version of the QOD was evaluated in test-retest reliability, split-half reliability and internal consistency.Then it was evaluated in validity test including content validity, criterion-related validity, responsibility. Criterion-related validity was using the medical outcome study's 36-item short rorm health survey(SF-36) and the World Health Organization quality of life-brief (WHOQOL-BREF) for comparison. Result: A total of 239 patients with olfactory dysfunction were enrolled and tested, in which 195 patients completed all three surveys(QOD, SF-36, WHOQOL-BREF). The test-retest reliabilities of the QOD-parosmia statements(QOD-P), QOD-quality of life(QOD-QoL), and the QOD-visual simulation(QOD-VAS)sections were 0.799( P <0.01),0.781( P <0.01),0.488( P <0.01), respectively, and the Cronbach' s α coefficients reliability were 0.477,0.812,0.889,respectively.The split-half reliability of QOD-QoL was 0.89. There was no correlation between the QOD-P section and the SF-36, but there were statistically significant correlations between the QOD-QoL and QOD-VAS sections with the SF-36. There was no correlation between the QOD-P section and the WHOQOL-BREF, but there were statistically significant correlations between the QOD-QoL and QOD-VAS sections with the SF-36 in most sections. Conclusion: The simplified Chinese version of the QOD was testified to be a reliable and valid questionnaire for evaluating patients with olfactory dysfunction living in mainland of China.The QOD-P section needs further modifications to properly adapt patients with Chinese cultural and knowledge background. Copyright© by the Editorial Department of Journal of Clinical Otorhinolaryngology Head and Neck Surgery.
Chang, Wen-Dien; Chang, Wan-Yi; Lee, Chia-Lun; Feng, Chi-Yen
2013-01-01
[Purpose] Balance is an integral part of human ability. The smart balance master system (SBM) is a balance test instrument with good reliability and validity, but it is expensive. Therefore, we modified a Wii Fit balance board, which is a convenient balance assessment tool, and analyzed its reliability and validity. [Subjects and Methods] We recruited 20 healthy young adults and 20 elderly people, and administered 3 balance tests. The correlation coefficient and intraclass correlation of both instruments were analyzed. [Results] There were no statistically significant differences in the 3 tests between the Wii Fit balance board and the SBM. The Wii Fit balance board had a good intraclass correlation (0.86–0.99) for the elderly people and positive correlations (r = 0.58–0.86) with the SBM. [Conclusions] The Wii Fit balance board is a balance assessment tool with good reliability and high validity for elderly people, and we recommend it as an alternative tool for assessing balance ability. PMID:24259769
Often Asked but Rarely Answered: Can Asians Meet DSM-5/ICD-10 Autism Spectrum Disorder Criteria?
Kim, So Hyun; Koh, Yun-Joo; Lim, Eun-Chung; Kim, Soo-Jeong; Leventhal, Bennett L.
2016-01-01
Abstract Objectives: To evaluate whether Asian (Korean children) populations can be validly diagnosed with autism spectrum disorder (ASD) using Western-based diagnostic instruments and criteria based on Diagnostic and Statistical Manual on Mental Disorders, 5th edition (DSM-5). Methods: Participants included an epidemiologically ascertained 7–14-year-old (N = 292) South Korean cohort from a larger prevalence study (N = 55,266). Main outcomes were based on Western-based diagnostic methods for Korean children using gold standard instruments, Autism Diagnostic Interview-Revised, and Autism Diagnostic Observation Schedule. Factor analysis and ANOVAs were performed to examine factor structure of autism symptoms and identify phenotypic differences between Korean children with ASD and non-ASD diagnoses. Results: Using Western-based diagnostic methods, Korean children with ASD were successfully identified with moderate-to-high diagnostic validity (sensitivities/specificities ranging 64%–93%), strong internal consistency, and convergent/concurrent validity. The patterns of autism phenotypes in a Korean population were similar to those observed in a Western population with two symptom domains (social communication and restricted and repetitive behavior factors). Statistically significant differences in the use of socially acceptable communicative behaviors (e.g., direct gaze, range of facial expressions) emerged between ASD versus non-ASD cases (mostly p < 0.001), ensuring that these can be a similarly valid part of the ASD phenotype in both Asian and Western populations. Conclusions: Despite myths, biases, and stereotypes about Asian social behavior, Asians (at least Korean children) typically use elements of reciprocal social interactions similar to those in the West. Therefore, standardized diagnostic methods widely used for ASD in Western culture can be validly used as part of the assessment process and research with Koreans and, possibly, other Asians. PMID:27315155
Pusceddu, Sara; Barretta, Francesco; Trama, Annalisa; Botta, Laura; Milione, Massimo; Buzzoni, Roberto; De Braud, Filippo; Mazzaferro, Vincenzo; Pastorino, Ugo; Seregni, Ettore; Mariani, Luigi; Gatta, Gemma; Di Bartolomeo, Maria; Femia, Daniela; Prinzi, Natalie; Coppa, Jorgelina; Panzuto, Francesco; Antonuzzo, Lorenzo; Bajetta, Emilio; Brizzi, Maria Pia; Campana, Davide; Catena, Laura; Comber, Harry; Dwane, Fiona; Fazio, Nicola; Faggiano, Antongiulio; Giuffrida, Dario; Henau, Kris; Ibrahim, Toni; Marconcini, Riccardo; Massironi, Sara; Žakelj, Maja Primic; Spada, Francesca; Tafuto, Salvatore; Van Eycken, Elizabeth; Van der Zwan, Jan Maaten; Žagar, Tina; Giacomelli, Luca; Miceli, Rosalba; Aroldi, Francesca; Bongiovanni, Alberto; Berardi, Rossana; Brighi, Nicole; Cingarlini, Sara; Cauchi, Carolina; Cavalcoli, Federica; Carnaghi, Carlo; Corti, Francesca; Duro, Marilina; Davì, Maria Vittoria; De Divitiis, Chiara; Ermacora, Paola; La Salvia, Anna; Luppi, Gabriele; Lo Russo, Giuseppe; Nichetti, Federico; Raimondi, Alessandra; Perfetti, Vittorio; Razzore, Paola; Rinzivillo, Maria; Siesling, Sabine; Torchio, Martina; Van Dijk, Boukje; Visser, Otto; Vernieri, Claudio
2018-01-01
No validated prognostic tool is available for predicting overall survival (OS) of patients with well-differentiated neuroendocrine tumors (WDNETs). This study, conducted in three independent cohorts of patients from five different European countries, aimed to develop and validate a classification prognostic score for OS in patients with stage IV WDNETs. We retrospectively collected data on 1387 patients: (i) patients treated at the Istituto Nazionale Tumori (Milan, Italy; n = 515); (ii) European cohort of rare NET patients included in the European RARECAREnet database (n = 457); (iii) Italian multicentric cohort of pancreatic NET (pNETs) patients treated at 24 Italian institutions (n = 415). The score was developed using data from patients included in cohort (i) (training set); external validation was performed by applying the score to the data of the two independent cohorts (ii) and (iii) evaluating both calibration and discriminative ability (Harrell C statistic). We used data on age, primary tumor site, metastasis (synchronous vs metachronous), Ki-67, functional status and primary surgery to build the score, which was developed for classifying patients into three groups with differential 10-year OS: (I) favorable risk group: 10-year OS ≥70%; (II) intermediate risk group: 30% ≤ 10-year OS < 70%; (III) poor risk group: 10-year OS <30%. The Harrell C statistic was 0.661 in the training set, and 0.626 and 0.601 in the RARECAREnet and Italian multicentric validation sets, respectively. In conclusion, based on the analysis of three ‘field-practice’ cohorts collected in different settings, we defined and validated a prognostic score to classify patients into three groups with different long-term prognoses. PMID:29559553
Measurement issues in research on social support and health.
Dean, K; Holst, E; Kreiner, S; Schoenborn, C; Wilson, R
1994-01-01
STUDY OBJECTIVE--The aims were: (1) to identify methodological problems that may explain the inconsistencies and contradictions in the research evidence on social support and health, and (2) to validate a frequently used measure of social support in order to determine whether or not it could be used in multivariate analyses of population data in research on social support and health. DESIGN AND METHODS--Secondary analysis of data collected in a cross sectional survey of a multistage cluster sample of the population of the United States, designed to study relationships in behavioural, social support and health variables. Statistical models based on item response theory and graph theory were used to validate the measure of social support to be used in subsequent analyses. PARTICIPANTS--Data on 1755 men and women aged 20 to 64 years were available for the scale validation. RESULTS--Massive evidence of item bias was found for all items of a group membership subscale. The most serious problems were found in relationship to an item measuring membership in work related groups. Using that item in the social network scale in multivariate analyses would distort findings on the statistical effects of education, employment status, and household income. Evidence of item bias was also found for a sociability subscale. When marital status was included to create what is called an intimate contacts subscale, the confounding grew worse. CONCLUSIONS--The composite measure of social network is not valid and would seriously distort the findings of analyses attempting to study relationships between the index and other variables. The findings show that valid measurement is a methodological issue that must be addressed in scientific research on population health. PMID:8189179
Entanglement entropy in Fermi gases and Anderson's orthogonality catastrophe.
Ossipov, A
2014-09-26
We study the ground-state entanglement entropy of a finite subsystem of size L of an infinite system of noninteracting fermions scattered by a potential of finite range a. We derive a general relation between the scattering matrix and the overlap matrix and use it to prove that for a one-dimensional symmetric potential the von Neumann entropy, the Rényi entropies, and the full counting statistics are robust against potential scattering, provided that L/a≫1. The results of numerical calculations support the validity of this conclusion for a generic potential.
Errors of logic and scholarship concerning dissociative identity disorder.
Ross, Colin A
2009-01-01
The author reviewed a two-part critique of dissociative identity disorder published in the Canadian Journal of Psychiatry. The two papers contain errors of logic and scholarship. Contrary to the conclusions in the critique, dissociative identity disorder has established diagnostic reliability and concurrent validity, the trauma histories of affected individuals can be corroborated, and the existing prospective treatment outcome literature demonstrates improvement in individuals receiving psychotherapy for the disorder. The available evidence supports the inclusion of dissociative identity disorder in future editions of the Diagnostic and Statistical Manual of Mental Disorders.
K(3)EDTA Vacuum Tubes Validation for Routine Hematological Testing.
Lima-Oliveira, Gabriel; Lippi, Giuseppe; Salvagno, Gian Luca; Montagnana, Martina; Poli, Giovanni; Solero, Giovanni Pietro; Picheth, Geraldo; Guidi, Gian Cesare
2012-01-01
Background and Objective. Some in vitro diagnostic devices (e.g, blood collection vacuum tubes and syringes for blood analyses) are not validated before the quality laboratory managers decide to start using or to change the brand. Frequently, the laboratory or hospital managers select the vacuum tubes for blood collection based on cost considerations or on relevance of a brand. The aim of this study was to validate two dry K(3)EDTA vacuum tubes of different brands for routine hematological testing. Methods. Blood specimens from 100 volunteers in two different K(3)EDTA vacuum tubes were collected by a single, expert phlebotomist. The routine hematological testing was done on Advia 2120i hematology system. The significance of the differences between samples was assessed by paired Student's t-test after checking for normality. The level of statistical significance was set at P < 0.05. Results and Conclusions. Different brand's tubes evaluated can represent a clinically relevant source of variations only on mean platelet volume (MPV) and platelet distribution width (PDW). Basically, our validation will permit the laboratory or hospital managers to select the brand's vacuum tubes validated according to him/her technical or economical reasons for routine hematological tests.
K3EDTA Vacuum Tubes Validation for Routine Hematological Testing
Lima-Oliveira, Gabriel; Lippi, Giuseppe; Salvagno, Gian Luca; Montagnana, Martina; Poli, Giovanni; Solero, Giovanni Pietro; Picheth, Geraldo; Guidi, Gian Cesare
2012-01-01
Background and Objective. Some in vitro diagnostic devices (e.g, blood collection vacuum tubes and syringes for blood analyses) are not validated before the quality laboratory managers decide to start using or to change the brand. Frequently, the laboratory or hospital managers select the vacuum tubes for blood collection based on cost considerations or on relevance of a brand. The aim of this study was to validate two dry K3EDTA vacuum tubes of different brands for routine hematological testing. Methods. Blood specimens from 100 volunteers in two different K3EDTA vacuum tubes were collected by a single, expert phlebotomist. The routine hematological testing was done on Advia 2120i hematology system. The significance of the differences between samples was assessed by paired Student's t-test after checking for normality. The level of statistical significance was set at P < 0.05. Results and Conclusions. Different brand's tubes evaluated can represent a clinically relevant source of variations only on mean platelet volume (MPV) and platelet distribution width (PDW). Basically, our validation will permit the laboratory or hospital managers to select the brand's vacuum tubes validated according to him/her technical or economical reasons for routine hematological tests. PMID:22888448
Comparative analysis of positive and negative attitudes toward statistics
NASA Astrophysics Data System (ADS)
Ghulami, Hassan Rahnaward; Ab Hamid, Mohd Rashid; Zakaria, Roslinazairimah
2015-02-01
Many statistics lecturers and statistics education researchers are interested to know the perception of their students' attitudes toward statistics during the statistics course. In statistics course, positive attitude toward statistics is a vital because it will be encourage students to get interested in the statistics course and in order to master the core content of the subject matters under study. Although, students who have negative attitudes toward statistics they will feel depressed especially in the given group assignment, at risk for failure, are often highly emotional, and could not move forward. Therefore, this study investigates the students' attitude towards learning statistics. Six latent constructs have been the measurement of students' attitudes toward learning statistic such as affect, cognitive competence, value, difficulty, interest, and effort. The questionnaire was adopted and adapted from the reliable and validate instrument of Survey of Attitudes towards Statistics (SATS). This study is conducted among engineering undergraduate engineering students in the university Malaysia Pahang (UMP). The respondents consist of students who were taking the applied statistics course from different faculties. From the analysis, it is found that the questionnaire is acceptable and the relationships among the constructs has been proposed and investigated. In this case, students show full effort to master the statistics course, feel statistics course enjoyable, have confidence that they have intellectual capacity, and they have more positive attitudes then negative attitudes towards statistics learning. In conclusion in terms of affect, cognitive competence, value, interest and effort construct the positive attitude towards statistics was mostly exhibited. While negative attitudes mostly exhibited by difficulty construct.
ERIC Educational Resources Information Center
Nolan, Meaghan M.; Beran, Tanya; Hecker, Kent G.
2012-01-01
Students with positive attitudes toward statistics are likely to show strong academic performance in statistics courses. Multiple surveys measuring students' attitudes toward statistics exist; however, a comparison of the validity and reliability of interpretations based on their scores is needed. A systematic review of relevant electronic…
An experimental validation of a statistical-based damage detection approach.
DOT National Transportation Integrated Search
2011-01-01
In this work, a previously-developed, statistical-based, damage-detection approach was validated for its ability to : autonomously detect damage in bridges. The damage-detection approach uses statistical differences in the actual and : predicted beha...
El Khattabi, Laïla Allach; Rouillac-Le Sciellour, Christelle; Le Tessier, Dominique; Luscan, Armelle; Coustier, Audrey; Porcher, Raphael; Bhouri, Rakia; Nectoux, Juliette; Sérazin, Valérie; Quibel, Thibaut; Mandelbrot, Laurent; Tsatsaris, Vassilis
2016-01-01
Objective NIPT for fetal aneuploidy by digital PCR has been hampered by the large number of PCR reactions needed to meet statistical requirements, preventing clinical application. Here, we designed an octoplex droplet digital PCR (ddPCR) assay which allows increasing the number of available targets and thus overcomes statistical obstacles. Method After technical optimization of the multiplex PCR on mixtures of trisomic and euploid DNA, we performed a validation study on samples of plasma DNA from 213 pregnant women. Molecular counting of circulating cell-free DNA was performed using a mix of hydrolysis probes targeting chromosome 21 and a reference chromosome. Results The results of our validation experiments showed that ddPCR detected trisomy 21 even when the sample’s trisomic DNA content is as low as 5%. In a validation study of plasma samples from 213 pregnant women, ddPCR discriminated clearly between the trisomy 21 and the euploidy groups. Conclusion Our results demonstrate that digital PCR can meet the requirements for non-invasive prenatal testing of trisomy 21. This approach is technically simple, relatively cheap, easy to implement in a diagnostic setting and compatible with ethical concerns regarding access to nucleotide sequence information. These advantages make it a potential technique of choice for population-wide screening for trisomy 21 in pregnant women. PMID:27167625
ERIC Educational Resources Information Center
O'Bryant, Monique J.
2017-01-01
The aim of this study was to validate an instrument that can be used by instructors or social scientist who are interested in evaluating statistics anxiety. The psychometric properties of the English version of the Statistical Anxiety Scale (SAS) was examined through a confirmatory factor analysis of scores from a sample of 323 undergraduate…
External validation of a Cox prognostic model: principles and methods
2013-01-01
Background A prognostic model should not enter clinical practice unless it has been demonstrated that it performs a useful role. External validation denotes evaluation of model performance in a sample independent of that used to develop the model. Unlike for logistic regression models, external validation of Cox models is sparsely treated in the literature. Successful validation of a model means achieving satisfactory discrimination and calibration (prediction accuracy) in the validation sample. Validating Cox models is not straightforward because event probabilities are estimated relative to an unspecified baseline function. Methods We describe statistical approaches to external validation of a published Cox model according to the level of published information, specifically (1) the prognostic index only, (2) the prognostic index together with Kaplan-Meier curves for risk groups, and (3) the first two plus the baseline survival curve (the estimated survival function at the mean prognostic index across the sample). The most challenging task, requiring level 3 information, is assessing calibration, for which we suggest a method of approximating the baseline survival function. Results We apply the methods to two comparable datasets in primary breast cancer, treating one as derivation and the other as validation sample. Results are presented for discrimination and calibration. We demonstrate plots of survival probabilities that can assist model evaluation. Conclusions Our validation methods are applicable to a wide range of prognostic studies and provide researchers with a toolkit for external validation of a published Cox model. PMID:23496923
Greaves, Paul; Clear, Andrew; Coutinho, Rita; Wilson, Andrew; Matthews, Janet; Owen, Andrew; Shanyinde, Milensu; Lister, T. Andrew; Calaminici, Maria; Gribben, John G.
2013-01-01
Purpose The immune microenvironment is key to the pathophysiology of classical Hodgkin lymphoma (CHL). Twenty percent of patients experience failure of their initial treatment, and others receive excessively toxic treatment. Prognostic scores and biomarkers have yet to influence outcomes significantly. Previous biomarker studies have been limited by the extent of tissue analyzed, statistical inconsistencies, and failure to validate findings. We aimed to overcome these limitations by validating recently identified microenvironment biomarkers (CD68, FOXP3, and CD20) in a new patient cohort with a greater extent of tissue and by using rigorous statistical methodology. Patients and Methods Diagnostic tissue from 122 patients with CHL was microarrayed and stained, and positive cells were counted across 10 to 20 high-powered fields per patient by using an automated system. Two statistical analyses were performed: a categorical analysis with test/validation set-defined cut points and Kaplan-Meier estimated outcome measures of 5-year overall survival (OS), disease-specific survival (DSS), and freedom from first-line treatment failure (FFTF) and an independent multivariate analysis of absolute uncategorized counts. Results Increased CD20 expression confers superior OS. Increased FOXP3 expression confers superior OS, and increased CD68 confers inferior FFTF and OS. FOXP3 varies independently of CD68 expression and retains significance when analyzed as a continuous variable in multivariate analysis. A simple score combining FOXP3 and CD68 discriminates three groups: FFTF 93%, 62%, and 47% (P < .001), DSS 93%, 82%, and 63% (P = .03), and OS 93%, 82%, and 59% (P = .002). Conclusion We have independently validated CD68, FOXP3, and CD20 as prognostic biomarkers in CHL, and we demonstrate, to the best of our knowledge for the first time, that combining FOXP3 and CD68 may further improve prognostic stratification. PMID:23045593
Martin, Kevin D; Amendola, Annunziato; Phisitkul, Phinit
2016-01-01
Abstract Purpose Orthopedic education continues to move towards evidence-based curriculum in order to comply with new residency accreditation mandates. There are currently three high fidelity arthroscopic virtual reality (VR) simulators available, each with multiple instructional modules and simulated arthroscopic procedures. The aim of the current study is to assess face validity, defined as the degree to which a procedure appears effective in terms of its stated aims, of three available VR simulators. Methods Thirty subjects were recruited from a single orthopedic residency training program. Each subject completed one training session on each of the three leading VR arthroscopic simulators (ARTHRO mentor-Symbionix, ArthroS-Virtamed, and ArthroSim-Toltech). Each arthroscopic session involved simulator-specific modules. After training sessions, subjects completed a previously validated simulator questionnaire for face validity. Results The median external appearances for the ARTHRO Mentor (9.3, range 6.7-10.0; p=0.0036) and ArthroS (9.3, range 7.3-10.0; p=0.0003) were statistically higher than for Arthro- Sim (6.7, range 3.3-9.7). There was no statistical difference in intraarticular appearance, instrument appearance, or user friendliness between the three groups. Most simulators reached an appropriate level of proportion of sufficient scores for each categor y (≥70%), except for ARTHRO Mentor (intraarticular appearance-50%; instrument appearance- 61.1%) and ArthroSim (external appearance- 50%; user friendliness-68.8%). Conclusion These results demonstrate that ArthroS has the highest overall face validity of the three current arthroscopic VR simulators. However, only external appearance for ArthroS reached statistical significance when compared to the other simulators. Additionally, each simulator had satisfactory intraarticular quality. This study helps further the understanding of VR simulation and necessary features for accurate arthroscopic representation. This data also provides objective data for educators when selecting equipment that will best facilitate residency training. PMID:27528830
CARVALHO, Suzana Papile Maciel; BRITO, Liz Magalhães; de PAIVA, Luiz Airton Saavedra; BICUDO, Lucilene Arilho Ribeiro; CROSATO, Edgard Michel; de OLIVEIRA, Rogério Nogueira
2013-01-01
Validation studies of physical anthropology methods in the different population groups are extremely important, especially in cases in which the population variations may cause problems in the identification of a native individual by the application of norms developed for different communities. Objective This study aimed to estimate the gender of skeletons by application of the method of Oliveira, et al. (1995), previously used in a population sample from Northeast Brazil. Material and Methods The accuracy of this method was assessed for a population from Southeast Brazil and validated by statistical tests. The method used two mandibular measurements, namely the bigonial distance and the mandibular ramus height. The sample was composed of 66 skulls and the method was applied by two examiners. The results were statistically analyzed by the paired t test, logistic discriminant analysis and logistic regression. Results The results demonstrated that the application of the method of Oliveira, et al. (1995) in this population achieved very different outcomes between genders, with 100% for females and only 11% for males, which may be explained by ethnic differences. However, statistical adjustment of measurement data for the population analyzed allowed accuracy of 76.47% for males and 78.13% for females, with the creation of a new discriminant formula. Conclusion It was concluded that methods involving physical anthropology present high rate of accuracy for human identification, easy application, low cost and simplicity; however, the methodologies must be validated for the different populations due to differences in ethnic patterns, which are directly related to the phenotypic aspects. In this specific case, the method of Oliveira, et al. (1995) presented good accuracy and may be used for gender estimation in Brazil in two geographic regions, namely Northeast and Southeast; however, for other regions of the country (North, Central West and South), previous methodological adjustment is recommended as demonstrated in this study. PMID:24037076
Content-based VLE designs improve learning efficiency in constructivist statistics education.
Wessa, Patrick; De Rycker, Antoon; Holliday, Ian Edward
2011-01-01
We introduced a series of computer-supported workshops in our undergraduate statistics courses, in the hope that it would help students to gain a deeper understanding of statistical concepts. This raised questions about the appropriate design of the Virtual Learning Environment (VLE) in which such an approach had to be implemented. Therefore, we investigated two competing software design models for VLEs. In the first system, all learning features were a function of the classical VLE. The second system was designed from the perspective that learning features should be a function of the course's core content (statistical analyses), which required us to develop a specific-purpose Statistical Learning Environment (SLE) based on Reproducible Computing and newly developed Peer Review (PR) technology. The main research question is whether the second VLE design improved learning efficiency as compared to the standard type of VLE design that is commonly used in education. As a secondary objective we provide empirical evidence about the usefulness of PR as a constructivist learning activity which supports non-rote learning. Finally, this paper illustrates that it is possible to introduce a constructivist learning approach in large student populations, based on adequately designed educational technology, without subsuming educational content to technological convenience. Both VLE systems were tested within a two-year quasi-experiment based on a Reliable Nonequivalent Group Design. This approach allowed us to draw valid conclusions about the treatment effect of the changed VLE design, even though the systems were implemented in successive years. The methodological aspects about the experiment's internal validity are explained extensively. The effect of the design change is shown to have substantially increased the efficiency of constructivist, computer-assisted learning activities for all cohorts of the student population under investigation. The findings demonstrate that a content-based design outperforms the traditional VLE-based design.
Survey of statistical techniques used in validation studies of air pollution prediction models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bornstein, R D; Anderson, S F
1979-03-01
Statistical techniques used by meteorologists to validate predictions made by air pollution models are surveyed. Techniques are divided into the following three groups: graphical, tabular, and summary statistics. Some of the practical problems associated with verification are also discussed. Characteristics desired in any validation program are listed and a suggested combination of techniques that possesses many of these characteristics is presented.
An Easy Tool to Predict Survival in Patients Receiving Radiation Therapy for Painful Bone Metastases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Westhoff, Paulien G., E-mail: p.g.westhoff@umcutrecht.nl; Graeff, Alexander de; Monninkhof, Evelyn M.
2014-11-15
Purpose: Patients with bone metastases have a widely varying survival. A reliable estimation of survival is needed for appropriate treatment strategies. Our goal was to assess the value of simple prognostic factors, namely, patient and tumor characteristics, Karnofsky performance status (KPS), and patient-reported scores of pain and quality of life, to predict survival in patients with painful bone metastases. Methods and Materials: In the Dutch Bone Metastasis Study, 1157 patients were treated with radiation therapy for painful bone metastases. At randomization, physicians determined the KPS; patients rated general health on a visual analogue scale (VAS-gh), valuation of life on amore » verbal rating scale (VRS-vl) and pain intensity. To assess the predictive value of the variables, we used multivariate Cox proportional hazard analyses and C-statistics for discriminative value. Of the final model, calibration was assessed. External validation was performed on a dataset of 934 patients who were treated with radiation therapy for vertebral metastases. Results: Patients had mainly breast (39%), prostate (23%), or lung cancer (25%). After a maximum of 142 weeks' follow-up, 74% of patients had died. The best predictive model included sex, primary tumor, visceral metastases, KPS, VAS-gh, and VRS-vl (C-statistic = 0.72, 95% CI = 0.70-0.74). A reduced model, with only KPS and primary tumor, showed comparable discriminative capacity (C-statistic = 0.71, 95% CI = 0.69-0.72). External validation showed a C-statistic of 0.72 (95% CI = 0.70-0.73). Calibration of the derivation and the validation dataset showed underestimation of survival. Conclusion: In predicting survival in patients with painful bone metastases, KPS combined with primary tumor was comparable to a more complex model. Considering the amount of variables in complex models and the additional burden on patients, the simple model is preferred for daily use. In addition, a risk table for survival is provided.« less
Validation of a literature-based adherence score to Mediterranean diet: the MEDI-LITE score.
Sofi, Francesco; Dinu, Monica; Pagliai, Giuditta; Marcucci, Rossella; Casini, Alessandro
2017-09-01
Numerous studies have demonstrated a relationship between adherence to Mediterranean diet and prevention of chronic degenerative diseases. The aim of this study was to validate a novel instrument to measure adherence to Mediterranean diet based on the literature (the MEDI-LITE score). Two-hundred-and-four clinically healthy subjects completed both the MEDI-LITE score and the validated MedDietScore (MDS). Significant positive correlation between the MEDI-LITE and the MDS scores was found in the study population (R = .70; p < .0001). Furthermore, statistically significant positive correlations were found for all the nine different food groups. According to the receiver operating characteristic (ROC) curve analysis, MEDI-LITE evidenced a significant discriminative capacity between adherents and non-adherents to the Mediterranean diet pattern (optimal cut-off point = 8.50; sensitivity = 96%; specificity = 38%). In conclusion, our findings show that the MEDI-LITE score well correlate with MDS in both global score and in most of the items related to the specific food categories.
Assessment Methodology for Process Validation Lifecycle Stage 3A.
Sayeed-Desta, Naheed; Pazhayattil, Ajay Babu; Collins, Jordan; Chen, Shu; Ingram, Marzena; Spes, Jana
2017-07-01
The paper introduces evaluation methodologies and associated statistical approaches for process validation lifecycle Stage 3A. The assessment tools proposed can be applied to newly developed and launched small molecule as well as bio-pharma products, where substantial process and product knowledge has been gathered. The following elements may be included in Stage 3A: number of 3A batch determination; evaluation of critical material attributes, critical process parameters, critical quality attributes; in vivo in vitro correlation; estimation of inherent process variability (IPV) and PaCS index; process capability and quality dashboard (PCQd); and enhanced control strategy. US FDA guidance on Process Validation: General Principles and Practices, January 2011 encourages applying previous credible experience with suitably similar products and processes. A complete Stage 3A evaluation is a valuable resource for product development and future risk mitigation of similar products and processes. Elements of 3A assessment were developed to address industry and regulatory guidance requirements. The conclusions made provide sufficient information to make a scientific and risk-based decision on product robustness.
NASA Technical Reports Server (NTRS)
Klimas, Alex; Uritsky, Vadim; Donovan, Eric
2010-01-01
We provide indirect evidence for turbulent reconnection in Earth's midtail plasma sheet by reexamining the statistical properties of bright, nightside auroral emission events as observed by the UVI experiment on the Polar spacecraft and discussed previously by Uritsky et al. The events are divided into two groups: (1) those that map to absolute value of (X(sub GSM)) < 12 R(sub E) in the magnetotail and do not show scale-free statistics and (2) those that map to absolute value of (X(sub GSM)) > 12 R(sub E) and do show scale-free statistics. The absolute value of (X(sub GSM)) dependence is shown to most effectively organize the events into these two groups. Power law exponents obtained for group 2 are shown to validate the conclusions of Uritsky et al. concerning the existence of critical dynamics in the auroral emissions. It is suggested that the auroral dynamics is a reflection of a critical state in the magnetotail that is based on the dynamics of turbulent reconnection in the midtail plasma sheet.
Risk prediction score for death of traumatised and injured children
2014-01-01
Background Injury prediction scores facilitate the development of clinical management protocols to decrease mortality. However, most of the previously developed scores are limited in scope and are non-specific for use in children. We aimed to develop and validate a risk prediction model of death for injured and Traumatised Thai children. Methods Our cross-sectional study included 43,516 injured children from 34 emergency services. A risk prediction model was derived using a logistic regression analysis that included 15 predictors. Model performance was assessed using the concordance statistic (C-statistic) and the observed per expected (O/E) ratio. Internal validation of the model was performed using a 200-repetition bootstrap analysis. Results Death occurred in 1.7% of the injured children (95% confidence interval [95% CI]: 1.57–1.82). Ten predictors (i.e., age, airway intervention, physical injury mechanism, three injured body regions, the Glasgow Coma Scale, and three vital signs) were significantly associated with death. The C-statistic and the O/E ratio were 0.938 (95% CI: 0.929–0.947) and 0.86 (95% CI: 0.70–1.02), respectively. The scoring scheme classified three risk stratifications with respective likelihood ratios of 1.26 (95% CI: 1.25–1.27), 2.45 (95% CI: 2.42–2.52), and 4.72 (95% CI: 4.57–4.88) for low, intermediate, and high risks of death. Internal validation showed good model performance (C-statistic = 0.938, 95% CI: 0.926–0.952) and a small calibration bias of 0.002 (95% CI: 0.0005–0.003). Conclusions We developed a simplified Thai pediatric injury death prediction score with satisfactory calibrated and discriminative performance in emergency room settings. PMID:24575982
Statistical Validation for Clinical Measures: Repeatability and Agreement of Kinect™-Based Software.
Lopez, Natalia; Perez, Elisa; Tello, Emanuel; Rodrigo, Alejandro; Valentinuzzi, Max E
2018-01-01
The rehabilitation process is a fundamental stage for recovery of people's capabilities. However, the evaluation of the process is performed by physiatrists and medical doctors, mostly based on their observations, that is, a subjective appreciation of the patient's evolution. This paper proposes a tracking platform of the movement made by an individual's upper limb using Kinect sensor(s) to be applied for the patient during the rehabilitation process. The main contribution is the development of quantifying software and the statistical validation of its performance, repeatability, and clinical use in the rehabilitation process. The software determines joint angles and upper limb trajectories for the construction of a specific rehabilitation protocol and quantifies the treatment evolution. In turn, the information is presented via a graphical interface that allows the recording, storage, and report of the patient's data. For clinical purposes, the software information is statistically validated with three different methodologies, comparing the measures with a goniometer in terms of agreement and repeatability. The agreement of joint angles measured with the proposed software and goniometer is evaluated with Bland-Altman plots; all measurements fell well within the limits of agreement, meaning interchangeability of both techniques. Additionally, the results of Bland-Altman analysis of repeatability show 95% confidence. Finally, the physiotherapists' qualitative assessment shows encouraging results for the clinical use. The main conclusion is that the software is capable of offering a clinical history of the patient and is useful for quantification of the rehabilitation success. The simplicity, low cost, and visualization possibilities enhance the use of the software Kinect for rehabilitation and other applications, and the expert's opinion endorses the choice of our approach for clinical practice. Comparison of the new measurement technique with established goniometric methods determines that the proposed software agrees sufficiently to be used interchangeably.
Dynamic TIMI Risk Score for STEMI
Amin, Sameer T.; Morrow, David A.; Braunwald, Eugene; Sloan, Sarah; Contant, Charles; Murphy, Sabina; Antman, Elliott M.
2013-01-01
Background Although there are multiple methods of risk stratification for ST‐elevation myocardial infarction (STEMI), this study presents a prospectively validated method for reclassification of patients based on in‐hospital events. A dynamic risk score provides an initial risk stratification and reassessment at discharge. Methods and Results The dynamic TIMI risk score for STEMI was derived in ExTRACT‐TIMI 25 and validated in TRITON‐TIMI 38. Baseline variables were from the original TIMI risk score for STEMI. New variables were major clinical events occurring during the index hospitalization. Each variable was tested individually in a univariate Cox proportional hazards regression. Variables with P<0.05 were incorporated into a full multivariable Cox model to assess the risk of death at 1 year. Each variable was assigned an integer value based on the odds ratio, and the final score was the sum of these values. The dynamic score included the development of in‐hospital MI, arrhythmia, major bleed, stroke, congestive heart failure, recurrent ischemia, and renal failure. The C‐statistic produced by the dynamic score in the derivation database was 0.76, with a net reclassification improvement (NRI) of 0.33 (P<0.0001) from the inclusion of dynamic events to the original TIMI risk score. In the validation database, the C‐statistic was 0.81, with a NRI of 0.35 (P=0.01). Conclusions This score is a prospectively derived, validated means of estimating 1‐year mortality of STEMI at hospital discharge and can serve as a clinically useful tool. By incorporating events during the index hospitalization, it can better define risk and help to guide treatment decisions. PMID:23525425
NASA Astrophysics Data System (ADS)
Oberlack, Martin; Rosteck, Andreas; Avsarkisov, Victor
2013-11-01
Text-book knowledge proclaims that Lie symmetries such as Galilean transformation lie at the heart of fluid dynamics. These important properties also carry over to the statistical description of turbulence, i.e. to the Reynolds stress transport equations and its generalization, the multi-point correlation equations (MPCE). Interesting enough, the MPCE admit a much larger set of symmetries, in fact infinite dimensional, subsequently named statistical symmetries. Most important, theses new symmetries have important consequences for our understanding of turbulent scaling laws. The symmetries form the essential foundation to construct exact solutions to the infinite set of MPCE, which in turn are identified as classical and new turbulent scaling laws. Examples on various classical and new shear flow scaling laws including higher order moments will be presented. Even new scaling have been forecasted from these symmetries and in turn validated by DNS. Turbulence modellers have implicitly recognized at least one of the statistical symmetries as this is the basis for the usual log-law which has been employed for calibrating essentially all engineering turbulence models. An obvious conclusion is to generally make turbulence models consistent with the new statistical symmetries.
Zamanzadeh, Vahid; Ghahramanian, Akram; Rassouli, Maryam; Abbaszadeh, Abbas; Alavi-Majd, Hamid; Nikanfar, Ali-Reza
2015-01-01
Introduction: The importance of content validity in the instrument psychometric and its relevance with reliability, have made it an essential step in the instrument development. This article attempts to give an overview of the content validity process and to explain the complexity of this process by introducing an example. Methods: We carried out a methodological study conducted to examine the content validity of the patient-centered communication instrument through a two-step process (development and judgment). At the first step, domain determination, sampling (item generation) and instrument formation and at the second step, content validity ratio, content validity index and modified kappa statistic was performed. Suggestions of expert panel and item impact scores are used to examine the instrument face validity. Results: From a set of 188 items, content validity process identified seven dimensions includes trust building (eight items), informational support (seven items), emotional support (five items), problem solving (seven items), patient activation (10 items), intimacy/friendship (six items) and spirituality strengthening (14 items). Content validity study revealed that this instrument enjoys an appropriate level of content validity. The overall content validity index of the instrument using universal agreement approach was low; however, it can be advocated with respect to the high number of content experts that makes consensus difficult and high value of the S-CVI with the average approach, which was equal to 0.93. Conclusion: This article illustrates acceptable quantities indices for content validity a new instrument and outlines them during design and psychometrics of patient-centered communication measuring instrument. PMID:26161370
A prospective study of the validity of self-reported use of specific types of dental services.
Gilbert, Gregg H; Rose, John S; Shelton, Brent J
2003-01-01
The purpose of this study was to quantify the validity of self-reported receipt of dental services in 10 categories, using information from dental charts as the "gold standard." The Florida Dental Care Study was a prospective cohort study of a diverse sample of adults. In-person interviews were conducted at baseline and at 24 and 48 months following baseline, with telephone interviews at six-month intervals in between. Participants reported new dental visits, reason(s) for the visit(s), and specific service(s) received. For the present study, self-reported data were compared with data from patients' dental charts. Percent concordance between self-report and dental charts ranged from 82% to 100%, while Kappa values ranged from 0.33 to 0.91. Bivariate multiple logistic regressions were performed for each of the service categories, with two outcomes: self-reported service receipt and service receipt determined from the dental chart. Parameter estimate intervals overlapped for each of the four hypothesized predictors of service receipt (age group, sex, "race" defined as non-Hispanic African American vs. non-Hispanic white, and annual household income < 20,000 US dollars vs. > or = 20,000 US dollars), although for five of the 10 service categories, there were differences in conclusions about statistical significance for certain predictors. The validity of self-reported use of dental services ranged from poor to excellent, depending upon the service type. Regression estimates using either the self-reported or chart-validated measure yielded similar results overall, but conclusions about key predictors of service use differed in some instances. Self-reported dental service use is valid for some, but not all, service types.
Statistical validation of normal tissue complication probability models.
Xu, Cheng-Jian; van der Schaaf, Arjen; Van't Veld, Aart A; Langendijk, Johannes A; Schilstra, Cornelis
2012-09-01
To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. A penalized regression method, LASSO (least absolute shrinkage and selection operator), was used to build NTCP models for xerostomia after radiation therapy treatment of head-and-neck cancer. Model assessment was based on the likelihood function and the area under the receiver operating characteristic curve. Repeated double cross-validation showed the uncertainty and instability of the NTCP models and indicated that the statistical significance of model performance can be obtained by permutation testing. Repeated double cross-validation and permutation tests are recommended to validate NTCP models before clinical use. Copyright © 2012 Elsevier Inc. All rights reserved.
The effects of BleedArrest on hemorrhage control in a porcine model.
Gegel, Brian; Burgert, James; Loughren, Michael; Johnson, Don
2012-01-01
The purpose of this study was to examine the effectiveness of the hemostatic agent BleedArrest compared to control. This was a prospective, experimental design employing an established porcine model of uncontrolled hemorrhage. The minimum number of animals (n=10 per group) was used to obtain a statistically valid result. There were no statistically significant differences between the groups (P>.05) indicating that the groups were equivalent on the following parameters: activating clotting time, the subject weights, core body temperatures, amount of one minute hemorrhage, arterial blood pressures, and the amount and percentage of total blood volume. There were significant differences in the amount of hemorrhage (P=.033) between the BleedArrest (mean=72, SD±72 mL) and control (mean=317.30, SD±112.02 mL). BleedArrest is statistically and clinically superior at controlling hemorrhage compared to the standard pressure dressing control group. In conclusion, BleedArrest is an effective hemostatic agent for use in civilian and military trauma management.
[Health-related behavior in a sample of Brazilian college students: gender differences].
Colares, Viviane; Franca, Carolina da; Gonzalez, Emília
2009-03-01
This study investigated whether undergraduate students' health-risk behaviors differed according to gender. The sample consisted of 382 subjects, aged 20-29 years, from public universities in Pernambuco State, Brazil. Data were collected using the National College Health Risk Behavior Survey, previously validated in Portuguese. Descriptive and inferential statistical techniques were used. Associations were analyzed with the chi-square test or Fisher's exact test. Statistical significance was set at p < or = 0.05. In general, females engaged in the following risk behaviors less frequently than males: alcohol consumption (p = 0.005), smoking (p = 0.002), experimenting with marijuana (p = 0.002), consumption of inhalants (p < or = 0.001), steroid use (p = 0.003), carrying weapons (p = 0.001), and involvement in physical fights (p = 0.014). Meanwhile, female students displayed more concern about losing or maintaining weight, although they exercised less frequently than males. The findings thus showed statistically different health behaviors between genders. In conclusion, different approaches need to be used for the two genders.
Barrio, P A; Crespillo, M; Luque, J A; Aler, M; Baeza-Richer, C; Baldassarri, L; Carnevali, E; Coufalova, P; Flores, I; García, O; García, M A; González, R; Hernández, A; Inglés, V; Luque, G M; Mosquera-Miguel, A; Pedrosa, S; Pontes, M L; Porto, M J; Posada, Y; Ramella, M I; Ribeiro, T; Riego, E; Sala, A; Saragoni, V G; Serrano, A; Vannelli, S
2018-07-01
One of the main goals of the Spanish and Portuguese-Speaking Group of the International Society for Forensic Genetics (GHEP-ISFG) is to promote and contribute to the development and dissemination of scientific knowledge in the field of forensic genetics. Due to this fact, GHEP-ISFG holds different working commissions that are set up to develop activities in scientific aspects of general interest. One of them, the Mixture Commission of GHEP-ISFG, has organized annually, since 2009, a collaborative exercise on analysis and interpretation of autosomal short tandem repeat (STR) mixture profiles. Until now, six exercises have been organized. At the present edition (GHEP-MIX06), with 25 participant laboratories, the exercise main aim was to assess mixture profiles results by issuing a report, from the proposal of a complex mock case. One of the conclusions obtained from this exercise is the increasing tendency of participating laboratories to validate DNA mixture profiles analysis following international recommendations. However, the results have shown some differences among them regarding the edition and also the interpretation of mixture profiles. Besides, although the last revision of ISO/IEC 17025:2017 gives indications of how results should be reported, not all laboratories strictly follow their recommendations. Regarding the statistical aspect, all those laboratories that have performed statistical evaluation of the data have employed the likelihood ratio (LR) as a parameter to evaluate the statistical compatibility. However, LR values obtained show a wide range of variation. This fact could not be attributed to the software employed, since the vast majority of laboratories that performed LR calculation employed the same software (LRmixStudio). Thus, the final allelic composition of the edited mixture profile and the parameters employed in the software could explain this data dispersion. This highlights the need, for each laboratory, to define through internal validations its criteria for editing and interpreting mixtures, and to continuous train in software handling. Copyright © 2018 Elsevier B.V. All rights reserved.
Knowledge of the pelvic floor in nulliparous women
Neels, Hedwig; Wyndaele, Jean-Jacques; Tjalma, Wiebren A. A.; De Wachter, Stefan; Wyndaele, Michel; Vermandel, Alexandra
2016-01-01
[Purpose] Proper pelvic floor function is important to avoid serious dysfunctions including incontinence, prolapse, and sexual problems. The current study evaluated the knowledge of young nulliparous women about their pelvic floor and identified what additional information they wanted. [Subjects and Methods] In this cross-sectional survey, a validated, 36 item questionnaire was distributed to 212 nulliparous women. The questionnaire addressed demography, pelvic floor muscles, pelvic floor dysfunction, and possible information sources. Descriptive statistics were generated for all variables. Stability and validity testing were performed using Kappa statistics and intra class correlation coefficients to define agreement for each question. The study was approved by the ethics Committee (B300201318334). [Results] Using a VAS scale (0 to 10), the women rated their knowledge about the pelvic floor as a mean of 2.4 (SD 2.01). A total of 93% of the women were insufficiently informed and requested more information; 25% had concerns about developing urinary incontinence, and 14% about fecal incontinence. Many of the women were unaware what pelvic floor training meant. [Conclusion] There was a significant lack of knowledge about pelvic floor function among nulliparous women. The majority of nulliparous women expressed a need for education, which might offer a way to reduce dysfunction. PMID:27313364
Neman, R
1975-03-01
The Zigler and Seitz (1975) critique was carefully examined with respect to the conclusions of the Neman et al. (1975) study. Particular attention was given to the following questions: (a) did experimenter bias or commitment account for the results, (b) were unreliable and invalid psychometric instruments used, (c) were the statistical analyses insufficient or incorrect, (d) did the results reflect no more than the operation of chance, and (e) were the results biased by artifactually inflated profile scores. Experimenter bias and commitment were shown to be insufficient to account for the results; a further review of Buros (1972) showed that there was no need for apprehension about the testing instruments; the statistical analyses were shown to exceed prevailing standards for research reporting; the results were shown to reflect valid findings at the .05 probability level; and the Neman et al. (1975) results for the profile measure were equally significant using either "raw" neurological scores or "scales" neurological age scores. Zigler, Seitz, and I agreed on the needs for (a) using multivariate analyses, where applicable, in studies having more than one dependent variable; (b) defining the population for which sensorimotor training procedures may be appropriately prescribed; and (c) validating the profile measure as a tool to assess neurological disorganization.
Tuğcu-Demiröz, Fatmanur; Gonzalez-Alvarez, Isabel; Gonzalez-Alvarez, Marta; Bermejo, Marival
2014-10-01
The aim of the present study was to develop a method for water flux reabsorption measurement in Doluisio's Perfusion Technique based on the use of phenol red as a non-absorbable marker and to validate it by comparison with gravimetric procedure. The compounds selected for the study were metoprolol, atenolol, cimetidine and cefadroxil in order to include low, intermediate and high permeability drugs absorbed by passive diffusion and by carrier mediated mechanism. The intestinal permeabilities (Peff) of the drugs were obtained in male and female Wistar rats and calculated using both methods of water flux correction. The absorption rate coefficients of all the assayed compounds did not show statistically significant differences between male and female rats consequently all the individual values were combined to compare between reabsorption methods. The absorption rate coefficients and permeability values did not show statistically significant differences between the two strategies of concentration correction. The apparent zero order water absorption coefficients were also similar in both correction procedures. In conclusion gravimetric and phenol red method for water reabsorption correction are accurate and interchangeable for permeability estimation in closed loop perfusion method. Copyright © 2014 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Idris, Khairiani; Yang, Kai-Lin
2017-01-01
This article reports the results of a mixed-methods approach to develop and validate an instrument to measure Indonesian pre-service teachers' conceptions of statistics. First, a phenomenographic study involving a sample of 44 participants uncovered six categories of conceptions of statistics. Second, an instrument of conceptions of statistics was…
[Psychometric properties and diagnostic value of 'lexical screening for aphasias'].
Pena-Chavez, R; Martinez-Jimenez, L; Lopez-Espinoza, M
2014-09-16
INTRODUCTION. Language assessment in persons with brain injury makes it possible to know whether they require language rehabilitation or not. Given the importance of a precise evaluation, assessment instruments must be valid and reliable, so as to avoid mistaken and subjective diagnoses. AIM. To validate 'lexical screening for aphasias' in a sample of 58 Chilean individuals. SUBJECTS AND METHODS. A screening-type language test, lasting 20 minutes and based on the lexical processing model devised by Patterson and Shewell (1987), was constructed. The sample was made up of two groups containing 29 aphasic subjects and 29 control subjects from different health centres in the regions of Biobio and Maule, Chile. Their ages ranged between 24 and 79 years and had between 0 and 17 years' schooling. Tests were carried out to determine discriminating validity, concurrent validity with the aphasia disorder assessment battery, reliability, sensitivity and specificity. RESULTS. The statistical analysis showed a high discriminating validity (p < 0.001), an acceptable mean concurrent validity with aphasia disorder assessment battery (rs = 0.65), high mean reliability (alpha = 0.87), moderate mean sensitivity (69%) and high mean specificity (86%). CONCLUSION. 'Lexical screening for aphasias' is valid and reliable for assessing language in persons with aphasias; it is sensitive for detecting aphasic subjects and is specific for precluding language disorders in persons with normal language abilities.
Risk-based Methodology for Validation of Pharmaceutical Batch Processes.
Wiles, Frederick
2013-01-01
In January 2011, the U.S. Food and Drug Administration published new process validation guidance for pharmaceutical processes. The new guidance debunks the long-held industry notion that three consecutive validation batches or runs are all that are required to demonstrate that a process is operating in a validated state. Instead, the new guidance now emphasizes that the level of monitoring and testing performed during process performance qualification (PPQ) studies must be sufficient to demonstrate statistical confidence both within and between batches. In some cases, three qualification runs may not be enough. Nearly two years after the guidance was first published, little has been written defining a statistical methodology for determining the number of samples and qualification runs required to satisfy Stage 2 requirements of the new guidance. This article proposes using a combination of risk assessment, control charting, and capability statistics to define the monitoring and testing scheme required to show that a pharmaceutical batch process is operating in a validated state. In this methodology, an assessment of process risk is performed through application of a process failure mode, effects, and criticality analysis (PFMECA). The output of PFMECA is used to select appropriate levels of statistical confidence and coverage which, in turn, are used in capability calculations to determine when significant Stage 2 (PPQ) milestones have been met. The achievement of Stage 2 milestones signals the release of batches for commercial distribution and the reduction of monitoring and testing to commercial production levels. Individuals, moving range, and range/sigma charts are used in conjunction with capability statistics to demonstrate that the commercial process is operating in a state of statistical control. The new process validation guidance published by the U.S. Food and Drug Administration in January of 2011 indicates that the number of process validation batches or runs required to demonstrate that a pharmaceutical process is operating in a validated state should be based on sound statistical principles. The old rule of "three consecutive batches and you're done" is no longer sufficient. The guidance, however, does not provide any specific methodology for determining the number of runs required, and little has been published to augment this shortcoming. The paper titled "Risk-based Methodology for Validation of Pharmaceutical Batch Processes" describes a statistically sound methodology for determining when a statistically valid number of validation runs has been acquired based on risk assessment and calculation of process capability.
Moss, Travis J.; Lake, Douglas E.; Forrest Calland, J; Enfield, Kyle B; Delos, John B.; Fairchild, Karen D.; Randall Moorman, J.
2016-01-01
Objective Patients in intensive care units are susceptible to subacute, potentially catastrophic illnesses such as respiratory failure, sepsis, and hemorrhage that present as severe derangements of vital signs. More subtle physiologic signatures may be present before clinical deterioration, when treatment might be more effective. We performed multivariate statistical analyses of bedside physiologic monitoring data to identify such early, subclinical signatures of incipient life-threatening illness. Design We report a study of model development and validation of a retrospective observational cohort using resampling (TRIPOD Type 1b internal validation), and a study of model validation using separate data (Type 2b internal/external validation). Setting University of Virginia Health System (Charlottesville), a tertiary-care, academic medical center. Patients Critically ill patients consecutively admitted between January 2009 and June 2015 to either the neonatal, surgical/trauma/burn, or medical intensive care units with available physiologic monitoring data. Interventions None. Measurements and Main Results We analyzed 146 patient-years of vital sign and electrocardiography waveform time series from the bedside monitors of 9,232 ICU admissions. Calculations from 30-minute windows of the physiologic monitoring data were made every 15 minutes. Clinicians identified 1,206 episodes of respiratory failure leading to urgent, unplanned intubation, sepsis, or hemorrhage leading to multi-unit transfusions from systematic, individual chart reviews. Multivariate models to predict events up to 24 hours prior had internally-validated C-statistics of 0.61 to 0.88. In adults, physiologic signatures of respiratory failure and hemorrhage were distinct from each other but externally consistent across ICUs. Sepsis, on the other hand, demonstrated less distinct and inconsistent signatures. Physiologic signatures of all neonatal illnesses were similar. Conclusions Subacute, potentially catastrophic illnesses in 3 diverse ICU populations have physiologic signatures that are detectable in the hours preceding clinical detection and intervention. Detection of such signatures can draw attention to patients at highest risk, potentially enabling earlier intervention and better outcomes. PMID:27452809
Dimitrov, Borislav D; Motterlini, Nicola; Fahey, Tom
2015-01-01
Objective Estimating calibration performance of clinical prediction rules (CPRs) in systematic reviews of validation studies is not possible when predicted values are neither published nor accessible or sufficient or no individual participant or patient data are available. Our aims were to describe a simplified approach for outcomes prediction and calibration assessment and evaluate its functionality and validity. Study design and methods: Methodological study of systematic reviews of validation studies of CPRs: a) ABCD2 rule for prediction of 7 day stroke; and b) CRB-65 rule for prediction of 30 day mortality. Predicted outcomes in a sample validation study were computed by CPR distribution patterns (“derivation model”). As confirmation, a logistic regression model (with derivation study coefficients) was applied to CPR-based dummy variables in the validation study. Meta-analysis of validation studies provided pooled estimates of “predicted:observed” risk ratios (RRs), 95% confidence intervals (CIs), and indexes of heterogeneity (I2) on forest plots (fixed and random effects models), with and without adjustment of intercepts. The above approach was also applied to the CRB-65 rule. Results Our simplified method, applied to ABCD2 rule in three risk strata (low, 0–3; intermediate, 4–5; high, 6–7 points), indicated that predictions are identical to those computed by univariate, CPR-based logistic regression model. Discrimination was good (c-statistics =0.61–0.82), however, calibration in some studies was low. In such cases with miscalibration, the under-prediction (RRs =0.73–0.91, 95% CIs 0.41–1.48) could be further corrected by intercept adjustment to account for incidence differences. An improvement of both heterogeneities and P-values (Hosmer-Lemeshow goodness-of-fit test) was observed. Better calibration and improved pooled RRs (0.90–1.06), with narrower 95% CIs (0.57–1.41) were achieved. Conclusion Our results have an immediate clinical implication in situations when predicted outcomes in CPR validation studies are lacking or deficient by describing how such predictions can be obtained by everyone using the derivation study alone, without any need for highly specialized knowledge or sophisticated statistics. PMID:25931829
Basic biostatistics for post-graduate students
Dakhale, Ganesh N.; Hiware, Sachin K.; Shinde, Abhijit T.; Mahatme, Mohini S.
2012-01-01
Statistical methods are important to draw valid conclusions from the obtained data. This article provides background information related to fundamental methods and techniques in biostatistics for the use of postgraduate students. Main focus is given to types of data, measurement of central variations and basic tests, which are useful for analysis of different types of observations. Few parameters like normal distribution, calculation of sample size, level of significance, null hypothesis, indices of variability, and different test are explained in detail by giving suitable examples. Using these guidelines, we are confident enough that postgraduate students will be able to classify distribution of data along with application of proper test. Information is also given regarding various free software programs and websites useful for calculations of statistics. Thus, postgraduate students will be benefitted in both ways whether they opt for academics or for industry. PMID:23087501
Prabhakaran, Shyam; Jovin, Tudor G.; Tayal, Ashis H.; Hussain, Muhammad S.; Nguyen, Thanh N.; Sheth, Kevin N.; Terry, John B.; Nogueira, Raul G.; Horev, Anat; Gandhi, Dheeraj; Wisco, Dolora; Glenn, Brenda A.; Ludwig, Bryan; Clemmons, Paul F.; Cronin, Carolyn A.; Tian, Melissa; Liebeskind, David; Zaidat, Osama O.; Castonguay, Alicia C.; Martin, Coleman; Mueller-Kronast, Nils; English, Joey D.; Linfante, Italo; Malisch, Timothy W.; Gupta, Rishi
2014-01-01
Background There are multiple clinical and radiographic factors that influence outcomes after endovascular reperfusion therapy (ERT) in acute ischemic stroke (AIS). We sought to derive and validate an outcome prediction score for AIS patients undergoing ERT based on readily available pretreatment and posttreatment factors. Methods The derivation cohort included 511 patients with anterior circulation AIS treated with ERT at 10 centers between September 2009 and July 2011. The prospective validation cohort included 223 patients with anterior circulation AIS treated in the North American Solitaire Acute Stroke registry. Multivariable logistic regression identified predictors of good outcome (modified Rankin score ≤2 at 3 months) in the derivation cohort; model β coefficients were used to assign points and calculate a risk score. Discrimination was tested using C statistics with 95% confidence intervals (CIs) in the derivation and validation cohorts. Calibration was assessed using the Hosmer-Lemeshow test and plots of observed to expected outcomes. We assessed the net reclassification improvement for the derived score compared to the Totaled Health Risks in Vascular Events (THRIVE) score. Subgroup analysis in patients with pretreatment Alberta Stroke Program Early CT Score (ASPECTS) and posttreatment final infarct volume measurements was also performed to identify whether these radiographic predictors improved the model compared to simpler models. Results Good outcome was noted in 186 (36.4%) and 100 patients (44.8%) in the derivation and validation cohorts, respectively. Combining readily available pretreatment and posttreatment variables, we created a score (acronym: SNARL) based on the following parameters: symptomatic hemorrhage [2 points: none, hemorrhagic infarction (HI)1–2 or parenchymal hematoma (PH) type 1; 0 points: PH2], baseline National Institutes of Health Stroke Scale score (3 points: 0–10; 1 point: 11–20; 0 points: >20), age (2 points: <60 years; 1 point: 60–79 years; 0 points: >79 years), reperfusion (3 points: Thrombolysis In Cerebral Ischemia score 2b or 3) and location of clot (1 point: M2; 0 points: M1 or internal carotid artery). The SNARL score demonstrated good discrimination in the derivation (C statistic 0.79, 95% CI 0.75–0.83) and validation cohorts (C statistic 0.74, 95% CI 0.68–0.81) and was superior to the THRIVE score (derivation cohort: C statistic 0.65, 95% CI 0.60–0.70; validation cohort: C-statistic 0.59, 95% CI 0.52–0.67; p < 0.01 in both cohorts) but was inferior to a score that included age, ASPECTS, reperfusion status and final infarct volume (C statistic 0.86, 95% CI 0.82–0.91; p = 0.04). Compared with the THRIVE score, the SNARL score resulted in a net reclassification improvement of 34.8%. Conclusions Among AIS patients treated with ERT, pretreatment scores such as the THRIVE score provide only fair prognostic information. Inclusion of posttreatment variables such as reperfusion and symptomatic hemorrhage greatly influences outcome and results in improved outcome prediction. PMID:24942008
Structural parameters of young star clusters: fractal analysis
NASA Astrophysics Data System (ADS)
Hetem, A.
2017-07-01
A unified view of star formation in the Universe demand detailed and in-depth studies of young star clusters. This work is related to our previous study of fractal statistics estimated for a sample of young stellar clusters (Gregorio-Hetem et al. 2015, MNRAS 448, 2504). The structural properties can lead to significant conclusions about the early stages of cluster formation: 1) virial conditions can be used to distinguish warm collapsed; 2) bound or unbound behaviour can lead to conclusions about expansion; and 3) fractal statistics are correlated to the dynamical evolution and age. The technique of error bars estimation most used in the literature is to adopt inferential methods (like bootstrap) to estimate deviation and variance, which are valid only for an artificially generated cluster. In this paper, we expanded the number of studied clusters, in order to enhance the investigation of the cluster properties and dynamic evolution. The structural parameters were compared with fractal statistics and reveal that the clusters radial density profile show a tendency of the mean separation of the stars increase with the average surface density. The sample can be divided into two groups showing different dynamic behaviour, but they have the same dynamic evolution, since the entire sample was revealed as being expanding objects, for which the substructures do not seem to have been completely erased. These results are in agreement with the simulations adopting low surface densities and supervirial conditions.
IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics
2016-01-01
Background We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. Objective To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. Methods The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Results Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix. Conclusions IBMWA is a new alternative for data analytics software that automates descriptive, predictive, and visual analytics. This program is very user-friendly but requires data preprocessing, statistical conceptual understanding, and domain expertise. PMID:27729304
Riley, Richard D.
2017-01-01
An important question for clinicians appraising a meta‐analysis is: are the findings likely to be valid in their own practice—does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity—where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple (‘leave‐one‐out’) cross‐validation technique, we demonstrate how we may test meta‐analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta‐analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta‐analysis and a tailored meta‐regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within‐study variance, between‐study variance, study sample size, and the number of studies in the meta‐analysis. Finally, we apply Vn to two published meta‐analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta‐analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28620945
DOT National Transportation Integrated Search
1979-03-01
There are several conditions that can influence the calculation of the statistical validity of a test battery such as that used to selected Air Traffic Control Specialists. Two conditions of prime importance to statistical validity are recruitment pr...
Torres, Heloísa de Carvalho; Chaves, Fernanda Figueredo; da Silva, Daniel Dutra Romualdo; Bosco, Adriana Aparecida; Gabriel, Beatriz Diniz; Reis, Ilka Afonso; Rodrigues, Júlia Santos Nunes; Pagano, Adriana Silvina
2016-01-01
ABSTRACT Objective: to translate, adapt and validate the contents of the Diabetes Medical Management Plan for the Brazilian context. This protocol was developed by the American Diabetes Association and guides the procedure of educators for the care of children and adolescents with diabetes in schools. Method: this methodological study was conducted in four stages: initial translation, synthesis of initial translation, back translation and content validation by an expert committee, composed of 94 specialists (29 applied linguists and 65 health professionals), for evaluation of the translated version through an online questionnaire. The concordance level of the judges was calculated based on the Content Validity Index. Data were exported into the R program for statistical analysis: Results: the evaluation of the instrument showed good concordance between the judges of the Health and Applied Linguistics areas, with a mean content validity index of 0.9 and 0.89, respectively, and slight variability of the index between groups (difference of less than 0.01). The items in the translated version, evaluated as unsatisfactory by the judges, were reformulated based on the considerations of the professionals of each group. Conclusion: a Brazilian version of Diabetes Medical Management Plan was constructed, called the Plano de Manejo do Diabetes na Escola. PMID:27508911
History and development of the Schmidt-Hunter meta-analysis methods.
Schmidt, Frank L
2015-09-01
In this article, I provide answers to the questions posed by Will Shadish about the history and development of the Schmidt-Hunter methods of meta-analysis. In the 1970s, I headed a research program on personnel selection at the US Office of Personnel Management (OPM). After our research showed that validity studies have low statistical power, OPM felt a need for a better way to demonstrate test validity, especially in light of court cases challenging selection methods. In response, we created our method of meta-analysis (initially called validity generalization). Results showed that most of the variability of validity estimates from study to study was because of sampling error and other research artifacts such as variations in range restriction and measurement error. Corrections for these artifacts in our research and in replications by others showed that the predictive validity of most tests was high and generalizable. This conclusion challenged long-standing beliefs and so provoked resistance, which over time was overcome. The 1982 book that we published extending these methods to research areas beyond personnel selection was positively received and was followed by expanded books in 1990, 2004, and 2014. Today, these methods are being applied in a wide variety of areas. Copyright © 2015 John Wiley & Sons, Ltd.
Yalın Sapmaz, Şermin; Ergin, Dilek; Şen Celasin, Nesrin; Özek Erkuran, Handan; Karaarslan, Duygu; Öztekin, Siğnem; Uzel Tanrıverdi, Bengisu; Köroğlu, Ertuğrul; Aydemir, Ömer
2017-01-01
The goal of this study was to assess the validity and reliability of the Turkish version of the DSM-5 (Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition) Dissociative Symptoms Severity Scale-Child Form. The scale was prepared by translating and then back-translating the DSM-5 Dissociative Symptoms Severity Scale. The study groups included one group of 30 patients diagnosed with posttraumatic stress disorder who were treated in a child and adolescent psychiatry unit and another group of 83 healthy volunteers from middle and high schools in the community. For assessment, the Adolescent Dissociative Experiences Scale (ADES) was used in addition to the DSM-5 Dissociative Symptoms Severity Scale. Regarding the reliability of the DSM-5 Dissociative Symptoms Severity Scale, Cronbach's alpha was .824 and item-total score correlation coefficients were between .464 and .648. The test-retest correlation coefficient was calculated to be r = .784. In terms of construct validity, one factor accounted for 45.2% of the variance. Furthermore, in terms of concurrent validity, the scale showed a high correlation with the ADES. In conclusion, the Turkish version of the DSM-5 Dissociative Symptoms Severity Scale-Child Form is a valid and reliable tool for both clinical practice and research.
Validation of the Malay Version of the Inventory of Functional Status after Childbirth Questionnaire
Noor, Norhayati Mohd; Aziz, Aniza Abd.; Mostapa, Mohd Rosmizaki; Awang, Zainudin
2015-01-01
Objective. This study was designed to examine the psychometric properties of Malay version of the Inventory of Functional Status after Childbirth (IFSAC). Design. A cross-sectional study. Materials and Methods. A total of 108 postpartum mothers attending Obstetrics and Gynaecology Clinic, in a tertiary teaching hospital in Malaysia, were involved. Construct validity and internal consistency were performed after the translation, content validity, and face validity process. The data were analyzed using Analysis of Moment Structure version 18 and Statistical Packages for the Social Sciences version 20. Results. The final model consists of four constructs, namely, infant care, personal care, household activities, and social and community activities, with 18 items demonstrating acceptable factor loadings, domain to domain correlation, and best fit (Chi-squared/degree of freedom = 1.678; Tucker-Lewis index = 0.923; comparative fit index = 0.936; and root mean square error of approximation = 0.080). Composite reliability and average variance extracted of the domains ranged from 0.659 to 0.921 and from 0.499 to 0.628, respectively. Conclusion. The study suggested that the four-factor model with 18 items of the Malay version of IFSAC was acceptable to be used to measure functional status after childbirth because it is valid, reliable, and simple. PMID:25667932
FLiGS Score: A New Method of Outcome Assessment for Lip Carcinoma–Treated Patients
Grassi, Rita; Toia, Francesca; Di Rosa, Luigi; Cordova, Adriana
2015-01-01
Background: Lip cancer and its treatment have considerable functional and cosmetic effects with resultant nutritional and physical detriments. As we continue to investigate new treatment regimens, we are simultaneously required to assess postoperative outcomes to design interventions that lessen the adverse impact of this disease process. We wish to introduce Functional Lip Glasgow Scale (FLiGS) score as a new method of outcome assessment to measure the effect of lip cancer and its treatment on patients’ daily functioning. Methods: Fifty patients affected by lip squamous cell carcinoma were recruited between 2009 and 2013. Patients were asked to fill the FLiGS questionnaire before surgery, 1 month, 6 months, and 1 year after surgery. The subscores were used to calculate a total FLiGS score of global oral disability. Statistical analysis was performed to test validity and reliability. Results: FLiGS scores improved significantly from preoperative to 12 months postoperative values (P = 0.000). Statistical evidence of validity was provided through rs (Spearman correlation coefficient) that resulted >0.30 for all surveys and for which P < 0.001. FLiGS score reliability was shown through examination of internal consistency and test-retest reliability. Conclusions: FLiGS score is a simple way of assessing functional impairment related to lip cancer before and after surgery; it is sensitive, valid, reliable, and clinically relevant: it provides useful information to orient the physician in the postoperative management and in the rehabilitation program. PMID:26034652
Park, Joanne; Roberts, Mary Roduta; Esmail, Shaniff; Rayani, Fahreen; Norris, Colleen M; Gross, Douglas P
2018-06-01
Purpose To examine construct and concurrent validity of the Readiness for Return-To-Work (RRTW) Scale with injured workers participating in an outpatient occupational rehabilitation program. Methods Lost-time claimants (n = 389) with sub-acute or chronic musculoskeletal disorders completed the RRTW Scale on their first day of their occupational rehabilitation program. Statistical analysis included exploratory and confirmatory factor analyses of the readiness items, reliability analyses, and correlation with related scales and questionnaires. Results For claimants in the non-job attached/not working group (n = 165), three factors were found (1) Contemplation (2) Prepared for Action-Self-evaluative and (3) Prepared for Action-Behavioural. The precontemplation stage was not identified within this sample of injured workers. For claimants who were job attached/working group in some capacity (n = 224), two factors were identified (1) Uncertain Maintenance and (2) Proactive Maintenance. Expected relationships and statistically significant differences were found among the identified Return-To-Work (RTW) readiness factors and related constructs of pain, physical and mental health and RTW expectations. Conclusion Construct and concurrent validity of the RRTW Scale were supported in this study. The results of this study indicate the construct of readiness for RTW can vary by disability duration and occupational category. Physical health appears to be a significant barrier to RRTW for the job attached/working group while mental health significantly compromises RRTW with the non-job attached/not working group.
Deep Learning to Classify Radiology Free-Text Reports.
Chen, Matthew C; Ball, Robyn L; Yang, Lingyao; Moradzadeh, Nathaniel; Chapman, Brian E; Larson, David B; Langlotz, Curtis P; Amrhein, Timothy J; Lungren, Matthew P
2018-03-01
Purpose To evaluate the performance of a deep learning convolutional neural network (CNN) model compared with a traditional natural language processing (NLP) model in extracting pulmonary embolism (PE) findings from thoracic computed tomography (CT) reports from two institutions. Materials and Methods Contrast material-enhanced CT examinations of the chest performed between January 1, 1998, and January 1, 2016, were selected. Annotations by two human radiologists were made for three categories: the presence, chronicity, and location of PE. Classification of performance of a CNN model with an unsupervised learning algorithm for obtaining vector representations of words was compared with the open-source application PeFinder. Sensitivity, specificity, accuracy, and F1 scores for both the CNN model and PeFinder in the internal and external validation sets were determined. Results The CNN model demonstrated an accuracy of 99% and an area under the curve value of 0.97. For internal validation report data, the CNN model had a statistically significant larger F1 score (0.938) than did PeFinder (0.867) when classifying findings as either PE positive or PE negative, but no significant difference in sensitivity, specificity, or accuracy was found. For external validation report data, no statistical difference between the performance of the CNN model and PeFinder was found. Conclusion A deep learning CNN model can classify radiology free-text reports with accuracy equivalent to or beyond that of an existing traditional NLP model. © RSNA, 2017 Online supplemental material is available for this article.
Correcting for Optimistic Prediction in Small Data Sets
Smith, Gordon C. S.; Seaman, Shaun R.; Wood, Angela M.; Royston, Patrick; White, Ian R.
2014-01-01
The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. We used clinical data sets (United Kingdom Down syndrome screening data from Glasgow (1991–2003), Edinburgh (1999–2003), and Cambridge (1990–2006), as well as Scottish national pregnancy discharge data (2004–2007)) to evaluate different approaches to adjustment for optimism. We found that sample splitting, cross-validation without replication, and leave-1-out cross-validation produced optimism-adjusted estimates of the C statistic that were biased and/or associated with greater absolute error than other available methods. Cross-validation with replication, bootstrapping, and a new method (leave-pair-out cross-validation) all generated unbiased optimism-adjusted estimates of the C statistic and had similar absolute errors in the clinical data set. Larger simulation studies confirmed that all 3 methods performed similarly with 10 or more events per variable, or when the C statistic was 0.9 or greater. However, with lower events per variable or lower C statistics, bootstrapping tended to be optimistic but with lower absolute and mean squared errors than both methods of cross-validation. PMID:24966219
Assessing Discriminative Performance at External Validation of Clinical Prediction Models.
Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W
2016-01-01
External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients.
Correcting evaluation bias of relational classifiers with network cross validation
Neville, Jennifer; Gallagher, Brian; Eliassi-Rad, Tina; ...
2011-01-04
Recently, a number of modeling techniques have been developed for data mining and machine learning in relational and network domains where the instances are not independent and identically distributed (i.i.d.). These methods specifically exploit the statistical dependencies among instances in order to improve classification accuracy. However, there has been little focus on how these same dependencies affect our ability to draw accurate conclusions about the performance of the models. More specifically, the complex link structure and attribute dependencies in relational data violate the assumptions of many conventional statistical tests and make it difficult to use these tests to assess themore » models in an unbiased manner. In this work, we examine the task of within-network classification and the question of whether two algorithms will learn models that will result in significantly different levels of performance. We show that the commonly used form of evaluation (paired t-test on overlapping network samples) can result in an unacceptable level of Type I error. Furthermore, we show that Type I error increases as (1) the correlation among instances increases and (2) the size of the evaluation set increases (i.e., the proportion of labeled nodes in the network decreases). Lastly, we propose a method for network cross-validation that combined with paired t-tests produces more acceptable levels of Type I error while still providing reasonable levels of statistical power (i.e., 1–Type II error).« less
Yan, Yu-Xiang; Liu, You-Qin; Li, Man; Hu, Pei-Feng; Guo, Ai-Min; Yang, Xing-Hua; Qiu, Jing-Jun; Yang, Shan-Shan; Shen, Jian; Zhang, Li-Ping; Wang, Wei
2009-01-01
Background Suboptimal health status (SHS) is characterized by ambiguous health complaints, general weakness, and lack of vitality, and has become a new public health challenge in China. It is believed to be a subclinical, reversible stage of chronic disease. Studies of intervention and prognosis for SHS are expected to become increasingly important. Consequently, a reliable and valid instrument to assess SHS is essential. We developed and evaluated a questionnaire for measuring SHS in urban Chinese. Methods Focus group discussions and a literature review provided the basis for the development of the questionnaire. Questionnaire validity and reliability were evaluated in a small pilot study and in a larger cross-sectional study of 3000 individuals. Analyses included tests for reliability and internal consistency, exploratory and confirmatory factor analysis, and tests for discriminative ability and convergent validity. Results The final questionnaire included 25 items on SHS (SHSQ-25), and encompassed 5 subscales: fatigue, the cardiovascular system, the digestive tract, the immune system, and mental status. Overall, 2799 of 3000 participants completed the questionnaire (93.3%). Test-retest reliability coefficients of individual items ranged from 0.89 to 0.98. Item-subscale correlations ranged from 0.51 to 0.72, and Cronbach’s α was 0.70 or higher for all subscales. Factor analysis established 5 distinct domains, as conceptualized in our model. One-way ANOVA showed statistically significant differences in scale scores between 3 occupation groups; these included total scores and subscores (P < 0.01). The correlation between the SHS scores and experienced stress was statistically significant (r = 0.57, P < 0.001). Conclusions The SHSQ-25 is a reliable and valid instrument for measuring sub-health status in urban Chinese. PMID:19749497
A Severe Sepsis Mortality Prediction Model and Score for Use with Administrative Data
Ford, Dee W.; Goodwin, Andrew J.; Simpson, Annie N.; Johnson, Emily; Nadig, Nandita; Simpson, Kit N.
2016-01-01
Objective Administrative data is used for research, quality improvement, and health policy in severe sepsis. However, there is not a sepsis-specific tool applicable to administrative data with which to adjust for illness severity. Our objective was to develop, internally validate, and externally validate a severe sepsis mortality prediction model and associated mortality prediction score. Design Retrospective cohort study using 2012 administrative data from five US states. Three cohorts of patients with severe sepsis were created: 1) ICD-9-CM codes for severe sepsis/septic shock, 2) ‘Martin’ approach, and 3) ‘Angus’ approach. The model was developed and internally validated in ICD-9-CM cohort and externally validated in other cohorts. Integer point values for each predictor variable were generated to create a sepsis severity score. Setting Acute care, non-federal hospitals in NY, MD, FL, MI, and WA Subjects Patients in one of three severe sepsis cohorts: 1) explicitly coded (n=108,448), 2) Martin cohort (n=139,094), and 3) Angus cohort (n=523,637) Interventions None Measurements and Main Results Maximum likelihood estimation logistic regression to develop a predictive model for in-hospital mortality. Model calibration and discrimination assessed via Hosmer-Lemeshow goodness-of-fit (GOF) and C-statistics respectively. Primary cohort subset into risk deciles and observed versus predicted mortality plotted. GOF demonstrated p>0.05 for each cohort demonstrating sound calibration. C-statistic ranged from low of 0.709 (sepsis severity score) to high of 0.838 (Angus cohort) suggesting good to excellent model discrimination. Comparison of observed versus expected mortality was robust although accuracy decreased in highest risk decile. Conclusions Our sepsis severity model and score is a tool that provides reliable risk adjustment for administrative data. PMID:26496452
Morgan, Patrick; Nissi, Mikko J; Hughes, John; Mortazavi, Shabnam; Ellerman, Jutta
2017-07-01
Objectives The purpose of this study was to validate T2* mapping as an objective, noninvasive method for the prediction of acetabular cartilage damage. Methods This is the second step in the validation of T2*. In a previous study, we established a quantitative predictive model for identifying and grading acetabular cartilage damage. In this study, the model was applied to a second cohort of 27 consecutive hips to validate the model. A clinical 3.0-T imaging protocol with T2* mapping was used. Acetabular regions of interest (ROI) were identified on magnetic resonance and graded using the previously established model. Each ROI was then graded in a blinded fashion by arthroscopy. Accurate surgical location of ROIs was facilitated with a 2-dimensional map projection of the acetabulum. A total of 459 ROIs were studied. Results When T2* mapping and arthroscopic assessment were compared, 82% of ROIs were within 1 Beck group (of a total 6 possible) and 32% of ROIs were classified identically. Disease prediction based on receiver operating characteristic curve analysis demonstrated a sensitivity of 0.713 and a specificity of 0.804. Model stability evaluation required no significant changes to the predictive model produced in the initial study. Conclusions These results validate that T2* mapping provides statistically comparable information regarding acetabular cartilage when compared to arthroscopy. In contrast to arthroscopy, T2* mapping is quantitative, noninvasive, and can be used in follow-up. Unlike research quantitative magnetic resonance protocols, T2* takes little time and does not require a contrast agent. This may facilitate its use in the clinical sphere.
Improving the governance of patient safety in emergency care: a systematic review of interventions
Hesselink, Gijs; Berben, Sivera; Beune, Thimpe
2016-01-01
Objectives To systematically review interventions that aim to improve the governance of patient safety within emergency care on effectiveness, reliability, validity and feasibility. Design A systematic review of the literature. Methods PubMed, EMBASE, Cumulative Index to Nursing and Allied Health Literature, the Cochrane Database of Systematic Reviews and PsychInfo were searched for studies published between January 1990 and July 2014. We included studies evaluating interventions relevant for higher management to oversee and manage patient safety, in prehospital emergency medical service (EMS) organisations and hospital-based emergency departments (EDs). Two reviewers independently selected candidate studies, extracted data and assessed study quality. Studies were categorised according to study quality, setting, sample, intervention characteristics and findings. Results Of the 18 included studies, 13 (72%) were non-experimental. Nine studies (50%) reported data on the reliability and/or validity of the intervention. Eight studies (44%) reported on the feasibility of the intervention. Only 4 studies (22%) reported statistically significant effects. The use of a simulation-based training programme and well-designed incident reporting systems led to a statistically significant improvement of safety knowledge and attitudes by ED staff and an increase of incident reports within EDs, respectively. Conclusions Characteristics of the interventions included in this review (eg, anonymous incident reporting and validation of incident reports by an independent party) could provide useful input for the design of an effective tool to govern patient safety in EMS organisations and EDs. However, executives cannot rely on a robust set of evidence-based and feasible tools to govern patient safety within their emergency care organisation and in the chain of emergency care. Established strategies from other high-risk sectors need to be evaluated in emergency care settings, using an experimental design with valid outcome measures to strengthen the evidence base. PMID:26826151
2013-01-01
Background The Scale to Assess Unawareness in Mental Disorder (SUMD) is widely used in clinical trials and epidemiological studies but more rarely in clinical practice because of its length (74 items). In clinical practice, it is necessary to provide shorter instruments. The aim of this study was to investigate the validity and reliability of the abbreviated version of the SUMD. Methods Design: We used data from four cross-sectional studies conducted in several psychiatric hospitals in France. Inclusion criteria: a diagnosis of schizophrenia based on DSM-IV criteria. Data collection: socio-demographic and clinical data (including duration of illness, Positive and Negative Syndrome Scale, and the Calgary Depression Scale); quality of life; SUMD. Statistical analysis: confirmatory factor analyses, item-dimension correlations, Cronbach’s alpha coefficients, Rasch statistics, relationships between the SUMD and other parameters. We tested two different scoring models and considered the response ‘not applicable’ as ‘0’ or as missing data. Results Five hundred and thirty-one patients participated in this study. The 3-factor structure of the SUMD (awareness of the disease, consequences and need for treatment; awareness of positive symptoms; and awareness of negative symptoms) was confirmed using LISREL confirmatory factor analysis for the two models. Internal item consistency and reliability were satisfactory for all dimensions. External validity testing revealed that dimension scores correlated significantly with all PANSS scores, especially with the G12 item (lack of judgement and awareness). Significant associations with age, disease duration, education level, and living arrangements showed good discriminant validity. Conclusion The abbreviated version of the SUMD appears to be a valid and reliable instrument for measuring insight in patients with schizophrenia and may be used by clinicians to accurately assess insight in clinical settings. PMID:24053640
Kievit, Rogier F; Hoes, Arno W; Bots, Michiel L; van Riet, Evelien ES; van Mourik, Yvonne; Bertens, Loes CM; Boonman-de Winter, Leandra JM; den Ruijter, Hester M; Rutten, Frans H
2018-01-01
Background Prevalence of undetected heart failure in older individuals is high in the community, with patients being at increased risk of morbidity and mortality due to the chronic and progressive nature of this complex syndrome. An essential, yet currently unavailable, strategy to pre-select candidates eligible for echocardiography to confirm or exclude heart failure would identify patients earlier, enable targeted interventions and prevent disease progression. The aim of this study was therefore to develop and validate such a model that can be implemented clinically. Methods and results Individual patient data from four primary care screening studies were analysed. From 1941 participants >60 years old, 462 were diagnosed with heart failure, according to criteria of the European Society of Cardiology heart failure guidelines. Prediction models were developed in each cohort followed by cross-validation, omitting each of the four cohorts in turn. The model consisted of five independent predictors; age, history of ischaemic heart disease, exercise-related shortness of breath, body mass index and a laterally displaced/broadened apex beat, with no significant interaction with sex. The c-statistic ranged from 0.70 (95% confidence interval (CI) 0.64–0.76) to 0.82 (95% CI 0.78–0.87) at cross-validation and the calibration was reasonable with Observed/Expected ratios ranging from 0.86 to 1.15. The clinical model improved with the addition of N-terminal pro B-type natriuretic peptide with the c-statistic increasing from 0.76 (95% CI 0.70–0.81) to 0.89 (95% CI 0.86–0.92) at cross-validation. Conclusion Easily obtainable patient characteristics can select older men and women from the community who are candidates for echocardiography to confirm or refute heart failure. PMID:29327942
Kievit, Rogier F; Gohar, Aisha; Hoes, Arno W; Bots, Michiel L; van Riet, Evelien Es; van Mourik, Yvonne; Bertens, Loes Cm; Boonman-de Winter, Leandra Jm; den Ruijter, Hester M; Rutten, Frans H
2018-03-01
Background Prevalence of undetected heart failure in older individuals is high in the community, with patients being at increased risk of morbidity and mortality due to the chronic and progressive nature of this complex syndrome. An essential, yet currently unavailable, strategy to pre-select candidates eligible for echocardiography to confirm or exclude heart failure would identify patients earlier, enable targeted interventions and prevent disease progression. The aim of this study was therefore to develop and validate such a model that can be implemented clinically. Methods and results Individual patient data from four primary care screening studies were analysed. From 1941 participants >60 years old, 462 were diagnosed with heart failure, according to criteria of the European Society of Cardiology heart failure guidelines. Prediction models were developed in each cohort followed by cross-validation, omitting each of the four cohorts in turn. The model consisted of five independent predictors; age, history of ischaemic heart disease, exercise-related shortness of breath, body mass index and a laterally displaced/broadened apex beat, with no significant interaction with sex. The c-statistic ranged from 0.70 (95% confidence interval (CI) 0.64-0.76) to 0.82 (95% CI 0.78-0.87) at cross-validation and the calibration was reasonable with Observed/Expected ratios ranging from 0.86 to 1.15. The clinical model improved with the addition of N-terminal pro B-type natriuretic peptide with the c-statistic increasing from 0.76 (95% CI 0.70-0.81) to 0.89 (95% CI 0.86-0.92) at cross-validation. Conclusion Easily obtainable patient characteristics can select older men and women from the community who are candidates for echocardiography to confirm or refute heart failure.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Palma, David A., E-mail: david.palma@uwo.ca; Senan, Suresh; Oberije, Cary
Purpose: Concurrent chemoradiation therapy (CCRT) improves survival compared with sequential treatment for locally advanced non-small cell lung cancer, but it increases toxicity, particularly radiation esophagitis (RE). Validated predictors of RE for clinical use are lacking. We performed an individual-patient-data meta-analysis to determine factors predictive of clinically significant RE. Methods and Materials: After a systematic review of the literature, data were obtained on 1082 patients who underwent CCRT, including patients from Europe, North America, Asia, and Australia. Patients were randomly divided into training and validation sets (2/3 vs 1/3 of patients). Factors predictive of RE (grade ≥2 and grade ≥3) weremore » assessed using logistic modeling, with the concordance statistic (c statistic) used to evaluate the performance of each model. Results: The median radiation therapy dose delivered was 65 Gy, and the median follow-up time was 2.1 years. Most patients (91%) received platinum-containing CCRT regimens. The development of RE was common, scored as grade 2 in 348 patients (32.2%), grade 3 in 185 (17.1%), and grade 4 in 10 (0.9%). There were no RE-related deaths. On univariable analysis using the training set, several baseline factors were statistically predictive of RE (P<.05), but only dosimetric factors had good discrimination scores (c > .60). On multivariable analysis, the esophageal volume receiving ≥60 Gy (V60) alone emerged as the best predictor of grade ≥2 and grade ≥3 RE, with good calibration and discrimination. Recursive partitioning identified 3 risk groups: low (V60 <0.07%), intermediate (V60 0.07% to 16.99%), and high (V60 ≥17%). With use of the validation set, the predictive model performed inferiorly for the grade ≥2 endpoint (c = .58) but performed well for the grade ≥3 endpoint (c = .66). Conclusions: Clinically significant RE is common, but life-threatening complications occur in <1% of patients. Although several factors are statistically predictive of RE, the V60 alone provides the best predictive ability. Efforts to reduce the V60 should be prioritized, with further research needed to identify and validate new predictive factors.« less
LaBudde, Robert A; Harnly, James M
2012-01-01
A qualitative botanical identification method (BIM) is an analytical procedure that returns a binary result (1 = Identified, 0 = Not Identified). A BIM may be used by a buyer, manufacturer, or regulator to determine whether a botanical material being tested is the same as the target (desired) material, or whether it contains excessive nontarget (undesirable) material. The report describes the development and validation of studies for a BIM based on the proportion of replicates identified, or probability of identification (POI), as the basic observed statistic. The statistical procedures proposed for data analysis follow closely those of the probability of detection, and harmonize the statistical concepts and parameters between quantitative and qualitative method validation. Use of POI statistics also harmonizes statistical concepts for botanical, microbiological, toxin, and other analyte identification methods that produce binary results. The POI statistical model provides a tool for graphical representation of response curves for qualitative methods, reporting of descriptive statistics, and application of performance requirements. Single collaborator and multicollaborative study examples are given.
Medina, N.; Fernández, G.; Cruz, T.; Jordán, N.; Trenche, M.
2015-01-01
Background School violence is a worldwide public health issue with negative effects on education. Official statistics and reports do not include daily occurrences of violent behavior that may precede severe incidents. Objectives This project aimed to engage school community members in the development, validation and implementation of an observation instrument to identify characteristics of school violence. Methods The role of members of each participating school community in all phases of the research is described. Results (or Lessons Learned) The input of community members contributed to enrich the process by providing insight into the problem studied and a more informed framework for interpreting results. Conclusions Taking into account distinctive features of each particular school made results meaningful to the school community and fostered a sense of empowerment of community members as they recognized their knowledge is essential to the solution of their problems. PMID:27346771
Dinno, Alexis
2014-12-01
In the recent Demography article titled "The Effect of Same-Sex Marriage Laws on Different-Sex Marriage: Evidence From the Netherlands," Trandafir attempted to answer the question, Are rates of opposite sex marriage affected by legal recognition of same-sex marriages? The results of his approach to statistical inference-looking for evidence of a difference in rates of opposite-sex marriage-provide an absence of evidence of such effects. However, the validity of his conclusion of no causal relationship between same-sex marriage laws and rates of opposite-sex marriage is threatened by the fact that Trandafir did not also look for equivalence in rates of opposite-sex marriage in order to provide evidence of an absence of such an effect. Equivalence tests in combination with difference tests are introduced and presented in this article as a more valid inferential approach to the substantive question Trandafir attempted to answer.
Galileo Attitude Determination: Experiences with a Rotating Star Scanner
NASA Technical Reports Server (NTRS)
Merken, L.; Singh, G.
1991-01-01
The Galileo experience with a rotating star scanner is discussed in terms of problems encountered in flight, solutions implemented, and lessons learned. An overview of the Galileo project and the attitude and articulation control subsystem is given and the star scanner hardware and relevant software algorithms are detailed. The star scanner is the sole source of inertial attitude reference for this spacecraft. Problem symptoms observed in flight are discussed in terms of effects on spacecraft performance and safety. Sources of thse problems include contributions from flight software idiosyncrasies and inadequate validation of the ground procedures used to identify target stars for use by the autonomous on-board star identification algorithm. Problem fixes (some already implemented and some only proposed) are discussed. A general conclusion is drawn regarding the inherent difficulty of performing simulation tests to validate algorithms which are highly sensitive to external inputs of statistically 'rare' events.
Shortening the Xerostomia Inventory
Thomson, William Murray; van der Putten, Gert-Jan; de Baat, Cees; Ikebe, Kazunori; Matsuda, Ken-ichi; Enoki, Kaori; Hopcraft, Matthew; Ling, Guo Y
2011-01-01
Objectives To determine the validity and properties of the Summated Xerostomia Inventory-Dutch Version in samples from Australia, The Netherlands, Japan and New Zealand. Study design Six cross-sectional samples of older people from The Netherlands (N = 50), Australia (N = 637 and N = 245), Japan (N = 401) and New Zealand (N = 167 and N = 86). Data were analysed using the Summated Xerostomia Inventory-Dutch Version. Results Almost all data-sets revealed a single extracted factor which explained about half of the variance, with Cronbach’s alpha values of at least 0.70. When mean scale scores were plotted against a “gold standard” xerostomia question, statistically significant gradients were observed, with the highest score seen in those who always had dry mouth, and the lowest in those who never had it. Conclusion The Summated Xerostomia Inventory-Dutch Version is valid for measuring xerostomia symptoms in clinical and epidemiological research. PMID:21684773
Moskoei, Sara; Mohtashami, Jamileh; Ghalenoeei, Mahdie; Nasiri, Maliheh; Tafreshi, Mansoreh Zaghari
2017-01-01
Introduction Evaluation of clinical competency in nurses has a distinct importance in healthcare due to its significant impact on improving the quality of patient care and creation of opportunities for professional promotion. This is a psychometric study for development of the “Clinical Competency of Mental Health Nursing”(CCMHN) rating scale. Methods In this methodological research that was conducted in 2015, in Tehran, Iran, the main items were developed after literature review and the validity and reliability of the tool were identified. The face, content (content validity ratio and content validity index) and construct validities were calculated. For face and content validity, experts’ comments were used. Exploratory factor analysis was used to determine the construct validity. The reliability of scale was determined by the internal consistency and inter-rater correlation. The collected data were analyzed by SPSS version 16, using descriptive statistical analysis. Results A scale with 45 items in two parts including Emotional/Moral and Specific Care competencies was developed. Content validity ratio and content validity index were 0.88, 0.97 respectively. Exploratory factor analysis indicated two factors: The first factor with 23.93 eigenvalue and second factor with eigenvalue 2.58. Cronbach’s alpha coefficient for determination of internal consistency was 0.98 and the ICC for confirmation inter-rater correlation was 0.98. Conclusion A scale with 45 items and two areas was developed with appropriate validity and reliability. This scale can be used to assess the clinical competency in nursing students and mental health nurses. PMID:28607650
Schiffman, Eric L.; Truelove, Edmond L.; Ohrbach, Richard; Anderson, Gary C.; John, Mike T.; List, Thomas; Look, John O.
2011-01-01
AIMS The purpose of the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD) Validation Project was to assess the diagnostic validity of this examination protocol. An overview is presented, including Axis I and II methodology and descriptive statistics for the study participant sample. This paper details the development of reliable methods to establish the reference standards for assessing criterion validity of the Axis I RDC/TMD diagnoses. Validity testing for the Axis II biobehavioral instruments was based on previously validated reference standards. METHODS The Axis I reference standards were based on the consensus of 2 criterion examiners independently performing a comprehensive history, clinical examination, and evaluation of imaging. Intersite reliability was assessed annually for criterion examiners and radiologists. Criterion exam reliability was also assessed within study sites. RESULTS Study participant demographics were comparable to those of participants in previous studies using the RDC/TMD. Diagnostic agreement of the criterion examiners with each other and with the consensus-based reference standards was excellent with all kappas ≥ 0.81, except for osteoarthrosis (moderate agreement, k = 0.53). Intrasite criterion exam agreement with reference standards was excellent (k ≥ 0.95). Intersite reliability of the radiologists for detecting computed tomography-disclosed osteoarthrosis and magnetic resonance imaging-disclosed disc displacement was good to excellent (k = 0.71 and 0.84, respectively). CONCLUSION The Validation Project study population was appropriate for assessing the reliability and validity of the RDC/TMD Axis I and II. The reference standards used to assess the validity of Axis I TMD were based on reliable and clinically credible methods. PMID:20213028
Transcultural adaptation and validation of the “Hip and Knee” questionnaire into Spanish
2014-01-01
Background The purpose of the present study is to translate and validate the “Hip and Knee Outcomes Questionnaire”, developed in English, into Spanish. The ‘Hip and Knee Outcomes Questionnaire is a questionnaire planned to evaluate the impact in quality of life of any problem related to the human musculoskeletal system. 10 scientific associations developed it. Methods The questionnaire underwent a validated translation/retro-translation process. Patients undergoing primary knee arthroplasty, before and six months postoperative, tested the final version in Spanish. Psychometric properties of feasibility, reliability, validity and sensitivity to change were assessed. Convergent validity with SF-36 and WOMAC questionnaires was evaluated. Results 316 patients were included. Feasibility: a high number of missing items in questions 3, 4 and 5 were observed. The number of patients with a missing item was 171 (51.35%) in the preoperative visit and 139 (44.0%) at the postoperative. Internal validity: revision of coefficients in the item-rest correlation recommended removing question 6 during the preoperative visit (coefficient <0.20). Convergent validity: coefficients of correlation with WOMAC and SF-36 scales confirm the questionnaire’s validity. Sensitivity to change: statistically significant differences were found between the mean scores of the first visit compared to the postoperative. Conclusion The proposed translation to Spanish of the ‘Hip and Knee Questionnaire’ is found to be reliable, valid and sensible to changes produced at the clinical practice of patients undergoing primary knee arthroplasty. However, some changes at the completion instructions are recommended. Level of evidence: Level I. Prognostic study. PMID:24885248
Kim, Yun Hak; Jeong, Dae Cheon; Pak, Kyoungjune; Goh, Tae Sik; Lee, Chi-Seung; Han, Myoung-Eun; Kim, Ji-Young; Liangwen, Liu; Kim, Chi Dae; Jang, Jeon Yeob; Cha, Wonjae; Oh, Sae-Ock
2017-09-29
Accurate prediction of prognosis is critical for therapeutic decisions regarding cancer patients. Many previously developed prognostic scoring systems have limitations in reflecting recent progress in the field of cancer biology such as microarray, next-generation sequencing, and signaling pathways. To develop a new prognostic scoring system for cancer patients, we used mRNA expression and clinical data in various independent breast cancer cohorts (n=1214) from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Gene Expression Omnibus (GEO). A new prognostic score that reflects gene network inherent in genomic big data was calculated using Network-Regularized high-dimensional Cox-regression (Net-score). We compared its discriminatory power with those of two previously used statistical methods: stepwise variable selection via univariate Cox regression (Uni-score) and Cox regression via Elastic net (Enet-score). The Net scoring system showed better discriminatory power in prediction of disease-specific survival (DSS) than other statistical methods (p=0 in METABRIC training cohort, p=0.000331, 4.58e-06 in two METABRIC validation cohorts) when accuracy was examined by log-rank test. Notably, comparison of C-index and AUC values in receiver operating characteristic analysis at 5 years showed fewer differences between training and validation cohorts with the Net scoring system than other statistical methods, suggesting minimal overfitting. The Net-based scoring system also successfully predicted prognosis in various independent GEO cohorts with high discriminatory power. In conclusion, the Net-based scoring system showed better discriminative power than previous statistical methods in prognostic prediction for breast cancer patients. This new system will mark a new era in prognosis prediction for cancer patients.
2012-01-01
Background Oestrogen and progestogen have the potential to influence gastro-intestinal motility; both are key components of hormone replacement therapy (HRT). Results of observational studies in women taking HRT rely on self-reporting of gastro-oesophageal symptoms and the aetiology of gastro-oesophageal reflux disease (GORD) remains unclear. This study investigated the association between HRT and GORD in menopausal women using validated general practice records. Methods 51,182 menopausal women were identified using the UK General Practice Research Database between 1995–2004. Of these, 8,831 were matched with and without hormone use. Odds ratios (ORs) were calculated for GORD and proton-pump inhibitor (PPI) use in hormone and non-hormone users, adjusting for age, co-morbidities, and co-pharmacy. Results In unadjusted analysis, all forms of hormone use (oestrogen-only, tibolone, combined HRT and progestogen) were statistically significantly associated with GORD. In adjusted models, this association remained statistically significant for oestrogen-only treatment (OR 1.49; 1.18–1.89). Unadjusted analysis showed a statistically significant association between PPI use and oestrogen-only and combined HRT treatment. When adjusted for covariates, oestrogen-only treatment was significant (OR 1.34; 95% CI 1.03–1.74). Findings from the adjusted model demonstrated the greater use of PPI by progestogen users (OR 1.50; 1.01–2.22). Conclusions This first large cohort study of the association between GORD and HRT found a statistically significant association between oestrogen-only hormone and GORD and PPI use. This should be further investigated using prospective follow-up to validate the strength of association and describe its clinical significance. PMID:22642788
Kim, Yun Hak; Jeong, Dae Cheon; Pak, Kyoungjune; Goh, Tae Sik; Lee, Chi-Seung; Han, Myoung-Eun; Kim, Ji-Young; Liangwen, Liu; Kim, Chi Dae; Jang, Jeon Yeob; Cha, Wonjae; Oh, Sae-Ock
2017-01-01
Accurate prediction of prognosis is critical for therapeutic decisions regarding cancer patients. Many previously developed prognostic scoring systems have limitations in reflecting recent progress in the field of cancer biology such as microarray, next-generation sequencing, and signaling pathways. To develop a new prognostic scoring system for cancer patients, we used mRNA expression and clinical data in various independent breast cancer cohorts (n=1214) from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Gene Expression Omnibus (GEO). A new prognostic score that reflects gene network inherent in genomic big data was calculated using Network-Regularized high-dimensional Cox-regression (Net-score). We compared its discriminatory power with those of two previously used statistical methods: stepwise variable selection via univariate Cox regression (Uni-score) and Cox regression via Elastic net (Enet-score). The Net scoring system showed better discriminatory power in prediction of disease-specific survival (DSS) than other statistical methods (p=0 in METABRIC training cohort, p=0.000331, 4.58e-06 in two METABRIC validation cohorts) when accuracy was examined by log-rank test. Notably, comparison of C-index and AUC values in receiver operating characteristic analysis at 5 years showed fewer differences between training and validation cohorts with the Net scoring system than other statistical methods, suggesting minimal overfitting. The Net-based scoring system also successfully predicted prognosis in various independent GEO cohorts with high discriminatory power. In conclusion, the Net-based scoring system showed better discriminative power than previous statistical methods in prognostic prediction for breast cancer patients. This new system will mark a new era in prognosis prediction for cancer patients. PMID:29100405
Managing heteroscedasticity in general linear models.
Rosopa, Patrick J; Schaffer, Meline M; Schroeder, Amber N
2013-09-01
Heteroscedasticity refers to a phenomenon where data violate a statistical assumption. This assumption is known as homoscedasticity. When the homoscedasticity assumption is violated, this can lead to increased Type I error rates or decreased statistical power. Because this can adversely affect substantive conclusions, the failure to detect and manage heteroscedasticity could have serious implications for theory, research, and practice. In addition, heteroscedasticity is not uncommon in the behavioral and social sciences. Thus, in the current article, we synthesize extant literature in applied psychology, econometrics, quantitative psychology, and statistics, and we offer recommendations for researchers and practitioners regarding available procedures for detecting heteroscedasticity and mitigating its effects. In addition to discussing the strengths and weaknesses of various procedures and comparing them in terms of existing simulation results, we describe a 3-step data-analytic process for detecting and managing heteroscedasticity: (a) fitting a model based on theory and saving residuals, (b) the analysis of residuals, and (c) statistical inferences (e.g., hypothesis tests and confidence intervals) involving parameter estimates. We also demonstrate this data-analytic process using an illustrative example. Overall, detecting violations of the homoscedasticity assumption and mitigating its biasing effects can strengthen the validity of inferences from behavioral and social science data.
On a logical basis for division of responsibilities in statistical practice
NASA Technical Reports Server (NTRS)
Deming, W. Edwards
1966-01-01
The purpose of this paper is to explain principles for division of responsibilities between the statistician and the people that he works with, and reasons why this division of responsibilities is important -- that is, how it improves the performance of both statistician and expert in subject-matter. The aim is to find and illustrate principles of practice by which statisticians may make effective use of their knowledge of theory. The specialist in statistical methods may find himself applying the same basic theory in a dozen different fields in a week, rotating through the same projects the next week. Or, he may work day after day primarily in a single substantive field. Either way, he requires rules of practice. A statement of statistical reliability should present any information that might help the reader to form his own opinion concerning the validity of conclusions likely to be drawn from the results. The aim of a statistical report is to protect the client from seeing merely what he would like to see; to protect him from losses that could come from misuse of results. A further aim is to forestall unwarranted claims of accuracy that the client's public might otherwise accept.
Wolf, Pedro S A; Figueredo, Aurelio J; Jacobs, W Jake
2013-01-01
The purpose of this paper is to examine the convergent and nomological validity of a GPS-based measure of daily activity, operationalized as Number of Places Visited (NPV). Relations among the GPS-based measure and two self-report measures of NPV, as well as relations among NPV and two factors made up of self-reported individual differences were examined. The first factor was composed of variables related to an Active Lifestyle (AL) (e.g., positive affect, extraversion…) and the second factor was composed of variables related to a Sedentary Lifestyle (SL) (e.g., depression, neuroticism…). NPV was measured over 4 days. This timeframe was made up of two week and two weekend days. A bi-variate analysis established one level of convergent validity and a Split-Plot GLM examined convergent validity, nomological validity, and alternative hypotheses related to constraints on activity throughout the week simultaneously. The first analysis revealed significant correlations among NPV measures- weekday, weekend, and the entire 4-day time period, supporting the convergent validity of the Diary-, Google Maps-, and GPS-NPV measures. Results from the second analysis, indicating non-significant mean differences in NPV regardless of method, also support this conclusion. We also found that AL is a statistically significant predictor of NPV no matter how NPV was measured. We did not find a statically significant relation among NPV and SL. These results permit us to infer that the GPS-based NPV measure has convergent and nomological validity.
Wolf, Pedro S. A.; Figueredo, Aurelio J.; Jacobs, W. Jake
2013-01-01
The purpose of this paper is to examine the convergent and nomological validity of a GPS-based measure of daily activity, operationalized as Number of Places Visited (NPV). Relations among the GPS-based measure and two self-report measures of NPV, as well as relations among NPV and two factors made up of self-reported individual differences were examined. The first factor was composed of variables related to an Active Lifestyle (AL) (e.g., positive affect, extraversion…) and the second factor was composed of variables related to a Sedentary Lifestyle (SL) (e.g., depression, neuroticism…). NPV was measured over 4 days. This timeframe was made up of two week and two weekend days. A bi-variate analysis established one level of convergent validity and a Split-Plot GLM examined convergent validity, nomological validity, and alternative hypotheses related to constraints on activity throughout the week simultaneously. The first analysis revealed significant correlations among NPV measures- weekday, weekend, and the entire 4-day time period, supporting the convergent validity of the Diary-, Google Maps-, and GPS-NPV measures. Results from the second analysis, indicating non-significant mean differences in NPV regardless of method, also support this conclusion. We also found that AL is a statistically significant predictor of NPV no matter how NPV was measured. We did not find a statically significant relation among NPV and SL. These results permit us to infer that the GPS-based NPV measure has convergent and nomological validity. PMID:23761772
Elders Health Empowerment Scale
2014-01-01
Introduction: Empowerment refers to patient skills that allow them to become primary decision-makers in control of daily self-management of health problems. As important the concept as it is, particularly for elders with chronic diseases, few available instruments have been validated for use with Spanish speaking people. Objective: Translate and adapt the Health Empowerment Scale (HES) for a Spanish-speaking older adults sample and perform its psychometric validation. Methods: The HES was adapted based on the Diabetes Empowerment Scale-Short Form. Where "diabetes" was mentioned in the original tool, it was replaced with "health" terms to cover all kinds of conditions that could affect health empowerment. Statistical and Psychometric Analyses were conducted on 648 urban-dwelling seniors. Results: The HES had an acceptable internal consistency with a Cronbach's α of 0.89. The convergent validity was supported by significant Pearson's Coefficient correlations between the HES total and item scores and the General Self Efficacy Scale (r= 0.77), Swedish Rheumatic Disease Empowerment Scale (r= 0.69) and Making Decisions Empowerment Scale (r= 0.70). Construct validity was evaluated using item analysis, half-split test and corrected item to total correlation coefficients; with good internal consistency (α> 0.8). The content validity was supported by Scale and Item Content Validity Index of 0.98 and 1.0, respectively. Conclusions: HES had acceptable face validity and reliability coefficients; which added to its ease administration and users' unbiased comprehension, could set it as a suitable tool in evaluating elder's outpatient empowerment-based medical education programs. PMID:25767307
Validation of Milliflex® Quantum for Bioburden Testing of Pharmaceutical Products.
Gordon, Oliver; Goverde, Marcel; Staerk, Alexandra; Roesti, David
2017-01-01
This article reports the validation strategy used to demonstrate that the Milliflex ® Quantum yielded non-inferior results to the traditional bioburden method. It was validated according to USP <1223>, European Pharmacopoeia 5.1.6, and Parenteral Drug Association Technical Report No. 33 and comprised the validation parameters robustness, ruggedness, repeatability, specificity, limit of detection and quantification, accuracy, precision, linearity, range, and equivalence in routine operation. For the validation, a combination of pharmacopeial ATCC strains as well as a broad selection of in-house isolates were used. In-house isolates were used in stressed state. Results were statistically evaluated regarding the pharmacopeial acceptance criterion of ≥70% recovery compared to the traditional method. Post-hoc test power calculations verified the appropriateness of the used sample size to detect such a difference. Furthermore, equivalence tests verified non-inferiority of the rapid method as compared to the traditional method. In conclusion, the rapid bioburden on basis of the Milliflex ® Quantum was successfully validated as alternative method to the traditional bioburden test. LAY ABSTRACT: Pharmaceutical drug products must fulfill specified quality criteria regarding their microbial content in order to ensure patient safety. Drugs that are delivered into the body via injection, infusion, or implantation must be sterile (i.e., devoid of living microorganisms). Bioburden testing measures the levels of microbes present in the bulk solution of a drug before sterilization, and thus it provides important information for manufacturing a safe product. In general, bioburden testing has to be performed using the methods described in the pharmacopoeias (membrane filtration or plate count). These methods are well established and validated regarding their effectiveness; however, the incubation time required to visually identify microbial colonies is long. Thus, alternative methods that detect microbial contamination faster will improve control over the manufacturing process and speed up product release. Before alternative methods may be used, they must undergo a side-by-side comparison with pharmacopeial methods. In this comparison, referred to as validation, it must be shown in a statistically verified manner that the effectiveness of the alternative method is at least equivalent to that of the pharmacopeial methods. Here we describe the successful validation of an alternative bioburden testing method based on fluorescent staining of growing microorganisms applying the Milliflex ® Quantum system by MilliporeSigma. © PDA, Inc. 2017.
On the validity of time-dependent AUC estimators.
Schmid, Matthias; Kestler, Hans A; Potapov, Sergej
2015-01-01
Recent developments in molecular biology have led to the massive discovery of new marker candidates for the prediction of patient survival. To evaluate the predictive value of these markers, statistical tools for measuring the performance of survival models are needed. We consider estimators of discrimination measures, which are a popular approach to evaluate survival predictions in biomarker studies. Estimators of discrimination measures are usually based on regularity assumptions such as the proportional hazards assumption. Based on two sets of molecular data and a simulation study, we show that violations of the regularity assumptions may lead to over-optimistic estimates of prediction accuracy and may therefore result in biased conclusions regarding the clinical utility of new biomarkers. In particular, we demonstrate that biased medical decision making is possible even if statistical checks indicate that all regularity assumptions are satisfied. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
McDonald, Craig M; Henricson, Erik K; Abresch, R Ted; Florence, Julaine; Eagle, Michelle; Gappmaier, Eduard; Glanzman, Allan M; Spiegel, Robert; Barth, Jay; Elfring, Gary; Reha, Allen; Peltz, Stuart W
2013-01-01
Introduction: An international clinical trial enrolled 174 ambulatory males ≥5 years old with nonsense mutation Duchenne muscular dystrophy (nmDMD). Pretreatment data provide insight into reliability, concurrent validity, and minimal clinically important differences (MCIDs) of the 6-minute walk test (6MWT) and other endpoints. Methods: Screening and baseline evaluations included the 6-minute walk distance (6MWD), timed function tests (TFTs), quantitative strength by myometry, the PedsQL, heart rate–determined energy expenditure index, and other exploratory endpoints. Results: The 6MWT proved feasible and reliable in a multicenter context. Concurrent validity with other endpoints was excellent. The MCID for 6MWD was 28.5 and 31.7 meters based on 2 statistical distribution methods. Conclusions: The ratio of MCID to baseline mean is lower for 6MWD than for other endpoints. The 6MWD is an optimal primary endpoint for Duchenne muscular dystrophy (DMD) clinical trials that are focused therapeutically on preservation of ambulation and slowing of disease progression. Muscle Nerve 48: 357–368, 2013 PMID:23674289
Exploring rationality in schizophrenia
Mortensen, Erik Lykke; Owen, Gareth; Nordgaard, Julie; Jansson, Lennart; Sæbye, Ditte; Flensborg-Madsen, Trine; Parnas, Josef
2015-01-01
Background Empirical studies of rationality (syllogisms) in patients with schizophrenia have obtained different results. One study found that patients reason more logically if the syllogism is presented through an unusual content. Aims To explore syllogism-based rationality in schizophrenia. Method Thirty-eight first-admitted patients with schizophrenia and 38 healthy controls solved 29 syllogisms that varied in presentation content (ordinary v. unusual) and validity (valid v. invalid). Statistical tests were made of unadjusted and adjusted group differences in models adjusting for intelligence and neuropsychological test performance. Results Controls outperformed patients on all syllogism types, but the difference between the two groups was only significant for valid syllogisms presented with unusual content. However, when adjusting for intelligence and neuropsychological test performance, all group differences became non-significant. Conclusions When taking intelligence and neuropsychological performance into account, patients with schizophrenia and controls perform similarly on syllogism tests of rationality. Declaration of interest None. Copyright and usage © The Royal College of Psychiatrists 2015. This is an open access article distributed under the terms of the Creative Commons Non-Commercial, No Derivatives (CC BY-NC-ND) licence. PMID:27703730
European Portuguese adaptation and validation of dilemmas used to assess moral decision-making.
Fernandes, Carina; Gonçalves, Ana Ribeiro; Pasion, Rita; Ferreira-Santos, Fernando; Paiva, Tiago Oliveira; Melo E Castro, Joana; Barbosa, Fernando; Martins, Isabel Pavão; Marques-Teixeira, João
2018-03-01
Objective To adapt and validate a widely used set of moral dilemmas to European Portuguese, which can be applied to assess decision-making. Moreover, the classical formulation of the dilemmas was compared with a more focused moral probe. Finally, a shorter version of the moral scenarios was tested. Methods The Portuguese version of the set of moral dilemmas was tested in 53 individuals from several regions of Portugal. In a second study, an alternative way of questioning on moral dilemmas was tested in 41 participants. Finally, the shorter version of the moral dilemmas was tested in 137 individuals. Results Results evidenced no significant differences between English and Portuguese versions. Also, asking whether actions are "morally acceptable" elicited less utilitarian responses than the original question, although without reaching statistical significance. Finally, all tested versions of moral dilemmas exhibited the same pattern of responses, suggesting that the fundamental elements to the moral decision-making were preserved. Conclusions We found evidence of cross-cultural validity for moral dilemmas. However, the moral focus might affect utilitarian/deontological judgments.
Development and validation of an algorithm for laser application in wound treatment 1
da Cunha, Diequison Rite; Salomé, Geraldo Magela; Massahud, Marcelo Renato; Mendes, Bruno; Ferreira, Lydia Masako
2017-01-01
ABSTRACT Objective: To develop and validate an algorithm for laser wound therapy. Method: Methodological study and literature review. For the development of the algorithm, a review was performed in the Health Sciences databases of the past ten years. The algorithm evaluation was performed by 24 participants, nurses, physiotherapists, and physicians. For data analysis, the Cronbach’s alpha coefficient and the chi-square test for independence was used. The level of significance of the statistical test was established at 5% (p<0.05). Results: The professionals’ responses regarding the facility to read the algorithm indicated: 41.70%, great; 41.70%, good; 16.70%, regular. With regard the algorithm being sufficient for supporting decisions related to wound evaluation and wound cleaning, 87.5% said yes to both questions. Regarding the participants’ opinion that the algorithm contained enough information to support their decision regarding the choice of laser parameters, 91.7% said yes. The questionnaire presented reliability using the Cronbach’s alpha coefficient test (α = 0.962). Conclusion: The developed and validated algorithm showed reliability for evaluation, wound cleaning, and use of laser therapy in wounds. PMID:29211197
Tian, Guo-Liang; Li, Hui-Qiong
2017-08-01
Some existing confidence interval methods and hypothesis testing methods in the analysis of a contingency table with incomplete observations in both margins entirely depend on an underlying assumption that the sampling distribution of the observed counts is a product of independent multinomial/binomial distributions for complete and incomplete counts. However, it can be shown that this independency assumption is incorrect and can result in unreliable conclusions because of the under-estimation of the uncertainty. Therefore, the first objective of this paper is to derive the valid joint sampling distribution of the observed counts in a contingency table with incomplete observations in both margins. The second objective is to provide a new framework for analyzing incomplete contingency tables based on the derived joint sampling distribution of the observed counts by developing a Fisher scoring algorithm to calculate maximum likelihood estimates of parameters of interest, the bootstrap confidence interval methods, and the bootstrap testing hypothesis methods. We compare the differences between the valid sampling distribution and the sampling distribution under the independency assumption. Simulation studies showed that average/expected confidence-interval widths of parameters based on the sampling distribution under the independency assumption are shorter than those based on the new sampling distribution, yielding unrealistic results. A real data set is analyzed to illustrate the application of the new sampling distribution for incomplete contingency tables and the analysis results again confirm the conclusions obtained from the simulation studies.
ERIC Educational Resources Information Center
Hassad, Rossi A.
2009-01-01
This study examined the teaching practices of 227 college instructors of introductory statistics (from the health and behavioral sciences). Using primarily multidimensional scaling (MDS) techniques, a two-dimensional, 10-item teaching practice scale, TISS (Teaching of Introductory Statistics Scale), was developed and validated. The two dimensions…
Alassaad, Anna; Melhus, Håkan; Hammarlund-Udenaes, Margareta; Bertilsson, Maria; Gillespie, Ulrika; Sundström, Johan
2015-01-01
Objectives To construct and internally validate a risk score, the ‘80+ score’, for revisits to hospital and mortality for older patients, incorporating aspects of pharmacotherapy. Our secondary aim was to compare the discriminatory ability of the score with that of three validated tools for measuring inappropriate prescribing: Screening Tool of Older Person's Prescriptions (STOPP), Screening Tool to Alert doctors to Right Treatment (START) and Medication Appropriateness Index (MAI). Setting Two acute internal medicine wards at Uppsala University hospital. Patient data were used from a randomised controlled trial investigating the effects of a comprehensive clinical pharmacist intervention. Participants Data from 368 patients, aged 80 years and older, admitted to one of the study wards. Primary outcome measure Time to rehospitalisation or death during the year after discharge from hospital. Candidate variables were selected among a large number of clinical and drug-specific variables. After a selection process, a score for risk estimation was constructed. The 80+ score was internally validated, and the discriminatory ability of the score and of STOPP, START and MAI was assessed using C-statistics. Results Seven variables were selected. Impaired renal function, pulmonary disease, malignant disease, living in a nursing home, being prescribed an opioid or being prescribed a drug for peptic ulcer or gastroesophageal reflux disease were associated with an increased risk, while being prescribed an antidepressant drug (tricyclic antidepressants not included) was linked to a lower risk of the outcome. These variables made up the components of the 80+ score. The C-statistics were 0.71 (80+), 0.57 (STOPP), 0.54 (START) and 0.63 (MAI). Conclusions We developed and internally validated a score for prediction of risk of rehospitalisation and mortality in hospitalised older people. The score discriminated risk better than available tools for inappropriate prescribing. Pending external validation, this score can aid in clinical identification of high-risk patients and targeting of interventions. PMID:25694461
Dantas, Raquel Batista; Oliveira, Graziella Lage; Silveira, Andréa Maria
2017-01-01
ABSTRACT OBJECTIVE Adapt and evaluate the psychometric properties of the Vulnerability to Abuse Screening Scale to identify risk of domestic violence against older adults in Brazil. METHODS The instrument was adapted and validated in a sample of 151 older adults from a geriatric reference center in the municipality of Belo Horizonte, State of Minas Gerais, in 2014. We collected sociodemographic, clinical, and abuse-related information, and verified reliability by reproducibility in a sample of 55 older people, who underwent re-testing of the instrument seven days after the first application. Descriptive and comparative analyses were performed for all variables, with a significance level of 5%. The construct validity was analyzed by the principal components method with a tetrachoric correlation matrix, the reliability of the scale by the weighted Kappa (Kp) statistic, and the internal consistency by the Kuder-Richardson estimator formula 20 (KR-20). RESULTS The average age of the participants was 72.1 years (DP = 6.96; 95%CI 70.94–73.17), with a maximum of 92 years, and they were predominantly female (76.2%; 95%CI 69.82–83.03). When analyzing the relationship between the scores of the Vulnerability to Abuse Screening Scale, categorized by presence (score > 3) or absence (score < 3) of vulnerability to abuse, with clinical and health conditions, we found statistically significant differences for self-perception of health (p = 0.002), depressive symptoms (p = 0.000), and presence of rheumatism (p = 0.003). There were no statistically significant differences between sexes. The Vulnerability to Abuse Screening Scale acceptably evaluated validity in the transcultural adaptation process, demonstrating dimensionality coherent with the original proposal (four factors). In the internal consistency analysis, the instrument presented good results (KR-20 = 0.69) and the reliability via reproducibility was considered excellent for the global scale (Kp = 0.92). CONCLUSIONS The Vulnerability to Abuse Screening Scale proved to be a valid instrument with good psychometric capacity for screening domestic abuse against older adults in Brazil. PMID:28423137
Examining the dimensions and correlates of workplace stress among Australian veterinarians
2009-01-01
Background Although stress is known to be a common occupational health issue in the veterinary profession, few studies have investigated its broad domains or the internal validity of the survey instrument used for assessment. Methods We analysed data from over 500 veterinarians in Queensland, Australia, who were surveyed during 2006-07. Results The most common causes of stress were reported to be long hours worked per day, not having enough holidays per year, not having enough rest breaks per day, the attitude of customers, lack of recognition from the public and not having enough time per patient. Age, gender and practice type were statistically associated with various aspects of work-related stress. Strong correlations were found between having too many patients per day and not having enough time per patient; between not having enough holidays and long working hours; and also between not enough rest breaks per day and long working hours. Factor analysis revealed four dimensions of stress comprising a mixture of career, professional and practice-related items. The internal validity of our stress questionnaire was shown to be high during statistical analysis. Conclusion Overall, this study suggests that workplace stress is fairly common among Australian veterinarians and represents an issue that occupies several distinct areas within their professional life. PMID:19995450
Khan, Asaduzzaman; Chien, Chi-Wen; Bagraith, Karl S
2015-04-01
To investigate whether using a parametric statistic in comparing groups leads to different conclusions when using summative scores from rating scales compared with using their corresponding Rasch-based measures. A Monte Carlo simulation study was designed to examine between-group differences in the change scores derived from summative scores from rating scales, and those derived from their corresponding Rasch-based measures, using 1-way analysis of variance. The degree of inconsistency between the 2 scoring approaches (i.e. summative and Rasch-based) was examined, using varying sample sizes, scale difficulties and person ability conditions. This simulation study revealed scaling artefacts that could arise from using summative scores rather than Rasch-based measures for determining the changes between groups. The group differences in the change scores were statistically significant for summative scores under all test conditions and sample size scenarios. However, none of the group differences in the change scores were significant when using the corresponding Rasch-based measures. This study raises questions about the validity of the inference on group differences of summative score changes in parametric analyses. Moreover, it provides a rationale for the use of Rasch-based measures, which can allow valid parametric analyses of rating scale data.
Is there a genetic cause for cancer cachexia? – a clinical validation study in 1797 patients
Solheim, T S; Fayers, P M; Fladvad, T; Tan, B; Skorpen, F; Fearon, K; Baracos, V E; Klepstad, P; Strasser, F; Kaasa, S
2011-01-01
Background: Cachexia has major impact on cancer patients' morbidity and mortality. Future development of cachexia treatment needs methods for early identification of patients at risk. The aim of the study was to validate nine single-nucleotide polymorphisms (SNPs) previously associated with cachexia, and to explore 182 other candidate SNPs with the potential to be involved in the pathophysiology. Method: A total of 1797 cancer patients, classified as either having severe cachexia, mild cachexia or no cachexia, were genotyped. Results: After allowing for multiple testing, there was no statistically significant association between any of the SNPs analysed and the cachexia groups. However, consistent with prior reports, two SNPs from the acylpeptide hydrolase (APEH) gene showed suggestive statistical significance (P=0.02; OR, 0.78). Conclusion: This study failed to detect any significant association between any of the SNPs analysed and cachexia; although two SNPs from the APEH gene had a trend towards significance. The APEH gene encodes the enzyme APEH, postulated to be important in the endpoint of the ubiquitin system and thus the breakdown of proteins into free amino acids. In cachexia, there is an extensive breakdown of muscle proteins and an increase in the production of acute phase proteins in the liver. PMID:21934689
Air Combat Training: Good Stick Index Validation. Final Report for Period 3 April 1978-1 April 1979.
ERIC Educational Resources Information Center
Moore, Samuel B.; And Others
A study was conducted to investigate and statistically validate a performance measuring system (the Good Stick Index) in the Tactical Air Command Combat Engagement Simulator I (TAC ACES I) Air Combat Maneuvering (ACM) training program. The study utilized a twelve-week sample of eighty-nine student pilots to statistically validate the Good Stick…
Content-Based VLE Designs Improve Learning Efficiency in Constructivist Statistics Education
Wessa, Patrick; De Rycker, Antoon; Holliday, Ian Edward
2011-01-01
Background We introduced a series of computer-supported workshops in our undergraduate statistics courses, in the hope that it would help students to gain a deeper understanding of statistical concepts. This raised questions about the appropriate design of the Virtual Learning Environment (VLE) in which such an approach had to be implemented. Therefore, we investigated two competing software design models for VLEs. In the first system, all learning features were a function of the classical VLE. The second system was designed from the perspective that learning features should be a function of the course's core content (statistical analyses), which required us to develop a specific–purpose Statistical Learning Environment (SLE) based on Reproducible Computing and newly developed Peer Review (PR) technology. Objectives The main research question is whether the second VLE design improved learning efficiency as compared to the standard type of VLE design that is commonly used in education. As a secondary objective we provide empirical evidence about the usefulness of PR as a constructivist learning activity which supports non-rote learning. Finally, this paper illustrates that it is possible to introduce a constructivist learning approach in large student populations, based on adequately designed educational technology, without subsuming educational content to technological convenience. Methods Both VLE systems were tested within a two-year quasi-experiment based on a Reliable Nonequivalent Group Design. This approach allowed us to draw valid conclusions about the treatment effect of the changed VLE design, even though the systems were implemented in successive years. The methodological aspects about the experiment's internal validity are explained extensively. Results The effect of the design change is shown to have substantially increased the efficiency of constructivist, computer-assisted learning activities for all cohorts of the student population under investigation. The findings demonstrate that a content–based design outperforms the traditional VLE–based design. PMID:21998652
Patients and Medical Statistics
Woloshin, Steven; Schwartz, Lisa M; Welch, H Gilbert
2005-01-01
BACKGROUND People are increasingly presented with medical statistics. There are no existing measures to assess their level of interest or confidence in using medical statistics. OBJECTIVE To develop 2 new measures, the STAT-interest and STAT-confidence scales, and assess their reliability and validity. DESIGN Survey with retest after approximately 2 weeks. SUBJECTS Two hundred and twenty-four people were recruited from advertisements in local newspapers, an outpatient clinic waiting area, and a hospital open house. MEASURES We developed and revised 5 items on interest in medical statistics and 3 on confidence understanding statistics. RESULTS Study participants were mostly college graduates (52%); 25% had a high school education or less. The mean age was 53 (range 20 to 84) years. Most paid attention to medical statistics (6% paid no attention). The mean (SD) STAT-interest score was 68 (17) and ranged from 15 to 100. Confidence in using statistics was also high: the mean (SD) STAT-confidence score was 65 (19) and ranged from 11 to 100. STAT-interest and STAT-confidence scores were moderately correlated (r=.36, P<.001). Both scales demonstrated good test–retest repeatability (r=.60, .62, respectively), internal consistency reliability (Cronbach's α=0.70 and 0.78), and usability (individual item nonresponse ranged from 0% to 1.3%). Scale scores correlated only weakly with scores on a medical data interpretation test (r=.15 and .26, respectively). CONCLUSION The STAT-interest and STAT-confidence scales are usable and reliable. Interest and confidence were only weakly related to the ability to actually use data. PMID:16307623
Karataş, Tuğba; Özen, Şükrü; Kutlutürkan, Sevinç
2017-01-01
Objective: The main aim of this study was to investigate the factor structure and psychometric properties of the Brief Illness Perception Questionnaire (BIPQ) in Turkish cancer patients. Methods: This methodological study involved 135 cancer patients. Statistical methods included confirmatory or exploratory factor analysis and Cronbach alpha coefficients for internal consistency. Results: The values of fit indices are within the acceptable range. The alpha coefficients for emotional illness representations, cognitive illness representations, and total scale are 0.83, 0.80, and 0.85, respectively. Conclusions: The results confirm the two-factor structure of the Turkish BIPQ and demonstrate its reliability and validity. PMID:28217734
Mitchell, Travis D.; Urli, Kristina E.; Breitenbach, Jacques; Yelverton, Chris
2007-01-01
Abstract Objective This study aimed to evaluate the validity of the sacral base pressure test in diagnosing sacroiliac joint dysfunction. It also determined the predictive powers of the test in determining which type of sacroiliac joint dysfunction was present. Methods This was a double-blind experimental study with 62 participants. The results from the sacral base pressure test were compared against a cluster of previously validated tests of sacroiliac joint dysfunction to determine its validity and predictive powers. The external rotation of the feet, occurring during the sacral base pressure test, was measured using a digital inclinometer. Results There was no statistically significant difference in the results of the sacral base pressure test between the types of sacroiliac joint dysfunction. In terms of the results of validity, the sacral base pressure test was useful in identifying positive values of sacroiliac joint dysfunction. It was fairly helpful in correctly diagnosing patients with negative test results; however, it had only a “slight” agreement with the diagnosis for κ interpretation. Conclusions In this study, the sacral base pressure test was not a valid test for determining the presence of sacroiliac joint dysfunction or the type of dysfunction present. Further research comparing the agreement of the sacral base pressure test or other sacroiliac joint dysfunction tests with a criterion standard of diagnosis is necessary. PMID:19674694
ERIC Educational Resources Information Center
Osler, James Edward, II
2015-01-01
This monograph provides an epistemological rational for the Accumulative Manifold Validation Analysis [also referred by the acronym "AMOVA"] statistical methodology designed to test psychometric instruments. This form of inquiry is a form of mathematical optimization in the discipline of linear stochastic modelling. AMOVA is an in-depth…
Probability of Detection (POD) as a statistical model for the validation of qualitative methods.
Wehling, Paul; LaBudde, Robert A; Brunelle, Sharon L; Nelson, Maria T
2011-01-01
A statistical model is presented for use in validation of qualitative methods. This model, termed Probability of Detection (POD), harmonizes the statistical concepts and parameters between quantitative and qualitative method validation. POD characterizes method response with respect to concentration as a continuous variable. The POD model provides a tool for graphical representation of response curves for qualitative methods. In addition, the model allows comparisons between candidate and reference methods, and provides calculations of repeatability, reproducibility, and laboratory effects from collaborative study data. Single laboratory study and collaborative study examples are given.
Shmulewitz, D.; Wall, M.M.; Aharonovich, E.; Spivak, B.; Weizman, A.; Frisch, A.; Grant, B. F.; Hasin, D.
2013-01-01
Background The fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) proposes aligning nicotine use disorder (NUD) criteria with those for other substances, by including the current DSM fourth edition (DSM-IV) nicotine dependence (ND) criteria, three abuse criteria (neglect roles, hazardous use, interpersonal problems) and craving. Although NUD criteria indicate one latent trait, evidence is lacking on: (1) validity of each criterion; (2) validity of the criteria as a set; (3) comparative validity between DSM-5 NUD and DSM-IV ND criterion sets; and (4) NUD prevalence. Method Nicotine criteria (DSM-IV ND, abuse and craving) and external validators (e.g. smoking soon after awakening, number of cigarettes per day) were assessed with a structured interview in 734 lifetime smokers from an Israeli household sample. Regression analysis evaluated the association between validators and each criterion. Receiver operating characteristic analysis assessed the association of the validators with the DSM-5 NUD set (number of criteria endorsed) and tested whether DSM-5 or DSM-IV provided the most discriminating criterion set. Changes in prevalence were examined. Results Each DSM-5 NUD criterion was significantly associated with the validators, with strength of associations similar across the criteria. As a set, DSM-5 criteria were significantly associated with the validators, were significantly more discriminating than DSM-IV ND criteria, and led to increased prevalence of binary NUD (two or more criteria) over ND. Conclusions All findings address previous concerns about the DSM-IV nicotine diagnosis and its criteria and support the proposed changes for DSM-5 NUD, which should result in improved diagnosis of nicotine disorders. PMID:23312475
Validation of a dye stain assay for vaginally inserted HEC-filled microbicide applicators
Katzen, Lauren L.; Fernández-Romero, José A.; Sarna, Avina; Murugavel, Kailapuri G.; Gawarecki, Daniel; Zydowsky, Thomas M.; Mensch, Barbara S.
2011-01-01
Background The reliability and validity of self-reports of vaginal microbicide use are questionable given the explicit understanding that participants are expected to comply with study protocols. Our objective was to optimize the Population Council's previously validated dye stain assay (DSA) and related procedures, and establish predictive values for the DSA's ability to identify vaginally inserted single-use, low-density polyethylene microbicide applicators filled with hydroxyethylcellulose gel. Methods Applicators, inserted by 252 female sex workers enrolled in a microbicide feasibility study in Southern India, served as positive controls for optimization and validation experiments. Prior to validation, optimal dye concentration and staining time were ascertained. Three validation experiments were conducted to determine sensitivity, specificity, negative predictive values and positive predictive values. Results The dye concentration of 0.05% (w/v) FD&C Blue No. 1 Granular Food Dye and staining time of five seconds were determined to be optimal and were used for the three validation experiments. There were a total of 1,848 possible applicator readings across validation experiments; 1,703 (92.2%) applicator readings were correct. On average, the DSA performed with 90.6% sensitivity, 93.9% specificity, and had a negative predictive value of 93.8% and a positive predictive value of 91.0%. No statistically significant differences between experiments were noted. Conclusions The DSA was optimized and successfully validated for use with single-use, low-density polyethylene applicators filled with hydroxyethylcellulose (HEC) gel. We recommend including the DSA in future microbicide trials involving vaginal gels in order to identify participants who have low adherence to dosing regimens. In doing so, we can develop strategies to improve adherence as well as investigate the association between product use and efficacy. PMID:21992983
Virtual Model Validation of Complex Multiscale Systems: Applications to Nonlinear Elastostatics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oden, John Tinsley; Prudencio, Ernest E.; Bauman, Paul T.
We propose a virtual statistical validation process as an aid to the design of experiments for the validation of phenomenological models of the behavior of material bodies, with focus on those cases in which knowledge of the fabrication process used to manufacture the body can provide information on the micro-molecular-scale properties underlying macroscale behavior. One example is given by models of elastomeric solids fabricated using polymerization processes. We describe a framework for model validation that involves Bayesian updates of parameters in statistical calibration and validation phases. The process enables the quanti cation of uncertainty in quantities of interest (QoIs) andmore » the determination of model consistency using tools of statistical information theory. We assert that microscale information drawn from molecular models of the fabrication of the body provides a valuable source of prior information on parameters as well as a means for estimating model bias and designing virtual validation experiments to provide information gain over calibration posteriors.« less
The relationship between organizational trust and nurse administrators’ productivity in hospitals
Bahrami, Susan; Hasanpour, Marzieh; Rajaeepour, Saeed; Aghahosseni, Taghi; Hodhodineghad, Nilofar
2012-01-01
Context: Management of health care organizations based on employee’s mutual trust will increase the improvement in functions and tasks. Aims: The present study was performed to investigate the relationship between organizational trust and the nurse administrators’ productivity in educational health centers of in Health-Education Centers of Isfahan University of Medical Sciences. Settings and Design: This research was a descriptive and correlational study. Materials and Methods: The population included all nurse administrators. In this research, 165 nurses were selected through random sampling method. Data collection instruments were organizational trust questionnaire based on Robbins’s model and productivity questionnaire based on Hersy and Blanchard’s model. Validity of these questionnaires was determined through content validity and their reliability was calculated through Cranach’s alpha. Statistical analysis was used: The data analysis was done using the SPSS (18) statistical software. Results: The indicators of organizational trust such as loyalty, competence, honesty, and stability were more than average level but explicitness indicator was at average level. The components of productivity such as ability, job knowledge, environmental compatibility, performance feedback, and validity were more than average level but motivation factor was at average level and organizational support was less than average level. There were a significant multiple correlations between organizational trust and productivity. Beta coefficients among organizational trust and productivity were significant and no autocorrelation existed and regression model was significant. Conclusions: Committed employees, timely performing the tasks and developing the sense of responsibility among employees can enhance production and productivity in the health care organizations. PMID:23922588
Smith, Ashlee L.; Sun, Mai; Bhargava, Rohit; Stewart, Nicolas A.; Flint, Melanie S.; Bigbee, William L.; Krivak, Thomas C.; Strange, Mary A.; Cooper, Kristine L.; Zorn, Kristin K.
2013-01-01
Objective: The biology of high grade serous ovarian carcinoma (HGSOC) is poorly understood. Little has been reported on intratumoral homogeneity or heterogeneity of primary HGSOC tumors and their metastases. We evaluated the global protein expression profiles of paired primary and metastatic HGSOC from formalin-fixed, paraffin-embedded (FFPE) tissue samples. Methods: After IRB approval, six patients with advanced HGSOC were identified with tumor in both ovaries at initial surgery. Laser capture microdissection (LCM) was used to extract tumor for protein digestion. Peptides were extracted and analyzed by reversed-phase liquid chromatography coupled to a linear ion trap mass spectrometer. Tandem mass spectra were searched against the UniProt human protein database. Differences in protein abundance between samples were assessed and analyzed by Ingenuity Pathway Analysis software. Immunohistochemistry (IHC) for select proteins from the original and an additional validation set of five patients was performed. Results: Unsupervised clustering of the abundance profiles placed the paired specimens adjacent to each other. IHC H-score analysis of the validation set revealed a strong correlation between paired samples for all proteins. For the similarly expressed proteins, the estimated correlation coefficients in two of three experimental samples and all validation samples were statistically significant (p < 0.05). The estimated correlation coefficients in the experimental sample proteins classified as differentially expressed were not statistically significant. Conclusion: A global proteomic screen of primary HGSOC tumors and their metastatic lesions identifies tumoral homogeneity and heterogeneity and provides preliminary insight into these protein profiles and the cellular pathways they constitute. PMID:28250404
Developing and Testing a Model to Predict Outcomes of Organizational Change
Gustafson, David H; Sainfort, François; Eichler, Mary; Adams, Laura; Bisognano, Maureen; Steudel, Harold
2003-01-01
Objective To test the effectiveness of a Bayesian model employing subjective probability estimates for predicting success and failure of health care improvement projects. Data Sources Experts' subjective assessment data for model development and independent retrospective data on 221 healthcare improvement projects in the United States, Canada, and the Netherlands collected between 1996 and 2000 for validation. Methods A panel of theoretical and practical experts and literature in organizational change were used to identify factors predicting the outcome of improvement efforts. A Bayesian model was developed to estimate probability of successful change using subjective estimates of likelihood ratios and prior odds elicited from the panel of experts. A subsequent retrospective empirical analysis of change efforts in 198 health care organizations was performed to validate the model. Logistic regression and ROC analysis were used to evaluate the model's performance using three alternative definitions of success. Data Collection For the model development, experts' subjective assessments were elicited using an integrative group process. For the validation study, a staff person intimately involved in each improvement project responded to a written survey asking questions about model factors and project outcomes. Results Logistic regression chi-square statistics and areas under the ROC curve demonstrated a high level of model performance in predicting success. Chi-square statistics were significant at the 0.001 level and areas under the ROC curve were greater than 0.84. Conclusions A subjective Bayesian model was effective in predicting the outcome of actual improvement projects. Additional prospective evaluations as well as testing the impact of this model as an intervention are warranted. PMID:12785571
Zhang, J H; Peng, R; Du, Y; Mou, Y; Li, N N; Cheng, L
2016-11-08
Objective: To evaluate the reliability and validity of Parkinson's disease sleep scale-Chinese version (CPDSS) through a study of a large PD population in southwest China, and to explore the prevalence and characteristics of sleep disorders in Parkinson's disease (PD) patients from southwest China. Methods: A total of 544 PD patients and 220 control subjects were enrolled in our study. Demographic data, CPDSS, ESS, PDQ39, HAMD and H-Y stage were assessed in all subjects. Statistical description, Cronbach's alpha coefficient, intra-class correlation coefficient ( ICC ), Spearman rank correlation coefficient and Mann-Whitney U test were used for statistical analyses. Result: The Cronbach's alpha coefficient for CPDSS was 0.79, ICC of the total scale was 0.94 and ICC of each item ranged from 0.73 to 0.97. The factor analysis yielded a five-factor solution, which explained 63.4% of the total variance. Total and each item scores of CPDSS in PD patients were lower than those in healthy controls. 69.3% of PD patients had sleep disorder, while prevalence in the control group was only 29.6%. Negative correlation was found between CPDSS and ESS. Daytime sleepiness was the most common factor (35.9%) leading to sleep disorders. The sleep disorders of PD patients in Southwest China were significantly related with the course of disease, the severity of disease, the quality of life, depression, cognitive level and motor symptoms. Conclusion: CPDSS has good feasibility, reliability and validity in PD population from southwest China. CPDSS is considered as an effective tool for the assessment of sleep disorder in PD patients.
2010-01-01
Background Aneurysmal subarachnoid haemorrhage (aSAH) is a devastating event with a frequently disabling outcome. Our aim was to develop a prognostic model to predict an ordinal clinical outcome at two months in patients with aSAH. Methods We studied patients enrolled in the International Subarachnoid Aneurysm Trial (ISAT), a randomized multicentre trial to compare coiling and clipping in aSAH patients. Several models were explored to estimate a patient's outcome according to the modified Rankin Scale (mRS) at two months after aSAH. Our final model was validated internally with bootstrapping techniques. Results The study population comprised of 2,128 patients of whom 159 patients died within 2 months (8%). Multivariable proportional odds analysis identified World Federation of Neurosurgical Societies (WFNS) grade as the most important predictor, followed by age, sex, lumen size of the aneurysm, Fisher grade, vasospasm on angiography, and treatment modality. The model discriminated moderately between those with poor and good mRS scores (c statistic = 0.65), with minor optimism according to bootstrap re-sampling (optimism corrected c statistic = 0.64). Conclusion We presented a calibrated and internally validated ordinal prognostic model to predict two month mRS in aSAH patients who survived the early stage up till a treatment decision. Although generalizability of the model is limited due to the selected population in which it was developed, this model could eventually be used to support clinical decision making after external validation. Trial Registration International Standard Randomised Controlled Trial, Number ISRCTN49866681 PMID:20920243
Monacis, Lucia; Palo, Valeria de; Griffiths, Mark D; Sinatra, Maria
2016-12-01
Background and aims The inclusion of Internet Gaming Disorder (IGD) in Section III of the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders has increased the interest of researchers in the development of new standardized psychometric tools for the assessment of such a disorder. To date, the nine-item Internet Gaming Disorder Scale - Short-Form (IGDS9-SF) has only been validated in English, Portuguese, and Slovenian languages. Therefore, the aim of this investigation was to examine the psychometric properties of the IGDS9-SF in an Italian-speaking sample. Methods A total of 757 participants were recruited to the present study. Confirmatory factor analysis and multi-group analyses were applied to assess the construct validity. Reliability analyses comprised the average variance extracted, the standard error of measurement, and the factor determinacy coefficient. Convergent and criterion validities were established through the associations with other related constructs. The receiver operating characteristic curve analysis was used to determine an empirical cut-off point. Results Findings confirmed the single-factor structure of the instrument, its measurement invariance at the configural level, and the convergent and criterion validities. Satisfactory levels of reliability and a cut-off point of 21 were obtained. Discussion and conclusions The present study provides validity evidence for the use of the Italian version of the IGDS9-SF and may foster research into gaming addiction in the Italian context.
Monacis, Lucia; de Palo, Valeria; Griffiths, Mark D.; Sinatra, Maria
2016-01-01
Background and aims The inclusion of Internet Gaming Disorder (IGD) in Section III of the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders has increased the interest of researchers in the development of new standardized psychometric tools for the assessment of such a disorder. To date, the nine-item Internet Gaming Disorder Scale – Short-Form (IGDS9-SF) has only been validated in English, Portuguese, and Slovenian languages. Therefore, the aim of this investigation was to examine the psychometric properties of the IGDS9-SF in an Italian-speaking sample. Methods A total of 757 participants were recruited to the present study. Confirmatory factor analysis and multi-group analyses were applied to assess the construct validity. Reliability analyses comprised the average variance extracted, the standard error of measurement, and the factor determinacy coefficient. Convergent and criterion validities were established through the associations with other related constructs. The receiver operating characteristic curve analysis was used to determine an empirical cut-off point. Results Findings confirmed the single-factor structure of the instrument, its measurement invariance at the configural level, and the convergent and criterion validities. Satisfactory levels of reliability and a cut-off point of 21 were obtained. Discussion and conclusions The present study provides validity evidence for the use of the Italian version of the IGDS9-SF and may foster research into gaming addiction in the Italian context. PMID:27876422
Moro, Maria Francesca; Colom, Francesc; Floris, Francesca; Pintus, Elisa; Pintus, Mirra; Contini, Francesca; Carta, Mauro Giovanni
2012-01-01
Background: Functioning Assessment Short Test (FAST) is a brief instrument designed to assess the main functioning problems experienced by psychiatric patients, specifically bipolar patients. It includes 24 items assessing impairment or disability in six domains of functioning: autonomy, occupational functioning, cognitive functioning, financial issues, interpersonal relationships and leisure time. The aim of this study is to measure the validity and reliability of the Italian version of this instrument. Methods: Twenty-four patients with DSM-IV TR bipolar disorder and 20 healthy controls were recruited and evaluated in three private clinics in Cagliari (Sardinia, Italy). The psychometric properties of FAST (feasibility, internal consistency, concurrent validity, discriminant validity (patients vs controls and eutimic patients vs manic and depressed), and test-retest reliability were analyzed. Results: The internal consistency obtained was very high with a Cronbach's alpha of 0.955. A highly significant negative correlation with GAF was obtained (r = -0.9; p < 0.001) pointing to a reasonable degree of concurrent validity. FAST show a good test-retest reliability between two independent evaluation differing of one week (mean K =0.73). The total FAST scores were lower in controls as compared with Bipolar Patients and in Euthimic patients compared with Depressed or Manic. Conclusion: The Italian version of the FAST showed similar psychometrics properties as far as regard internal consistency and discriminant validity of the original version and show a good test retest reliability measure by means of K statistics. PMID:22905035
2010-01-01
Background As a result of scientific and medical professionals gaining interest in Stress and Health Related Quality of Life (HRQL), the aim of our research is, thus, to validate into Spanish the German questionnaire Bad Sobernheim Stress Questionnaire (BSSQ) (mit Korsett), for adolescents wearing braces. Methods The methodology used adheres to literature on trans-cultural adaptation by doing a translation and a back translation; it involved 35 adolescents, ages ranging between 10 and 16, with Adolescent Idiopathic Scoliosis (AIS) and wearing the same kind of brace (Rigo System Chêneau Brace). The materials used were a socio-demographics data questionnaire, the SRS-22 and the Spanish version of BSSQ(brace).es. The statistical analysis calculated the reliability (test-retest reliability and internal consistency) and the validity (convergent and construct validity) of the BSSQ (brace).es. Results BSSQ(brace).es is reliable because of its satisfactory internal consistency (Cronbach's alpha coefficient was 0.809, p < 0.001) and temporal stability (test-retest method with a Pearson correlation coefficient of 0.902 (p < 0.01)). It demonstrated convergent validity with SRS-22 since the Pearson correlation coefficient was 0.656 (p < 0.01). By undertaking an Exploratory Principal Components Analysis, a latent structure was found based on two Components which explicate the variance at 60.8%. Conclusions BSSQ (brace).es is reliable and valid and can be used with Spanish adolescents to assess the stress level caused by the brace. PMID:20633253
A scoring system to predict breast cancer mortality at 5 and 10 years.
Paredes-Aracil, Esther; Palazón-Bru, Antonio; Folgado-de la Rosa, David Manuel; Ots-Gutiérrez, José Ramón; Compañ-Rosique, Antonio Fernando; Gil-Guillén, Vicente Francisco
2017-03-24
Although predictive models exist for mortality in breast cancer (BC) (generally all cause-mortality), they are not applicable to all patients and their statistical methodology is not the most powerful to develop a predictive model. Consequently, we developed a predictive model specific for BC mortality at 5 and 10 years resolving the above issues. This cohort study included 287 patients diagnosed with BC in a Spanish region in 2003-2016. time-to-BC death. Secondary variables: age, personal history of breast surgery, personal history of any cancer/BC, premenopause, postmenopause, grade, estrogen receptor, progesterone receptor, c-erbB2, TNM stage, multicentricity/multifocality, diagnosis and treatment. A points system was constructed to predict BC mortality at 5 and 10 years. The model was internally validated by bootstrapping. The points system was integrated into a mobile application for Android. Mean follow-up was 8.6 ± 3.5 years and 55 patients died of BC. The points system included age, personal history of BC, grade, TNM stage and multicentricity. Validation was satisfactory, in both discrimination and calibration. In conclusion, we constructed and internally validated a scoring system for predicting BC mortality at 5 and 10 years. External validation studies are needed for its use in other geographical areas.
Choi, Bongsam
2018-01-01
[Purpose] This study aimed to cross-cultural adapt and validate the Korean version of an physical activity measure (K-PAM) for community-dwelling elderly. [Subjects and Methods] One hundred and thirty eight community-dwelling elderlies, 32 males and 106 female, participated in the study. All participants were asked to fill out a fifty-one item questionnaire measuring perceived difficulty in the activities of daily living (ADL) for the elderly. One-parameter model of item response theory (Rasch analysis) was applied to determine the construct validity and to inspect item-level psychometric properties of 51 ADL items of the K-PAM. [Results] Person separation reliability (analogous to Cronbach's alpha) for internal consistency was ranging 0.93 to 0.94. A total of 16 items was misfit to the Rasch model. After misfit item deletion, 35 ADL items of the K-PAM were placed in an empirically meaningful hierarchy from easy to hard. The item-person map analysis delineated that the item difficulty was well matched for the elderlies with moderate and low ability except for high ceilings. [Conclusion] Cross-cultural adapted K-PAM was shown to be sufficient for establishing construct validity and stable psychometric properties confirmed by person separation reliability and fit statistics.
Development and Validation of the Numeracy Understanding in Medicine Instrument Short Form
Schapira, Marilyn M.; Walker, Cindy M.; Miller, Tamara; Fletcher, Kathlyn A; Ganschow, Pamela G.; Jacobs, Elizabeth A; Imbert, Diana; O'Connell, Maria; Neuner, Joan M.
2014-01-01
Background Health numeracy can be defined as the ability to understand and use numeric information and quantitative concepts in the context of health. We previously reported the development of the Numeracy Understanding in Medicine Instrument (NUMi); a 20-item test developed using item response theory. We now report the development and validation of a short form of the NUMi. Methods Item statistics were used to identify a subset of 8-items representing a range of difficulty and content areas. Internal reliability was evaluated with Cronbach's alpha. Divergent and convergent validity was assessed by comparing scores of the S-NUMI with existing measures of education, print and numeric health literacy, mathematic achievement, cognitive reasoning, and the original NUMi. Results The 8-item scale had adequate reliability (Cronbach's alpha: 0.72) and was strongly correlated to the 20-item NUMi (0.92). The S-NUMi scores were strongly correlated with the Lipkus numeracy test (0.62), Wide Range of Achievement Test-Mathematics (WRAT-M) (0.72), and Wonderlic cognitive reasoning test (0.76). Moderate correlation was found with education level (0.58) and print literacy as measured by the TOFHLA (0.49). Conclusion The short Numeracy Understanding in Medicine Instrument is a reliable and valid measure of health numeracy feasible for use in clinical and research settings. PMID:25315596
Sabour, Siamak
2018-03-08
The purpose of this letter, in response to Hall, Mehta, and Fackrell (2017), is to provide important knowledge about methodology and statistical issues in assessing the reliability and validity of an audiologist-administered tinnitus loudness matching test and a patient-reported tinnitus loudness rating. The author uses reference textbooks and published articles regarding scientific assessment of the validity and reliability of a clinical test to discuss the statistical test and the methodological approach in assessing validity and reliability in clinical research. Depending on the type of the variable (qualitative or quantitative), well-known statistical tests can be applied to assess reliability and validity. The qualitative variables of sensitivity, specificity, positive predictive value, negative predictive value, false positive and false negative rates, likelihood ratio positive and likelihood ratio negative, as well as odds ratio (i.e., ratio of true to false results), are the most appropriate estimates to evaluate validity of a test compared to a gold standard. In the case of quantitative variables, depending on distribution of the variable, Pearson r or Spearman rho can be applied. Diagnostic accuracy (validity) and diagnostic precision (reliability or agreement) are two completely different methodological issues. Depending on the type of the variable (qualitative or quantitative), well-known statistical tests can be applied to assess validity.
The Practicality of Statistical Physics Handout Based on KKNI and the Constructivist Approach
NASA Astrophysics Data System (ADS)
Sari, S. Y.; Afrizon, R.
2018-04-01
Statistical physics lecture shows that: 1) the performance of lecturers, social climate, students’ competence and soft skills needed at work are in enough category, 2) students feel difficulties in following the lectures of statistical physics because it is abstract, 3) 40.72% of students needs more understanding in the form of repetition, practice questions and structured tasks, and 4) the depth of statistical physics material needs to be improved gradually and structured. This indicates that learning materials in accordance of The Indonesian National Qualification Framework or Kerangka Kualifikasi Nasional Indonesia (KKNI) with the appropriate learning approach are needed to help lecturers and students in lectures. The author has designed statistical physics handouts which have very valid criteria (90.89%) according to expert judgment. In addition, the practical level of handouts designed also needs to be considered in order to be easy to use, interesting and efficient in lectures. The purpose of this research is to know the practical level of statistical physics handout based on KKNI and a constructivist approach. This research is a part of research and development with 4-D model developed by Thiagarajan. This research activity has reached part of development test at Development stage. Data collection took place by using a questionnaire distributed to lecturers and students. Data analysis using descriptive data analysis techniques in the form of percentage. The analysis of the questionnaire shows that the handout of statistical physics has very practical criteria. The conclusion of this study is statistical physics handouts based on the KKNI and constructivist approach have been practically used in lectures.
Sharma, Mohit; Nehra, Karan; Jayan, Balakrishna; Poonia, Anish; Bhattal, Hiteshwar
2016-01-01
ABSTRACT Introduction: This cross-sectional retrospective study was designed to assess the relationships among breastfeeding duration, nonnutritive sucking habits, convex facial profile, nonspaced dentition, and distoclusion in the deciduous dentition. Materials and methods: A sample of 415 children (228 males, 187 females) aged 4 to 6 years from a mixed Indian population was clinically examined by two orthodontists. Information about breastfeeding duration and nonnutritive sucking habits was obtained by written questionnaire which was answered by the parents. Results: Chi-square test did not indicate any significant association among breastfeeding duration, convex facial profile, and distoclusion. Statistically significant association was observed between breastfeeding duration and nonspaced dentition and also between breastfeeding duration and nonnutritive sucking habits. Nonnutritive sucking habits had a statistically significant association with distoclusion and convex facial profile (odds ratio 7.04 and 4.03 respectively). Nonnutritive sucking habits did not have a statistically significant association with nonspaced dentition. Conclusion: The children breastfed < 6 months had almost twofold increased probability for developing sucking habits and nonspaced dentition, respectively, than the children who had breastfeeding > 6 months duration. It can also be hypothesized that nonnutritive sucking habits may act as a dominant variable in the relationship between breastfeeding duration and occurrence of convex facial profile and distoclusion in deciduous dentition. How to cite this article: Agarwal SS, Sharma M, Nehra K, Jayan B, Poonia A, Bhattal H. Validation of Association between Breastfeeding Duration, Facial Profile, Occlusion, and Spacing: A Cross-sectional Study. Int J Clin Pediatr Dent 2016;9(2):162-166. PMID:27365941
NASA Astrophysics Data System (ADS)
Amesbury, Matthew J.; Swindles, Graeme T.; Bobrov, Anatoly; Charman, Dan J.; Holden, Joseph; Lamentowicz, Mariusz; Mallon, Gunnar; Mazei, Yuri; Mitchell, Edward A. D.; Payne, Richard J.; Roland, Thomas P.; Turner, T. Edward; Warner, Barry G.
2016-11-01
In the decade since the first pan-European testate amoeba-based transfer function for peatland palaeohydrological reconstruction was published, a vast amount of additional data collection has been undertaken by the research community. Here, we expand the pan-European dataset from 128 to 1799 samples, spanning 35° of latitude and 55° of longitude. After the development of a new taxonomic scheme to permit compilation of data from a wide range of contributors and the removal of samples with high pH values, we developed ecological transfer functions using a range of model types and a dataset of ∼1300 samples. We rigorously tested the efficacy of these models using both statistical validation and independent test sets with associated instrumental data. Model performance measured by statistical indicators was comparable to other published models. Comparison to test sets showed that taxonomic resolution did not impair model performance and that the new pan-European model can therefore be used as an effective tool for palaeohydrological reconstruction. Our results question the efficacy of relying on statistical validation of transfer functions alone and support a multi-faceted approach to the assessment of new models. We substantiated recent advice that model outputs should be standardised and presented as residual values in order to focus interpretation on secure directional shifts, avoiding potentially inaccurate conclusions relating to specific water-table depths. The extent and diversity of the dataset highlighted that, at the taxonomic resolution applied, a majority of taxa had broad geographic distributions, though some morphotypes appeared to have restricted ranges.
Towards sound epistemological foundations of statistical methods for high-dimensional biology.
Mehta, Tapan; Tanik, Murat; Allison, David B
2004-09-01
A sound epistemological foundation for biological inquiry comes, in part, from application of valid statistical procedures. This tenet is widely appreciated by scientists studying the new realm of high-dimensional biology, or 'omic' research, which involves multiplicity at unprecedented scales. Many papers aimed at the high-dimensional biology community describe the development or application of statistical techniques. The validity of many of these is questionable, and a shared understanding about the epistemological foundations of the statistical methods themselves seems to be lacking. Here we offer a framework in which the epistemological foundation of proposed statistical methods can be evaluated.
49 CFR Appendix B to Part 222 - Alternative Safety Measures
Code of Federal Regulations, 2014 CFR
2014-10-01
... statistically valid baseline violation rate must be established through automated or systematic manual... enforcement, a program of public education and awareness directed at motor vehicle drivers, pedestrians and..., a statistically valid baseline violation rate must be established through automated or systematic...
49 CFR Appendix B to Part 222 - Alternative Safety Measures
Code of Federal Regulations, 2013 CFR
2013-10-01
... statistically valid baseline violation rate must be established through automated or systematic manual... enforcement, a program of public education and awareness directed at motor vehicle drivers, pedestrians and..., a statistically valid baseline violation rate must be established through automated or systematic...
2015-06-12
27 viii Threats to Validity and Biases ...draw conclusions and make recommendations for future research. Threats to Validity and Biases There are a several issues that pose a threat to...validity and bias to the research. Threats to validity affect the accuracy of the research and soundness of the conclusion. Threats to external validity
NASA Astrophysics Data System (ADS)
Most, S.; Nowak, W.; Bijeljic, B.
2014-12-01
Transport processes in porous media are frequently simulated as particle movement. This process can be formulated as a stochastic process of particle position increments. At the pore scale, the geometry and micro-heterogeneities prohibit the commonly made assumption of independent and normally distributed increments to represent dispersion. Many recent particle methods seek to loosen this assumption. Recent experimental data suggest that we have not yet reached the end of the need to generalize, because particle increments show statistical dependency beyond linear correlation and over many time steps. The goal of this work is to better understand the validity regions of commonly made assumptions. We are investigating after what transport distances can we observe: A statistical dependence between increments, that can be modelled as an order-k Markov process, boils down to order 1. This would be the Markovian distance for the process, where the validity of yet-unexplored non-Gaussian-but-Markovian random walks would start. A bivariate statistical dependence that simplifies to a multi-Gaussian dependence based on simple linear correlation (validity of correlated PTRW). Complete absence of statistical dependence (validity of classical PTRW/CTRW). The approach is to derive a statistical model for pore-scale transport from a powerful experimental data set via copula analysis. The model is formulated as a non-Gaussian, mutually dependent Markov process of higher order, which allows us to investigate the validity ranges of simpler models.
Validation of asthma recording in electronic health records: a systematic review
Nissen, Francis; Quint, Jennifer K; Wilkinson, Samantha; Mullerova, Hana; Smeeth, Liam; Douglas, Ian J
2017-01-01
Objective To describe the methods used to validate asthma diagnoses in electronic health records and summarize the results of the validation studies. Background Electronic health records are increasingly being used for research on asthma to inform health services and health policy. Validation of the recording of asthma diagnoses in electronic health records is essential to use these databases for credible epidemiological asthma research. Methods We searched EMBASE and MEDLINE databases for studies that validated asthma diagnoses detected in electronic health records up to October 2016. Two reviewers independently assessed the full text against the predetermined inclusion criteria. Key data including author, year, data source, case definitions, reference standard, and validation statistics (including sensitivity, specificity, positive predictive value [PPV], and negative predictive value [NPV]) were summarized in two tables. Results Thirteen studies met the inclusion criteria. Most studies demonstrated a high validity using at least one case definition (PPV >80%). Ten studies used a manual validation as the reference standard; each had at least one case definition with a PPV of at least 63%, up to 100%. We also found two studies using a second independent database to validate asthma diagnoses. The PPVs of the best performing case definitions ranged from 46% to 58%. We found one study which used a questionnaire as the reference standard to validate a database case definition; the PPV of the case definition algorithm in this study was 89%. Conclusion Attaining high PPVs (>80%) is possible using each of the discussed validation methods. Identifying asthma cases in electronic health records is possible with high sensitivity, specificity or PPV, by combining multiple data sources, or by focusing on specific test measures. Studies testing a range of case definitions show wide variation in the validity of each definition, suggesting this may be important for obtaining asthma definitions with optimal validity. PMID:29238227
AZARI, Nadia; SOLEIMANI, Farin; VAMEGHI, Roshanak; SAJEDI, Firoozeh; SHAHSHAHANI, Soheila; KARIMI, Hossein; KRASKIAN, Adis; SHAHROKHI, Amin; TEYMOURI, Robab; GHARIB, Masoud
2017-01-01
Objective Bayley Scales of infant & toddler development is a well-known diagnostic developmental assessment tool for children aged 1–42 months. Our aim was investigating the validity & reliability of this scale in Persian speaking children. Materials & Methods The method was descriptive-analytic. Translation- back translation and cultural adaptation was done. Content & face validity of translated scale was determined by experts’ opinions. Overall, 403 children aged 1 to 42 months were recruited from health centers of Tehran, during years of 2013-2014 for developmental assessment in cognitive, communicative (receptive & expressive) and motor (fine & gross) domains. Reliability of scale was calculated through three methods; internal consistency using Cronbach’s alpha coefficient, test-retest and interrater methods. Construct validity was calculated using factor analysis and comparison of the mean scores methods. Results Cultural and linguistic changes were made in items of all domains especially on communication subscale. Content and face validity of the test were approved by experts’ opinions. Cronbach’s alpha coefficient was above 0.74 in all domains. Pearson correlation coefficient in various domains, were ≥ 0.982 in test retest method, and ≥0.993 in inter-rater method. Construct validity of the test was approved by factor analysis. Moreover, the mean scores for the different age groups were compared and statistically significant differences were observed between mean scores of different age groups, that confirms validity of the test. Conclusion The Bayley Scales of Infant and Toddler Development is a valid and reliable tool for child developmental assessment in Persian language children. PMID:28277556
Stanisavljevic, Dejana; Trajkovic, Goran; Marinkovic, Jelena; Bukumiric, Zoran; Cirkovic, Andja; Milic, Natasa
2014-01-01
Medical statistics has become important and relevant for future doctors, enabling them to practice evidence based medicine. Recent studies report that students' attitudes towards statistics play an important role in their statistics achievements. The aim of the study was to test the psychometric properties of the Serbian version of the Survey of Attitudes Towards Statistics (SATS) in order to acquire a valid instrument to measure attitudes inside the Serbian educational context. The validation study was performed on a cohort of 417 medical students who were enrolled in an obligatory introductory statistics course. The SATS adaptation was based on an internationally accepted methodology for translation and cultural adaptation. Psychometric properties of the Serbian version of the SATS were analyzed through the examination of factorial structure and internal consistency. Most medical students held positive attitudes towards statistics. The average total SATS score was above neutral (4.3±0.8), and varied from 1.9 to 6.2. Confirmatory factor analysis validated the six-factor structure of the questionnaire (Affect, Cognitive Competence, Value, Difficulty, Interest and Effort). Values for fit indices TLI (0.940) and CFI (0.961) were above the cut-off of ≥0.90. The RMSEA value of 0.064 (0.051-0.078) was below the suggested value of ≤0.08. Cronbach's alpha of the entire scale was 0.90, indicating scale reliability. In a multivariate regression model, self-rating of ability in mathematics and current grade point average were significantly associated with the total SATS score after adjusting for age and gender. Present study provided the evidence for the appropriate metric properties of the Serbian version of SATS. Confirmatory factor analysis validated the six-factor structure of the scale. The SATS might be reliable and a valid instrument for identifying medical students' attitudes towards statistics in the Serbian educational context.
Young, Tony; Dowsey, Michelle M.; Pandy, Marcus; Choong, Peter F.
2018-01-01
Background Medial stabilized total knee joint replacement (TKJR) construct is designed to closely replicate the kinematics of the knee. Little is known regarding comparison of clinical functional outcomes of patients utilising validated patient reported outcome measures (PROM) after medial stabilized TKJR and other construct designs. Purpose To perform a systematic review of the available literature related to the assessment of clinical functional outcomes following a TKJR employing a medial stabilized construct design. Methods The review was performed with a Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) algorithm. The literature search was performed using variouscombinations of keywords. The statistical analysis was completed using Review Manager (RevMan), Version 5.3. Results In the nineteen unique studies identified, there were 2,448 medial stabilized TKJRs implanted in 2,195 participants, there were 1,777 TKJRs with non-medial stabilized design constructs implanted in 1,734 subjects. The final mean Knee Society Score (KSS) value in the medial stabilized group was 89.92 compared to 90.76 in the non-medial stabilized group, with the final KSS mean value difference between the two groups was statistically significant and favored the non-medial stabilized group (SMD 0.21; 95% CI: 0.01 to 0.41; p = 004). The mean difference in the final WOMAC values between the two groups was also statistically significant and favored the medial stabilized group (SMD: −0.27; 95% CI: −0.47 to −0.07; p = 0.009). Moderate to high values (I2) of heterogeneity were observed during the statistical comparison of these functional outcomes. Conclusion Based on the small number of studies with appropriate statistical analysis, we are unable to reach a clear conclusion in the clinical performance of medial stabilized knee replacement construct. Level of Evidence Level II PMID:29696144
Measuring Microaggression and Organizational Climate Factors in Military Units
2011-04-01
i.e., items) to accurately assess what we intend for them to measure. To assess construct and convergent validity, the author assessed the statistical ...sample indicated both convergent and construct validity of the microaggression scale. Table 5 presents these statistics . Measuring Microaggressions...models. As shown in Table 7, the measurement models had acceptable fit indices. That is, the Chi-square statistics were at their minimum; although the
Person mobility in the design and analysis of cluster-randomized cohort prevention trials.
Vuchinich, Sam; Flay, Brian R; Aber, Lawrence; Bickman, Leonard
2012-06-01
Person mobility is an inescapable fact of life for most cluster-randomized (e.g., schools, hospitals, clinic, cities, state) cohort prevention trials. Mobility rates are an important substantive consideration in estimating the effects of an intervention. In cluster-randomized trials, mobility rates are often correlated with ethnicity, poverty and other variables associated with disparity. This raises the possibility that estimated intervention effects may generalize to only the least mobile segments of a population and, thus, create a threat to external validity. Such mobility can also create threats to the internal validity of conclusions from randomized trials. Researchers must decide how to deal with persons who leave study clusters during a trial (dropouts), persons and clusters that do not comply with an assigned intervention, and persons who enter clusters during a trial (late entrants), in addition to the persons who remain for the duration of a trial (stayers). Statistical techniques alone cannot solve the key issues of internal and external validity raised by the phenomenon of person mobility. This commentary presents a systematic, Campbellian-type analysis of person mobility in cluster-randomized cohort prevention trials. It describes four approaches for dealing with dropouts, late entrants and stayers with respect to data collection, analysis and generalizability. The questions at issue are: 1) From whom should data be collected at each wave of data collection? 2) Which cases should be included in the analyses of an intervention effect? and 3) To what populations can trial results be generalized? The conclusions lead to recommendations for the design and analysis of future cluster-randomized cohort prevention trials.
CheS-Mapper 2.0 for visual validation of (Q)SAR models
2014-01-01
Background Sound statistical validation is important to evaluate and compare the overall performance of (Q)SAR models. However, classical validation does not support the user in better understanding the properties of the model or the underlying data. Even though, a number of visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allow the investigation of model validation results are still lacking. Results We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. The approach applies the 3D viewer CheS-Mapper, an open-source application for the exploration of small molecules in virtual 3D space. The present work describes the new functionalities in CheS-Mapper 2.0, that facilitate the analysis of (Q)SAR information and allows the visual validation of (Q)SAR models. The tool enables the comparison of model predictions to the actual activity in feature space. The approach is generic: It is model-independent and can handle physico-chemical and structural input features as well as quantitative and qualitative endpoints. Conclusions Visual validation with CheS-Mapper enables analyzing (Q)SAR information in the data and indicates how this information is employed by the (Q)SAR model. It reveals, if the endpoint is modeled too specific or too generic and highlights common properties of misclassified compounds. Moreover, the researcher can use CheS-Mapper to inspect how the (Q)SAR model predicts activity cliffs. The CheS-Mapper software is freely available at http://ches-mapper.org. Graphical abstract Comparing actual and predicted activity values with CheS-Mapper.
Statistical considerations on prognostic models for glioma
Molinaro, Annette M.; Wrensch, Margaret R.; Jenkins, Robert B.; Eckel-Passow, Jeanette E.
2016-01-01
Given the lack of beneficial treatments in glioma, there is a need for prognostic models for therapeutic decision making and life planning. Recently several studies defining subtypes of glioma have been published. Here, we review the statistical considerations of how to build and validate prognostic models, explain the models presented in the current glioma literature, and discuss advantages and disadvantages of each model. The 3 statistical considerations to establishing clinically useful prognostic models are: study design, model building, and validation. Careful study design helps to ensure that the model is unbiased and generalizable to the population of interest. During model building, a discovery cohort of patients can be used to choose variables, construct models, and estimate prediction performance via internal validation. Via external validation, an independent dataset can assess how well the model performs. It is imperative that published models properly detail the study design and methods for both model building and validation. This provides readers the information necessary to assess the bias in a study, compare other published models, and determine the model's clinical usefulness. As editors, reviewers, and readers of the relevant literature, we should be cognizant of the needed statistical considerations and insist on their use. PMID:26657835
45 CFR 153.350 - Risk adjustment data validation standards.
Code of Federal Regulations, 2012 CFR
2012-10-01
... implementation of any risk adjustment software and ensure proper validation of a statistically valid sample of... respect to implementation of risk adjustment software or as a result of data validation conducted pursuant... implementation of risk adjustment software or data validation. ...
2013-01-01
Background A Drug Influence Evaluation (DIE) is a formal assessment of an impaired driving suspect, performed by a trained law enforcement officer who uses circumstantial facts, questioning, searching, and a physical exam to form an unstandardized opinion as to whether a suspect’s driving was impaired by drugs. This paper first identifies the scientific studies commonly cited in American criminal trials as evidence of DIE accuracy, and second, uses the QUADAS tool to investigate whether the methodologies used by these studies allow them to correctly quantify the diagnostic accuracy of the DIEs currently administered by US law enforcement. Results Three studies were selected for analysis. For each study, the QUADAS tool identified biases that distorted reported accuracies. The studies were subject to spectrum bias, selection bias, misclassification bias, verification bias, differential verification bias, incorporation bias, and review bias. The studies quantified DIE performance with prevalence-dependent accuracy statistics that are internally but not externally valid. Conclusion The accuracies reported by these studies do not quantify the accuracy of the DIE process now used by US law enforcement. These studies do not validate current DIE practice. PMID:24188398
Statistically Valid Planting Trials
C. B. Briscoe
1961-01-01
More than 100 million tree seedlings are planted each year in Latin America, and at least ten time'that many should be planted Rational control and development of a program of such magnitude require establishing and interpreting carefully planned trial plantings which will yield statistically valid answers to real and important questions. Unfortunately, many...
“Plateau”-related summary statistics are uninformative for comparing working memory models
van den Berg, Ronald; Ma, Wei Ji
2014-01-01
Performance on visual working memory tasks decreases as more items need to be remembered. Over the past decade, a debate has unfolded between proponents of slot models and slotless models of this phenomenon. Zhang and Luck (2008) and Anderson, Vogel, and Awh (2011) noticed that as more items need to be remembered, “memory noise” seems to first increase and then reach a “stable plateau.” They argued that three summary statistics characterizing this plateau are consistent with slot models, but not with slotless models. Here, we assess the validity of their methods. We generated synthetic data both from a leading slot model and from a recent slotless model and quantified model evidence using log Bayes factors. We found that the summary statistics provided, at most, 0.15% of the expected model evidence in the raw data. In a model recovery analysis, a total of more than a million trials were required to achieve 99% correct recovery when models were compared on the basis of summary statistics, whereas fewer than 1,000 trials were sufficient when raw data were used. At realistic numbers of trials, plateau-related summary statistics are completely unreliable for model comparison. Applying the same analyses to subject data from Anderson et al. (2011), we found that the evidence in the summary statistics was, at most, 0.12% of the evidence in the raw data and far too weak to warrant any conclusions. These findings call into question claims about working memory that are based on summary statistics. PMID:24719235
Trial Sequential Analysis in systematic reviews with meta-analysis.
Wetterslev, Jørn; Jakobsen, Janus Christian; Gluud, Christian
2017-03-06
Most meta-analyses in systematic reviews, including Cochrane ones, do not have sufficient statistical power to detect or refute even large intervention effects. This is why a meta-analysis ought to be regarded as an interim analysis on its way towards a required information size. The results of the meta-analyses should relate the total number of randomised participants to the estimated required meta-analytic information size accounting for statistical diversity. When the number of participants and the corresponding number of trials in a meta-analysis are insufficient, the use of the traditional 95% confidence interval or the 5% statistical significance threshold will lead to too many false positive conclusions (type I errors) and too many false negative conclusions (type II errors). We developed a methodology for interpreting meta-analysis results, using generally accepted, valid evidence on how to adjust thresholds for significance in randomised clinical trials when the required sample size has not been reached. The Lan-DeMets trial sequential monitoring boundaries in Trial Sequential Analysis offer adjusted confidence intervals and restricted thresholds for statistical significance when the diversity-adjusted required information size and the corresponding number of required trials for the meta-analysis have not been reached. Trial Sequential Analysis provides a frequentistic approach to control both type I and type II errors. We define the required information size and the corresponding number of required trials in a meta-analysis and the diversity (D 2 ) measure of heterogeneity. We explain the reasons for using Trial Sequential Analysis of meta-analysis when the actual information size fails to reach the required information size. We present examples drawn from traditional meta-analyses using unadjusted naïve 95% confidence intervals and 5% thresholds for statistical significance. Spurious conclusions in systematic reviews with traditional meta-analyses can be reduced using Trial Sequential Analysis. Several empirical studies have demonstrated that the Trial Sequential Analysis provides better control of type I errors and of type II errors than the traditional naïve meta-analysis. Trial Sequential Analysis represents analysis of meta-analytic data, with transparent assumptions, and better control of type I and type II errors than the traditional meta-analysis using naïve unadjusted confidence intervals.
Pageler, Natalie M; Grazier G'Sell, Max Jacob; Chandler, Warren; Mailes, Emily; Yang, Christine; Longhurst, Christopher A
2016-09-01
The objective of this project was to use statistical techniques to determine the completeness and accuracy of data migrated during electronic health record conversion. Data validation during migration consists of mapped record testing and validation of a sample of the data for completeness and accuracy. We statistically determined a randomized sample size for each data type based on the desired confidence level and error limits. The only error identified in the post go-live period was a failure to migrate some clinical notes, which was unrelated to the validation process. No errors in the migrated data were found during the 12- month post-implementation period. Compared to the typical industry approach, we have demonstrated that a statistical approach to sampling size for data validation can ensure consistent confidence levels while maximizing efficiency of the validation process during a major electronic health record conversion. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
2011-01-01
Background Psychometric properties include validity, reliability and sensitivity to change. Establishing the psychometric properties of an instrument which measures three-dimensional human posture are essential prior to applying it in clinical practice or research. Methods This paper reports the findings of a systematic literature review which aimed to 1) identify non-invasive three-dimensional (3D) human posture-measuring instruments; and 2) assess the quality of reporting of the methodological procedures undertaken to establish their psychometric properties, using a purpose-build critical appraisal tool. Results Seventeen instruments were identified, of which nine were supported by research into psychometric properties. Eleven and six papers respectively, reported on validity and reliability testing. Rater qualification and reference standards were generally poorly addressed, and there was variable quality reporting of rater blinding and statistical analysis. Conclusions There is a lack of current research to establish the psychometric properties of non-invasive 3D human posture-measuring instruments. PMID:21569486
Flinn, Sharon R.; Pease, William S.; Freimer, Miriam L.
2013-01-01
OBJECTIVE We investigated the psychometric properties of the Flinn Performance Screening Tool (FPST) for people referred with symptoms of carpal tunnel syndrome (CTS). METHOD An occupational therapist collected data from 46 participants who completed the Functional Status Scale (FSS) and FPST after the participants’ nerve conduction velocity study to test convergent and contrasted-group validity. RESULTS Seventy-four percent of the participants had abnormal nerve conduction studies. Cronbach’s α coefficients for subscale and total scores of the FPST ranged from .96 to .98. Intrarater reliability for six shared items of the FSS and the FPST was supported by high agreement (71%) and a fair κ statistic (.36). Strong to moderate positive relationships were found between the FSS and FPST scores. Functional status differed significantly among severe, mild, and negative CTS severity groups. CONCLUSION The FPST shows adequate psychometric properties as a client-centered screening tool for occupational performance of people referred for symptoms of CTS. PMID:22549598
Task Validation for the AN/TPQ-36 Radar System
1978-09-01
report presents the method and results of a study to validate personnel task descriptions for the new AN/TyP-Jb radar...TP.J-Sb KAPAK SVSTKM CONTENTS i ■ l.t |i- INTRODUCTION t METHOD 2 RESULTS, CONCLUSIONS, AND RECOMMENDATIONS b Task Validation 5 26B MOS... method , results, conclusions, and recommendations of the validation study. The appendixes contain the following: 1. Appendix A contains
2013-01-01
Background The validity of survey-based health care utilization estimates in the older population has been poorly researched. Owing to data protection legislation and a great number of different health care insurance providers, the assessment of recall and non-response bias is challenging to impossible in many countries. The objective of our study was to compare estimates from a population-based study in older German adults with external secondary data. Methods We used data from the German KORA-Age study, which included 4,127 people aged 65–94 years. Self-report questions covered the utilization of long-term care services, inpatient services, outpatient services, and pharmaceuticals. We calculated age- and sex-standardized mean utilization rates in each domain and compared them with the corresponding estimates derived from official statistics and independent statutory health insurance data. Results The KORA-Age study underestimated the use of long-term care services (−52%), in-hospital days (−21%) and physician visits (−70%). In contrast, the assessment of drug consumption by postal self-report questionnaires yielded similar estimates to the analysis of insurance claims data (−9%). Conclusion Survey estimates based on self-report tend to underestimate true health care utilization in the older population. Direct validation studies are needed to disentangle the impact of recall and non-response bias. PMID:23286781
Al-Dubai, SAR; Ganasegeran, K; Barua, A; Rizal, AM; Rampal, KG
2014-01-01
Background: The 10-item version of Perceived Stress Scale (PSS-10) is a widely used tool to measure stress. The Malay version of the PSS-10 has been validated among Malaysian Medical Students. However, studies have not been conducted to assess its validity in occupational settings. Aim: The aim of this study is to assess the psychometric properties of the Malay version of the PSS-10 in two occupational setting in Malaysia. Subjects and Methods: This study was conducted among 191 medical residents and 513 railway workers. An exploratory factor analysis was performed using the principal component method with varimax rotation. Correlation analyses, Kaiser-Meyer-Olkin, Bartlett's test of Sphericity and Cronbach's alpha were obtained. Statistical analysis was carried out using statistical package for the social sciences version 16 (SPSS, Chicago, IL, USA) software. Results: Analysis yielded two factor structure of the Malay version of PSS-10 in both occupational groups. The two factors accounted for 59.2% and 64.8% of the variance in the medical residents and the railway workers respectively. Factor loadings were greater than 0.59 in both occupational groups. Cronbach's alpha co-efficient was 0.70 for medical residents and 0.71 for railway workers. Conclusion: The Malay version of PSS-10 had adequate psychometric properties and can be used to measure stress among occupational settings in Malaysia. PMID:25184074
Christodoulou, Georgia; Gennings, Chris; Hupf, Jonathan; Factor-Litvak, Pam; Murphy, Jennifer; Goetz, Raymond R.; Mitsumoto, Hiroshi
2017-01-01
Objective To establish a valid and reliable battery of measures to evaluate frontotemporal dementia (FTD) in patients with ALS over the phone. Methods Thirty-one subjects were administered either in-person or telephone-based screening followed by the opposite mode of testing two weeks later, using a modified version of the UCSF Cognitive Screening Battery. Results Equivalence testing was performed for in-person and telephone-based tests. The standard ALS Cognitive Behavioral Screen (ALS-CBS) showed statistical equivalence at the 5% significance level when compared to a revised phone-version of the ALS-CBS. In addition, the Controlled Oral Word Association Test (COWAT) and Center for Neurologic Study-Lability Scale (CNS-LS) were also found to be equivalent at the 5% and 10% significance level respectively. Similarly, the Mini-Mental State Examination (MMSE) and the well-established Telephone Interview for Cognitive Status (TICS) were also statistically equivalent. Equivalence could not be claimed for the ALS-Frontal Behavioral Inventory (ALS-FBI) caregiver interview and the Written Verbal Fluency Index (WVFI). Conclusions Our study suggests that telephone-based versions of the ALS-CBS, COWAT, and CNS-LS may offer clinicians valid tools to detect frontotemporal changes in the ALS population. Development of telephone-based cognitive testing for ALS could become an integral resource for population-based research in the future. PMID:27121545
Christodoulou, Georgia; Gennings, Chris; Hupf, Jonathan; Factor-Litvak, Pam; Murphy, Jennifer; Goetz, Raymond R; Mitsumoto, Hiroshi
Our objective was to establish a valid and reliable battery of measures to evaluate frontotemporal dementia (FTD) in patients with ALS over the telephone. Thirty-one subjects were administered either in-person or by telephone-based screening followed by the opposite mode of testing two weeks later, using a modified version of the UCSF Cognitive Screening Battery. Equivalence testing was performed for in-person and telephone based tests. The standard ALS Cognitive Behavioral Screen (ALS-CBS) showed statistical equivalence at the 5% significance level compared to a revised phone version of the ALS-CBS. In addition, the Controlled Oral Word Association Test (COWAT) and Center for Neurologic Study-Lability Scale (CNS-LS) were also found to be equivalent at the 5% and 10% significance level, respectively. Similarly, the Mini-Mental State Examination (MMSE) and the well-established Telephone Interview for Cognitive Status (TICS) were also statistically equivalent. Equivalence could not be claimed for the ALS-Frontal Behavioral Inventory (ALS-FBI) caregiver interview and the Written Verbal Fluency Index (WVFI). In conclusion, our study suggests that telephone-based versions of the ALS-CBS, COWAT, and CNS-LS may offer clinicians valid tools to detect frontotemporal changes in the ALS population. Development of telephone based cognitive testing for ALS could become an integral resource for population based research in the future.
Ivanova, Maria V.; Hallowell, Brooke
2013-01-01
Background There are a limited number of aphasia language tests in the majority of the world's commonly spoken languages. Furthermore, few aphasia tests in languages other than English have been standardized and normed, and few have supportive psychometric data pertaining to reliability and validity. The lack of standardized assessment tools across many of the world's languages poses serious challenges to clinical practice and research in aphasia. Aims The current review addresses this lack of assessment tools by providing conceptual and statistical guidance for the development of aphasia assessment tools and establishment of their psychometric properties. Main Contribution A list of aphasia tests in the 20 most widely spoken languages is included. The pitfalls of translating an existing test into a new language versus creating a new test are outlined. Factors to consider in determining test content are discussed. Further, a description of test items corresponding to different language functions is provided, with special emphasis on implementing important controls in test design. Next, a broad review of principal psychometric properties relevant to aphasia tests is presented, with specific statistical guidance for establishing psychometric properties of standardized assessment tools. Conclusions This article may be used to help guide future work on developing, standardizing and validating aphasia language tests. The considerations discussed are also applicable to the development of standardized tests of other cognitive functions. PMID:23976813
Clinical validation of robot simulation of toothbrushing - comparative plaque removal efficacy
2014-01-01
Background Clinical validation of laboratory toothbrushing tests has important advantages. It was, therefore, the aim to demonstrate correlation of tooth cleaning efficiency of a new robot brushing simulation technique with clinical plaque removal. Methods Clinical programme: 27 subjects received dental cleaning prior to 3-day-plaque-regrowth-interval. Plaque was stained, photographically documented and scored using planimetrical index. Subjects brushed teeth 33–47 with three techniques (horizontal, rotating, vertical), each for 20s buccally and for 20s orally in 3 consecutive intervals. The force was calibrated, the brushing technique was video supported. Two different brushes were randomly assigned to the subject. Robot programme: Clinical brushing programmes were transfered to a 6-axis-robot. Artificial teeth 33–47 were covered with plaque-simulating substrate. All brushing techniques were repeated 7 times, results were scored according to clinical planimetry. All data underwent statistical analysis by t-test, U-test and multivariate analysis. Results The individual clinical cleaning patterns are well reproduced by the robot programmes. Differences in plaque removal are statistically significant for the two brushes, reproduced in clinical and robot data. Multivariate analysis confirms the higher cleaning efficiency for anterior teeth and for the buccal sites. Conclusions The robot tooth brushing simulation programme showed good correlation with clinically standardized tooth brushing. This new robot brushing simulation programme can be used for rapid, reproducible laboratory testing of tooth cleaning. PMID:24996973
Statistically Controlling for Confounding Constructs Is Harder than You Think
Westfall, Jacob; Yarkoni, Tal
2016-01-01
Social scientists often seek to demonstrate that a construct has incremental validity over and above other related constructs. However, these claims are typically supported by measurement-level models that fail to consider the effects of measurement (un)reliability. We use intuitive examples, Monte Carlo simulations, and a novel analytical framework to demonstrate that common strategies for establishing incremental construct validity using multiple regression analysis exhibit extremely high Type I error rates under parameter regimes common in many psychological domains. Counterintuitively, we find that error rates are highest—in some cases approaching 100%—when sample sizes are large and reliability is moderate. Our findings suggest that a potentially large proportion of incremental validity claims made in the literature are spurious. We present a web application (http://jakewestfall.org/ivy/) that readers can use to explore the statistical properties of these and other incremental validity arguments. We conclude by reviewing SEM-based statistical approaches that appropriately control the Type I error rate when attempting to establish incremental validity. PMID:27031707
When fast logic meets slow belief: Evidence for a parallel-processing model of belief bias.
Trippas, Dries; Thompson, Valerie A; Handley, Simon J
2017-05-01
Two experiments pitted the default-interventionist account of belief bias against a parallel-processing model. According to the former, belief bias occurs because a fast, belief-based evaluation of the conclusion pre-empts a working-memory demanding logical analysis. In contrast, according to the latter both belief-based and logic-based responding occur in parallel. Participants were given deductive reasoning problems of variable complexity and instructed to decide whether the conclusion was valid on half the trials or to decide whether the conclusion was believable on the other half. When belief and logic conflict, the default-interventionist view predicts that it should take less time to respond on the basis of belief than logic, and that the believability of a conclusion should interfere with judgments of validity, but not the reverse. The parallel-processing view predicts that beliefs should interfere with logic judgments only if the processing required to evaluate the logical structure exceeds that required to evaluate the knowledge necessary to make a belief-based judgment, and vice versa otherwise. Consistent with this latter view, for the simplest reasoning problems (modus ponens), judgments of belief resulted in lower accuracy than judgments of validity, and believability interfered more with judgments of validity than the converse. For problems of moderate complexity (modus tollens and single-model syllogisms), the interference was symmetrical, in that validity interfered with belief judgments to the same degree that believability interfered with validity judgments. For the most complex (three-term multiple-model syllogisms), conclusion believability interfered more with judgments of validity than vice versa, in spite of the significant interference from conclusion validity on judgments of belief.
The expectancy-value muddle in the theory of planned behaviour - and some proposed solutions.
French, David P; Hankins, Matthew
2003-02-01
The authors of the Theories of Reasoned Action and Planned Behaviour recommended a method for statistically analysing the relationships between beliefs and the Attitude, Subjective Norm, and Perceived Behavioural Control constructs. This method has been used in the overwhelming majority of studies using these theories. However, there is a growing awareness that this method yields statistically uninterpretable results (Evans, 1991). Despite this, the use of this method is continuing, as is uninformed interpretation of this problematic research literature. This is probably due to the lack of a simple account of where the problem lies, and the large number of alternatives available. This paper therefore summarizes the problem as simply as possible, gives consideration to the conclusions that can be validly drawn from studies that contain this problem, and critically reviews the many alternatives that have been proposed to address this problem. Different techniques are identified as being suitable, according to the purpose of the specific research project.
Initial Steps toward Validating and Measuring the Quality of Computerized Provider Documentation
Hammond, Kenric W.; Efthimiadis, Efthimis N.; Weir, Charlene R.; Embi, Peter J.; Thielke, Stephen M.; Laundry, Ryan M.; Hedeen, Ashley
2010-01-01
Background: Concerns exist about the quality of electronic health care documentation. Prior studies have focused on physicians. This investigation studied document quality perceptions of practitioners (including physicians), nurses and administrative staff. Methods: An instrument developed from staff interviews and literature sources was administered to 110 practitioners, nurses and administrative staff. Short, long and original versions of records were rated. Results: Length transformation did not affect quality ratings. On several scales practitioners rated notes less favorably than administrators or nurses. The original source document was associated with the quality rating, as was tf·idf, a relevance statistic computed from document text. Tf·idf was strongly associated with practitioner quality ratings. Conclusion: Document quality estimates were not sensitive to modifying redundancy in documents. Some perceptions of quality differ by role. Intrinsic document properties are associated with staff judgments of document quality. For practitioners, the tf·idf statistic was strongly associated with the quality dimensions evaluated. PMID:21346983
Improved Statistics for Determining the Patterson Symmetry fromUnmerged Diffraction Intensities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sauter, Nicholas K.; Grosse-Kunstleve, Ralf W.; Adams, Paul D.
We examine procedures for detecting the point-group symmetryof macromolecular datasets and propose enhancements. To validate apoint-group, it is sufficient to compare pairs of Bragg reflections thatare related by each of the group's component symmetry operators.Correlation is commonly expressed in the form of a single statisticalquantity (such as Rmerge) that incorporates information from all of theobserved reflections. However, the usual practice of weighting all pairsof symmetry-related intensities equally can obscure the fact that thevarious symmetry operators of the point-group contribute differingfractions of the total set. In some cases where particular symmetryelements are significantly under-represented, statistics calculatedglobally over all observations do notmore » permit conclusions about thepoint-group and Patterson symmetry. The problem can be avoided byrepartitioning the data in a way that explicitly takes note of individualoperators. The new analysis methods, incorporated into the programLABELIT (cci.lbl.gov/labelit), can be performed early enough during dataacquisition, and are quick enough, that it is feasible to pause tooptimize the data collection strategy.« less
Kim, Won Kuel; Seo, Kyung Mook; Kang, Si Hyun
2014-01-01
Objective To determine the reliability and validity of hand-held dynamometer (HHD) depending on its fixation in measuring isometric knee extensor strength by comparing the results with an isokinetic dynamometer. Methods Twenty-seven healthy female volunteers participated in this study. The subjects were tested in seated and supine position using three measurement methods: isometric knee extension by isokinetic dynamometer, non-fixed HHD, and fixed HHD. During the measurement, the knee joints of subjects were fixed at a 35° angle from the extended position. The fixed HHD measurement was conducted with the HHD fixed to distal tibia with a Velcro strap; non-fixed HHD was performed with a hand-held method without Velcro fixation. All the measurements were repeated three times and among them, the maximum values of peak torque were used for the analysis. Results The data from the fixed HHD method showed higher validity than the non-fixed method compared with the results of the isokinetic dynamometer. Pearson correlation coefficients (r) between fixed HHD and isokinetic dynamometer method were statistically significant (supine-right: r=0.806, p<0.05; seating-right: r=0.473, p<0.05; supine-left: r=0.524, p<0.05), whereas Pearson correlation coefficients between non-fixed dynamometer and isokinetic dynamometer methods were not statistically significant, except for the result of the supine position of the left leg (r=0.384, p<0.05). Both fixed and non-fixed HHD methods showed excellent inter-rater reliability. However, the fixed HHD method showed a higher reliability than the non-fixed HHD method by considering the intraclass correlation coefficient (fixed HHD, 0.952-0.984; non-fixed HHD, 0.940-0.963). Conclusion Fixation of HHD during measurement in the supine position increases the reliability and validity in measuring the quadriceps strength. PMID:24639931
Swanson, Brian T.; Riley, Sean P.; Cote, Mark P.; Leger, Robin R.; Moss, Isaac L.; Carlos,, John
2016-01-01
Background To date, no research has examined the reliability or predictive validity of manual unloading tests of the lumbar spine to identify potential responders to lumbar mechanical traction. Purpose To determine: (1) the intra and inter-rater reliability of a manual unloading test of the lumbar spine and (2) the criterion referenced predictive validity for the manual unloading test. Methods Ten volunteers with low back pain (LBP) underwent a manual unloading test to establish reliability. In a separate procedure, 30 consecutive patients with LBP (age 50·86±11·51) were assessed for pain in their most provocative standing position (visual analog scale (VAS) 49·53±25·52 mm). Patients were assessed with a manual unloading test in their most provocative position followed by a single application of intermittent mechanical traction. Post traction, pain in the provocative position was reassessed and utilized as the outcome criterion. Results The test of unloading demonstrated substantial intra and inter-rater reliability K = 1·00, P = 0·002, K = 0·737, P = 0·001, respectively. There were statistically significant within group differences for pain response following traction for patients with a positive manual unloading test (P<0·001), while patients with a negative manual unloading test did not demonstrate a statistically significant change (P>0·05). There were significant between group differences for proportion of responders to traction based on manual unloading response (P = 0·031), and manual unloading response demonstrated a moderate to strong relationship with traction response Phi = 0·443, P = 0·015. Discussion and conclusion The manual unloading test appears to be a reliable test and has a moderate to strong correlation with pain relief that exceeds minimal clinically important difference (MCID) following traction supporting the validity of this test. PMID:27559274
Using Classroom Data to Teach Students about Data Cleaning and Testing Assumptions
Cummiskey, Kevin; Kuiper, Shonda; Sturdivant, Rodney
2012-01-01
This paper discusses the influence that decisions about data cleaning and violations of statistical assumptions can have on drawing valid conclusions to research studies. The datasets provided in this paper were collected as part of a National Science Foundation grant to design online games and associated labs for use in undergraduate and graduate statistics courses that can effectively illustrate issues not always addressed in traditional instruction. Students play the role of a researcher by selecting from a wide variety of independent variables to explain why some students complete games faster than others. Typical project data sets are “messy,” with many outliers (usually from some students taking much longer than others) and distributions that do not appear normal. Classroom testing of the games over several semesters has produced evidence of their efficacy in statistics education. The projects tend to be engaging for students and they make the impact of data cleaning and violations of model assumptions more relevant. We discuss the use of one of the games and associated guided lab in introducing students to issues prevalent in real data and the challenges involved in data cleaning and dangers when model assumptions are violated. PMID:23055992
Kulesz, Paulina A; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M; Francis, David J
2015-03-01
Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product-moment correlation was compared with 4 robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator. All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PsycINFO Database Record (c) 2015 APA, all rights reserved.
Espinosa-Montero, Juan; Monterrubio-Flores, Eric A.; Sanchez-Estrada, Marcela; Buendia-Jimenez, Inmaculada; Lieberman, Harris R.; Allaert, François-Andre; Barquera, Simon
2016-01-01
Background Ingestion of water has been associated with general wellbeing. When water intake is insufficient, symptoms such as thirst, fatigue and impaired memory result. Currently there are no instruments to assess water consumption associated with wellbeing. The objective of our study was to develop and validate such an instrument in urban, low socioeconomic, adult Mexican population. Methods To construct the Water Ingestion-Related Wellbeing Instrument (WIRWI), a qualitative study in which wellbeing related to everyday practices and experiences in water consumption were investigated. To validate the WIRWI a formal, five-process procedure was used. Face and content validation were addressed, consistency was assessed by exploratory and confirmatory psychometric factor analyses, repeatability, reproducibility and concurrent validity were assessed by conducting correlation tests with other measures of wellbeing such as a quality of life instrument, the SF-36, and objective parameters such as urine osmolality, 24-hour urine total volume and others. Results The final WIRWI is composed of 17 items assessing physical and mental dimensions. Items were selected based on their content and face validity. Exploratory and confirmatory factor analyses yielded Cronbach's alpha of 0.87 and 0.86, respectively. The final confirmatory factor analysis demonstrated that the model estimates were satisfactory for the constructs. Statistically significant correlations with the SF-36, total liquid consumption and simple water consumption were observed. Conclusion The resulting WIRWI is a reliable tool for assessing wellbeing associated with consumption of plain water in Mexican adults and could be useful for similar groups. PMID:27388902
Pontes, Halley M.; Macur, Mirna; Griffiths, Mark D.
2016-01-01
Background and aims Since the inclusion of Internet Gaming Disorder (IGD) in the latest (fifth) edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) as a tentative disorder, a few psychometric screening instruments have been developed to assess IGD, including the 9-item Internet Gaming Disorder Scale – Short-Form (IGDS9-SF) – a short, valid, and reliable instrument. Methods Due to the lack of research on IGD in Slovenia, this study aimed to examine the psychometric properties of the IGDS9-SF in addition to investigating the prevalence rates of IGD in a nationally representative sample of eighth graders from Slovenia (N = 1,071). Results The IGDS9-SF underwent rigorous psychometric scrutiny in terms of validity and reliability. Construct validation was investigated with confirmatory factor analysis to examine the factorial structure of the IGDS9-SF and a unidimensional structure appeared to fit the data well. Concurrent and criterion validation were also investigated by examining the association between IGD and relevant psychosocial and game-related measures, which warranted these forms of validity. In terms of reliability, the Slovenian version IGDS9-SF obtained excellent results regarding its internal consistency at different levels, and the test appears to be a valid and reliable instrument to assess IGD among Slovenian youth. Finally, the prevalence rates of IGD were found to be around 2.5% in the whole sample and 3.1% among gamers. Discussion and conclusion Taken together, these results illustrate the suitability of the IGDS9-SF and warrants further research on IGD in Slovenia. PMID:27363464
Validity and reliability of acoustic analysis of respiratory sounds in infants
Elphick, H; Lancaster, G; Solis, A; Majumdar, A; Gupta, R; Smyth, R
2004-01-01
Objective: To investigate the validity and reliability of computerised acoustic analysis in the detection of abnormal respiratory noises in infants. Methods: Blinded, prospective comparison of acoustic analysis with stethoscope examination. Validity and reliability of acoustic analysis were assessed by calculating the degree of observer agreement using the κ statistic with 95% confidence intervals (CI). Results: 102 infants under 18 months were recruited. Convergent validity for agreement between stethoscope examination and acoustic analysis was poor for wheeze (κ = 0.07 (95% CI, –0.13 to 0.26)) and rattles (κ = 0.11 (–0.05 to 0.27)) and fair for crackles (κ = 0.36 (0.18 to 0.54)). Both the stethoscope and acoustic analysis distinguished well between sounds (discriminant validity). Agreement between observers for the presence of wheeze was poor for both stethoscope examination and acoustic analysis. Agreement for rattles was moderate for the stethoscope but poor for acoustic analysis. Agreement for crackles was moderate using both techniques. Within-observer reliability for all sounds using acoustic analysis was moderate to good. Conclusions: The stethoscope is unreliable for assessing respiratory sounds in infants. This has important implications for its use as a diagnostic tool for lung disorders in infants, and confirms that it cannot be used as a gold standard. Because of the unreliability of the stethoscope, the validity of acoustic analysis could not be demonstrated, although it could discriminate between sounds well and showed good within-observer reliability. For acoustic analysis, targeted training and the development of computerised pattern recognition systems may improve reliability so that it can be used in clinical practice. PMID:15499065
Hales, M.; Biros, E.
2015-01-01
Background: Since 1982, the International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI) has been used to classify sensation of spinal cord injury (SCI) through pinprick and light touch scores. The absence of proprioception, pain, and temperature within this scale creates questions about its validity and accuracy. Objectives: To assess whether the sensory component of the ISNCSCI represents a reliable and valid measure of classification of SCI. Methods: A systematic review of studies examining the reliability and validity of the sensory component of the ISNCSCI published between 1982 and February 2013 was conducted. The electronic databases MEDLINE via Ovid, CINAHL, PEDro, and Scopus were searched for relevant articles. A secondary search of reference lists was also completed. Chosen articles were assessed according to the Oxford Centre for Evidence-Based Medicine hierarchy of evidence and critically appraised using the McMasters Critical Review Form. A statistical analysis was conducted to investigate the variability of the results given by reliability studies. Results: Twelve studies were identified: 9 reviewed reliability and 3 reviewed validity. All studies demonstrated low levels of evidence and moderate critical appraisal scores. The majority of the articles (~67%; 6/9) assessing the reliability suggested that training was positively associated with better posttest results. The results of the 3 studies that assessed the validity of the ISNCSCI scale were confounding. Conclusions: Due to the low to moderate quality of the current literature, the sensory component of the ISNCSCI requires further revision and investigation if it is to be a useful tool in clinical trials. PMID:26363591
A Model for Investigating Predictive Validity at Highly Selective Institutions.
ERIC Educational Resources Information Center
Gross, Alan L.; And Others
A statistical model for investigating predictive validity at highly selective institutions is described. When the selection ratio is small, one must typically deal with a data set containing relatively large amounts of missing data on both criterion and predictor variables. Standard statistical approaches are based on the strong assumption that…
Selection of Marine Corps Drill Instructors
1980-03-01
8 4. ., ey- Construction and Cross-Validation Statistics for Drill Instructor School Performance Success Keys...Race, and School Attrition ........... ............................. ... 15 13. Key- Construction and Cross-Validation Statistics for Drill... constructed form, the Alternation Ranking of Series Drill Instruc- tors. In this form, DIs in a Series are ranked from highest to lowest in terms of their
Kleinstern, Geffen; Camp, Nicola J; Goldin, Lynn R; Vachon, Celine M; Vajdic, Claire M; de Sanjose, Silvia; Weinberg, J Brice; Benavente, Yolanda; Casabonne, Delphine; Liebow, Mark; Nieters, Alexandra; Hjalgrim, Henrik; Melbye, Mads; Glimelius, Bengt; Adami, Hans-Olov; Boffetta, Paolo; Brennan, Paul; Maynadie, Marc; McKay, James; Cocco, Pier Luigi; Shanafelt, Tait D; Call, Timothy G; Norman, Aaron D; Hanson, Curtis; Robinson, Dennis; Chaffee, Kari G; Brooks-Wilson, Angela R; Monnereau, Alain; Clavel, Jacqueline; Glenn, Martha; Curtin, Karen; Conde, Lucia; Bracci, Paige M; Morton, Lindsay M; Cozen, Wendy; Severson, Richard K; Chanock, Stephen J; Spinelli, John J; Johnston, James B; Rothman, Nathaniel; Skibola, Christine F; Leis, Jose F; Kay, Neil E; Smedby, Karin E; Berndt, Sonja I; Cerhan, James R; Caporaso, Neil; Slager, Susan L
2018-06-07
Inherited loci have been found to be associated with risk of chronic lymphocytic leukemia (CLL). A combined polygenic risk score (PRS) of representative single nucleotide polymorphisms (SNPs) from these loci may improve risk prediction over individual SNPs. Herein, we evaluated the association of a PRS with CLL risk and its precursor, monoclonal B-cell lymphocytosis (MBL). We assessed its validity and discriminative ability in an independent sample and evaluated effect modification and confounding by family history (FH) of hematological cancers. For discovery, we pooled genotype data on 41 representative SNPs from 1499 CLL and 2459 controls from the InterLymph Consortium. For validation, we used data from 1267 controls from Mayo Clinic and 201 CLL, 95 MBL, and 144 controls with a FH of CLL from the Genetic Epidemiology of CLL Consortium. We used odds ratios (ORs) to estimate disease associations with PRS and c-statistics to assess discriminatory accuracy. In InterLymph, the continuous PRS was strongly associated with CLL risk (OR, 2.49; P = 4.4 × 10 -94 ). We replicated these findings in the Genetic Epidemiology of CLL Consortium and Mayo controls (OR, 3.02; P = 7.8 × 10 -30 ) and observed high discrimination (c-statistic = 0.78). When jointly modeled with FH, PRS retained its significance, along with FH status. Finally, we found a highly significant association of the continuous PRS with MBL risk (OR, 2.81; P = 9.8 × 10 -16 ). In conclusion, our validated PRS was strongly associated with CLL risk, adding information beyond FH. The PRS provides a means of identifying those individuals at greater risk for CLL as well as those at increased risk of MBL, a condition that has potential clinical impact beyond CLL.
Zedler, Barbara K; Saunders, William B; Joyce, Andrew R; Vick, Catherine C; Murrelle, E Lenn
2018-01-01
Abstract Objective To validate a risk index that estimates the likelihood of overdose or serious opioid-induced respiratory depression (OIRD) among medical users of prescription opioids. Subjects and Methods A case-control analysis of 18,365,497 patients with an opioid prescription from 2009 to 2013 in the IMS PharMetrics Plus commercially insured health plan claims database (CIP). An OIRD event occurred in 7,234 cases. Four controls were selected per case. Validity of the Risk Index for Overdose or Serious Opioid-induced Respiratory Depression (RIOSORD), developed previously using Veterans Health Administration (VHA) patient data, was assessed. Multivariable logistic regression was used within the CIP study population to develop a slightly refined RIOSORD. The composition and performance of the CIP-based RIOSORD was evaluated and compared with VHA-based RIOSORD. Results VHA-RIOSORD performed well in discriminating OIRD events in CIP (C-statistic = 0.85). Additionally, re-estimation of logistic model coefficients in CIP yielded a 0.90 C-statistic. The resulting comorbidity and pharmacotherapy variables most highly associated with OIRD and retained in the CIP-RIOSORD were largely concordant with VHA-RIOSORD. These variables included neuropsychiatric and cardiopulmonary disorders, impaired drug excretion, opioid characteristics, and concurrent psychoactive medications. The average predicted probability of OIRD ranged from 2% to 83%, with excellent agreement between predicted and observed incidence across risk classes. Conclusions RIOSORD had excellent predictive accuracy in a large population of US medical users of prescription opioids, similar to its performance in VHA. This practical risk index is designed to support clinical decision-making for safer opioid prescribing, and its clinical utility should be evaluated prospectively. PMID:28340046
Houssaini, Allal; Assoumou, Lambert; Miller, Veronica; Calvez, Vincent; Marcelin, Anne-Geneviève; Flandre, Philippe
2013-01-01
Background Several attempts have been made to determine HIV-1 resistance from genotype resistance testing. We compare scoring methods for building weighted genotyping scores and commonly used systems to determine whether the virus of a HIV-infected patient is resistant. Methods and Principal Findings Three statistical methods (linear discriminant analysis, support vector machine and logistic regression) are used to determine the weight of mutations involved in HIV resistance. We compared these weighted scores with known interpretation systems (ANRS, REGA and Stanford HIV-db) to classify patients as resistant or not. Our methodology is illustrated on the Forum for Collaborative HIV Research didanosine database (N = 1453). The database was divided into four samples according to the country of enrolment (France, USA/Canada, Italy and Spain/UK/Switzerland). The total sample and the four country-based samples allow external validation (one sample is used to estimate a score and the other samples are used to validate it). We used the observed precision to compare the performance of newly derived scores with other interpretation systems. Our results show that newly derived scores performed better than or similar to existing interpretation systems, even with external validation sets. No difference was found between the three methods investigated. Our analysis identified four new mutations associated with didanosine resistance: D123S, Q207K, H208Y and K223Q. Conclusions We explored the potential of three statistical methods to construct weighted scores for didanosine resistance. Our proposed scores performed at least as well as already existing interpretation systems and previously unrecognized didanosine-resistance associated mutations were identified. This approach could be used for building scores of genotypic resistance to other antiretroviral drugs. PMID:23555613
Validation of suicide and self-harm records in the Clinical Practice Research Datalink
Thomas, Kyla H; Davies, Neil; Metcalfe, Chris; Windmeijer, Frank; Martin, Richard M; Gunnell, David
2013-01-01
Aims The UK Clinical Practice Research Datalink (CPRD) is increasingly being used to investigate suicide-related adverse drug reactions. No studies have comprehensively validated the recording of suicide and nonfatal self-harm in the CPRD. We validated general practitioners' recording of these outcomes using linked Office for National Statistics (ONS) mortality and Hospital Episode Statistics (HES) admission data. Methods We identified cases of suicide and self-harm recorded using appropriate Read codes in the CPRD between 1998 and 2010 in patients aged ≥15 years. Suicides were defined as patients with Read codes for suicide recorded within 95 days of their death. International Classification of Diseases codes were used to identify suicides/hospital admissions for self-harm in the linked ONS and HES data sets. We compared CPRD-derived cases/incidence of suicide and self-harm with those identified from linked ONS mortality and HES data, national suicide incidence rates and published self-harm incidence data. Results Only 26.1% (n = 590) of the ‘true’ (ONS-confirmed) suicides were identified using Read codes. Furthermore, only 55.5% of Read code-identified suicides were confirmed as suicide by the ONS data. Of the HES-identified cases of self-harm, 68.4% were identified in the CPRD using Read codes. The CPRD self-harm rates based on Read codes had similar age and sex distributions to rates observed in self-harm hospital registers, although rates were underestimated in all age groups. Conclusions The CPRD recording of suicide using Read codes is unreliable, with significant inaccuracy (over- and under-reporting). Future CPRD suicide studies should use linked ONS mortality data. The under-reporting of self-harm appears to be less marked. PMID:23216533
National Variation in Costs and Mortality for Leukodystrophy Patients in U.S. Children’s Hospitals
Brimley, Cameron J; Lopez, Jonathan; van Haren, Keith; Wilkes, Jacob; Sheng, Xiaoming; Nelson, Clint; Korgenski, E. Kent; Srivastava, Rajendu; Bonkowsky, Joshua L.
2013-01-01
Background Inherited leukodystrophies are progressive, debilitating neurological disorders with few treatment options and high mortality rates. Our objective was to determine national variation in the costs for leukodystrophy patients, and to evaluate differences in their care. Methods We developed an algorithm to identify inherited leukodystrophy patients in de-identified data sets using a recursive tree model based on ICD-9 CM diagnosis and procedure charge codes. Validation of the algorithm was performed independently at two institutions, and with data from the Pediatric Health Information System (PHIS) of 43 U.S. children’s hospitals, for a seven year time period, 2004–2010. Results A recursive algorithm was developed and validated, based on six ICD-9 codes and one procedure code, that had a sensitivity up to 90% (range 61–90%) and a specificity up to 99% (range 53–99%) for identifying inherited leukodystrophy patients. Inherited leukodystrophy patients comprise 0.4% of admissions to children’s hospitals and 0.7% of costs. Over seven years these patients required $411 million of hospital care, or $131,000/patient. Hospital costs for leukodystrophy patients varied at different institutions, ranging from 2 to 15 times more than the average pediatric patient. There was a statistically significant correlation between higher volume and increased cost efficiency. Increased mortality rates had an inverse relationship with increased patient volume that was not statistically significant. Conclusions We developed and validated a code-based algorithm for identifying leukodystrophy patients in deidentified national datasets. Leukodystrophy patients account for $59 million of costs yearly at children’s hospitals. Our data highlight potential to reduce unwarranted variability and improve patient care. PMID:23953952
Molaeinezhad, Mitra; Roudsari, Robab Latifnejad; Yousefy, Alireza; Salehi, Mehrdad; Khoei, Effat Merghati
2014-01-01
Background: Vaginismus is considered as one of the most common female psychosexual dysfunctions. Although the importance of using a multidisciplinary approach for assessment of vaginal penetration disorder is emphasized, the paucity of instruments for this purpose is clear. We designed a study to develop and investigate the psychometric properties of a multidimensional vaginal penetration disorder questionnaire (MVPDQ), thereby assisting specialists for clinical assessment of women with lifelong vaginismus (LLV). Materials and Methods: MVPDQ was developed using the findings from a thematic qualitative research conducted with 20 unconsummated couples from a former study, which was followed by an extensive literature review. Then, during a cross-sectional design, a consecutive sample of 214 women, who were diagnosed as LLV based on Diagnostic and Statistical Manual of Mental Disorders (DSM)-IV-TR criteria completed MVPDQ and additional questions regarding their demographic and sexual history. Validation measures and reliability were tested by exploratory factor analysis and Cronbach's alpha coefficient via Statistical Package for the Social Sciences (SPSS) version 16. Results: After conducting exploratory factor analysis, MVPDQ emerged with 72 items and 9 dimensions: Catastrophic cognitions and tightening, helplessness, marital adjustment, hypervigilance, avoidance, penetration motivation, sexual information, genital incompatibility, and optimism. Subscales of MVPDQ showed a significant reliability that varied between 0.70 and 0.87 and results of test–retest were satisfactory. Conclusion: The present study shows that MVPDQ is a valid and reliable self-report questionnaire for clinical assessment of women complaining of LLV. This instrument may assist specialists to make a clinical judgment and plan appropriately for clinical management. PMID:25097607
Deblauwe, Vincent; Kennel, Pol; Couteron, Pierre
2012-01-01
Background Independence between observations is a standard prerequisite of traditional statistical tests of association. This condition is, however, violated when autocorrelation is present within the data. In the case of variables that are regularly sampled in space (i.e. lattice data or images), such as those provided by remote-sensing or geographical databases, this problem is particularly acute. Because analytic derivation of the null probability distribution of the test statistic (e.g. Pearson's r) is not always possible when autocorrelation is present, we propose instead the use of a Monte Carlo simulation with surrogate data. Methodology/Principal Findings The null hypothesis that two observed mapped variables are the result of independent pattern generating processes is tested here by generating sets of random image data while preserving the autocorrelation function of the original images. Surrogates are generated by matching the dual-tree complex wavelet spectra (and hence the autocorrelation functions) of white noise images with the spectra of the original images. The generated images can then be used to build the probability distribution function of any statistic of association under the null hypothesis. We demonstrate the validity of a statistical test of association based on these surrogates with both actual and synthetic data and compare it with a corrected parametric test and three existing methods that generate surrogates (randomization, random rotations and shifts, and iterative amplitude adjusted Fourier transform). Type I error control was excellent, even with strong and long-range autocorrelation, which is not the case for alternative methods. Conclusions/Significance The wavelet-based surrogates are particularly appropriate in cases where autocorrelation appears at all scales or is direction-dependent (anisotropy). We explore the potential of the method for association tests involving a lattice of binary data and discuss its potential for validation of species distribution models. An implementation of the method in Java for the generation of wavelet-based surrogates is available online as supporting material. PMID:23144961
White, Khendi T.; Moorthy, M.V.; Akinkuolie, Akintunde O.; Demler, Olga; Ridker, Paul M; Cook, Nancy R.; Mora, Samia
2015-01-01
Background Nonfasting triglycerides are similar to or superior to fasting triglycerides at predicting cardiovascular events. However, diagnostic cutpoints are based on fasting triglycerides. We examined the optimal cutpoint for increased nonfasting triglycerides. Methods Baseline nonfasting (<8 hours since last meal) samples were obtained from 6,391 participants in the Women’s Health Study, followed prospectively for up to 17 years. The optimal diagnostic threshold for nonfasting triglycerides, determined by logistic regression models using c-statistics and Youden index (sum of sensitivity and specificity minus one), was used to calculate hazard ratios for incident cardiovascular events. Performance was compared to thresholds recommended by the American Heart Association (AHA) and European guidelines. Results The optimal threshold was 175 mg/dL (1.98 mmol/L), corresponding to a c-statistic of 0.656 that was statistically better than the AHA cutpoint of 200 mg/dL (c-statistic of 0.628). For nonfasting triglycerides above and below 175 mg/dL, adjusting for age, hypertension, smoking, hormone use, and menopausal status, the hazard ratio for cardiovascular events was 1.88 (95% CI, 1.52–2.33, P<0.001), and for triglycerides measured at 0–4 and 4–8 hours since last meal, hazard ratios (95%CIs) were 2.05 (1.54– 2.74) and 1.68 (1.21–2.32), respectively. Performance of this optimal cutpoint was validated using ten-fold cross-validation and bootstrapping of multivariable models that included standard risk factors plus total and HDL cholesterol, diabetes, body-mass index, and C-reactive protein. Conclusions In this study of middle aged and older apparently healthy women, we identified a diagnostic threshold for nonfasting hypertriglyceridemia of 175 mg/dL (1.98 mmol/L), with the potential to more accurately identify cases than the currently recommended AHA cutpoint. PMID:26071491
Adler, Lenard A.; Clemow, David B.; Williams, David W.; Durell, Todd M.
2014-01-01
Objective To evaluate the effect of atomoxetine treatment on executive functions in young adults with attention-deficit/hyperactivity disorder (ADHD). Methods In this Phase 4, multi-center, double-blind, placebo-controlled trial, young adults (18–30 years) with ADHD were randomized to receive atomoxetine (20–50 mg BID, N = 220) or placebo (N = 225) for 12 weeks. The Behavior Rating Inventory of Executive Function-Adult (BRIEF-A) consists of 75 self-report items within 9 nonoverlapping clinical scales measuring various aspects of executive functioning. Mean changes from baseline to 12-week endpoint on the BRIEF-A were analyzed using an ANCOVA model (terms: baseline score, treatment, and investigator). Results At baseline, there were no significant treatment group differences in the percentage of patients with BRIEF-A composite or index T-scores ≥60 (p>.5), with over 92% of patients having composite scores ≥60 (≥60 deemed clinically meaningful for these analyses). At endpoint, statistically significantly greater mean reductions were seen in the atomoxetine versus placebo group for the BRIEF-A Global Executive Composite (GEC), Behavioral Regulation Index (BRI), and Metacognitive Index (MI) scores, as well as the Inhibit, Self-Monitor, Working Memory, Plan/Organize and Task Monitor subscale scores (p<.05), with decreases in scores signifying improvements in executive functioning. Changes in the BRIEF-A Initiate (p = .051), Organization of Materials (p = .051), Shift (p = .090), and Emotional Control (p = .219) subscale scores were not statistically significant. In addition, the validity scales: Inconsistency (p = .644), Infrequency (p = .097), and Negativity (p = .456) were not statistically significant, showing scale validity. Conclusion Statistically significantly greater improvement in executive function was observed in young adults with ADHD in the atomoxetine versus placebo group as measured by changes in the BRIEF-A scales. Trial Registration ClinicalTrials.gov NCT00510276 PMID:25148243
Clark, Matthew T.; Calland, James Forrest; Enfield, Kyle B.; Voss, John D.; Lake, Douglas E.; Moorman, J. Randall
2017-01-01
Background Charted vital signs and laboratory results represent intermittent samples of a patient’s dynamic physiologic state and have been used to calculate early warning scores to identify patients at risk of clinical deterioration. We hypothesized that the addition of cardiorespiratory dynamics measured from continuous electrocardiography (ECG) monitoring to intermittently sampled data improves the predictive validity of models trained to detect clinical deterioration prior to intensive care unit (ICU) transfer or unanticipated death. Methods and findings We analyzed 63 patient-years of ECG data from 8,105 acute care patient admissions at a tertiary care academic medical center. We developed models to predict deterioration resulting in ICU transfer or unanticipated death within the next 24 hours using either vital signs, laboratory results, or cardiorespiratory dynamics from continuous ECG monitoring and also evaluated models using all available data sources. We calculated the predictive validity (C-statistic), the net reclassification improvement, and the probability of achieving the difference in likelihood ratio χ2 for the additional degrees of freedom. The primary outcome occurred 755 times in 586 admissions (7%). We analyzed 395 clinical deteriorations with continuous ECG data in the 24 hours prior to an event. Using only continuous ECG measures resulted in a C-statistic of 0.65, similar to models using only laboratory results and vital signs (0.63 and 0.69 respectively). Addition of continuous ECG measures to models using conventional measurements improved the C-statistic by 0.01 and 0.07; a model integrating all data sources had a C-statistic of 0.73 with categorical net reclassification improvement of 0.09 for a change of 1 decile in risk. The difference in likelihood ratio χ2 between integrated models with and without cardiorespiratory dynamics was 2158 (p value: <0.001). Conclusions Cardiorespiratory dynamics from continuous ECG monitoring detect clinical deterioration in acute care patients and improve performance of conventional models that use only laboratory results and vital signs. PMID:28771487
Brunault, Paul; Ballon, Nicolas; Gaillard, Philippe; Réveillère, Christian; Courtois, Robert
2014-01-01
Objective: The concept of food addiction has recently been proposed by applying the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision, criteria for substance dependence to eating behaviour. Food addiction has received increased attention given that it may play a role in binge eating, eating disorders, and the recent increase in obesity prevalence. Currently, there is no psychometrically sound tool for assessing food addiction in French. Our study aimed to test the psychometric properties of a French version of the Yale Food Addiction Scale (YFAS) by establishing its factor structure and construct validity in a nonclinical population. Method: A total of 553 participants were assessed for food addiction (French version of the YFAS) and binge eating behaviour (Bulimic Investigatory Test Edinburgh and Binge Eating Scale). We tested the scale’s factor structure (factor analysis for dichotomous data based on tetrachoric correlation coefficients), internal consistency, and construct validity with measures of binge eating. Results: Our results supported a 1-factor structure, which accounted for 54.1% of the variance. This tool had adequate reliability and high construct validity with measures of binge eating in this population, both in its diagnosis and symptom count version. A 2-factor structure explained an additional 9.1% of the variance, and could differentiate between patients with high, compared with low, levels of insight regarding addiction symptoms. Conclusions: In our study, we validated a psychometrically sound French version of the YFAS, both in its symptom count and diagnostic version. Future studies should validate this tool in clinical samples. PMID:25007281
Komro, Kelli A; Livingston, Melvin D; Kominsky, Terrence K; Livingston, Bethany J; Garrett, Brady A; Molina, Mildred Maldonado; Boyd, Misty L
2015-01-01
Objective: American Indians (AIs) suffer from significant alcohol-related health disparities, and increased risk begins early. This study examined the reliability and validity of measures to be used in a preventive intervention trial. Reliability and validity across racial/ethnic subgroups are crucial to evaluate intervention effectiveness and promote culturally appropriate evidence-based practice. Method: To assess reliability and validity, we used three baseline surveys of high school students participating in a preventive intervention trial within the jurisdictional service area of the Cherokee Nation in northeastern Oklahoma. The 15-minute alcohol risk survey included 16 multi-item scales and one composite score measuring key proximal, primary, and moderating variables. Forty-four percent of the students indicated that they were AI (of whom 82% were Cherokee), including 23% who reported being AI only (n = 435) and 18% both AI and White (n = 352). Forty-seven percent reported being White only (n = 901). Results: Scales were adequately reliable for the full sample and across race/ethnicity defined by AI, AI/White, and White subgroups. Among the full sample, all scales had acceptable internal consistency, with minor variation across race/ethnicity. All scales had extensive to exemplary test–retest reliability and showed minimal variation across race/ethnicity. The eight proximal and two primary outcome scales were each significantly associated with the frequency of alcohol use during the past month in both the cross-sectional and the longitudinal models, providing support for both criterion validity and predictive validity. For most scales, interpretation of the strength of association and statistical significance did not differ between the racial/ethnic subgroups. Conclusions: The results support the reliability and validity of scales of a brief questionnaire measuring risk and protective factors for alcohol use among AI adolescents, primarily members of the Cherokee Nation. PMID:25486402
PIV Data Validation Software Package
NASA Technical Reports Server (NTRS)
Blackshire, James L.
1997-01-01
A PIV data validation and post-processing software package was developed to provide semi-automated data validation and data reduction capabilities for Particle Image Velocimetry data sets. The software provides three primary capabilities including (1) removal of spurious vector data, (2) filtering, smoothing, and interpolating of PIV data, and (3) calculations of out-of-plane vorticity, ensemble statistics, and turbulence statistics information. The software runs on an IBM PC/AT host computer working either under Microsoft Windows 3.1 or Windows 95 operating systems.
Validity assessment of self-reported medication use by comparing to pharmacy insurance claims
Fujita, Misuzu; Sato, Yasunori; Nagashima, Kengo; Takahashi, Sho; Hata, Akira
2015-01-01
Objectives In Japan, an annual health check-up and health promotion guidance programme was established in 2008 in accordance with the Act on Assurance of Medical Care for the Elderly. A self-reported questionnaire on medication use is a required item in this programme and has been used widely, but its validity has not been assessed. The aim of this study was to evaluate the validity of this questionnaire by comparing self-reported usage to pharmacy insurance claims. Setting This is a population-based validation study. Self-reported medication use for hypertension, diabetes and dyslipidaemia is the evaluated measurement. Data on pharmacy insurance claims are used as a reference standard. Participants Participants were 54 712 beneficiaries of the National Health Insurance of Chiba City. Primary and secondary outcome measures Sensitivity, specificity and κ statistics of the self-reported medication-use questionnaire for predicting actual prescriptions during 1 month (that of the check-up) and 3 months (that of the check-up and the previous 2 months) were calculated. Results Sensitivity and specificity scores of questionnaire data for predicting insurance claims covering 3 months were, respectively, 92.4% (95% CI 91.9 to 92.8) and 86.4% (95% CI 86.0 to 86.7) for hypertension, 82.6% (95% CI 81.1 to 84.0) and 98.5% (95% CI 98.4 to 98.6) for diabetes, and 86.2% (95% CI 85.5 to 86.8) and 91.0% (95% CI 90.8 to 91.3) for dyslipidaemia. Corresponding κ statistics were 70.9% (95% CI 70.1 to 71.7), 77.1% (95% CI 76.2 to 77.9) and 69.8% (95% CI 68.9 to 70.6). The specificity was significantly higher for questionnaire data covering 3 months compared with data covering 1 month for all 3 conditions. Conclusions Self-reported questionnaire data on medication use had sufficiently high validity for further analyses. Item responses showed close agreement with actual prescriptions, particularly those covering 3 months. PMID:26553839
Smith, Heidi A.B.; Gangopadhyay, Maalobeeka; Goben, Christina M.; Jacobowski, Natalie L.; Chestnut, Mary Hamilton; Savage, Shane; Rutherford, Michael T.; Denton, Danica; Thompson, Jennifer L.; Chandrasekhar, Rameela; Acton, Michelle; Newman, Jessica; Noori, Hannah P.; Terrell, Michelle K.; Williams, Stacey R.; Griffith, Katherine; Cooper, Timothy J.; Ely, E. Wesley; Fuchs, D. Catherine; Pandharipande, Pratik P.
2015-01-01
RATIONALE and OBJECTIVE Delirium assessments in critically ill infants and young children pose unique challenges due to evolution of cognitive and language skills. The objectives of this study were to determine the validity and reliability of a fundamentally objective and developmentally appropriate delirium assessment tool for critically ill infants and preschool-aged children, and to determine delirium prevalence. DESIGN and SETTING Prospective, observational cohort validation study of the PreSchool Confusion Assessment Method for the ICU (psCAM-ICU) in a tertiary medical center pediatric ICU. PATIENTS Participants aged 6 months to 5 years and admitted to the pediatric ICU regardless of admission diagnosis were enrolled. INTERVENTIONS, MEASUREMENTS and MAIN RESULTS An interdisciplinary team created the psCAM-ICU for pediatric delirium monitoring. To assess validity, patients were independently assessed for delirium daily by the research team using the psCAM-ICU and by a child psychiatrist using the Diagnostic and Statistical Manual of Mental Disorders criteria. Reliability was assessed using blinded, concurrent psCAM-ICU evaluations by research staff. A total of 530-paired delirium assessments were completed among 300 patients, with a median age of 20 months (IQR 11, 37) and 43% requiring mechanical ventilation. The psCAM-ICU demonstrated a specificity of 91% (95%CI 90, 93), sensitivity of 75% (72, 78), negative predictive value of 86% (84, 88), positive predictive value of 84% (81, 87), and a reliability kappa statistic of 0.79 (0.76, 0.83). Delirium prevalence was 44% using the psCAM-ICU and 47% by the reference-rater. The rates of delirium were 53% vs. 56% in patients < 2 years of age and 33% vs. 35% in patients ≥ 2 - 5 years of age using the psCAM-ICU and reference-rater respectively. The short-form psCAM-ICU maintained a high specificity (87%) and sensitivity (78%) in post-hoc analysis. CONCLUSIONS The psCAM-ICU is a highly valid and reliable delirium instrument for critically ill infants and preschool-aged children, in whom delirium is extremely prevalent. PMID:26565631
Anitharaj, Velmurugan; Stephen, Selvaraj; Pradeep, Jothimani; Pooja, Pratheesh; Preethi, Sridharan
2017-01-01
Background: In the recent past, scrub typhus (ST) has been reported from different parts of India, based on Weil-Felix/enzyme-linked immunosorbent assay (ELISA)/indirect immunofluorescence assay (IFA). Molecular tests are applied only by a few researchers. Aims: Evaluation of a new commercial real time polymerase chain reaction (PCR) kit for molecular diagnosis of ST by comparing it with the commonly used IgM ELISA is our aim. Settings and Design: ST has been reported all over India including Puducherry and surrounding Tamil Nadu and identified as endemic for ST. This study was designed to correlate antibody detection by IgM ELISA and Orientia tsutsugamushi DNA in real time PCR. Materials and Methods: ST IgM ELISA (InBios Inc., USA) was carried out for 170 consecutive patients who presented with the symptoms of acute ST during 11 months (November, 2015– September, 2016). All 77 of these patients with IgM ELISA positivity and 49 of 93 IgM ELISA negative patients were subjected to real time PCR (Geno-Sen's ST real time PCR, Himachal Pradesh, India). Statistical Analysis: Statistical analysis for clinical and laboratory results was performed using IBM SPSS Statistics 17 for Windows (SPSS Inc., Chicago, USA). Chi-square test with Yates correction (Fisher's test) was employed for a small number of samples. Results and Conclusion: Among 77 suspected cases of acute ST with IgM ELISA positivity and 49 IgM negative patients, 42 and 7 were positive, respectively, for O. tsutsugamushi 56-kDa type-specific gene in real time PCR kit. Until ST IFA, the gold standard diagnostic test, is properly validated in India, diagnosis of acute ST will depend on both ELISA and quantitative PCR. PMID:28878522
Montgomery, Eric; Gao, Chen; de Luca, Julie; Bower, Jessie; Attwood, Kristropher; Ylagan, Lourdes
2014-12-01
The Cellient(®) cell block system has become available as an alternative, partially automated method to create cell blocks in cytology. We sought to show a validation method for immunohistochemical (IHC) staining on the Cellient cell block system (CCB) in comparison with the formalin fixed paraffin embedded traditional cell block (TCB). Immunohistochemical staining was performed using 31 antibodies on 38 patient samples for a total of 326 slides. Split samples were processed using both methods by following the Cellient(®) manufacturer's recommendations for the Cellient cell block (CCB) and the Histogel method for preparing the traditional cell block (TCB). Interpretation was performed by three pathologists and two cytotechnologists. Immunohistochemical stains were scored as: 0/1+ (negative) and 2/3+ (positive). Inter-rater agreement for each antibody was evaluated for CCB and TCB, as well as the intra-rater agreement between TCB and CCB between observers. Interobserver staining concordance for the TCB was obtained with statistical significance (P < 0.05) in 24 of 31 antibodies. Interobserver staining concordance for the CCB was obtained with statistical significance in 27 of 31 antibodies. Intra-observer staining concordance between TCB and CCB was obtained with statistical significance in 24 of 31 antibodies tested. In conclusions, immunohistochemical stains on cytologic specimens processed by the Cellient system are reliable and concordant with stains performed on the same split samples processed via a formalin fixed-paraffin embedded (FFPE) block. The Cellient system is a welcome adjunct to cytology work-flow by producing cell block material of sufficient quality to allow the use of routine IHC. © 2014 Wiley Periodicals, Inc.
Prediction of Ischemic Heart Disease and Stroke in Survivors of Childhood Cancer.
Chow, Eric J; Chen, Yan; Hudson, Melissa M; Feijen, Elizabeth A M; Kremer, Leontien C; Border, William L; Green, Daniel M; Meacham, Lillian R; Mulrooney, Daniel A; Ness, Kirsten K; Oeffinger, Kevin C; Ronckers, Cécile M; Sklar, Charles A; Stovall, Marilyn; van der Pal, Helena J; van Dijk, Irma W E M; van Leeuwen, Flora E; Weathers, Rita E; Robison, Leslie L; Armstrong, Gregory T; Yasui, Yutaka
2018-01-01
Purpose We aimed to predict individual risk of ischemic heart disease and stroke in 5-year survivors of childhood cancer. Patients and Methods Participants in the Childhood Cancer Survivor Study (CCSS; n = 13,060) were observed through age 50 years for the development of ischemic heart disease and stroke. Siblings (n = 4,023) established the baseline population risk. Piecewise exponential models with backward selection estimated the relationships between potential predictors and each outcome. The St Jude Lifetime Cohort Study (n = 1,842) and the Emma Children's Hospital cohort (n = 1,362) were used to validate the CCSS models. Results Ischemic heart disease and stroke occurred in 265 and 295 CCSS participants, respectively. Risk scores based on a standard prediction model that included sex, chemotherapy, and radiotherapy (cranial, neck, and chest) exposures achieved an area under the curve and concordance statistic of 0.70 and 0.70 for ischemic heart disease and 0.63 and 0.66 for stroke, respectively. Validation cohort area under the curve and concordance statistics ranged from 0.66 to 0.67 for ischemic heart disease and 0.68 to 0.72 for stroke. Risk scores were collapsed to form statistically distinct low-, moderate-, and high-risk groups. The cumulative incidences at age 50 years among CCSS low-risk groups were < 5%, compared with approximately 20% for high-risk groups ( P < .001); cumulative incidence was only 1% for siblings ( P < .001 v low-risk survivors). Conclusion Information available to clinicians soon after completion of childhood cancer therapy can predict individual risk for subsequent ischemic heart disease and stroke with reasonable accuracy and discrimination through age 50 years. These models provide a framework on which to base future screening strategies and interventions.
Validity of Models for Predicting BRCA1 and BRCA2 Mutations
Parmigiani, Giovanni; Chen, Sining; Iversen, Edwin S.; Friebel, Tara M.; Finkelstein, Dianne M.; Anton-Culver, Hoda; Ziogas, Argyrios; Weber, Barbara L.; Eisen, Andrea; Malone, Kathleen E.; Daling, Janet R.; Hsu, Li; Ostrander, Elaine A.; Peterson, Leif E.; Schildkraut, Joellen M.; Isaacs, Claudine; Corio, Camille; Leondaridis, Leoni; Tomlinson, Gail; Amos, Christopher I.; Strong, Louise C.; Berry, Donald A.; Weitzel, Jeffrey N.; Sand, Sharon; Dutson, Debra; Kerber, Rich; Peshkin, Beth N.; Euhus, David M.
2008-01-01
Background Deleterious mutations of the BRCA1 and BRCA2 genes confer susceptibility to breast and ovarian cancer. At least 7 models for estimating the probabilities of having a mutation are used widely in clinical and scientific activities; however, the merits and limitations of these models are not fully understood. Objective To systematically quantify the accuracy of the following publicly available models to predict mutation carrier status: BRCAPRO, family history assessment tool, Finnish, Myriad, National Cancer Institute, University of Pennsylvania, and Yale University. Design Cross-sectional validation study, using model predictions and BRCA1 or BRCA2 mutation status of patients different from those used to develop the models. Setting Multicenter study across Cancer Genetics Network participating centers. Patients 3 population-based samples of participants in research studies and 8 samples from genetic counseling clinics. Measurements Discrimination between individuals testing positive for a mutation in BRCA1 or BRCA2 from those testing negative, as measured by the c-statistic, and sensitivity and specificity of model predictions. Results The 7 models differ in their predictions. The better-performing models have a c-statistic around 80%. BRCAPRO has the largest c-statistic overall and in all but 2 patient subgroups, although the margin over other models is narrow in many strata. Outside of high-risk populations, all models have high false-negative and false-positive rates across a range of probability thresholds used to refer for mutation testing. Limitation Three recently published models were not included. Conclusions All models identify women who probably carry a deleterious mutation of BRCA1 or BRCA2 with adequate discrimination to support individualized genetic counseling, although discrimination varies across models and populations. PMID:17909205
On the analysis of very small samples of Gaussian repeated measurements: an alternative approach.
Westgate, Philip M; Burchett, Woodrow W
2017-03-15
The analysis of very small samples of Gaussian repeated measurements can be challenging. First, due to a very small number of independent subjects contributing outcomes over time, statistical power can be quite small. Second, nuisance covariance parameters must be appropriately accounted for in the analysis in order to maintain the nominal test size. However, available statistical strategies that ensure valid statistical inference may lack power, whereas more powerful methods may have the potential for inflated test sizes. Therefore, we explore an alternative approach to the analysis of very small samples of Gaussian repeated measurements, with the goal of maintaining valid inference while also improving statistical power relative to other valid methods. This approach uses generalized estimating equations with a bias-corrected empirical covariance matrix that accounts for all small-sample aspects of nuisance correlation parameter estimation in order to maintain valid inference. Furthermore, the approach utilizes correlation selection strategies with the goal of choosing the working structure that will result in the greatest power. In our study, we show that when accurate modeling of the nuisance correlation structure impacts the efficiency of regression parameter estimation, this method can improve power relative to existing methods that yield valid inference. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Reliability and Validity of Two Self-report Measures to Assess Sedentary Behavior in Older Adults
Gennuso, Keith P.; Matthews, Charles E.; Colbert, Lisa H.
2015-01-01
Background The purpose of this study was to examine the reliability and validity of two currently available physical activity surveys for assessing time spent in sedentary behavior (SB) in older adults. Methods Fifty-eight adults (≥65 years) completed the Yale Physical Activity Survey for Older Adults (YPAS) and Community Health Activities Model Program for Seniors (CHAMPS) before and after a 10-day period during which they wore an ActiGraph accelerometer (ACC). Intraclass correlation coefficients (ICC) examined test-retest reliability. Overall percent agreement and a kappa statistic examined YPAS validity. Lin’s concordance correlation, Pearson correlation, and Bland-Altman analysis examined CHAMPS validity. Results Both surveys had moderate test-retest reliability (ICC: YPAS=0.59 (P<0.001), CHAMPS=0.64 (P<0.001)) and significantly underestimated SB time. Agreement between YPAS and ACC was low (κ=−0.0003); however, there was a linear increase (P< 0.01) in ACC-derived SB time across YPAS response categories. There was poor agreement between ACC-derived SB and CHAMPS (Lin’s r=0.005; 95% CI, −0.010 to 0.020), and no linear trend across CHAMPS quartiles (p=0.53). Conclusions Neither of the surveys should be used as the sole measure of SB in a study; though the YPAS has the ability to rank individuals, providing it with some merit for use in correlational SB research. PMID:25110344
A microRNA-based prediction model for lymph node metastasis in hepatocellular carcinoma.
Zhang, Li; Xiang, Zuo-Lin; Zeng, Zhao-Chong; Fan, Jia; Tang, Zhao-You; Zhao, Xiao-Mei
2016-01-19
We developed an efficient microRNA (miRNA) model that could predict the risk of lymph node metastasis (LNM) in hepatocellular carcinoma (HCC). We first evaluated a training cohort of 192 HCC patients after hepatectomy and found five LNM associated predictive factors: vascular invasion, Barcelona Clinic Liver Cancer stage, miR-145, miR-31, and miR-92a. The five statistically independent factors were used to develop a predictive model. The predictive value of the miRNA-based model was confirmed in a validation cohort of 209 consecutive HCC patients. The prediction model was scored for LNM risk from 0 to 8. The cutoff value 4 was used to distinguish high-risk and low-risk groups. The model sensitivity and specificity was 69.6 and 80.2%, respectively, during 5 years in the validation cohort. And the area under the curve (AUC) for the miRNA-based prognostic model was 0.860. The 5-year positive and negative predictive values of the model in the validation cohort were 30.3 and 95.5%, respectively. Cox regression analysis revealed that the LNM hazard ratio of the high-risk versus low-risk groups was 11.751 (95% CI, 5.110-27.021; P < 0.001) in the validation cohort. In conclusion, the miRNA-based model is reliable and accurate for the early prediction of LNM in patients with HCC.
de Sousa, Carla Suellen Pires; Castro, Régia Christina Moura Barbosa; Pinheiro, Ana Karina Bezerra; Moura, Escolástica Rejane Ferreira; Almeida, Paulo César; Aquino, Priscila de Souza
2018-01-01
ABSTRACT Objective: translate and adapt the Condom Self-Efficacy Scale to Portuguese in the Brazilian context. The scale originated in the United States and measures self-efficacy in condom use. Method: methodological study in two phases: translation, cross-cultural adaptation and verification of psychometric properties. The translation and adaptation process involved four translators, one mediator of the synthesis and five health professionals. The content validity was verified using the Content Validation Index, based on 22 experts’ judgments. Forty subjects participated in the pretest, who contributed to the understanding of the scale items. The scale was applied to 209 students between 13 and 26 years of age from a school affiliated with the state-owned educational network. The reliability was analyzed by means of Cronbach’s alpha. Results: the Portuguese version of the scale obtained a Cronbach’s alpha coefficient of 0.85 and the total mean score was 68.1 points. A statistically significant relation was found between the total scale and the variables not having children (p= 0.038), condom use (p= 0.008) and condom use with fixed partner (p=0.036). Conclusion: the Brazilian version of the Condom Self-Efficacy Scale is a valid and reliable tool to verify the self-efficacy in condom use among adolescents and young adults. PMID:29319748
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhong, Bin-Yan; He, Shi-Cheng; Zhu, Hai-Dong
PurposeWe aim to determine the predictors of new adjacent vertebral fractures (AVCFs) after percutaneous vertebroplasty (PVP) in patients with osteoporotic vertebral compression fractures (OVCFs) and to construct a risk prediction score to estimate a 2-year new AVCF risk-by-risk factor condition.Materials and MethodsPatients with OVCFs who underwent their first PVP between December 2006 and December 2013 at Hospital A (training cohort) and Hospital B (validation cohort) were included in this study. In training cohort, we assessed the independent risk predictors and developed the probability of new adjacent OVCFs (PNAV) score system using the Cox proportional hazard regression analysis. The accuracy ofmore » this system was then validated in both training and validation cohorts by concordance (c) statistic.Results421 patients (training cohort: n = 256; validation cohort: n = 165) were included in this study. In training cohort, new AVCFs after the first PVP treatment occurred in 33 (12.9%) patients. The independent risk factors were intradiscal cement leakage and preexisting old vertebral compression fracture(s). The estimated 2-year absolute risk of new AVCFs ranged from less than 4% in patients with neither independent risk factors to more than 45% in individuals with both factors.ConclusionsThe PNAV score is an objective and easy approach to predict the risk of new AVCFs.« less
Validation of the Chinese version of EORTC QLQ-BN20 for patients with brain cancer.
Zhang, K; Tian, J; He, Z; Sun, W; Pekbay, B; Lin, Y; Wu, D; Zhang, J; Chen, P; Guo, H; Wan, Y; Wang, M; Yang, S; Zheng, J; Zhang, L
2018-03-01
This is a single centre study in mainland China aiming to evaluate the reliability, validity and responsiveness of the Chinese version of EORTC QLQ-BN20, designed by The European Organization for Research and Treatment of Cancer Quality of Life Group to evaluate the life quality of patients with brain tumour, cancer or metastases. One hundred and eighty-eight patients with primary or secondary brain cancer from Hunan Provincial Tumor Hospital during September 2013 to June 2014 completed the Chinese EORTC QLQ-C30/BN20 questionnaires developed by translation, back translation and cultural adaptation. Results were statistically analysed using SPSS17.0. The internal consistency (Cronbach's α coefficient) was between .753 and .869, the correlation coefficients among items and its own dimension were bigger than .4, and all items had a better correlation with its own dimension. The Spearman was used to analyse the correlation of each dimension between EORTC QLQ-BN20 and EORTC QLQ-C30, and the result showed that individual dimensions were moderately correlated, other dimensions were weakly correlated. In conclusion, the Chinese version of EORTC QLQ BN20 questionnaire had great relevance, reliability, convergent validity and discriminant validity. It provides a valuable tool for the assessment of health-related quality of life in clinical studies of Chinese patients with primary or secondary brain cancer. © 2018 John Wiley & Sons Ltd.
Validation of the Saskatoon Falls Prevention Consortium's Falls Screening and Referral Algorithm
Lawson, Sara Nicole; Zaluski, Neal; Petrie, Amanda; Arnold, Cathy; Basran, Jenny
2013-01-01
ABSTRACT Purpose: To investigate the concurrent validity of the Saskatoon Falls Prevention Consortium's Falls Screening and Referral Algorithm (FSRA). Method: A total of 29 older adults (mean age 77.7 [SD 4.0] y) residing in an independent-living senior's complex who met inclusion criteria completed a demographic questionnaire and the components of the FSRA and Berg Balance Scale (BBS). The FSRA consists of the Elderly Fall Screening Test (EFST) and the Multi-factor Falls Questionnaire (MFQ); it is designed to categorize individuals into low, moderate, or high fall-risk categories to determine appropriate management pathways. A predictive model for probability of fall risk, based on previous research, was used to determine concurrent validity of the FSRI. Results: The FSRA placed 79% of participants into the low-risk category, whereas the predictive model found the probability of fall risk to range from 0.04 to 0.74, with a mean of 0.35 (SD 0.25). No statistically significant correlation was found between the FSRA and the predictive model for probability of fall risk (Spearman's ρ=0.35, p=0.06). Conclusion: The FSRA lacks concurrent validity relative to to a previously established model of fall risk and appears to over-categorize individuals into the low-risk group. Further research on the FSRA as an adequate tool to screen community-dwelling older adults for fall risk is recommended. PMID:24381379
Manzoni, Gian Mauro; Rossi, Alessandro; Marazzi, Nicoletta; Agosti, Fiorenza; De Col, Alessandra; Pietrabissa, Giada; Castelnuovo, Gianluca; Molinari, Enrico; Sartorio, Allessandro
2018-01-01
Objective This study was aimed to examine the feasibility, validity, and reliability of the Italian Pediatric Quality of Life Inventory Multidimensional Fatigue Scale (PedsQL™ MFS) for adult inpatients with severe obesity. Methods 200 inpatients (81% females) with severe obesity (BMI ≥ 35 kg/m2) completed the PedsQL MFS (General Fatigue, Sleep/Rest Fatigue and Cognitive Fatigue domains), the Fatigue Severity Scale, and the Center for Epidemiologic Studies Depression Scale immediately after admission to a 3-week residential body weight reduction program. A randomized subsample of 48 patients re-completed the PedsQL MFS after 3 days. Results Confirmatory factor analysis showed that a modified hierarchical model with two items moved from the Sleep/Rest Fatigue domain to the General Fatigue domain and a second-order latent factor best fitted the data. Internal consistency and test-retest reliabilities were acceptable to high in all scales, and small to high statistically significant correlations were found with all convergent measures, with the exception of BMI. Significant floor effects were found in two scales (Cognitive Fatigue and Sleep/Rest Fatigue). Conclusion The Italian modified PedsQL MFS for adults showed to be a valid and reliable tool for the assessment of fatigue in inpatients with severe obesity. Future studies should assess its discriminant validity as well as its responsiveness to weight reduction. PMID:29402854
Ibrahim, Edward F; Petrou, Charalambos; Galanos, Antonis
2015-01-01
Background The purpose of the present study was to validate the Functional Shoulder Score (FSS), a new patient-reported outcome score specifically designed to evaluate patients with rotator cuff disorders. Methods One hundred and nineteen patients were assessed using two shoulder scoring systems [the FSS and the Constant–Murley Score (CMS)] at 3 weeks pre- and 6 months post-arthroscopic rotator cuff surgery. The reliability, validity, responsiveness and interpretability of the FSS were evaluated. Results Reliability analysis (test–retest) showed an intraclass correlation coefficient value of 0.96 [95% confidence interval (CI) = 0.92 to 0.98]. Internal consistency analysis revealed a Cronbach's alpha coefficient of 0.93. The Pearson correlation coefficient FSS-CMS was 0.782 pre-operatively and 0.737 postoperatively (p < 0.0005). There was a statistically significant increase in FSS scores postoperatively, an effect size of 3.06 and standardized response mean of 2.80. The value for minimal detectable change was ±8.38 scale points (based on a 90% CI) and the minimal clinically important difference for improvement was 24.7 ± 5.4 points. Conclusions The FSS is a patient-reported outcome measure that can easily be incorporated into clinical practice, providing a quick, reliable, valid and practical measure for rotator cuff problems. The questionnaire is highly sensitive to clinical change. PMID:27582986
Assessment of Semi-Structured Clinical Interview for Mobile Phone Addiction Disorder
Alavi, Seyyed Salman; Jannatifard, Fereshteh; Mohammadi Kalhori, Soroush; Sepahbodi, Ghazal; BabaReisi, Mohammad; Sajedi, Sahar; Farshchi, Mojtaba; KhodaKarami, Rasul; Hatami Kasvaee, Vahid
2016-01-01
Objective: The Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision (DSM-IV-TR) classified mobile phone addiction disorder under “impulse control disorder not elsewhere classified”. This study surveyed the diagnostic criteria of DSM-IV-TR for the diagnosis of mobile phone addiction in correspondence with Iranian society and culture. Method: Two hundred fifty students of Tehran universities were entered into this descriptive-analytical and cross-sectional study. Quota sampling method was used. At first, semi- structured clinical interview (based on DSM-IV-TR) was performed for all the cases, and another specialist reevaluated the interviews. Data were analyzed using content validity, inter-scorer reliability (Kappa coefficient) and test-retest via SPSS18 software. Results: The content validity of the semi- structured clinical interview matched the DSM–IV-TR criteria for behavioral addiction. Moreover, their content was appropriate, and two items, including “SMS pathological use” and “High monthly cost of using the mobile phone” were added to promote its validity. Internal reliability (Kappa) and test–retest reliability were 0.55 and r = 0.4 (p<0. 01) respectively. Conclusion: The results of this study revealed that semi- structured diagnostic criteria of DSM-IV-TR are valid and reliable for diagnosing mobile phone addiction, and this instrument is an effective tool to diagnose this disorder. PMID:27437008
The Validity and Reliability of an iPhone App for Measuring Running Mechanics.
Balsalobre-Fernández, Carlos; Agopyan, Hovannes; Morin, Jean-Benoit
2017-07-01
The purpose of this investigation was to analyze the validity of an iPhone application (Runmatic) for measuring running mechanics. To do this, 96 steps from 12 different runs at speeds ranging from 2.77-5.55 m·s -1 were recorded simultaneously with Runmatic, as well as with an opto-electronic device installed on a motorized treadmill to measure the contact and aerial time of each step. Additionally, several running mechanics variables were calculated using the contact and aerial times measured, and previously validated equations. Several statistics were computed to test the validity and reliability of Runmatic in comparison with the opto-electronic device for the measurement of contact time, aerial time, vertical oscillation, leg stiffness, maximum relative force, and step frequency. The running mechanics values obtained with both the app and the opto-electronic device showed a high degree of correlation (r = .94-.99, p < .001). Moreover, there was very close agreement between instruments as revealed by the ICC (2,1) (ICC = 0.965-0.991). Finally, both Runmatic and the opto-electronic device showed almost identical reliability levels when measuring each set of 8 steps for every run recorded. In conclusion, Runmatic has been proven to be a highly reliable tool for measuring the running mechanics studied in this work.
Coaching leadership: leaders' and followers' perception assessment questionnaires in nursing
Cardoso, Maria Lúcia Alves Pereira; Ramos, Laís Helena; D'Innocenzo, Maria
2014-01-01
ABSTRACT Objective: To describe the development, content analysis, and reliability of two questionnaires to assess the perception of nurse leaders, nurse technicians, and licensed practical nurses – coached in the practice of leadership and the relation with the dimensions of the coaching process. Methods: This was a methodological study with a quantitative and qualitative approach, which had the goal of instrumentation in reference to the construction and validation of measuring instruments. The instrument proposition design was based on the literature on leadership, coaching, and assessment of psychometric properties, subjected to content validation as to clarity, relevance, and applicability in order to validate the propositions through the consensus of judges, using the Delphi technique, in 2010. The final version of the questionnaires was administered to 279 nurses and 608 nurse technicians and licensed practical nurses, at two university hospitals and two private hospitals. Results: The Cronbach's alpha value with all items of the self-perception instrument was very high (0.911). The team members' instrument of perception showed that for all determinants and for each dimension of the coaching process, Cronbach's overall alpha value (0.952) was considered quite high, pointing to a very strong consistency of the scale. Confirmatory analysis showed that the models were well adjusted. Conclusion: From the statistical validation we compared the possibility of reusing the questionnaires for other study samples, because there was evidence of reliability and applicability. PMID:24728249
Validating e-learning in continuing pharmacy education: user acceptance and knowledge change
2014-01-01
Background Continuing pharmacy education is becoming mandatory in most countries in order to keep the professional license valid. Increasing number of pharmacists are now using e-learning as part of their continuing education. Consequently, the increasing popularity of this method of education calls for standardization and validation practices. The conducted research explored validation aspects of e-learning in terms of knowledge increase and user acceptance. Methods Two e-courses were conducted as e-based continuing pharmacy education for graduated pharmacists. Knowledge increase and user acceptance were the two outcome measured. The change of knowledge in the first e-course was measured by a pre- and post-test and results analysed by the Wilcoxon signed–rank test. The acceptance of e-learning in the second e-course was investigated by a questionnaire and the results analysed using descriptive statistics. Results Results showed that knowledge increased significantly (p < 0.001) by 16 pp after participation in the first e-course. Among the participants who responded to the survey in the second course, 92% stated that e-courses were effective and 91% stated that they enjoyed the course. Conclusions The study shows that e-learning is a viable medium of conducting continuing pharmacy education; e-learning is effective in increasing knowledge and highly accepted by pharmacists from various working environments such as community and hospital pharmacies, faculties of pharmacy or wholesales. PMID:24528547
Association of TNF, MBL, and VDR Polymorphisms with Leprosy Phenotypes
Sapkota, Bishwa R.; Macdonald, Murdo; Berrington, William R.; Misch, E. Ann; Ranjit, Chaman; Siddiqui, M. Ruby; Kaplan, Gilla; Hawn, Thomas R.
2010-01-01
Background Although genetic variants in tumor necrosis factor (TNF), mannose binding lectin (MBL), and the vitamin D receptor (VDR) have been associated with leprosy clinical outcomes these findings have not been extensively validated. Methods We used a case-control study design with 933 patients in Nepal, which included 240 patients with type I reversal reaction (RR), and 124 patients with erythema nodosum leprosum (ENL) reactions. We compared genotype frequencies in 933 cases and 101 controls of 7 polymorphisms, including a promoter region variant in TNF (G−308A), three polymorphisms in MBL (C154T, G161A and G170A), and three variants in VDR (FokI, BsmI, and TaqI). Results We observed an association between TNF −308A and protection from leprosy with an odds ratio (OR) of 0.52 (95% confidence interval (CI) of 0.29 to 0.95, P = 0.016). MBL polymorphism G161A was associated with protection from lepromatous leprosy (OR (95% CI) = 0.33 (0.12–0.85), P = 0.010). VDR polymorphisms were not associated with leprosy phenotypes. Conclusion These results confirm previous findings of an association of TNF −308A with protection from leprosy and MBL polymorphisms with protection from lepromatous leprosy. The statistical significance was modest and will require further study for conclusive validation. PMID:20650301
Murphy, Siobhan; Elklit, Ask; Dokkedahl, Sarah
2018-01-01
ABSTRACT With the publication of the International Statistical Classification of Diseases and Related Health Problems, 11th edition (ICD-11) due for release in 2018, a number of studies have assessed the factorial validity of the proposed post-traumatic stress disorder (PTSD) and complex (CPTSD) diagnostic criteria and whether the disorders are correlated but distinct constructs. As the specific nature of CPTSD symptoms has yet to be firmly established, this study aimed to examine the dimension of affect dysregulation as two separate constructs representing hyper-activation and hypo-activation. Seven alternative models were estimated within a confirmatory factor analytic framework using the International Trauma Questionnaire (ITQ). Data were analysed from a young adult sample from northern Uganda (n = 314), of which 51% were female and aged 18–25 years. Forty per cent of the participants were former child soldiers (n = 124) while the remainder were civilians (n = 190). The prevalence of CPTSD was 20.8% and PTSD was 13.1%. The results indicated that all models that estimated affective dysregulation as distinct but correlated constructs (i.e. hyper-activation and hypo-activation) provided satisfactory model fit, with statistical superiority for a seven-factor first-order correlated model. Furthermore, individuals who met the criteria for CPTSD reported higher levels of war experiences, symptoms of anxiety and depression, and somatic problems than those with PTSD only and no diagnosis. There was also a much larger proportion of former child soldiers that met the criteria for a CPTSD diagnosis. In conclusion, these results partly support the factorial validity of the ICD-11 proposals for PTSD and CPTSD in a non-Western culture exposed to mass violence. These findings highlight that more research is required across different cultural backgrounds before firm conclusions can be made regarding the factor structure of CPTSD using the ITQ. PMID:29707169
Murphy, Siobhan; Elklit, Ask; Dokkedahl, Sarah; Shevlin, Mark
2018-01-01
With the publication of the International Statistical Classification of Diseases and Related Health Problems, 11th edition (ICD-11) due for release in 2018, a number of studies have assessed the factorial validity of the proposed post-traumatic stress disorder (PTSD) and complex (CPTSD) diagnostic criteria and whether the disorders are correlated but distinct constructs. As the specific nature of CPTSD symptoms has yet to be firmly established, this study aimed to examine the dimension of affect dysregulation as two separate constructs representing hyper-activation and hypo-activation. Seven alternative models were estimated within a confirmatory factor analytic framework using the International Trauma Questionnaire (ITQ). Data were analysed from a young adult sample from northern Uganda ( n = 314), of which 51% were female and aged 18-25 years. Forty per cent of the participants were former child soldiers ( n = 124) while the remainder were civilians ( n = 190). The prevalence of CPTSD was 20.8% and PTSD was 13.1%. The results indicated that all models that estimated affective dysregulation as distinct but correlated constructs (i.e. hyper-activation and hypo-activation) provided satisfactory model fit, with statistical superiority for a seven-factor first-order correlated model. Furthermore, individuals who met the criteria for CPTSD reported higher levels of war experiences, symptoms of anxiety and depression, and somatic problems than those with PTSD only and no diagnosis. There was also a much larger proportion of former child soldiers that met the criteria for a CPTSD diagnosis. In conclusion, these results partly support the factorial validity of the ICD-11 proposals for PTSD and CPTSD in a non-Western culture exposed to mass violence. These findings highlight that more research is required across different cultural backgrounds before firm conclusions can be made regarding the factor structure of CPTSD using the ITQ.
Wojtusiak, Janusz; Michalski, Ryszard S; Simanivanh, Thipkesone; Baranova, Ancha V
2009-12-01
Systematic reviews and meta-analysis of published clinical datasets are important part of medical research. By combining results of multiple studies, meta-analysis is able to increase confidence in its conclusions, validate particular study results, and sometimes lead to new findings. Extensive theory has been built on how to aggregate results from multiple studies and arrive to the statistically valid conclusions. Surprisingly, very little has been done to adopt advanced machine learning methods to support meta-analysis. In this paper we describe a novel machine learning methodology that is capable of inducing accurate and easy to understand attributional rules from aggregated data. Thus, the methodology can be used to support traditional meta-analysis in systematic reviews. Most machine learning applications give primary attention to predictive accuracy of the learned knowledge, and lesser attention to its understandability. Here we employed attributional rules, the special form of rules that are relatively easy to interpret for medical experts who are not necessarily trained in statistics and meta-analysis. The methodology has been implemented and initially tested on a set of publicly available clinical data describing patients with metabolic syndrome (MS). The objective of this application was to determine rules describing combinations of clinical parameters used for metabolic syndrome diagnosis, and to develop rules for predicting whether particular patients are likely to develop secondary complications of MS. The aggregated clinical data was retrieved from 20 separate hospital cohorts that included 12 groups of patients with present liver disease symptoms and 8 control groups of healthy subjects. The total of 152 attributes were used, most of which were measured, however, in different studies. Twenty most common attributes were selected for the rule learning process. By applying the developed rule learning methodology we arrived at several different possible rulesets that can be used to predict three considered complications of MS, namely nonalcoholic fatty liver disease (NAFLD), simple steatosis (SS), and nonalcoholic steatohepatitis (NASH).
Folic acid supplements and colorectal cancer risk: meta-analysis of randomized controlled trials
NASA Astrophysics Data System (ADS)
Qin, Tingting; Du, Mulong; Du, Haina; Shu, Yongqian; Wang, Meilin; Zhu, Lingjun
2015-07-01
Numerous studies have investigated the effects of folic acid supplementation on colorectal cancer risk, but conflicting results were reported. We herein performed a meta-analysis based on relevant studies to reach a more definitive conclusion. The PubMed and Embase databases were searched for quality randomized controlled trials (RCTs) published before October 2014. Eight articles met the inclusion criteria and were subsequently analyzed. The results suggested that folic acid treatment was not associated with colorectal cancer risk in the total population (relative risk [RR] = 1.00, 95% confidence interval [CI] = 0.82-1.22, P = 0.974). Moreover, no statistical effect was identified in further subgroup analyses stratified by ethnicity, gender, body mass index (BMI) and potential confounding factors. No significant heterogeneity or publication bias was observed. In conclusion, our meta-analysis demonstrated that folic acid supplementation had no effect on colorectal cancer risk. However, this finding must be validated by further large studies.
Agnotology: learning from mistakes
NASA Astrophysics Data System (ADS)
Benestad, R. E.; Hygen, H. O.; van Dorland, R.; Cook, J.; Nuccitelli, D.
2013-05-01
Replication is an important part of science, and by repeating past analyses, we show that a number of papers in the scientific literature contain severe methodological flaws which can easily be identified through simple tests and demonstrations. In many cases, shortcomings are related to a lack of robustness, leading to results that are not universally valid but rather an artifact of a particular experimental set-up. Some examples presented here have ignored data that do not fit the conclusions, and in several other cases, inappropriate statistical methods have been adopted or conclusions have been based on misconceived physics. These papers may serve as educational case studies for why certain analytical approaches sometimes are unsuitable in providing reliable answers. They also highlight the merit of replication. A lack of common replication has repercussions for the quality of the scientific literature, and may be a reason why some controversial questions remain unanswered even when ignorance could be reduced. Agnotology is the study of such ignorance. A free and open-source software is provided for demonstration purposes.
A scoring system for ascertainment of incident stroke; the Risk Index Score (RISc).
Kass-Hout, T A; Moyé, L A; Smith, M A; Morgenstern, L B
2006-01-01
The main objective of this study was to develop and validate a computer-based statistical algorithm that could be translated into a simple scoring system in order to ascertain incident stroke cases using hospital admission medical records data. The Risk Index Score (RISc) algorithm was developed using data collected prospectively by the Brain Attack Surveillance in Corpus Christi (BASIC) project, 2000. The validity of RISc was evaluated by estimating the concordance of scoring system stroke ascertainment to stroke ascertainment by physician and/or abstractor review of hospital admission records. RISc was developed on 1718 randomly selected patients (training set) and then statistically validated on an independent sample of 858 patients (validation set). A multivariable logistic model was used to develop RISc and subsequently evaluated by goodness-of-fit and receiver operating characteristic (ROC) analyses. The higher the value of RISc, the higher the patient's risk of potential stroke. The study showed RISc was well calibrated and discriminated those who had potential stroke from those that did not on initial screening. In this study we developed and validated a rapid, easy, efficient, and accurate method to ascertain incident stroke cases from routine hospital admission records for epidemiologic investigations. Validation of this scoring system was achieved statistically; however, clinical validation in a community hospital setting is warranted.
Hickey, Graeme L; Blackstone, Eugene H
2016-08-01
Clinical risk-prediction models serve an important role in healthcare. They are used for clinical decision-making and measuring the performance of healthcare providers. To establish confidence in a model, external model validation is imperative. When designing such an external model validation study, thought must be given to patient selection, risk factor and outcome definitions, missing data, and the transparent reporting of the analysis. In addition, there are a number of statistical methods available for external model validation. Execution of a rigorous external validation study rests in proper study design, application of suitable statistical methods, and transparent reporting. Copyright © 2016 The American Association for Thoracic Surgery. Published by Elsevier Inc. All rights reserved.
Quantifying falsifiability of scientific theories
NASA Astrophysics Data System (ADS)
Nemenman, Ilya
I argue that the notion of falsifiability, a key concept in defining a valid scientific theory, can be quantified using Bayesian Model Selection, which is a standard tool in modern statistics. This relates falsifiability to the quantitative version of the statistical Occam's razor, and allows transforming some long-running arguments about validity of scientific theories from philosophical discussions to rigorous mathematical calculations.
Mass spectrometry-based protein identification with accurate statistical significance assignment.
Alves, Gelio; Yu, Yi-Kuo
2015-03-01
Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sorić formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berman, D.W.; Allen, B.C.; Van Landingham, C.B.
1998-12-31
The decision rules commonly employed to determine the need for cleanup are evaluated both to identify conditions under which they lead to erroneous conclusions and to quantify the rate that such errors occur. Their performance is also compared with that of other applicable decision rules. The authors based the evaluation of decision rules on simulations. Results are presented as power curves. These curves demonstrate that the degree of statistical control achieved is independent of the form of the null hypothesis. The loss of statistical control that occurs when a decision rule is applied to a data set that does notmore » satisfy the rule`s validity criteria is also clearly demonstrated. Some of the rules evaluated do not offer the formal statistical control that is an inherent design feature of other rules. Nevertheless, results indicate that such informal decision rules may provide superior overall control of error rates, when their application is restricted to data exhibiting particular characteristics. The results reported here are limited to decision rules applied to uncensored and lognormally distributed data. To optimize decision rules, it is necessary to evaluate their behavior when applied to data exhibiting a range of characteristics that bracket those common to field data. The performance of decision rules applied to data sets exhibiting a broader range of characteristics is reported in the second paper of this study.« less
STRengthening Analytical Thinking for Observational Studies: the STRATOS initiative
Sauerbrei, Willi; Abrahamowicz, Michal; Altman, Douglas G; le Cessie, Saskia; Carpenter, James
2014-01-01
The validity and practical utility of observational medical research depends critically on good study design, excellent data quality, appropriate statistical methods and accurate interpretation of results. Statistical methodology has seen substantial development in recent times. Unfortunately, many of these methodological developments are ignored in practice. Consequently, design and analysis of observational studies often exhibit serious weaknesses. The lack of guidance on vital practical issues discourages many applied researchers from using more sophisticated and possibly more appropriate methods when analyzing observational studies. Furthermore, many analyses are conducted by researchers with a relatively weak statistical background and limited experience in using statistical methodology and software. Consequently, even ‘standard’ analyses reported in the medical literature are often flawed, casting doubt on their results and conclusions. An efficient way to help researchers to keep up with recent methodological developments is to develop guidance documents that are spread to the research community at large. These observations led to the initiation of the strengthening analytical thinking for observational studies (STRATOS) initiative, a large collaboration of experts in many different areas of biostatistical research. The objective of STRATOS is to provide accessible and accurate guidance in the design and analysis of observational studies. The guidance is intended for applied statisticians and other data analysts with varying levels of statistical education, experience and interests. In this article, we introduce the STRATOS initiative and its main aims, present the need for guidance documents and outline the planned approach and progress so far. We encourage other biostatisticians to become involved. PMID:25074480
Statistical Approaches Used to Assess the Equity of Access to Food Outlets: A Systematic Review
Lamb, Karen E.; Thornton, Lukar E.; Cerin, Ester; Ball, Kylie
2015-01-01
Background Inequalities in eating behaviours are often linked to the types of food retailers accessible in neighbourhood environments. Numerous studies have aimed to identify if access to healthy and unhealthy food retailers is socioeconomically patterned across neighbourhoods, and thus a potential risk factor for dietary inequalities. Existing reviews have examined differences between methodologies, particularly focussing on neighbourhood and food outlet access measure definitions. However, no review has informatively discussed the suitability of the statistical methodologies employed; a key issue determining the validity of study findings. Our aim was to examine the suitability of statistical approaches adopted in these analyses. Methods Searches were conducted for articles published from 2000–2014. Eligible studies included objective measures of the neighbourhood food environment and neighbourhood-level socio-economic status, with a statistical analysis of the association between food outlet access and socio-economic status. Results Fifty-four papers were included. Outlet accessibility was typically defined as the distance to the nearest outlet from the neighbourhood centroid, or as the number of food outlets within a neighbourhood (or buffer). To assess if these measures were linked to neighbourhood disadvantage, common statistical methods included ANOVA, correlation, and Poisson or negative binomial regression. Although all studies involved spatial data, few considered spatial analysis techniques or spatial autocorrelation. Conclusions With advances in GIS software, sophisticated measures of neighbourhood outlet accessibility can be considered. However, approaches to statistical analysis often appear less sophisticated. Care should be taken to consider assumptions underlying the analysis and the possibility of spatially correlated residuals which could affect the results. PMID:29546115
STRengthening analytical thinking for observational studies: the STRATOS initiative.
Sauerbrei, Willi; Abrahamowicz, Michal; Altman, Douglas G; le Cessie, Saskia; Carpenter, James
2014-12-30
The validity and practical utility of observational medical research depends critically on good study design, excellent data quality, appropriate statistical methods and accurate interpretation of results. Statistical methodology has seen substantial development in recent times. Unfortunately, many of these methodological developments are ignored in practice. Consequently, design and analysis of observational studies often exhibit serious weaknesses. The lack of guidance on vital practical issues discourages many applied researchers from using more sophisticated and possibly more appropriate methods when analyzing observational studies. Furthermore, many analyses are conducted by researchers with a relatively weak statistical background and limited experience in using statistical methodology and software. Consequently, even 'standard' analyses reported in the medical literature are often flawed, casting doubt on their results and conclusions. An efficient way to help researchers to keep up with recent methodological developments is to develop guidance documents that are spread to the research community at large. These observations led to the initiation of the strengthening analytical thinking for observational studies (STRATOS) initiative, a large collaboration of experts in many different areas of biostatistical research. The objective of STRATOS is to provide accessible and accurate guidance in the design and analysis of observational studies. The guidance is intended for applied statisticians and other data analysts with varying levels of statistical education, experience and interests. In this article, we introduce the STRATOS initiative and its main aims, present the need for guidance documents and outline the planned approach and progress so far. We encourage other biostatisticians to become involved. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.
Developing a Campaign Plan to Target Centers of Gravity Within Economic Systems
1995-05-01
Conclusion 67 CHAPTER 7: CURRENT AND FUTURE CONCERNS 69 Decision Making and Planning 69 Conclusion 72 CHAPTER 8: CONCLUSION 73 APPENDIX A: STATISTICS 80...Terminology and Statistical Tests 80 Country Analysis 84 APPENDIX B 154 BIBLIOGRAPHY 157 VITAE 162 IV LIST OF FIGURES Figure 1. Air Campaign...This project furthers the original statistical effort and adds to this a campaign planning approach (including both systems and operational level
Katz, Ralph V.; Green, B. Lee; Kressin, Nancy R.; Claudio, Cristina; Wang, Min Qi; Russell, Stefanie L.
2007-01-01
OBJECTIVES: The purposes of this analysis were to compare the self-reported willingness of blacks, Puerto-Rican Hispanics and whites to participate as research subjects in biomedical studies, and to determine the reliability of the Tuskegee Legacy Project Questionnaire (TLP). METHODS: The TLP Questionnaire, initially used in a four-city study in 1999-2000, was administered in a follow-up study within a random-digit-dial telephone survey to a stratified random sample of adults in three different U.S. cities: Baltimore, MD; New York City; and San Juan, PR. The questionnaire, a 60-item instrument, contains two validated scales: the Likelihood of Participation (LOP) Scale and the Guinea Pig Fear Factor (GPFF) Scale. RESULTS: Adjusting for age, sex, education, income and city, the LOP Scale was not statistically significantly different for the racial/ethnic groups (ANCOVA, p=87). The GPFF Scale was statistically significantly higher for blacks and Hispanics as compared to whites (adjusted ANCOVA, p<0.001). CONCLUSIONS: The of the findings from the current three-city study, as well as from our prior four-city study, are remarkably similar and reinforce the conclusion that blacks and Hispanics self-report that, despite having a higher fear of participation, they are just as likely as whites to participate in biomedical research. PMID:17913117
Hippisley-Cox, Julia; Coupland, Carol
2015-01-01
Objective To derive and validate a set of clinical risk prediction algorithm to estimate the 10-year risk of 11 common cancers. Design Prospective open cohort study using routinely collected data from 753 QResearch general practices in England. We used 565 practices to develop the scores and 188 for validation. Subjects 4.96 million patients aged 25–84 years in the derivation cohort; 1.64 million in the validation cohort. Patients were free of the relevant cancer at baseline. Methods Cox proportional hazards models in the derivation cohort to derive 10-year risk algorithms. Risk factors considered included age, ethnicity, deprivation, body mass index, smoking, alcohol, previous cancer diagnoses, family history of cancer, relevant comorbidities and medication. Measures of calibration and discrimination in the validation cohort. Outcomes Incident cases of blood, breast, bowel, gastro-oesophageal, lung, oral, ovarian, pancreas, prostate, renal tract and uterine cancers. Cancers were recorded on any one of four linked data sources (general practitioner (GP), mortality, hospital or cancer records). Results We identified 228 241 incident cases during follow-up of the 11 types of cancer. Of these 25 444 were blood; 41 315 breast; 32 626 bowel, 12 808 gastro-oesophageal; 32 187 lung; 4811 oral; 6635 ovarian; 7119 pancreatic; 35 256 prostate; 23 091 renal tract; 6949 uterine cancers. The lung cancer algorithm had the best performance with an R2 of 64.2%; D statistic of 2.74; receiver operating characteristic curve statistic of 0.91 in women. The sensitivity for the top 10% of women at highest risk of lung cancer was 67%. Performance of the algorithms in men was very similar to that for women. Conclusions We have developed and validated a prediction models to quantify absolute risk of 11 common cancers. They can be used to identify patients at high risk of cancers for prevention or further assessment. The algorithms could be integrated into clinical computer systems and used to identify high-risk patients. Web calculator: There is a simple web calculator to implement the Qcancer 10 year risk algorithm together with the open source software for download (available at http://qcancer.org/10yr/). PMID:25783428
ERIC Educational Resources Information Center
Acar, Tu¨lin
2014-01-01
In literature, it has been observed that many enhanced criteria are limited by factor analysis techniques. Besides examinations of statistical structure and/or psychological structure, such validity studies as cross validation and classification-sequencing studies should be performed frequently. The purpose of this study is to examine cross…
45 CFR 153.350 - Risk adjustment data validation standards.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 45 Public Welfare 1 2013-10-01 2013-10-01 false Risk adjustment data validation standards. 153.350... validation standards. (a) General requirement. The State, or HHS on behalf of the State, must ensure proper implementation of any risk adjustment software and ensure proper validation of a statistically valid sample of...
45 CFR 153.350 - Risk adjustment data validation standards.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 45 Public Welfare 1 2014-10-01 2014-10-01 false Risk adjustment data validation standards. 153.350... validation standards. (a) General requirement. The State, or HHS on behalf of the State, must ensure proper implementation of any risk adjustment software and ensure proper validation of a statistically valid sample of...
Vetter, Thomas R
2017-11-01
Descriptive statistics are specific methods basically used to calculate, describe, and summarize collected research data in a logical, meaningful, and efficient way. Descriptive statistics are reported numerically in the manuscript text and/or in its tables, or graphically in its figures. This basic statistical tutorial discusses a series of fundamental concepts about descriptive statistics and their reporting. The mean, median, and mode are 3 measures of the center or central tendency of a set of data. In addition to a measure of its central tendency (mean, median, or mode), another important characteristic of a research data set is its variability or dispersion (ie, spread). In simplest terms, variability is how much the individual recorded scores or observed values differ from one another. The range, standard deviation, and interquartile range are 3 measures of variability or dispersion. The standard deviation is typically reported for a mean, and the interquartile range for a median. Testing for statistical significance, along with calculating the observed treatment effect (or the strength of the association between an exposure and an outcome), and generating a corresponding confidence interval are 3 tools commonly used by researchers (and their collaborating biostatistician or epidemiologist) to validly make inferences and more generalized conclusions from their collected data and descriptive statistics. A number of journals, including Anesthesia & Analgesia, strongly encourage or require the reporting of pertinent confidence intervals. A confidence interval can be calculated for virtually any variable or outcome measure in an experimental, quasi-experimental, or observational research study design. Generally speaking, in a clinical trial, the confidence interval is the range of values within which the true treatment effect in the population likely resides. In an observational study, the confidence interval is the range of values within which the true strength of the association between the exposure and the outcome (eg, the risk ratio or odds ratio) in the population likely resides. There are many possible ways to graphically display or illustrate different types of data. While there is often latitude as to the choice of format, ultimately, the simplest and most comprehensible format is preferred. Common examples include a histogram, bar chart, line chart or line graph, pie chart, scatterplot, and box-and-whisker plot. Valid and reliable descriptive statistics can answer basic yet important questions about a research data set, namely: "Who, What, Why, When, Where, How, How Much?"
A multi-analyte serum test for the detection of non-small cell lung cancer
Farlow, E C; Vercillo, M S; Coon, J S; Basu, S; Kim, A W; Faber, L P; Warren, W H; Bonomi, P; Liptay, M J; Borgia, J A
2010-01-01
Background: In this study, we appraised a wide assortment of biomarkers previously shown to have diagnostic or prognostic value for non-small cell lung cancer (NSCLC) with the intent of establishing a multi-analyte serum test capable of identifying patients with lung cancer. Methods: Circulating levels of 47 biomarkers were evaluated against patient cohorts consisting of 90 NSCLC and 43 non-cancer controls using commercial immunoassays. Multivariate statistical methods were used on all biomarkers achieving statistical relevance to define an optimised panel of diagnostic biomarkers for NSCLC. The resulting biomarkers were fashioned into a classification algorithm and validated against serum from a second patient cohort. Results: A total of 14 analytes achieved statistical relevance upon evaluation. Multivariate statistical methods then identified a panel of six biomarkers (tumour necrosis factor-α, CYFRA 21-1, interleukin-1ra, matrix metalloproteinase-2, monocyte chemotactic protein-1 and sE-selectin) as being the most efficacious for diagnosing early stage NSCLC. When tested against a second patient cohort, the panel successfully classified 75 of 88 patients. Conclusions: Here, we report the development of a serum algorithm with high specificity for classifying patients with NSCLC against cohorts of various ‘high-risk' individuals. A high rate of false positives was observed within the cohort in which patients had non-neoplastic lung nodules, possibly as a consequence of the inflammatory nature of these conditions. PMID:20859284
"Plateau"-related summary statistics are uninformative for comparing working memory models.
van den Berg, Ronald; Ma, Wei Ji
2014-10-01
Performance on visual working memory tasks decreases as more items need to be remembered. Over the past decade, a debate has unfolded between proponents of slot models and slotless models of this phenomenon (Ma, Husain, Bays (Nature Neuroscience 17, 347-356, 2014). Zhang and Luck (Nature 453, (7192), 233-235, 2008) and Anderson, Vogel, and Awh (Attention, Perception, Psychophys 74, (5), 891-910, 2011) noticed that as more items need to be remembered, "memory noise" seems to first increase and then reach a "stable plateau." They argued that three summary statistics characterizing this plateau are consistent with slot models, but not with slotless models. Here, we assess the validity of their methods. We generated synthetic data both from a leading slot model and from a recent slotless model and quantified model evidence using log Bayes factors. We found that the summary statistics provided at most 0.15 % of the expected model evidence in the raw data. In a model recovery analysis, a total of more than a million trials were required to achieve 99 % correct recovery when models were compared on the basis of summary statistics, whereas fewer than 1,000 trials were sufficient when raw data were used. Therefore, at realistic numbers of trials, plateau-related summary statistics are highly unreliable for model comparison. Applying the same analyses to subject data from Anderson et al. (Attention, Perception, Psychophys 74, (5), 891-910, 2011), we found that the evidence in the summary statistics was at most 0.12 % of the evidence in the raw data and far too weak to warrant any conclusions. The evidence in the raw data, in fact, strongly favored the slotless model. These findings call into question claims about working memory that are based on summary statistics.
Statistical methodology: II. Reliability and validity assessment in study design, Part B.
Karras, D J
1997-02-01
Validity measures the correspondence between a test and other purported measures of the same or similar qualities. When a reference standard exists, a criterion-based validity coefficient can be calculated. If no such standard is available, the concepts of content and construct validity may be used, but quantitative analysis may not be possible. The Pearson and Spearman tests of correlation are often used to assess the correspondence between tests, but do not account for measurement biases and may yield misleading results. Techniques that measure interest differences may be more meaningful in validity assessment, and the kappa statistic is useful for analyzing categorical variables. Questionnaires often can be designed to allow quantitative assessment of reliability and validity, although this may be difficult. Inclusion of homogeneous questions is necessary to assess reliability. Analysis is enhanced by using Likert scales or similar techniques that yield ordinal data. Validity assessment of questionnaires requires careful definition of the scope of the test and comparison with previously validated tools.
Hubert, Ph; Nguyen-Huu, J-J; Boulanger, B; Chapuzet, E; Chiap, P; Cohen, N; Compagnon, P-A; Dewé, W; Feinberg, M; Lallier, M; Laurentie, M; Mercier, N; Muzard, G; Nivet, C; Valat, L
2004-11-15
This paper is the first part of a summary report of a new commission of the Société Française des Sciences et Techniques Pharmaceutiques (SFSTP). The main objective of this commission was the harmonization of approaches for the validation of quantitative analytical procedures. Indeed, the principle of the validation of theses procedures is today widely spread in all the domains of activities where measurements are made. Nevertheless, this simple question of acceptability or not of an analytical procedure for a given application, remains incompletely determined in several cases despite the various regulations relating to the good practices (GLP, GMP, ...) and other documents of normative character (ISO, ICH, FDA, ...). There are many official documents describing the criteria of validation to be tested, but they do not propose any experimental protocol and limit themselves most often to the general concepts. For those reasons, two previous SFSTP commissions elaborated validation guides to concretely help the industrial scientists in charge of drug development to apply those regulatory recommendations. If these two first guides widely contributed to the use and progress of analytical validations, they present, nevertheless, weaknesses regarding the conclusions of the performed statistical tests and the decisions to be made with respect to the acceptance limits defined by the use of an analytical procedure. The present paper proposes to review even the bases of the analytical validation for developing harmonized approach, by distinguishing notably the diagnosis rules and the decision rules. This latter rule is based on the use of the accuracy profile, uses the notion of total error and allows to simplify the approach of the validation of an analytical procedure while checking the associated risk to its usage. Thanks to this novel validation approach, it is possible to unambiguously demonstrate the fitness for purpose of a new method as stated in all regulatory documents.
Analysis of model development strategies: predicting ventral hernia recurrence.
Holihan, Julie L; Li, Linda T; Askenasy, Erik P; Greenberg, Jacob A; Keith, Jerrod N; Martindale, Robert G; Roth, J Scott; Liang, Mike K
2016-11-01
There have been many attempts to identify variables associated with ventral hernia recurrence; however, it is unclear which statistical modeling approach results in models with greatest internal and external validity. We aim to assess the predictive accuracy of models developed using five common variable selection strategies to determine variables associated with hernia recurrence. Two multicenter ventral hernia databases were used. Database 1 was randomly split into "development" and "internal validation" cohorts. Database 2 was designated "external validation". The dependent variable for model development was hernia recurrence. Five variable selection strategies were used: (1) "clinical"-variables considered clinically relevant, (2) "selective stepwise"-all variables with a P value <0.20 were assessed in a step-backward model, (3) "liberal stepwise"-all variables were included and step-backward regression was performed, (4) "restrictive internal resampling," and (5) "liberal internal resampling." Variables were included with P < 0.05 for the Restrictive model and P < 0.10 for the Liberal model. A time-to-event analysis using Cox regression was performed using these strategies. The predictive accuracy of the developed models was tested on the internal and external validation cohorts using Harrell's C-statistic where C > 0.70 was considered "reasonable". The recurrence rate was 32.9% (n = 173/526; median/range follow-up, 20/1-58 mo) for the development cohort, 36.0% (n = 95/264, median/range follow-up 20/1-61 mo) for the internal validation cohort, and 12.7% (n = 155/1224, median/range follow-up 9/1-50 mo) for the external validation cohort. Internal validation demonstrated reasonable predictive accuracy (C-statistics = 0.772, 0.760, 0.767, 0.757, 0.763), while on external validation, predictive accuracy dipped precipitously (C-statistic = 0.561, 0.557, 0.562, 0.553, 0.560). Predictive accuracy was equally adequate on internal validation among models; however, on external validation, all five models failed to demonstrate utility. Future studies should report multiple variable selection techniques and demonstrate predictive accuracy on external data sets for model validation. Copyright © 2016 Elsevier Inc. All rights reserved.
77 FR 46096 - Statistical Process Controls for Blood Establishments; Public Workshop
Federal Register 2010, 2011, 2012, 2013, 2014
2012-08-02
...] Statistical Process Controls for Blood Establishments; Public Workshop AGENCY: Food and Drug Administration... workshop entitled: ``Statistical Process Controls for Blood Establishments.'' The purpose of this public workshop is to discuss the implementation of statistical process controls to validate and monitor...
Alvarez, Karina; Loehr, Laura; Folsom, Aaron R.; Newman, Anne B.; Weissfeld, Lisa A.; Wunderink, Richard G.; Kritchevsky, Stephen B.; Mukamal, Kenneth J.; London, Stephanie J.; Harris, Tamara B.; Bauer, Doug C.; Angus, Derek C.
2013-01-01
Background: Preventing pneumonia requires better understanding of incidence, mortality, and long-term clinical and biologic risk factors, particularly in younger individuals. Methods: This was a cohort study in three population-based cohorts of community-dwelling individuals. A derivation cohort (n = 16,260) was used to determine incidence and survival and develop a risk prediction model. The prediction model was validated in two cohorts (n = 8,495). The primary outcome was 10-year risk of pneumonia hospitalization. Results: The crude and age-adjusted incidences of pneumonia were 6.71 and 9.43 cases/1,000 person-years (10-year risk was 6.15%). The 30-day and 1-year mortality were 16.5% and 31.5%. Although age was the most important risk factor (range of crude incidence rates, 1.69-39.13 cases/1,000 person-years for each 5-year increment from 45-85 years), 38% of pneumonia cases occurred in adults < 65 years of age. The 30-day and 1-year mortality were 12.5% and 25.7% in those < 65 years of age. Although most comorbidities were associated with higher risk of pneumonia, reduced lung function was the most important risk factor (relative risk = 6.61 for severe reduction based on FEV1 by spirometry). A clinical risk prediction model based on age, smoking, and lung function predicted 10-year risk (area under curve [AUC] = 0.77 and Hosmer-Lemeshow [HL] C statistic = 0.12). Model discrimination and calibration were similar in the internal validation cohort (AUC = 0.77; HL C statistic, 0.65) but lower in the external validation cohort (AUC = 0.62; HL C statistic, 0.45). The model also calibrated well in blacks and younger adults. C-reactive protein and IL-6 were associated with higher pneumonia risk but did not improve model performance. Conclusions: Pneumonia hospitalization is common and associated with high mortality, even in younger healthy adults. Long-term risk of pneumonia can be predicted in community-dwelling adults with a simple clinical risk prediction model. PMID:23744106
Direct U-Pb dating of Cretaceous and Paleocene dinosaur bones, San Juan Basin, New Mexico: COMMENT
Koenig, Alan E.; Lucas, Spencer G.; Neymark, Leonid A.; Heckert, Andrew B.; Sullivan, Robert M.; Jasinski, Steven E.; Fowler, Denver W.
2012-01-01
Based on U-Pb dating of two dinosaur bones from the San Juan Basin of New Mexico (United States), Fassett et al. (2011) claim to provide the first successful direct dating of fossil bones and to establish the presence of Paleocene dinosaurs. Fassett et al. ignore previously published work that directly questions their stratigraphic interpretations (Lucas et al., 2009), and fail to provide sufficient descriptions of instrumental, geochronological, and statistical treatments of the data to allow evaluation of the potentially complex diagenetic and recrystallization history of bone. These shortcomings lead us to question the validity of the U-Pb dates published by Fassett et al. and their conclusions regarding the existence of Paleocene dinosaurs.
Comment on 'Semitransparency effects in the moving mirror model for Hawking radiation'
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elizalde, Emilio; Haro, Jaume
2010-06-15
Particle production by a semitransparent mirror accelerating on trajectories which simulate the Hawking effect was recently discussed in 3. This author points out that some results in 1 are incorrect. We show here that, contrary to statements therein, the main results and conclusions of the last paper remain valid, only Eq. (41) there and some particular implication are not. The misunderstanding actually comes from comparing two very different parameter regions, and from the fact that, in our work, the word statistics was used in an unusual way related to the sign of the {beta}-Bogoliubov coefficient, and not with its ordinarymore » meaning, connected with the number of particles emitted per mode.« less
Baxter, Suzanne Domel; Smith, Albert F.; Hardin, James W.; Nichols, Michele D.
2008-01-01
Objective Validation-study data are used to illustrate that conclusions about children’s reporting accuracy for energy and macronutrients over multiple interviews (ie, time) depend on the analytic approach for comparing reported and reference information—conventional, which disregards accuracy of reported items and amounts, or reporting-error-sensitive, which classifies reported items as matches (eaten) or intrusions (not eaten), and amounts as corresponding or overreported. Subjects and design Children were observed eating school meals on one day (n = 12), or two (n = 13) or three (n = 79) nonconsecutive days separated by ≥25 days, and interviewed in the morning after each observation day about intake the previous day. Reference (observed) and reported information were transformed to energy and macronutrients (protein, carbohydrate, fat), and compared. Main outcome measures For energy and each macronutrient: report rates (reported/reference), correspondence rates (genuine accuracy measures), inflation ratios (error measures). Statistical analyses Mixed-model analyses. Results Using the conventional approach for analyzing energy and macronutrients, report rates did not vary systematically over interviews (Ps > .61). Using the reporting-error-sensitive approach for analyzing energy and macronutrients, correspondence rates increased over interviews (Ps < .04), indicating that reporting accuracy improved over time; inflation ratios decreased, although not significantly, over interviews, also suggesting that reporting accuracy improved over time. Correspondence rates were lower than report rates, indicating that reporting accuracy was worse than implied by conventional measures. Conclusions When analyzed using the reporting-error-sensitive approach, children’s dietary reporting accuracy for energy and macronutrients improved over time, but the conventional approach masked improvements and overestimated accuracy. Applications The reporting-error-sensitive approach is recommended when analyzing data from validation studies of dietary reporting accuracy for energy and macronutrients. PMID:17383265
An Assessment Blueprint for EncStat: A Statistics Anxiety Intervention Program.
ERIC Educational Resources Information Center
Watson, Freda S.; Lang, Thomas R.; Kromrey, Jeffrey D.; Ferron, John M.; Hess, Melinda R.; Hogarty, Kristine Y.
EncStat (Encouraged about Statistics) is a multimedia program being developed to identify and assist students with statistics anxiety or negative attitudes about statistics. This study explored the validity of the assessment instruments included in EncStat with respect to their diagnostic value for statistics anxiety and negative attitudes about…
Villagómez-Ornelas, Paloma; Hernández-López, Pedro; Carrasco-Enríquez, Brenda; Barrios-Sánchez, Karina; Pérez-Escamilla, Rafael; Melgar-Quiñónez, Hugo
2014-01-01
This article validates the statistical consistency of two food security scales: the Mexican Food Security Scale (EMSA) and the Latin American and Caribbean Food Security Scale (ELCSA). Validity tests were conducted in order to verify that both scales were consistent instruments, conformed by independent, properly calibrated and adequately sorted items, arranged in a continuum of severity. The following tests were developed: sorting of items; Cronbach's alpha analysis; parallelism of prevalence curves; Rasch models; sensitivity analysis through mean differences' hypothesis test. The tests showed that both scales meet the required attributes and are robust statistical instruments for food security measurement. This is relevant given that the lack of access to food indicator, included in multidimensional poverty measurement in Mexico, is calculated with EMSA.
Sensor data validation and reconstruction. Phase 1: System architecture study
NASA Technical Reports Server (NTRS)
1991-01-01
The sensor validation and data reconstruction task reviewed relevant literature and selected applicable validation and reconstruction techniques for further study; analyzed the selected techniques and emphasized those which could be used for both validation and reconstruction; analyzed Space Shuttle Main Engine (SSME) hot fire test data to determine statistical and physical relationships between various parameters; developed statistical and empirical correlations between parameters to perform validation and reconstruction tasks, using a computer aided engineering (CAE) package; and conceptually designed an expert system based knowledge fusion tool, which allows the user to relate diverse types of information when validating sensor data. The host hardware for the system is intended to be a Sun SPARCstation, but could be any RISC workstation with a UNIX operating system and a windowing/graphics system such as Motif or Dataviews. The information fusion tool is intended to be developed using the NEXPERT Object expert system shell, and the C programming language.
2013-01-01
Background Perceived incongruity between the individual and the job on work-life areas such as workload, control, reward, fairness, community and values have implications for the dimensions of burnout syndrome. The “Areas of Work-life Scale” (AWS) is a practical instrument to measure employees´ perceptions of their work environments. AIMS: Validate a Spanish translation of the AWS, and it relationship with Masclach Burnout Inventory dimensions. Methods The study was conducted in three medium-sized hospitals and seven rural and urban Primary Care centres (N = 871) in Spain. The “Maslach Burnout Inventory General Survey” (MBI-GS) and AWS were applied. We developed a complete psychometric analysis of its reliability, and validity. Results Data on the reliability supported a good internal consistency (Cronbach α between .71, and .85). Construct validity was confirmed by a six factor model of the AWS as a good measure of work environments (χ2(352) = 806.21, p < 0.001; χ2/df = 2.29; CFI = 0.935, RMSEA = 0.039); concurrent validity was analysed for its relationship with other measures (opposing dimensions to burnout, and MBI), and each correlation between dimensions and sub-dimensions were statistically significant; as well, predictive validity, by a series of Multiple Regression Analysis examined the resulting patterns of the Confirmatory Factor Analysis (CFA) confirms the relationship between the work-life areas and burnout dimensions. Conclusions Leiter and Maslach’s AWS has been an important instrument in exploring several work-life factors that contribute to burnout. This scale can now be used to assess the quality of work-life in order to design and assess the need for intervention programs in Spanish-speaking countries. PMID:23596987
A content validated questionnaire for assessment of self reported venous blood sampling practices
2012-01-01
Background Venous blood sampling is a common procedure in health care. It is strictly regulated by national and international guidelines. Deviations from guidelines due to human mistakes can cause patient harm. Validated questionnaires for health care personnel can be used to assess preventable "near misses"--i.e. potential errors and nonconformities during venous blood sampling practices that could transform into adverse events. However, no validated questionnaire that assesses nonconformities in venous blood sampling has previously been presented. The aim was to test a recently developed questionnaire in self reported venous blood sampling practices for validity and reliability. Findings We developed a questionnaire to assess deviations from best practices during venous blood sampling. The questionnaire contained questions about patient identification, test request management, test tube labeling, test tube handling, information search procedures and frequencies of error reporting. For content validity, the questionnaire was confirmed by experts on questionnaires and venous blood sampling. For reliability, test-retest statistics were used on the questionnaire answered twice. The final venous blood sampling questionnaire included 19 questions out of which 9 had in total 34 underlying items. It was found to have content validity. The test-retest analysis demonstrated that the items were generally stable. In total, 82% of the items fulfilled the reliability acceptance criteria. Conclusions The questionnaire could be used for assessment of "near miss" practices that could jeopardize patient safety and gives several benefits instead of assessing rare adverse events only. The higher frequencies of "near miss" practices allows for quantitative analysis of the effect of corrective interventions and to benchmark preanalytical quality not only at the laboratory/hospital level but also at the health care unit/hospital ward. PMID:22260505
A score to estimate the likelihood of detecting advanced colorectal neoplasia at colonoscopy
Kaminski, Michal F; Polkowski, Marcin; Kraszewska, Ewa; Rupinski, Maciej; Butruk, Eugeniusz; Regula, Jaroslaw
2014-01-01
Objective This study aimed to develop and validate a model to estimate the likelihood of detecting advanced colorectal neoplasia in Caucasian patients. Design We performed a cross-sectional analysis of database records for 40-year-old to 66-year-old patients who entered a national primary colonoscopy-based screening programme for colorectal cancer in 73 centres in Poland in the year 2007. We used multivariate logistic regression to investigate the associations between clinical variables and the presence of advanced neoplasia in a randomly selected test set, and confirmed the associations in a validation set. We used model coefficients to develop a risk score for detection of advanced colorectal neoplasia. Results Advanced colorectal neoplasia was detected in 2544 of the 35 918 included participants (7.1%). In the test set, a logistic-regression model showed that independent risk factors for advanced colorectal neoplasia were: age, sex, family history of colorectal cancer, cigarette smoking (p<0.001 for these four factors), and Body Mass Index (p=0.033). In the validation set, the model was well calibrated (ratio of expected to observed risk of advanced neoplasia: 1.00 (95% CI 0.95 to 1.06)) and had moderate discriminatory power (c-statistic 0.62). We developed a score that estimated the likelihood of detecting advanced neoplasia in the validation set, from 1.32% for patients scoring 0, to 19.12% for patients scoring 7–8. Conclusions Developed and internally validated score consisting of simple clinical factors successfully estimates the likelihood of detecting advanced colorectal neoplasia in asymptomatic Caucasian patients. Once externally validated, it may be useful for counselling or designing primary prevention studies. PMID:24385598
2010-01-01
Background The purpose of this study was to reduce the number of items, create a scoring method and assess the psychometric properties of the Freedom from Glasses Value Scale (FGVS), which measures benefits of freedom from glasses perceived by cataract and presbyopic patients after multifocal intraocular lens (IOL) surgery. Methods The 21-item FGVS, developed simultaneously in French and Spanish, was administered by phone during an observational study to 152 French and 152 Spanish patients who had undergone cataract or presbyopia surgery at least 1 year before the study. Reduction of items and creation of the scoring method employed statistical methods (principal component analysis, multitrait analysis) and content analysis. Psychometric properties (validation of the structure, internal consistency reliability, and known-group validity) of the resulting version were assessed in the pooled population and per country. Results One item was deleted and 3 were kept but not aggregated in a dimension. The other 17 items were grouped into 2 dimensions ('global evaluation', 9 items; 'advantages', 8 items) and divided into 5 sub-dimensions, with higher scores indicating higher benefit of surgery. The structure was validated (good item convergent and discriminant validity). Internal consistency reliability was good for all dimensions and sub-dimensions (Cronbach's alphas above 0.70). The FGVS was able to discriminate between patients wearing glasses or not after surgery (higher scores for patients not wearing glasses). FGVS scores were significantly higher in Spain than France; however, the measure had similar psychometric performances in both countries. Conclusions The FGVS is a valid and reliable instrument measuring benefits of freedom from glasses perceived by cataract and presbyopic patients after multifocal IOL surgery. PMID:20497555
Assessment of protein set coherence using functional annotations
Chagoyen, Monica; Carazo, Jose M; Pascual-Montano, Alberto
2008-01-01
Background Analysis of large-scale experimental datasets frequently produces one or more sets of proteins that are subsequently mined for functional interpretation and validation. To this end, a number of computational methods have been devised that rely on the analysis of functional annotations. Although current methods provide valuable information (e.g. significantly enriched annotations, pairwise functional similarities), they do not specifically measure the degree of homogeneity of a protein set. Results In this work we present a method that scores the degree of functional homogeneity, or coherence, of a set of proteins on the basis of the global similarity of their functional annotations. The method uses statistical hypothesis testing to assess the significance of the set in the context of the functional space of a reference set. As such, it can be used as a first step in the validation of sets expected to be homogeneous prior to further functional interpretation. Conclusion We evaluate our method by analysing known biologically relevant sets as well as random ones. The known relevant sets comprise macromolecular complexes, cellular components and pathways described for Saccharomyces cerevisiae, which are mostly significantly coherent. Finally, we illustrate the usefulness of our approach for validating 'functional modules' obtained from computational analysis of protein-protein interaction networks. Matlab code and supplementary data are available at PMID:18937846
Laparoscopic Common Bile Duct Exploration Four-Task Training Model: Construct Validity
Otaño, Natalia; Rodríguez, Omaira; Sánchez, Renata; Benítez, Gustavo; Schweitzer, Michael
2012-01-01
Background: Training models in laparoscopic surgery allow the surgical team to practice procedures in a safe environment. We have proposed the use of a 4-task, low-cost inert model to practice critical steps of laparoscopic common bile duct exploration. Methods: The performance of 3 groups with different levels of expertise in laparoscopic surgery, novices (A), intermediates (B), and experts (C), was evaluated using a low-cost inert model in the following tasks: (1) intraoperative cholangiography catheter insertion, (2) transcystic exploration, (3) T-tube placement, and (4) choledochoscope management. Kruskal-Wallis and Mann-Whitney tests were used to identify differences among the groups. Results: A total of 14 individuals were evaluated: 5 novices (A), 5 intermediates (B), and 4 experts (C). The results involving intraoperative cholangiography catheter insertion were similar among the 3 groups. As for the other tasks, the expert had better results than the other 2, in which no significant differences occurred. The proposed model is able to discriminate among individuals with different levels of expertise, indicating that the abilities that the model evaluates are relevant in the surgeon's performance in CBD exploration. Conclusions: Construct validity for tasks 2 and 3 was demonstrated. However, task 1 was no capable of distinguishing between groups, and task 4 was not statistically validated. PMID:22906323
Francis, Gregory
2016-01-01
In response to concerns about the validity of empirical findings in psychology, some scientists use replication studies as a way to validate good science and to identify poor science. Such efforts are resource intensive and are sometimes controversial (with accusations of researcher incompetence) when a replication fails to show a previous result. An alternative approach is to examine the statistical properties of the reported literature to identify some cases of poor science. This review discusses some details of this process for prominent findings about racial bias, where a set of studies seems "too good to be true." This kind of analysis is based on the original studies, so it avoids criticism from the original authors about the validity of replication studies. The analysis is also much easier to perform than a new empirical study. A variation of the analysis can also be used to explore whether it makes sense to run a replication study. As demonstrated here, there are situations where the existing data suggest that a direct replication of a set of studies is not worth the effort. Such a conclusion should motivate scientists to generate alternative experimental designs that better test theoretical ideas.
Duque, P; Ibanez, J; Del Barco, A; Sepulcre, J; de Ramon, E; Fernandez-Fernandez, O
2012-03-01
INTRODUCTION. The current batteries such as the Brief Repeatable Battery of Neuropsychological Tests (BRB-N) for evaluating cognitive decline in patients with multiple sclerosis are complex and time-consuming. AIM. To obtain normative values and validate a new battery. SUBJECTS AND METHODS. Four neuropsychological tests were finally included (episodic memory, the Symbol-Digit Modalities Test, a category fluency test, and the Paced Auditory Serial Addition Test). Normative values (overall and by age group) were derived by administering the battery to healthy subjects (5th percentile was the limit of normal). External validity was explored by comparison with the BRB-N. The new battery was also administered to a subsample after 4 weeks to assess reproducibility. RESULTS. To provide normative data, 1036 healthy subjects were recruited. The mean completion time was 18.5 ± 5.2 minutes. For the 229 subjects who were administered the new battery and the BRB-N, no statistically significant differences were found except for mean completion time (19 ± 4 vs 25 ± 5 minutes). In the reproducibility study, there were no significant differences except in the memory tests. CONCLUSION. The scores on the new battery and the BRB-N were strongly correlated although the shorter completion time and ease of administration could make the new battery preferable in clinical practice.
Francis, Gregory
2016-01-01
In response to concerns about the validity of empirical findings in psychology, some scientists use replication studies as a way to validate good science and to identify poor science. Such efforts are resource intensive and are sometimes controversial (with accusations of researcher incompetence) when a replication fails to show a previous result. An alternative approach is to examine the statistical properties of the reported literature to identify some cases of poor science. This review discusses some details of this process for prominent findings about racial bias, where a set of studies seems “too good to be true.” This kind of analysis is based on the original studies, so it avoids criticism from the original authors about the validity of replication studies. The analysis is also much easier to perform than a new empirical study. A variation of the analysis can also be used to explore whether it makes sense to run a replication study. As demonstrated here, there are situations where the existing data suggest that a direct replication of a set of studies is not worth the effort. Such a conclusion should motivate scientists to generate alternative experimental designs that better test theoretical ideas. PMID:27713708
Systematic review of methods for quantifying teamwork in the operating theatre
Marshall, D.; Sykes, M.; McCulloch, P.; Shalhoub, J.; Maruthappu, M.
2018-01-01
Background Teamwork in the operating theatre is becoming increasingly recognized as a major factor in clinical outcomes. Many tools have been developed to measure teamwork. Most fall into two categories: self‐assessment by theatre staff and assessment by observers. A critical and comparative analysis of the validity and reliability of these tools is lacking. Methods MEDLINE and Embase databases were searched following PRISMA guidelines. Content validity was assessed using measurements of inter‐rater agreement, predictive validity and multisite reliability, and interobserver reliability using statistical measures of inter‐rater agreement and reliability. Quantitative meta‐analysis was deemed unsuitable. Results Forty‐eight articles were selected for final inclusion; self‐assessment tools were used in 18 and observational tools in 28, and there were two qualitative studies. Self‐assessment of teamwork by profession varied with the profession of the assessor. The most robust self‐assessment tool was the Safety Attitudes Questionnaire (SAQ), although this failed to demonstrate multisite reliability. The most robust observational tool was the Non‐Technical Skills (NOTECHS) system, which demonstrated both test–retest reliability (P > 0·09) and interobserver reliability (Rwg = 0·96). Conclusion Self‐assessment of teamwork by the theatre team was influenced by professional differences. Observational tools, when used by trained observers, circumvented this.
NASA Technical Reports Server (NTRS)
Adler, R. F.; Gu, G.; Curtis, S.; Huffman, G. J.; Bolvin, D. T.; Nelkin, E. J.
2005-01-01
The Global Precipitation Climatology Project (GPCP) 25-year precipitation data set is used to evaluate the variability and extremes on global and regional scales. The variability of precipitation year-to-year is evaluated in relation to the overall lack of a significant global trend and to climate events such as ENSO and volcanic eruptions. The validity of conclusions and limitations of the data set are checked by comparison with independent data sets (e.g., TRMM). The GPCP data set necessarily has a heterogeneous time series of input data sources, so part of the assessment described above is to test the initial results for potential influence by major data boundaries in the record. Regional trends, or inter-decadal changes, are also analyzed to determine validity and correlation with other long-term data sets related to the hydrological cycle (e.g., clouds and ocean surface fluxes). Statistics of extremes (both wet and dry) are analyzed at the monthly time scale for the 25 years. A preliminary result of increasing frequency of extreme monthly values will be a focus to determine validity. Daily values for an eight-year are also examined for variation in extremes and compared to the longer monthly-based study.
Mizuguchi, Satoshi; Sands, William A; Wassinger, Craig A; Lamont, Hugh S; Stone, Michael H
2015-06-01
Examining a countermovement jump (CMJ) force-time curve related to net impulse might be useful in monitoring athletes' performance. This study aimed to investigate the reliability of alternative net impulse calculation and net impulse characteristics (height, width, rate of force development, shape factor, and proportion) and validate against the traditional calculation in the CMJ. Twelve participants performed the CMJ in two sessions (48 hours apart) for test-retest reliability. Twenty participants were involved for the validity assessment. Results indicated intra-class correlation coefficient (ICC) of ≥ 0.89 and coefficient of variation (CV) of ≤ 5.1% for all of the variables except for rate of force development (ICC = 0.78 and CV = 22.3%). The relationship between the criterion and alternative calculations was r = 1.00. While the difference between them was statistically significant (245.96 ± 63.83 vs. 247.14 ± 64.08 N s, p < 0.0001), the effect size was trivial and deemed practically minimal (d = 0.02). In conclusion, variability of rate of force development will pose a greater challenge in detecting performance changes. Also, the alternative calculation can be used practically in place of the traditional calculation to identify net impulse characteristics and monitor and study athletes' performance in greater depth.
Construct Validation of the Dietary Inflammatory Index among African Americans
Wirth, Michael D; Shivappa, Nitin; Davis, Lisa; Hurley, Thomas G.; Ortaglia, Andrew; Drayton, Ruby; Blair, Steven N.; Hébert, James R.
2017-01-01
Objectives Chronic inflammation is linked to many chronic conditions. One of the strongest modulators of chronic inflammation is diet. The Dietary Inflammatory Index (DII) measures dietary inflammatory potential and has been validated previously, but not among African Americans (AAs). Design Cross-sectional analysis using baseline data from the Healthy Eating and Active Living in the Spirit (HEALS) intervention study. Setting Baseline data collection occurred between 2009 and 2012 in or near Columbia, SC. Participants African-American churchgoers Measurements Baseline data collection included c-reactive protein (CRP) and interleukin-6 from blood draws, anthropometric measures, and numerous questionnaires. The questionnaires included a food frequency questionnaire which was used for DII calculation. The main analyses were performed using quantile regression. Results Subjects in the highest DII quartile (i.e., more pro-inflammatory) were younger, more likely to be married, and had less education and greater BMI. Individuals in DII quartile 4 had statistically significantly greater CRP at the 75th and 90th percentiles of CRP versus those in quartile 1 (i.e., more anti-inflammatory). Conclusion Construct validation provides support for using the DII in research among AA populations. Future research should explore avenues to promote more anti-inflammatory diets, with use of the DII, among AA populations to reduce risk of chronic disease. PMID:28448077
Validation of a Spanish version of the Revised Fibromyalgia Impact Questionnaire (FIQR)
2013-01-01
Background The Revised version of the Fibromyalgia Impact Questionnaire (FIQR) was published in 2009. The aim of this study was to prepare a Spanish version, and to assess its psychometric properties in a sample of patients with fibromyalgia. Methods The FIQR was translated into Spanish and administered, along with the FIQ, the Hospital Anxiety Depression Scale (HADS), the 36-Item Short-Form Health Survey (SF-36), and the Brief Pain Inventory (BPI), to 113 Spanish fibromyalgia patients. The administration of the Spanish FIQR was repeated a week later. Results The Spanish FIQR had high internal consistency (Cronbach’s α was 0.91 and 0.95 at visits 1 and 2 respectively). The test-retest reliability was good for the FIQR total score and its function and symptoms domains (intraclass correlation coefficient (ICC > 0.70), but modest for the overall impact domain (ICC = 0.51). Statistically significant correlations (p < 0.05) were also found between the FIQR and the FIQ scores, as well as between the FIQR scores and the remaining scales’ scores. Conclusions The Spanish version of the FIQR has a good internal consistency and our findings support its validity for assessing fibromyalgia patients. It might be a valid instrument to apply in clinical and investigational grounds. PMID:23915386
40 CFR 86.1341-90 - Test cycle validation criteria.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 40 Protection of Environment 19 2011-07-01 2011-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...
40 CFR 86.1341-90 - Test cycle validation criteria.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 40 Protection of Environment 20 2013-07-01 2013-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...
Teaching "Instant Experience" with Graphical Model Validation Techniques
ERIC Educational Resources Information Center
Ekstrøm, Claus Thorn
2014-01-01
Graphical model validation techniques for linear normal models are often used to check the assumptions underlying a statistical model. We describe an approach to provide "instant experience" in looking at a graphical model validation plot, so it becomes easier to validate if any of the underlying assumptions are violated.
40 CFR 86.1341-90 - Test cycle validation criteria.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 40 Protection of Environment 20 2012-07-01 2012-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...
Dragunsky, Eugenia; Nomura, Tatsuji; Karpinski, Kazimir; Furesz, John; Wood, David J.; Pervikov, Yuri; Abe, Shinobu; Kurata, Takeshi; Vanloocke, Olivier; Karganova, Galina; Taffs, Rolf; Heath, Alan; Ivshina, Anna; Levenbook, Inessa
2003-01-01
OBJECTIVE: Extensive WHO collaborative studies were performed to evaluate the suitability of transgenic mice susceptible to poliovirus (TgPVR mice, strain 21, bred and provided by the Central Institute for Experimental Animals, Japan) as an alternative to monkeys in the neurovirulence test (NVT) of oral poliovirus vaccine (OPV). METHODS: Nine laboratories participated in the collaborative study on testing neurovirulence of 94 preparations of OPV and vaccine derivatives of all three serotypes in TgPVR21 mice. FINDINGS: Statistical analysis of the data demonstrated that the TgPVR21 mouse NVT was of comparable sensitivity and reproducibility to the conventional WHO NVT in simians. A statistical model for acceptance/rejection of OPV lots in the mouse test was developed, validated, and shown to be suitable for all three vaccine types. The assessment of the transgenic mouse NVT is based on clinical evaluation of paralysed mice. Unlike the monkey NVT, histological examination of central nervous system tissue of each mouse offered no advantage over careful and detailed clinical observation. CONCLUSIONS: Based on data from the collaborative studies the WHO Expert Committee for Biological Standardization approved the mouse NVT as an alternative to the monkey test for all three OPV types and defined a standard implementation process for laboratories that wish to use the test. This represents the first successful introduction of transgenic animals into control of biologicals. PMID:12764491
Validation of a new digital breast tomosynthesis medical display
NASA Astrophysics Data System (ADS)
Marchessoux, Cédric; Vivien, Nicolas; Kumcu, Asli; Kimpe, Tom
2011-03-01
The main objective of this study is to evaluate and validate the new Barco medical display MDMG-5221 which has been optimized for the Digital Breast Tomosynthesis (DBT) imaging modality system, and to prove the benefit of the new DBT display in terms of image quality and clinical performance. The clinical performance is evaluated by the detection of micro-calcifications inserted in reconstructed Digital Breast Tomosynthesis slices. The slices are shown in dynamic cine loops, at two frames rates. The statistical analysis chosen for this study is the Receiver Operating Characteristic Multiple-Reader, Multiple-Case methodology, in order to measure the clinical performance of the two displays. Four experienced radiologists are involved in this study. For this clinical study, 50 normal and 50 abnormal independent datasets were used. The result is that the new display outperforms the mammography display for a signal detection task using real DBT images viewed at 25 and 50 slices per second. In the case of 50 slices per second, the p-value = 0.0664. For a cut-off where alpha=0.05, the conclusion is that the null hypothesis cannot be rejected, however the trend is that the new display performs 6% better than the old display in terms of AUC. At 25 slices per second, the difference between the two displays is very apparent. The new display outperforms the mammography display by 10% in terms of AUC, with a good statistical significance of p=0.0415.
He, Fu-yuan; Deng, Kai-wen; Huang, Sheng; Liu, Wen-long; Shi, Ji-lian
2013-09-01
The paper aims to elucidate and establish a new mathematic model: the total quantum statistical moment standard similarity (TQSMSS) on the base of the original total quantum statistical moment model and to illustrate the application of the model to medical theoretical research. The model was established combined with the statistical moment principle and the normal distribution probability density function properties, then validated and illustrated by the pharmacokinetics of three ingredients in Buyanghuanwu decoction and of three data analytical method for them, and by analysis of chromatographic fingerprint for various extracts with different solubility parameter solvents dissolving the Buyanghanwu-decoction extract. The established model consists of four mainly parameters: (1) total quantum statistical moment similarity as ST, an overlapped area by two normal distribution probability density curves in conversion of the two TQSM parameters; (2) total variability as DT, a confidence limit of standard normal accumulation probability which is equal to the absolute difference value between the two normal accumulation probabilities within integration of their curve nodical; (3) total variable probability as 1-Ss, standard normal distribution probability within interval of D(T); (4) total variable probability (1-beta)alpha and (5) stable confident probability beta(1-alpha): the correct probability to make positive and negative conclusions under confident coefficient alpha. With the model, we had analyzed the TQSMS similarities of pharmacokinetics of three ingredients in Buyanghuanwu decoction and of three data analytical methods for them were at range of 0.3852-0.9875 that illuminated different pharmacokinetic behaviors of each other; and the TQSMS similarities (ST) of chromatographic fingerprint for various extracts with different solubility parameter solvents dissolving Buyanghuanwu-decoction-extract were at range of 0.6842-0.999 2 that showed different constituents with various solvent extracts. The TQSMSS can characterize the sample similarity, by which we can quantitate the correct probability with the test of power under to make positive and negative conclusions no matter the samples come from same population under confident coefficient a or not, by which we can realize an analysis at both macroscopic and microcosmic levels, as an important similar analytical method for medical theoretical research.
Psychometrics of chronic liver disease questionnaire in Chinese chronic hepatitis B patients
Zhou, Kai-Na; Zhang, Min; Wu, Qian; Ji, Zhen-Hao; Zhang, Xiao-Mei; Zhuang, Gui-Hua
2013-01-01
AIM: To evaluate psychometrics of the Chinese (mainland) chronic liver disease questionnaire (CLDQ) in patients with chronic hepatitis B (CHB). METHODS: A cross-sectional sample of 460 Chinese patients with CHB was selected from the Outpatient Department of the Eighth Hospital of Xi’an, including CHB (CHB without cirrhosis) (n = 323) and CHB-related cirrhosis (n = 137). The psychometrics includes reliability, validity and sensitivity. Internal consistency reliability was measured using Cronbach’s α. Convergent and discriminant validity was evaluated by item-scale correlation. Factorial validity was explored by principal component analysis with varimax rotation. Sensitivity was assessed using Cohen’s effect size (ES), and independent sample t test between CHB and CHB-related cirrhosis groups and between alanine aminotransferase (ALT) normal and abnormal groups after stratifying the disease (CHB and CHB-related cirrhosis). RESULTS: Internal consistency reliability of the CLDQ was 0.83 (range: 0.65-0.90). Most of the hypothesized item-scale correlations were 0.40 or over, and all of such hypothesized correlations were higher than the alternative ones, indicating satisfactory convergent and discriminant validity. Six factors were extracted after varimax rotation from the 29 items of CLDQ. The eligible Cohen’s ES with statistically significant independent sample t test was found in the overall CLDQ and abdominal, systematic, activity scales (CHB vs CHB-related cirrhosis), and in the overall CLDQ and abdominal scale in the stratification of patients with CHB (ALT normal vs abnormal). CONCLUSION: The CLDQ has acceptable reliability, validity and sensitivity in Chinese (mainland) patients with CHB. PMID:23801844
Stroke Impact Scale 3.0: Reliability and Validity Evaluation of the Korean Version
2017-01-01
Objective To establish the reliability and validity the Korean version of the Stroke Impact Scale (K-SIS) 3.0. Methods A total of 70 post-stroke patients were enrolled. All subjects were evaluated for general characteristics, Mini-Mental State Examination (MMSE), the National Institutes of Health Stroke Scale (NIHSS), Modified Barthel Index, Hospital Anxiety and Depression Scale (HADS). The SF-36 and K-SIS 3.0 assessed their health-related quality of life. Statistical analysis after evaluation, determined the reliability and validity of the K-SIS 3.0. Results A total of 70 patients (mean age, 54.97 years) participated in this study. Internal consistency of the SIS 3.0 (Cronbach's alpha) was obtained, and all domains had good co-efficiency, with threshold above 0.70. Test-retest reliability of SIS 3.0 required correlation (Spearman's rho) of the same domain scores obtained on the first and second assessments. Results were above 0.5, with the exception of social participation and mobility. Concurrent validity of K-SIS 3.0 was assessed using the SF-36, and other scales with the same or similar domains. Each domain of K-SIS 3.0 had a positive correlation with corresponding similar domain of SF-36 and other scales (HADS, MMSE, and NIHSS). Conclusion The newly developed K-SIS 3.0 showed high inter-intra reliability and test-retest reliabilities, together with high concurrent validity with the original and various other scales, for patients with stroke. K-SIS 3.0 can therefore be used for stroke patients, to assess their health-related quality of life and treatment efficacy. PMID:28758075
Matulis, Simone; Loos, Laura; Langguth, Nadine; Schreiber, Franziska; Gutermann, Jana; Gawrilow, Caterina; Steil, Regina
2015-01-01
The Trauma Symptom Checklist for Children (TSC-C) is the most widely used self-report scale to assess trauma-related symptoms in children and adolescents on six clinical scales. The purpose of the present study was to develop a German version of the TSC-C and to investigate its psychometric properties, such as factor structure, reliability, and validity, in a sample of German adolescents. A normative sample of N=583 and a clinical sample of N=41 adolescents with a history of physical or sexual abuse aged between 13 and 21 years participated in the study. The Confirmatory Factor Analysis on the six-factor model (anger, anxiety, depression, dissociation, posttraumatic stress, and sexual concerns with the subdimensions preoccupation and distress) revealed acceptable to good fit statistics in the normative sample. One item had to be excluded from the German version of the TSC-C because the factor loading was too low. All clinical scales presented acceptable to good reliability, with Cronbach's α's ranging from .80 to .86 in the normative sample and from .72 to .87 in the clinical sample. Concurrent validity was also demonstrated by the high correlations between the TSC-C scales and instruments measuring similar psychopathology. TSC-C scores reliably differentiated between adolescents with trauma history and those without trauma history, indicating discriminative validity. In conclusion, the German version of the TSC-C is a reliable and valid instrument for assessing trauma-related symptoms on six different scales in adolescents aged between 13 and 21 years.
Wang, Lin; Hui, Stanley Sai-chuen; Wong, Stephen Heung-sang
2014-01-01
Background The current study aimed to examine the validity of various published bioelectrical impedance analysis (BIA) equations in estimating FFM among Chinese children and adolescents and to develop BIA equations for the estimation of fat-free mass (FFM) appropriate for Chinese children and adolescents. Material/Methods A total of 255 healthy Chinese children and adolescents aged 9 to 19 years old (127 males and 128 females) from Tianjin, China, participated in the BIA measurement at 50 kHz between the hand and the foot. The criterion measure of FFM was also employed using dual-energy X-ray absorptiometry (DEXA). FFM estimated from 24 published BIA equations was cross-validated against the criterion measure from DEXA. Multiple linear regression was conducted to examine alternative BIA equation for the studied population. Results FFM estimated from the 24 published BIA equations yielded high correlations with the directly measured FFM from DEXA. However, none of the 24 equations was statistically equivalent with the DEXA-measured FFM. Using multiple linear regression and cross-validation against DEXA measurement, an alternative prediction equation was determined as follows: FFM (kg)=1.613+0.742×height (cm)2/impedance (Ω)+0.151×body weight (kg); R2=0.95; SEE=2.45kg; CV=6.5, 93.7% of the residuals of all the participants fell within the 95% limits of agreement. Conclusions BIA was highly correlated with FFM in Chinese children and adolescents. When the new developed BIA equations are applied, BIA can provide a practical and valid measurement of body composition in Chinese children and adolescents. PMID:25398209
Laleh, Leila; Koushki, Davood; Matin, Marzieh; Javidan, Abbas Norouzi; Yekaninejad, Mir Saeed
2015-01-01
Background: Patients with spinal cord injury (SCI) deal with various restrictive factors regarding their clothing, such as disability and difficulty with access to shopping centers. Objectives: We designed a questionnaire to assess attention to clothing and impact of its restrictive factors among Iranian patients with SCI (ACIRF-SCI). Methods: The ACIRF-SCI has 5 domains: functional, medical, attitude, aesthetic, and emotional. The first 3 domains reflect the impact of restrictive factors (factors that restrict attention to clothing), and the last 2 domains reflect attention to clothing and fashion. Functional restrictive factors include disability and dependence. Medical restrictive factors include existence of specific medical conditions that interfere with clothing choice. Construct validity was assessed by factorial analysis, and reliability was expressed by Cronbach’s alpha. Results: A total of 100 patients (75 men and 25 women) entered this study. Patients with a lower injury level had a higher total score (P < .0001), and similarly, patients with paraplegia had higher scores than those with tetraplegia (P < .0001), which illustrates an admissible discriminant validity. Postinjury duration was positively associated with total scores (r = 0.21, P = .04). Construct validity was 0.97, and Cronbach’s alpha was 0.61. Conclusion: Iranian patients with SCI who have greater ability and independence experience a lower impact of restrictive factors related to clothing. The ACIRF-SCI reveals that this assumption is statistically significant, which shows its admissible discriminant validity. The measured construct validity (0.97) and reliability (internal consistency expressed by alpha = 0.61) are acceptable. PMID:26363593
Preanalytical management: serum vacuum tubes validation for routine clinical chemistry
Lima-Oliveira, Gabriel; Lippi, Giuseppe; Salvagno, Gian Luca; Montagnana, Martina; Picheth, Geraldo; Guidi, Gian Cesare
2012-01-01
Introduction The validation process is essential in accredited clinical laboratories. Aim of this study was to validate five kinds of serum vacuum tubes for routine clinical chemistry laboratory testing. Materials and methods: Blood specimens from 100 volunteers in five diff erent serum vacuum tubes (Tube I: VACUETTE®, Tube II: LABOR IMPORT®, Tube III: S-Monovette®, Tube IV: SST® and Tube V: SST II®) were collected by a single, expert phlebotomist. The routine clinical chemistry tests were analyzed on cobas® 6000
Why Does a Method That Fails Continue To Be Used: The Answer
Templeton, Alan R.
2009-01-01
It has been claimed that hundreds of researchers use nested clade phylogeographic analysis (NCPA) based on what the method promises rather than requiring objective validation of the method. The supposed failure of NCPA is based upon the argument that validating it by using positive controls ignored type I error, and that computer simulations have shown a high type I error. The first argument is factually incorrect: the previously published validation analysis fully accounted for both type I and type II errors. The simulations that indicate a 75% type I error rate have serious flaws and only evaluate outdated versions of NCPA. These outdated type I error rates fall precipitously when the 2003 version of single locus NCPA is used or when the 2002 multi-locus version of NCPA is used. It is shown that the treewise type I errors in single-locus NCPA can be corrected to the desired nominal level by a simple statistical procedure, and that multilocus NCPA reconstructs a simulated scenario used to discredit NCPA with 100% accuracy. Hence, NCPA is a not a failed method at all, but rather has been validated both by actual data and by simulated data in a manner that satisfies the published criteria given by its critics. The critics have come to different conclusions because they have focused on the pre-2002 versions of NCPA and have failed to take into account the extensive developments in NCPA since 2002. Hence, researchers can choose to use NCPA based upon objective critical validation that shows that NCPA delivers what it promises. PMID:19335340
Echevarria, C; Steer, J; Heslop-Marshall, K; Stenton, S C; Hughes, R; Wijesinghe, M; Harrison, R N; Steen, N; Simpson, A J; Gibson, G J; Bourke, S C
2017-01-01
Background One in three patients hospitalised due to acute exacerbation of COPD (AECOPD) is readmitted within 90 days. No tool has been developed specifically in this population to predict readmission or death. Clinicians are unable to identify patients at particular risk, yet resources to prevent readmission are allocated based on clinical judgement. Methods In participating hospitals, consecutive admissions of patients with AECOPD were identified by screening wards and reviewing coding records. A tool to predict 90-day readmission or death without readmission was developed in two hospitals (the derivation cohort) and validated in: (a) the same hospitals at a later timeframe (internal validation cohort) and (b) four further UK hospitals (external validation cohort). Performance was compared with ADO, BODEX, CODEX, DOSE and LACE scores. Results Of 2417 patients, 936 were readmitted or died within 90 days of discharge. The five independent variables in the final model were: Previous admissions, eMRCD score, Age, Right-sided heart failure and Left-sided heart failure (PEARL). The PEARL score was consistently discriminative and accurate with a c-statistic of 0.73, 0.68 and 0.70 in the derivation, internal validation and external validation cohorts. Higher PEARL scores were associated with a shorter time to readmission. Conclusions The PEARL score is a simple tool that can effectively stratify patients' risk of 90-day readmission or death, which could help guide readmission avoidance strategies within the clinical and research setting. It is superior to other scores that have been used in this population. Trial registration number UKCRN ID 14214. PMID:28235886
The Development of Statistics Textbook Supported with ICT and Portfolio-Based Assessment
NASA Astrophysics Data System (ADS)
Hendikawati, Putriaji; Yuni Arini, Florentina
2016-02-01
This research was development research that aimed to develop and produce a Statistics textbook model that supported with information and communication technology (ICT) and Portfolio-Based Assessment. This book was designed for students of mathematics at the college to improve students’ ability in mathematical connection and communication. There were three stages in this research i.e. define, design, and develop. The textbooks consisted of 10 chapters which each chapter contains introduction, core materials and include examples and exercises. The textbook developed phase begins with the early stages of designed the book (draft 1) which then validated by experts. Revision of draft 1 produced draft 2 which then limited test for readability test book. Furthermore, revision of draft 2 produced textbook draft 3 which simulated on a small sample to produce a valid model textbook. The data were analysed with descriptive statistics. The analysis showed that the Statistics textbook model that supported with ICT and Portfolio-Based Assessment valid and fill up the criteria of practicality.
Fuzzy-logic based strategy for validation of multiplex methods: example with qualitative GMO assays.
Bellocchi, Gianni; Bertholet, Vincent; Hamels, Sandrine; Moens, W; Remacle, José; Van den Eede, Guy
2010-02-01
This paper illustrates the advantages that a fuzzy-based aggregation method could bring into the validation of a multiplex method for GMO detection (DualChip GMO kit, Eppendorf). Guidelines for validation of chemical, bio-chemical, pharmaceutical and genetic methods have been developed and ad hoc validation statistics are available and routinely used, for in-house and inter-laboratory testing, and decision-making. Fuzzy logic allows summarising the information obtained by independent validation statistics into one synthetic indicator of overall method performance. The microarray technology, introduced for simultaneous identification of multiple GMOs, poses specific validation issues (patterns of performance for a variety of GMOs at different concentrations). A fuzzy-based indicator for overall evaluation is illustrated in this paper, and applied to validation data for different genetically modified elements. Remarks were drawn on the analytical results. The fuzzy-logic based rules were shown to be applicable to improve interpretation of results and facilitate overall evaluation of the multiplex method.
Supply Chain Collaboration: Information Sharing in a Tactical Operating Environment
2013-06-01
architecture, there are four tiers: Client (Web Application Clients ), Presentation (Web-Server), Processing (Application-Server), Data (Database...organization in each period. This data will be collected to analyze. i) Analyses and Validation: We will do a statistics test in this data, Pareto ...notes, outstanding deliveries, and inventory. i) Analyses and Validation: We will do a statistics test in this data, Pareto analyses and confirmation
Research Education in Undergraduate Occupational Therapy Programs.
ERIC Educational Resources Information Center
Petersen, Paul; And Others
1992-01-01
Of 63 undergraduate occupational therapy programs surveyed, the 38 responses revealed some common areas covered: elementary descriptive statistics, validity, reliability, and measurement. Areas underrepresented include statistical analysis with or without computers, research design, and advanced statistics. (SK)
Hoseinzadeh, Hamidreza; Taghipour, Ali; Yousefi, Mahdi
2018-01-01
Background Development of a questionnaire based on the resources of Persian traditional medicine seems necessary. One of the problems faced by practitioners of traditional medicine is the different opinions regarding the diagnosis of general temperament or temperament of member. One of the reasons is the lack of validity tools, and it has led to difficulties in training the student of traditional medicine and the treatment of patients. The differences in the detection methods, have given rise to several treatment methods. Objective The present study aimed to develop a questionnaire and standard software for diagnosis of gastrointestinal dystemperaments. Methods The present research is a tool developing study which included 8 stages of developing the items, determining the statements based on items, assessing the face validity, assessing the content validity, assessing the reliability, rating the items, developing a software for calculation of the total score of the questionnaire named GDS v.1.1, and evaluating the concurrent validity using statistical tests including Cronbach’s alpha coefficient, Cohen’s kappa coefficient. Results Based on the results, 112 notes including 62 symptoms were extracted from resources, and 58 items were obtained from in-person interview sessions with a panel of experts. A statement was selected for each item and, after merging a number of statements, a total of 49 statements were finally obtained. By calculating the score of statement impact and determining the content validity, respectively, 6 and 10 other items were removed from the list of statements. Standardized Cronbach’s alpha for this questionnaire was obtained 0.795 and its concurrent validity was equal to 0.8. Conclusion A quantitative tool was developed for diagnosis and examination of gastrointestinal dystemperaments. The developed questionnaire is adequately reliable and valid for this purpose. In addition, the software can be used for clinical diagnosis. PMID:29629060
Parvizi, Mohammad Mahdi; Amini, Mitra; Dehghani, Mohammad Reza; Jafari, Peyman; Parvizi, Zahra
2016-01-01
Purpose Evaluation is the main component in design and implementation of educational activities and rapid growth of educational institution programs. Outpatient medical education and clinical training environment is one of the most important parts of training of medical residents. This study aimed to determine the validity and reliability of the Persian version of Ambulatory Care Learning Educational Environment Measure (ACLEEM) questionnaire, as an instrument for assessment of educational environments in residency medical clinics. Materials and methods This study was performed on 180 residents in Shiraz University of Medical Sciences, Shiraz, Iran, in 2014–2015. The questionnaire designers’ electronic permission (by email) and the residents’ verbal consent were obtained before distributing the questionnaires. The study data were gathered using ACLEEM questionnaire developed by Arnoldo Riquelme in 2013. The data were analyzed using the SPSS statistical software, version 14, and MedCalc® software. Then, the construct validity, including convergent and discriminant validities, of the Persian version of ACLEEM questionnaire was assessed. Its internal consistency was also checked by Cronbach’s alpha coefficient. Results Five team members who were experts in medical education were consulted to test the cultural adaptation, linguistic equivalency, and content validity of the Persian version of the questionnaire. Content validity indexes were >0.9 in all items. In factor analysis of the instrument, the Kaiser–Meyer–Olkin index was 0.928 and Barlett’s sphericity test yielded the following results: X2=6,717.551, df =1,225, and P≤0.001. Besides, Cronbach’s alpha coefficient of ACLEEM questionnaire was 0.964. Cronbach’s alpha coefficients were also >0.80 in all the three domains of the questionnaire. Overall, the Persian version of ACLEEM showed excellent convergent validity and acceptable discriminant validity, except for the clinical training domain. Conclusion According to the results, the Persian version of ACLEEM questionnaire was a valid and reliable instrument for Iranian residents to assess specialized clinics and residency ambulatory settings. PMID:27729824
Validation of the H-SAF precipitation product H03 over Greece using rain gauge data
NASA Astrophysics Data System (ADS)
Feidas, H.; Porcu, F.; Puca, S.; Rinollo, A.; Lagouvardos, C.; Kotroni, V.
2018-01-01
This paper presents an extensive validation of the combined infrared/microwave H-SAF (EUMETSAT Satellite Application Facility on Support to Operational Hydrology and Water Management) precipitation product H03, for a 1-year period, using gauge observations from a relatively dense network of 233 stations over Greece. First, the quality of the interpolated data used to validate the precipitation product is assessed and a quality index is constructed based on parameters such as the density of the station network and the orography. Then, a validation analysis is conducted based on comparisons of satellite (H03) with interpolated rain gauge data to produce continuous and multi-categorical statistics at monthly and annual timescales by taking into account the different geophysical characteristics of the terrain (land, coast, sea, elevation). Finally, the impact of the quality of interpolated data on the validation statistics is examined in terms of different configurations of the interpolation model and the rain gauge network characteristics used in the interpolation. The possibility of using a quality index of the interpolated data as a filter in the validation procedure is also investigated. The continuous validation statistics show yearly root mean squared error (RMSE) and mean absolute error (MAE) corresponding to the 225 and 105 % of the mean rain rate, respectively. Mean error (ME) indicates a slight overall tendency for underestimation of the rain gauge rates, which takes large values for the high rain rates. In general, the H03 algorithm cannot retrieve very well the light (< 1 mm/h) and the convective type (>10 mm/h) precipitation. The poor correlation between satellite and gauge data points to algorithm problems in co-locating precipitation patterns. Seasonal comparison shows that retrieval errors are lower for cold months than in the summer months of the year. The multi-categorical statistics indicate that the H03 algorithm is able to discriminate efficiently the rain from the no rain events although a large number of rain events are missed. The most prominent feature is the very high false alarm ratio (FAR) (more than 70 %), the relatively low probability of detection (POD) (less than 40 %), and the overestimation of the rainy pixels. Although the different geophysical features of the terrain (land, coast, sea, elevation) and the quality of the interpolated data have an effect on the validation statistics, this, in general, is not significant and seems to be more distinct in the categorical than in the continuous statistics.
Development of criteria for a diagnosis: lessons from the night eating syndrome
Stunkard, Albert J.; Allison, Kelly C.; Geliebter, Allan; Lundgren, Jennifer D.; Gluck, Marci E.; O’Reardon, John P.
2013-01-01
Criteria for inclusion of diagnoses of Axis I disorders in the forthcoming Diagnostic and Statistical Manual (DSM-V) of the American Psychiatric Association are being considered. The 5 criteria that were proposed by Blashfield et al as necessary for inclusion in DSM-IV are reviewed and are met by the night eating syndrome (NES). Seventy-seven publications in refereed journals in the last decade indicate growing recognition of NES. Two core diagnostic criteria have been established: evening hyperphagia (consumption of at least 25% of daily food intake after the evening meal) and/or the presence of nocturnal awakenings with ingestions. These criteria have been validated in studies that used self-reports, structured interviews, and symptom scales. Night eating syndrome can be distinguished from binge eating disorder and sleep-related eating disorder. Four additional features attest to the usefulness of the diagnosis of NES: (1) its prevalence, (2) its association with obesity, (3) its extensive comorbidity, and (4) its biological aspects. In conclusion, research on NES supports the validity of the diagnosis and its inclusion in DSM-V. PMID:19683608
Advanced information processing system: Fault injection study and results
NASA Technical Reports Server (NTRS)
Burkhardt, Laura F.; Masotto, Thomas K.; Lala, Jaynarayan H.
1992-01-01
The objective of the AIPS program is to achieve a validated fault tolerant distributed computer system. The goals of the AIPS fault injection study were: (1) to present the fault injection study components addressing the AIPS validation objective; (2) to obtain feedback for fault removal from the design implementation; (3) to obtain statistical data regarding fault detection, isolation, and reconfiguration responses; and (4) to obtain data regarding the effects of faults on system performance. The parameters are described that must be varied to create a comprehensive set of fault injection tests, the subset of test cases selected, the test case measurements, and the test case execution. Both pin level hardware faults using a hardware fault injector and software injected memory mutations were used to test the system. An overview is provided of the hardware fault injector and the associated software used to carry out the experiments. Detailed specifications are given of fault and test results for the I/O Network and the AIPS Fault Tolerant Processor, respectively. The results are summarized and conclusions are given.
NASA Astrophysics Data System (ADS)
Peikari, Mohammad; Martel, Anne L.
2016-03-01
Purpose: Automatic cell segmentation plays an important role in reliable diagnosis and prognosis of patients. Most of the state-of-the-art cell detection and segmentation techniques focus on complicated methods to subtract foreground cells from the background. In this study, we introduce a preprocessing method which leads to a better detection and segmentation results compared to a well-known state-of-the-art work. Method: We transform the original red-green-blue (RGB) space into a new space defined by the top eigenvectors of the RGB space. Stretching is done by manipulating the contrast of each pixel value to equalize the color variances. New pixel values are then inverse transformed to the original RGB space. This altered RGB image is then used to segment cells. Result: The validation of our method with a well-known state-of-the-art technique revealed a statistically significant improvement on an identical validation set. We achieved a mean F1-score of 0.901. Conclusion: Preprocessing steps to decorrelate colorspaces may improve cell segmentation performances.
Assessment of Processes of Change for Weight Management in a UK Sample
Andrés, Ana; Saldaña, Carmina; Beeken, Rebecca J.
2015-01-01
Objective The present study aimed to validate the English version of the Processes of Change questionnaire in weight management (P-Weight). Methods Participants were 1,087 UK adults, including people enrolled in a behavioural weight management programme, university students and an opportunistic sample. The mean age of the sample was 34.80 (SD = 13.56) years, and 83% were women. BMI ranged from 18.51 to 55.36 (mean = 25.92, SD = 6.26) kg/m2. Participants completed both the stages and processes questionnaires in weight management (S-Weight and P-Weight), and subscales from the EDI-2 and EAT-40. A refined version of the P-Weight consisting of 32 items was obtained based on the item analysis. Results The internal structure of the scale fitted a four-factor model, and statistically significant correlations with external measures supported the convergent validity of the scale. Conclusion The adequate psychometric properties of the P-Weight English version suggest that it could be a useful tool to tailor weight management interventions. PMID:25765163
Mapping the Diagnosis Axis of an Interface Terminology to the NANDA International Taxonomy
Juvé Udina, Maria-Eulàlia; Gonzalez Samartino, Maribel; Matud Calvo, Cristina
2012-01-01
Background. Nursing terminologies are designed to support nursing practice but, as with any other clinical tool, they should be evaluated. Cross-mapping is a formal method for examining the validity of the existing controlled vocabularies. Objectives. The study aims to assess the inclusiveness and expressiveness of the nursing diagnosis axis of a newly implemented interface terminology by cross-mapping with the NANDA-I taxonomy. Design/Methods. The study applied a descriptive design, using a cross-sectional, bidirectional mapping strategy. The sample included 728 concepts from both vocabularies. Concept cross-mapping was carried out to identify one-to-one, negative, and hierarchical connections. The analysis was conducted using descriptive statistics. Results. Agreement of the raters' mapping achieved 97%. More than 60% of the nursing diagnosis concepts in the NANDA-I taxonomy were mapped to concepts in the diagnosis axis of the new interface terminology; 71.1% were reversely mapped. Conclusions. Main results for outcome measures suggest that the diagnosis axis of this interface terminology meets the validity criterion of cross-mapping when mapped from and to the NANDA-I taxonomy. PMID:22830046
Mapping the Diagnosis Axis of an Interface Terminology to the NANDA International Taxonomy.
Juvé Udina, Maria-Eulàlia; Gonzalez Samartino, Maribel; Matud Calvo, Cristina
2012-01-01
Background. Nursing terminologies are designed to support nursing practice but, as with any other clinical tool, they should be evaluated. Cross-mapping is a formal method for examining the validity of the existing controlled vocabularies. Objectives. The study aims to assess the inclusiveness and expressiveness of the nursing diagnosis axis of a newly implemented interface terminology by cross-mapping with the NANDA-I taxonomy. Design/Methods. The study applied a descriptive design, using a cross-sectional, bidirectional mapping strategy. The sample included 728 concepts from both vocabularies. Concept cross-mapping was carried out to identify one-to-one, negative, and hierarchical connections. The analysis was conducted using descriptive statistics. Results. Agreement of the raters' mapping achieved 97%. More than 60% of the nursing diagnosis concepts in the NANDA-I taxonomy were mapped to concepts in the diagnosis axis of the new interface terminology; 71.1% were reversely mapped. Conclusions. Main results for outcome measures suggest that the diagnosis axis of this interface terminology meets the validity criterion of cross-mapping when mapped from and to the NANDA-I taxonomy.
Lotfy, Hayam M; Hegazy, Maha A; Rezk, Mamdouh R; Omran, Yasmin Rostom
2015-09-05
Smart spectrophotometric methods have been applied and validated for the simultaneous determination of a binary mixture of chloramphenicol (CPL) and prednisolone acetate (PA) without preliminary separation. Two novel methods have been developed; the first method depends upon advanced absorbance subtraction (AAS), while the other method relies on advanced amplitude modulation (AAM); in addition to the well established dual wavelength (DW), ratio difference (RD) and constant center coupled with spectrum subtraction (CC-SS) methods. Accuracy, precision and linearity ranges of these methods were determined. Moreover, selectivity was assessed by analyzing synthetic mixtures of both drugs. The proposed methods were successfully applied to the assay of drugs in their pharmaceutical formulations. No interference was observed from common additives and the validity of the methods was tested. The obtained results have been statistically compared to that of official spectrophotometric methods to give a conclusion that there is no significant difference between the proposed methods and the official ones with respect to accuracy and precision. Copyright © 2015 Elsevier B.V. All rights reserved.
Implementing statistical equating for MRCP(UK) Parts 1 and 2.
McManus, I C; Chis, Liliana; Fox, Ray; Waller, Derek; Tang, Peter
2014-09-26
The MRCP(UK) exam, in 2008 and 2010, changed the standard-setting of its Part 1 and Part 2 examinations from a hybrid Angoff/Hofstee method to statistical equating using Item Response Theory, the reference group being UK graduates. The present paper considers the implementation of the change, the question of whether the pass rate increased amongst non-UK candidates, any possible role of Differential Item Functioning (DIF), and changes in examination predictive validity after the change. Analysis of data of MRCP(UK) Part 1 exam from 2003 to 2013 and Part 2 exam from 2005 to 2013. Inspection suggested that Part 1 pass rates were stable after the introduction of statistical equating, but showed greater annual variation probably due to stronger candidates taking the examination earlier. Pass rates seemed to have increased in non-UK graduates after equating was introduced, but was not associated with any changes in DIF after statistical equating. Statistical modelling of the pass rates for non-UK graduates found that pass rates, in both Part 1 and Part 2, were increasing year on year, with the changes probably beginning before the introduction of equating. The predictive validity of Part 1 for Part 2 was higher with statistical equating than with the previous hybrid Angoff/Hofstee method, confirming the utility of IRT-based statistical equating. Statistical equating was successfully introduced into the MRCP(UK) Part 1 and Part 2 written examinations, resulting in higher predictive validity than the previous Angoff/Hofstee standard setting. Concerns about an artefactual increase in pass rates for non-UK candidates after equating were shown not to be well-founded. Most likely the changes resulted from a genuine increase in candidate ability, albeit for reasons which remain unclear, coupled with a cognitive illusion giving the impression of a step-change immediately after equating began. Statistical equating provides a robust standard-setting method, with a better theoretical foundation than judgemental techniques such as Angoff, and is more straightforward and requires far less examiner time to provide a more valid result. The present study provides a detailed case study of introducing statistical equating, and issues which may need to be considered with its introduction.
Barber, Julie A; Thompson, Simon G
1998-01-01
Objective To review critically the statistical methods used for health economic evaluations in randomised controlled trials where an estimate of cost is available for each patient in the study. Design Survey of published randomised trials including an economic evaluation with cost values suitable for statistical analysis; 45 such trials published in 1995 were identified from Medline. Main outcome measures The use of statistical methods for cost data was assessed in terms of the descriptive statistics reported, use of statistical inference, and whether the reported conclusions were justified. Results Although all 45 trials reviewed apparently had cost data for each patient, only 9 (20%) reported adequate measures of variability for these data and only 25 (56%) gave results of statistical tests or a measure of precision for the comparison of costs between the randomised groups. Only 16 (36%) of the articles gave conclusions which were justified on the basis of results presented in the paper. No paper reported sample size calculations for costs. Conclusions The analysis and interpretation of cost data from published trials reveal a lack of statistical awareness. Strong and potentially misleading conclusions about the relative costs of alternative therapies have often been reported in the absence of supporting statistical evidence. Improvements in the analysis and reporting of health economic assessments are urgently required. Health economic guidelines need to be revised to incorporate more detailed statistical advice. Key messagesHealth economic evaluations required for important healthcare policy decisions are often carried out in randomised controlled trialsA review of such published economic evaluations assessed whether statistical methods for cost outcomes have been appropriately used and interpretedFew publications presented adequate descriptive information for costs or performed appropriate statistical analysesIn at least two thirds of the papers, the main conclusions regarding costs were not justifiedThe analysis and reporting of health economic assessments within randomised controlled trials urgently need improving PMID:9794854
Murphy, Thomas; Schwedock, Julie; Nguyen, Kham; Mills, Anna; Jones, David
2015-01-01
New recommendations for the validation of rapid microbiological methods have been included in the revised Technical Report 33 release from the PDA. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This case study applies those statistical methods to accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological methods system being evaluated for water bioburden testing. Results presented demonstrate that the statistical methods described in the PDA Technical Report 33 chapter can all be successfully applied to the rapid microbiological method data sets and gave the same interpretation for equivalence to the standard method. The rapid microbiological method was in general able to pass the requirements of PDA Technical Report 33, though the study shows that there can be occasional outlying results and that caution should be used when applying statistical methods to low average colony-forming unit values. Prior to use in a quality-controlled environment, any new method or technology has to be shown to work as designed by the manufacturer for the purpose required. For new rapid microbiological methods that detect and enumerate contaminating microorganisms, additional recommendations have been provided in the revised PDA Technical Report No. 33. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This paper applies those statistical methods to analyze accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological method system being validated for water bioburden testing. The case study demonstrates that the statistical methods described in the PDA Technical Report No. 33 chapter can be successfully applied to rapid microbiological method data sets and give the same comparability results for similarity or difference as the standard method. © PDA, Inc. 2015.
Saraf, Sanatan; Mathew, Thomas; Roy, Anindya
2015-01-01
For the statistical validation of surrogate endpoints, an alternative formulation is proposed for testing Prentice's fourth criterion, under a bivariate normal model. In such a setup, the criterion involves inference concerning an appropriate regression parameter, and the criterion holds if the regression parameter is zero. Testing such a null hypothesis has been criticized in the literature since it can only be used to reject a poor surrogate, and not to validate a good surrogate. In order to circumvent this, an equivalence hypothesis is formulated for the regression parameter, namely the hypothesis that the parameter is equivalent to zero. Such an equivalence hypothesis is formulated as an alternative hypothesis, so that the surrogate endpoint is statistically validated when the null hypothesis is rejected. Confidence intervals for the regression parameter and tests for the equivalence hypothesis are proposed using bootstrap methods and small sample asymptotics, and their performances are numerically evaluated and recommendations are made. The choice of the equivalence margin is a regulatory issue that needs to be addressed. The proposed equivalence testing formulation is also adopted for other parameters that have been proposed in the literature on surrogate endpoint validation, namely, the relative effect and proportion explained.
Validating the simulation of large-scale parallel applications using statistical characteristics
Zhang, Deli; Wilke, Jeremiah; Hendry, Gilbert; ...
2016-03-01
Simulation is a widely adopted method to analyze and predict the performance of large-scale parallel applications. Validating the hardware model is highly important for complex simulations with a large number of parameters. Common practice involves calculating the percent error between the projected and the real execution time of a benchmark program. However, in a high-dimensional parameter space, this coarse-grained approach often suffers from parameter insensitivity, which may not be known a priori. Moreover, the traditional approach cannot be applied to the validation of software models, such as application skeletons used in online simulations. In this work, we present a methodologymore » and a toolset for validating both hardware and software models by quantitatively comparing fine-grained statistical characteristics obtained from execution traces. Although statistical information has been used in tasks like performance optimization, this is the first attempt to apply it to simulation validation. Lastly, our experimental results show that the proposed evaluation approach offers significant improvement in fidelity when compared to evaluation using total execution time, and the proposed metrics serve as reliable criteria that progress toward automating the simulation tuning process.« less
van der Heijden, G. J.; van der Windt, D. A.; de Winter, A. F.
1997-01-01
OBJECTIVE: To assess the effectiveness of physiotherapy for patients with soft tissue shoulder disorders. DESIGN: A systematic computerised literature search of Medline and Embase, supplemented with citation tracking, for relevant trials with random allocation published before 1996. SUBJECTS: Patients treated with physiotherapy for disorders of soft tissue of the shoulder. MAIN OUTCOME MEASURES: Success rates, mobility, pain, functional status. RESULTS: Six of the 20 assessed trials satisfied at least five of eight validity criteria. Assessment of methods was often hampered by insufficient information on various validity criteria, and trials were often flawed by lack of blinding, high proportions of withdrawals from treatment, and high proportions of missing values. Trial sizes were small: only six trials included intervention groups of more than 25 patients. Ultrasound therapy, evaluated in six trials, was not shown to be effective. Four other trials favoured physiotherapy (laser therapy or manipulation), but the validity of their methods was unsatisfactory. CONCLUSIONS: There is evidence that ultrasound therapy is ineffective in the treatment of soft tissue shoulder disorders. Due to small trial sizes and unsatisfactory methods, evidence for the effectiveness of other methods of physiotherapy is inconclusive. For all methods of treatment, trials were too heterogeneous with respect to included patients, index and reference treatments, and follow up to merit valid statistical pooling. Future studies should show whether physiotherapy is superior to treatment with drugs, steroid injections, or a wait and see policy. PMID:9233322
Validation and reliability of a Behcet’s Syndrome Activity Scale in Korea
Choi, Hyo Jin; Seo, Mi Ryoung; Ryu, Hee Jung; Baek, Han Joo
2016-01-01
Background/Aims: We prepared a cross-cultural adaptation of the Behcet’s Syndrome Activity Scale (BSAS) and evaluated its reliability and validity in Korea. Methods: Fifty patients with Behcet’s disease (BD) who attended the Rheumatology Clinic of Gachon University Gil Medical Center were included in this study. The first BSAS questionnaire was administered at each clinic visit, and the second questionnaire was completed at home within 24 hours of the visit. A Behcet’s Disease Current Activity Form (BDCAF) and a Behcet’s Disease Quality of Life (BDQOL) form were also given to patients. The test-retest reliability was analyzed by intraclass correlation coefficients (ICC). To assess the validity, the total BSAS score was compared with the BDCAF score, the patient/physician global assessment, and the BDQOL by Spearman rank correlation. Results: Twelve males and 38 females were enrolled. The mean age was 48.5 years and the mean disease duration was 6.7 years. Thirty-eight patients (76.0%) returned the questionnaire by mail. For the test-retest reliability, the two assessments were significantly correlated on all 10 items of the BSAS questionnaire (p < 0.05) and the total BSAS score (ICC, 0.925; p < 0.001). The total BSAS score was statistically correlated with the BDQOL, BDCAF, and patient/physician global assessment (p < 0.01). Conclusions: The Korean version of BSAS is a reliable and valid instrument to measure BD activity. PMID:26767871
Psychometric properties and clinical utility of the Scale for Suicidal Ideation (SSI) in adolescents
Holi, Matti M; Pelkonen, Mirjami; Karlsson, Linnea; Kiviruusu, Olli; Ruuttu, Titta; Heilä, Hannele; Tuisku, Virpi; Marttunen, Mauri
2005-01-01
Background Accurate assessment of suicidality is of major importance in both clinical and research settings. The Scale for Suicidal Ideation (SSI) is a well-established clinician-rating scale but its suitability to adolescents has not been studied. The aim of this study was to evaluate the reliability and validity, and to test an appropriate cutoff threshold for the SSI in a depressed adolescent outpatient population and controls. Methods 218 adolescent psychiatric outpatient clinic patients suffering from depressive disorders and 200 age- and sex-matched school-attending controls were evaluated by the SSI for presence and severity of suicidal ideation. Internal consistency, discriminative-, concurrent-, and construct validity as well as the screening properties of the SSI were evaluated. Results Cronbach's α for the whole SSI was 0.95. The SSI total score differentiated patients and controls, and increased statistically significantly in classes with increasing severity of suicidality derived from the suicidality items of the K-SADS-PL diagnostic interview. Varimax-rotated principal component analysis of the SSI items yielded three theoretically coherent factors suggesting construct validity. Area under the receiver operating characteristic (ROC) curve was 0.84 for the whole sample and 0.80 for the patient sample. The optimal cutoff threshold for the SSI total score was 3/4 yielding sensitivity of 75% and specificity of 88.9% in this population. Conclusions SSI appears to be a reliable and a valid measure of suicidal ideation for depressed adolescents. PMID:15691388
Unusual Childhood Waking as a Possible Precursor of the 1995 Kobe Earthquake
Ikeya, Motoji; Whitehead, Neil E.
2013-01-01
Simple Summary The paper investigates whether young children may waken before earthquakes through a cause other than foreshocks. It concludes there is statistical evidence for this, but the mechanism best supported is anxiety produced by Ultra Low Frequency (ULF) electromagnetic waves. Abstract Nearly 1,100 young students living in Japan at a range of distances up to 500 km from the 1995 Kobe M7 earthquake were interviewed. A statistically significant abnormal rate of early wakening before the earthquake was found, having exponential decrease with distance and a half value approaching 100 km, but decreasing much slower than from a point source such as an epicentre; instead originating from an extended area of more than 100 km in diameter. Because an improbably high amount of variance is explained, this effect is unlikely to be simply psychological and must reflect another mechanism—perhaps Ultra-Low Frequency (ULF) electromagnetic waves creating anxiety—but probably not 222Rn excess. Other work reviewed suggests these conclusions may be valid for animals in general, not just children, but would be very difficult to apply for practical earthquake prediction. PMID:26487316
2013-01-01
Background The present study aimed to develop an artificial neural network (ANN) based prediction model for cardiovascular autonomic (CA) dysfunction in the general population. Methods We analyzed a previous dataset based on a population sample consisted of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN analysis. Performances of these prediction models were evaluated in the validation set. Results Univariate analysis indicated that 14 risk factors showed statistically significant association with CA dysfunction (P < 0.05). The mean area under the receiver-operating curve was 0.762 (95% CI 0.732–0.793) for prediction model developed using ANN analysis. The mean sensitivity, specificity, positive and negative predictive values were similar in the prediction models was 0.751, 0.665, 0.330 and 0.924, respectively. All HL statistics were less than 15.0. Conclusion ANN is an effective tool for developing prediction models with high value for predicting CA dysfunction among the general population. PMID:23902963
A note on the misuses of the variance test in meteorological studies
NASA Astrophysics Data System (ADS)
Hazra, Arnab; Bhattacharya, Sourabh; Banik, Pabitra; Bhattacharya, Sabyasachi
2017-12-01
Stochastic modeling of rainfall data is an important area in meteorology. The gamma distribution is a widely used probability model for non-zero rainfall. Typically the choice of the distribution for such meteorological studies is based on two goodness-of-fit tests—the Pearson's Chi-square test and the Kolmogorov-Smirnov test. Inspired by the index of dispersion introduced by Fisher (Statistical methods for research workers. Hafner Publishing Company Inc., New York, 1925), Mooley (Mon Weather Rev 101:160-176, 1973) proposed the variance test as a goodness-of-fit measure in this context and a number of researchers have implemented it since then. We show that the asymptotic distribution of the test statistic for the variance test is generally not comparable to any central Chi-square distribution and hence the test is erroneous. We also describe a method for checking the validity of the asymptotic distribution for a class of distributions. We implement the erroneous test on some simulated, as well as real datasets and demonstrate how it leads to some wrong conclusions.
Sequential Tests of Multiple Hypotheses Controlling Type I and II Familywise Error Rates
Bartroff, Jay; Song, Jinlin
2014-01-01
This paper addresses the following general scenario: A scientist wishes to perform a battery of experiments, each generating a sequential stream of data, to investigate some phenomenon. The scientist would like to control the overall error rate in order to draw statistically-valid conclusions from each experiment, while being as efficient as possible. The between-stream data may differ in distribution and dimension but also may be highly correlated, even duplicated exactly in some cases. Treating each experiment as a hypothesis test and adopting the familywise error rate (FWER) metric, we give a procedure that sequentially tests each hypothesis while controlling both the type I and II FWERs regardless of the between-stream correlation, and only requires arbitrary sequential test statistics that control the error rates for a given stream in isolation. The proposed procedure, which we call the sequential Holm procedure because of its inspiration from Holm’s (1979) seminal fixed-sample procedure, shows simultaneous savings in expected sample size and less conservative error control relative to fixed sample, sequential Bonferroni, and other recently proposed sequential procedures in a simulation study. PMID:25092948
Atmospheric convective velocities and the Fourier phase spectrum
NASA Technical Reports Server (NTRS)
Cliff, W. C.
1974-01-01
The relationship between convective velocity and the Fourier phase spectrum of the cross correlation is developed. By examining the convective velocity as a function of frequency, one may determine if Taylor's conversion from time statistics to space statistics is valid. It is felt that the high shear regions of the atmospheric boundary layer need to be explored to determine the validity of the use of Taylor's hypothesis for this region.
Ciani, Oriana; Davis, Sarah; Tappenden, Paul; Garside, Ruth; Stein, Ken; Cantrell, Anna; Saad, Everardo D; Buyse, Marc; Taylor, Rod S
2014-07-01
Licensing of, and coverage decisions on, new therapies should rely on evidence from patient-relevant endpoints such as overall survival (OS). Nevertheless, evidence from surrogate endpoints may also be useful, as it may not only expedite the regulatory approval of new therapies but also inform coverage decisions. It is, therefore, essential that candidate surrogate endpoints be properly validated. However, there is no consensus on statistical methods for such validation and on how the evidence thus derived should be applied by policy makers. We review current statistical approaches to surrogate-endpoint validation based on meta-analysis in various advanced-tumor settings. We assessed the suitability of two surrogates (progression-free survival [PFS] and time-to-progression [TTP]) using three current validation frameworks: Elston and Taylor's framework, the German Institute of Quality and Efficiency in Health Care's (IQWiG) framework and the Biomarker-Surrogacy Evaluation Schema (BSES3). A wide variety of statistical methods have been used to assess surrogacy. The strength of the association between the two surrogates and OS was generally low. The level of evidence (observation-level versus treatment-level) available varied considerably by cancer type, by evaluation tools and was not always consistent even within one specific cancer type. Not in all solid tumors the treatment-level association between PFS or TTP and OS has been investigated. According to IQWiG's framework, only PFS achieved acceptable evidence of surrogacy in metastatic colorectal and ovarian cancer treated with cytotoxic agents. Our study emphasizes the challenges of surrogate-endpoint validation and the importance of building consensus on the development of evaluation frameworks.
Soo, Danielle H E; Pendharkar, Sayali A; Jivanji, Chirag J; Gillies, Nicola A; Windsor, John A; Petrov, Maxim S
2017-10-01
Approximately 40% of patients develop abnormal glucose metabolism after a single episode of acute pancreatitis. This study aimed to develop and validate a prediabetes self-assessment screening score for patients after acute pancreatitis. Data from non-overlapping training (n=82) and validation (n=80) cohorts were analysed. Univariate logistic and linear regression identified variables associated with prediabetes after acute pancreatitis. Multivariate logistic regression developed the score, ranging from 0 to 215. The area under the receiver-operating characteristic curve (AUROC), Hosmer-Lemeshow χ 2 statistic, and calibration plots were used to assess model discrimination and calibration. The developed score was validated using data from the validation cohort. The score had an AUROC of 0.88 (95% CI, 0.80-0.97) and Hosmer-Lemeshow χ 2 statistic of 5.75 (p=0.676). Patients with a score of ≥75 had a 94.1% probability of having prediabetes, and were 29 times more likely to have prediabetes than those with a score of <75. The AUROC in the validation cohort was 0.81 (95% CI, 0.70-0.92) and the Hosmer-Lemeshow χ 2 statistic was 5.50 (p=0.599). Model calibration of the score showed good calibration in both cohorts. The developed and validated score, called PERSEUS, is the first instrument to identify individuals who are at high risk of developing abnormal glucose metabolism following an episode of acute pancreatitis. Copyright © 2017 Editrice Gastroenterologica Italiana S.r.l. Published by Elsevier Ltd. All rights reserved.
The Stroke Riskometer™ App: Validation of a data collection tool and stroke risk predictor
Parmar, Priya; Krishnamurthi, Rita; Ikram, M Arfan; Hofman, Albert; Mirza, Saira S; Varakin, Yury; Kravchenko, Michael; Piradov, Michael; Thrift, Amanda G; Norrving, Bo; Wang, Wenzhi; Mandal, Dipes Kumar; Barker-Collo, Suzanne; Sahathevan, Ramesh; Davis, Stephen; Saposnik, Gustavo; Kivipelto, Miia; Sindi, Shireen; Bornstein, Natan M; Giroud, Maurice; Béjot, Yannick; Brainin, Michael; Poulton, Richie; Narayan, K M Venkat; Correia, Manuel; Freire, António; Kokubo, Yoshihiro; Wiebers, David; Mensah, George; BinDhim, Nasser F; Barber, P Alan; Pandian, Jeyaraj Durai; Hankey, Graeme J; Mehndiratta, Man Mohan; Azhagammal, Shobhana; Ibrahim, Norlinah Mohd; Abbott, Max; Rush, Elaine; Hume, Patria; Hussein, Tasleem; Bhattacharjee, Rohit; Purohit, Mitali; Feigin, Valery L
2015-01-01
Background The greatest potential to reduce the burden of stroke is by primary prevention of first-ever stroke, which constitutes three quarters of all stroke. In addition to population-wide prevention strategies (the ‘mass’ approach), the ‘high risk’ approach aims to identify individuals at risk of stroke and to modify their risk factors, and risk, accordingly. Current methods of assessing and modifying stroke risk are difficult to access and implement by the general population, amongst whom most future strokes will arise. To help reduce the burden of stroke on individuals and the population a new app, the Stroke Riskometer™, has been developed. We aim to explore the validity of the app for predicting the risk of stroke compared with current best methods. Methods 752 stroke outcomes from a sample of 9501 individuals across three countries (New Zealand, Russia and the Netherlands) were utilized to investigate the performance of a novel stroke risk prediction tool algorithm (Stroke Riskometer™) compared with two established stroke risk score prediction algorithms (Framingham Stroke Risk Score [FSRS] and QStroke). We calculated the receiver operating characteristics (ROC) curves and area under the ROC curve (AUROC) with 95% confidence intervals, Harrels C-statistic and D-statistics for measure of discrimination, R2 statistics to indicate level of variability accounted for by each prediction algorithm, the Hosmer-Lemeshow statistic for calibration, and the sensitivity and specificity of each algorithm. Results The Stroke Riskometer™ performed well against the FSRS five-year AUROC for both males (FSRS = 75·0% (95% CI 72·3%–77·6%), Stroke Riskometer™ = 74·0(95% CI 71·3%–76·7%) and females [FSRS = 70·3% (95% CI 67·9%–72·8%, Stroke Riskometer™ = 71·5% (95% CI 69·0%–73·9%)], and better than QStroke [males – 59·7% (95% CI 57·3%–62·0%) and comparable to females = 71·1% (95% CI 69·0%–73·1%)]. Discriminative ability of all algorithms was low (C-statistic ranging from 0·51–0·56, D-statistic ranging from 0·01–0·12). Hosmer-Lemeshow illustrated that all of the predicted risk scores were not well calibrated with the observed event data (P < 0·006). Conclusions The Stroke Riskometer™ is comparable in performance for stroke prediction with FSRS and QStroke. All three algorithms performed equally poorly in predicting stroke events. The Stroke Riskometer™ will be continually developed and validated to address the need to improve the current stroke risk scoring systems to more accurately predict stroke, particularly by identifying robust ethnic/race ethnicity group and country specific risk factors. PMID:25491651
Development of a Research Methods and Statistics Concept Inventory
ERIC Educational Resources Information Center
Veilleux, Jennifer C.; Chapman, Kate M.
2017-01-01
Research methods and statistics are core courses in the undergraduate psychology major. To assess learning outcomes, it would be useful to have a measure that assesses research methods and statistical literacy beyond course grades. In two studies, we developed and provided initial validation results for a research methods and statistical knowledge…
Validity of Diagnostic Codes for Acute Stroke in Administrative Databases: A Systematic Review
McCormick, Natalie; Bhole, Vidula; Lacaille, Diane; Avina-Zubieta, J. Antonio
2015-01-01
Objective To conduct a systematic review of studies reporting on the validity of International Classification of Diseases (ICD) codes for identifying stroke in administrative data. Methods MEDLINE and EMBASE were searched (inception to February 2015) for studies: (a) Using administrative data to identify stroke; or (b) Evaluating the validity of stroke codes in administrative data; and (c) Reporting validation statistics (sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), or Kappa scores) for stroke, or data sufficient for their calculation. Additional articles were located by hand search (up to February 2015) of original papers. Studies solely evaluating codes for transient ischaemic attack were excluded. Data were extracted by two independent reviewers; article quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies tool. Results Seventy-seven studies published from 1976–2015 were included. The sensitivity of ICD-9 430-438/ICD-10 I60-I69 for any cerebrovascular disease was ≥ 82% in most [≥ 50%] studies, and specificity and NPV were both ≥ 95%. The PPV of these codes for any cerebrovascular disease was ≥ 81% in most studies, while the PPV specifically for acute stroke was ≤ 68%. In at least 50% of studies, PPVs were ≥ 93% for subarachnoid haemorrhage (ICD-9 430/ICD-10 I60), 89% for intracerebral haemorrhage (ICD-9 431/ICD-10 I61), and 82% for ischaemic stroke (ICD-9 434/ICD-10 I63 or ICD-9 434&436). For in-hospital deaths, sensitivity was 55%. For cerebrovascular disease or acute stroke as a cause-of-death on death certificates, sensitivity was ≤ 71% in most studies while PPV was ≥ 87%. Conclusions While most cases of prevalent cerebrovascular disease can be detected using 430-438/I60-I69 collectively, acute stroke must be defined using more specific codes. Most in-hospital deaths and death certificates with stroke as a cause-of-death correspond to true stroke deaths. Linking vital statistics and hospitalization data may improve the ascertainment of fatal stroke. PMID:26292280
Frequency Response Function Based Damage Identification for Aerospace Structures
NASA Astrophysics Data System (ADS)
Oliver, Joseph Acton
Structural health monitoring technologies continue to be pursued for aerospace structures in the interests of increased safety and, when combined with health prognosis, efficiency in life-cycle management. The current dissertation develops and validates damage identification technology as a critical component for structural health monitoring of aerospace structures and, in particular, composite unmanned aerial vehicles. The primary innovation is a statistical least-squares damage identification algorithm based in concepts of parameter estimation and model update. The algorithm uses frequency response function based residual force vectors derived from distributed vibration measurements to update a structural finite element model through statistically weighted least-squares minimization producing location and quantification of the damage, estimation uncertainty, and an updated model. Advantages compared to other approaches include robust applicability to systems which are heavily damped, large, and noisy, with a relatively low number of distributed measurement points compared to the number of analytical degrees-of-freedom of an associated analytical structural model (e.g., modal finite element model). Motivation, research objectives, and a dissertation summary are discussed in Chapter 1 followed by a literature review in Chapter 2. Chapter 3 gives background theory and the damage identification algorithm derivation followed by a study of fundamental algorithm behavior on a two degree-of-freedom mass-spring system with generalized damping. Chapter 4 investigates the impact of noise then successfully proves the algorithm against competing methods using an analytical eight degree-of-freedom mass-spring system with non-proportional structural damping. Chapter 5 extends use of the algorithm to finite element models, including solutions for numerical issues, approaches for modeling damping approximately in reduced coordinates, and analytical validation using a composite sandwich plate model. Chapter 6 presents the final extension to experimental systems-including methods for initial baseline correlation and data reduction-and validates the algorithm on an experimental composite plate with impact damage. The final chapter deviates from development and validation of the primary algorithm to discuss development of an experimental scaled-wing test bed as part of a collaborative effort for developing structural health monitoring and prognosis technology. The dissertation concludes with an overview of technical conclusions and recommendations for future work.
Interrater reliability of the mind map assessment rubric in a cohort of medical students
D'Antoni, Anthony V; Zipp, Genevieve Pinto; Olson, Valerie G
2009-01-01
Background Learning strategies are thinking tools that students can use to actively acquire information. Examples of learning strategies include mnemonics, charts, and maps. One strategy that may help students master the tsunami of information presented in medical school is the mind map learning strategy. Currently, there is no valid and reliable rubric to grade mind maps and this may contribute to their underutilization in medicine. Because concept maps and mind maps engage learners similarly at a metacognitive level, a valid and reliable concept map assessment scoring system was adapted to form the mind map assessment rubric (MMAR). The MMAR can assess mind map depth based upon concept-links, cross-links, hierarchies, examples, pictures, and colors. The purpose of this study was to examine interrater reliability of the MMAR. Methods This exploratory study was conducted at a US medical school as part of a larger investigation on learning strategies. Sixty-six (N = 66) first-year medical students were given a 394-word text passage followed by a 30-minute presentation on mind mapping. After the presentation, subjects were again given the text passage and instructed to create mind maps based upon the passage. The mind maps were collected and independently scored using the MMAR by 3 examiners. Interrater reliability was measured using the intraclass correlation coefficient (ICC) statistic. Statistics were calculated using SPSS version 12.0 (Chicago, IL). Results Analysis of the mind maps revealed the following: concept-links ICC = .05 (95% CI, -.42 to .38), cross-links ICC = .58 (95% CI, .37 to .73), hierarchies ICC = .23 (95% CI, -.15 to .50), examples ICC = .53 (95% CI, .29 to .69), pictures ICC = .86 (95% CI, .79 to .91), colors ICC = .73 (95% CI, .59 to .82), and total score ICC = .86 (95% CI, .79 to .91). Conclusion The high ICC value for total mind map score indicates strong MMAR interrater reliability. Pictures and colors demonstrated moderate to strong interrater reliability. We conclude that the MMAR may be a valid and reliable tool to assess mind maps in medicine. However, further research on the validity and reliability of the MMAR is necessary. PMID:19400964
Usman, Mohammad N.; Umar, Muhammad D.
2018-01-01
Background: Recent studies have revealed that pharmacists have interest in conducting research. However, lack of confidence is a major barrier. Objective: This study evaluated pharmacists’ self-perceived competence and confidence to plan and conduct health-related research. Method: This cross sectional study was conducted during the 89th Annual National Conference of the Pharmaceutical Society of Nigeria in November 2016. An adapted questionnaire was validated and administered to 200 pharmacist delegates during the conference. Result: Overall, 127 questionnaires were included in the analysis. At least 80% of the pharmacists had previous health-related research experience. Pharmacist’s competence and confidence scores were lowest for research skills such as: using software for statistical analysis, choosing and applying appropriate inferential statistical test and method, and outlining detailed statistical plan to be used in data analysis. Highest competence and confidence scores were observed for conception of research idea, literature search and critical appraisal of literature. Pharmacists with previous research experience had higher competence and confidence scores than those with no previous research experience (p<0.05). The only predictor of moderate-to-extreme self-competence and confidence was having at least one journal article publication during the last 5 years. Conclusion: Nigerian pharmacists indicated interest to participate in health-related research. However, self-competence and confidence to plan and conduct research were low. This was particularly so for skills related to statistical analysis. Training programs and building of Pharmacy Practice Research Network are recommended to enhance pharmacist’s research capacity. PMID:29619141
Burke, F J T; Ravaghi, V; Mackenzie, L; Priest, N; Falcon, H C
2017-04-21
Aim To assess the performance and thereby the progress of the FDs when they carried out a number of simulated clinical exercises at the start and at the end of their FD year.Methods A standardised simulated clinical restorative dentistry training exercise was carried out by a group of 61 recently qualified dental graduates undertaking a 12 months' duration foundation training programme in England, at both the start and end of the programme. Participants completed a Class II cavity preparation and amalgam restoration, a Class IV composite resin restoration and two preparations for a porcelain-metal full crown. The completed preparations and restorations were independently assessed by an experienced consultant in restorative dentistry, using a scoring system based on previously validated criteria. The data were subjected to statistical analysis.Results There was wide variation in individual performance. Overall, there was a small but not statistically significant improvement in performance by the end of the programme. A statistically significant improvement was observed for the amalgam preparation and restoration, and, overall, for one of the five geographical sub-groups in the study. Possible reasons for the variable performance and improvement are discussed.Conclusions There was variability in the performance of the FDs. The operative performance of FDs at the commencement and end of their FD year indicated an overall moderately improved performance over the year and a statistically significant improvement in their performance with regard to amalgam restoration.
Ranking and validation of spallation models for isotopic production cross sections of heavy residua
NASA Astrophysics Data System (ADS)
Sharma, Sushil K.; Kamys, Bogusław; Goldenbaum, Frank; Filges, Detlef
2017-07-01
The production cross sections of isotopically identified residual nuclei of spallation reactions induced by 136Xe projectiles at 500AMeV on hydrogen target were analyzed in a two-step model. The first stage of the reaction was described by the INCL4.6 model of an intranuclear cascade of nucleon-nucleon and pion-nucleon collisions whereas the second stage was analyzed by means of four different models; ABLA07, GEM2, GEMINI++ and SMM. The quality of the data description was judged quantitatively using two statistical deviation factors; the H-factor and the M-factor. It was found that the present analysis leads to a different ranking of models as compared to that obtained from the qualitative inspection of the data reproduction. The disagreement was caused by sensitivity of the deviation factors to large statistical errors present in some of the data. A new deviation factor, the A factor, was proposed, that is not sensitive to the statistical errors of the cross sections. The quantitative ranking of models performed using the A-factor agreed well with the qualitative analysis of the data. It was concluded that using the deviation factors weighted by statistical errors may lead to erroneous conclusions in the case when the data cover a large range of values. The quality of data reproduction by the theoretical models is discussed. Some systematic deviations of the theoretical predictions from the experimental results are observed.
Parker, Scott L; Sivaganesan, Ahilan; Chotai, Silky; McGirt, Matthew J; Asher, Anthony L; Devin, Clinton J
2018-06-15
OBJECTIVE Hospital readmissions lead to a significant increase in the total cost of care in patients undergoing elective spine surgery. Understanding factors associated with an increased risk of postoperative readmission could facilitate a reduction in such occurrences. The aims of this study were to develop and validate a predictive model for 90-day hospital readmission following elective spine surgery. METHODS All patients undergoing elective spine surgery for degenerative disease were enrolled in a prospective longitudinal registry. All 90-day readmissions were prospectively recorded. For predictive modeling, all covariates were selected by choosing those variables that were significantly associated with readmission and by incorporating other relevant variables based on clinical intuition and the Akaike information criterion. Eighty percent of the sample was randomly selected for model development and 20% for model validation. Multiple logistic regression analysis was performed with Bayesian model averaging (BMA) to model the odds of 90-day readmission. Goodness of fit was assessed via the C-statistic, that is, the area under the receiver operating characteristic curve (AUC), using the training data set. Discrimination (predictive performance) was assessed using the C-statistic, as applied to the 20% validation data set. RESULTS A total of 2803 consecutive patients were enrolled in the registry, and their data were analyzed for this study. Of this cohort, 227 (8.1%) patients were readmitted to the hospital (for any cause) within 90 days postoperatively. Variables significantly associated with an increased risk of readmission were as follows (OR [95% CI]): lumbar surgery 1.8 [1.1-2.8], government-issued insurance 2.0 [1.4-3.0], hypertension 2.1 [1.4-3.3], prior myocardial infarction 2.2 [1.2-3.8], diabetes 2.5 [1.7-3.7], and coagulation disorder 3.1 [1.6-5.8]. These variables, in addition to others determined a priori to be clinically relevant, comprised 32 inputs in the predictive model constructed using BMA. The AUC value for the training data set was 0.77 for model development and 0.76 for model validation. CONCLUSIONS Identification of high-risk patients is feasible with the novel predictive model presented herein. Appropriate allocation of resources to reduce the postoperative incidence of readmission may reduce the readmission rate and the associated health care costs.
Tyrer, Jonathan; Fasching, Peter A.; Beckmann, Matthias W.; Ekici, Arif B.; Schulz-Wendtland, Rüdiger; Bojesen, Stig E.; Nordestgaard, Børge G.; Flyger, Henrik; Milne, Roger L.; Arias, José Ignacio; Menéndez, Primitiva; Benítez, Javier; Chang-Claude, Jenny; Hein, Rebecca; Wang-Gohrke, Shan; Nevanlinna, Heli; Heikkinen, Tuomas; Aittomäki, Kristiina; Blomqvist, Carl; Margolin, Sara; Mannermaa, Arto; Kosma, Veli-Matti; Kataja, Vesa; Beesley, Jonathan; Chen, Xiaoqing; Chenevix-Trench, Georgia; Couch, Fergus J.; Olson, Janet E.; Fredericksen, Zachary S.; Wang, Xianshu; Giles, Graham G.; Severi, Gianluca; Baglietto, Laura; Southey, Melissa C.; Devilee, Peter; Tollenaar, Rob A. E. M.; Seynaeve, Caroline; García-Closas, Montserrat; Lissowska, Jolanta; Sherman, Mark E.; Bolton, Kelly L.; Hall, Per; Czene, Kamila; Cox, Angela; Brock, Ian W.; Elliott, Graeme C.; Reed, Malcolm W. R.; Greenberg, David; Anton-Culver, Hoda; Ziogas, Argyrios; Humphreys, Manjeet; Easton, Douglas F.; Caporaso, Neil E.; Pharoah, Paul D. P.
2010-01-01
Background Traditional prognostic factors for survival and treatment response of patients with breast cancer do not fully account for observed survival variation. We used available genotype data from a previously conducted two-stage, breast cancer susceptibility genome-wide association study (ie, Studies of Epidemiology and Risk factors in Cancer Heredity [SEARCH]) to investigate associations between variation in germline DNA and overall survival. Methods We evaluated possible associations between overall survival after a breast cancer diagnosis and 10 621 germline single-nucleotide polymorphisms (SNPs) from up to 3761 patients with invasive breast cancer (including 647 deaths and 26 978 person-years at risk) that were genotyped previously in the SEARCH study with high-density oligonucleotide microarrays (ie, hypothesis-generating set). Associations with all-cause mortality were assessed for each SNP by use of Cox regression analysis, generating a per rare allele hazard ratio (HR). To validate putative associations, we used patient genotype information that had been obtained with 5′ nuclease assay or mass spectrometry and overall survival information for up to 14 096 patients with invasive breast cancer (including 2303 deaths and 70 019 person-years at risk) from 15 international case–control studies (ie, validation set). Fixed-effects meta-analysis was used to generate an overall effect estimate in the validation dataset and in combined SEARCH and validation datasets. All statistical tests were two-sided. Results In the hypothesis-generating dataset, SNP rs4778137 (C>G) of the OCA2 gene at 15q13.1 was statistically significantly associated with overall survival among patients with estrogen receptor–negative tumors, with the rare G allele being associated with increased overall survival (HR of death per rare allele carried = 0.56, 95% confidence interval [CI] = 0.41 to 0.75, P = 9.2 × 10−5). This association was also observed in the validation dataset (HR of death per rare allele carried = 0.88, 95% CI = 0.78 to 0.99, P = .03) and in the combined dataset (HR of death per rare allele carried = 0.82, 95% CI = 0.73 to 0.92, P = 5 × 10−4). Conclusion The rare G allele of the OCA2 polymorphism, rs4778137, may be associated with improved overall survival among patients with estrogen receptor–negative breast cancer. PMID:20308648
A SURVEY OF LABORATORY AND STATISTICAL ISSUES RELATED TO FARMWORKER EXPOSURE STUDIES
Developing internally valid, and perhaps generalizable, farmworker exposure studies is a complex process that involves many statistical and laboratory considerations. Statistics are an integral component of each study beginning with the design stage and continuing to the final da...
Statistics Online Computational Resource for Education
ERIC Educational Resources Information Center
Dinov, Ivo D.; Christou, Nicolas
2009-01-01
The Statistics Online Computational Resource (http://www.SOCR.ucla.edu) provides one of the largest collections of free Internet-based resources for probability and statistics education. SOCR develops, validates and disseminates two core types of materials--instructional resources and computational libraries. (Contains 2 figures.)
Kuretzki, Carlos Henrique; Campos, Antônio Carlos Ligocki; Malafaia, Osvaldo; Soares, Sandramara Scandelari Kusano de Paula; Tenório, Sérgio Bernardo; Timi, Jorge Rufino Ribas
2016-03-01
The use of information technology is often applied in healthcare. With regard to scientific research, the SINPE(c) - Integrated Electronic Protocols was created as a tool to support researchers, offering clinical data standardization. By the time, SINPE(c) lacked statistical tests obtained by automatic analysis. Add to SINPE(c) features for automatic realization of the main statistical methods used in medicine . The study was divided into four topics: check the interest of users towards the implementation of the tests; search the frequency of their use in health care; carry out the implementation; and validate the results with researchers and their protocols. It was applied in a group of users of this software in their thesis in the strict sensu master and doctorate degrees in one postgraduate program in surgery. To assess the reliability of the statistics was compared the data obtained both automatically by SINPE(c) as manually held by a professional in statistics with experience with this type of study. There was concern for the use of automatic statistical tests, with good acceptance. The chi-square, Mann-Whitney, Fisher and t-Student were considered as tests frequently used by participants in medical studies. These methods have been implemented and thereafter approved as expected. The incorporation of the automatic SINPE (c) Statistical Analysis was shown to be reliable and equal to the manually done, validating its use as a research tool for medical research.
Indonesia’s Electricity Demand Dynamic Modelling
NASA Astrophysics Data System (ADS)
Sulistio, J.; Wirabhuana, A.; Wiratama, M. G.
2017-06-01
Electricity Systems modelling is one of the emerging area in the Global Energy policy studies recently. System Dynamics approach and Computer Simulation has become one the common methods used in energy systems planning and evaluation in many conditions. On the other hand, Indonesia experiencing several major issues in Electricity system such as fossil fuel domination, demand - supply imbalances, distribution inefficiency, and bio-devastation. This paper aims to explain the development of System Dynamics modelling approaches and computer simulation techniques in representing and predicting electricity demand in Indonesia. In addition, this paper also described the typical characteristics and relationship of commercial business sector, industrial sector, and family / domestic sector as electricity subsystems in Indonesia. Moreover, it will be also present direct structure, behavioural, and statistical test as model validation approach and ended by conclusions.
Statistical Modeling of Natural Backgrounds in Hyperspectral LWIR Data
2016-09-06
extremely important for studying performance trades. First, we study the validity of this model using real hyperspectral data, and compare the relative...difficult to validate any statistical model created for a target of interest. However, since background measurements are plentiful, it is reasonable to...Golden, S., Less, D., Jin, X., and Rynes, P., “ Modeling and analysis of LWIR signature variability associated with 3d and BRDF effects,” 98400P (May 2016
NASA Astrophysics Data System (ADS)
Kim, Hyun-Tae; Romanelli, M.; Yuan, X.; Kaye, S.; Sips, A. C. C.; Frassinetti, L.; Buchanan, J.; Contributors, JET
2017-06-01
This paper presents for the first time a statistical validation of predictive TRANSP simulations of plasma temperature using two transport models, GLF23 and TGLF, over a database of 80 baseline H-mode discharges in JET-ILW. While the accuracy of the predicted T e with TRANSP-GLF23 is affected by plasma collisionality, the dependency of predictions on collisionality is less significant when using TRANSP-TGLF, indicating that the latter model has a broader applicability across plasma regimes. TRANSP-TGLF also shows a good matching of predicted T i with experimental measurements allowing for a more accurate prediction of the neutron yields. The impact of input data and assumptions prescribed in the simulations are also investigated in this paper. The statistical validation and the assessment of uncertainty level in predictive TRANSP simulations for JET-ILW-DD will constitute the basis for the extrapolation to JET-ILW-DT experiments.
Perlin, Mark William
2015-01-01
Background: DNA mixtures of two or more people are a common type of forensic crime scene evidence. A match statistic that connects the evidence to a criminal defendant is usually needed for court. Jurors rely on this strength of match to help decide guilt or innocence. However, the reliability of unsophisticated match statistics for DNA mixtures has been questioned. Materials and Methods: The most prevalent match statistic for DNA mixtures is the combined probability of inclusion (CPI), used by crime labs for over 15 years. When testing 13 short tandem repeat (STR) genetic loci, the CPI-1 value is typically around a million, regardless of DNA mixture composition. However, actual identification information, as measured by a likelihood ratio (LR), spans a much broader range. This study examined probability of inclusion (PI) mixture statistics for 517 locus experiments drawn from 16 reported cases and compared them with LR locus information calculated independently on the same data. The log(PI-1) values were examined and compared with corresponding log(LR) values. Results: The LR and CPI methods were compared in case examples of false inclusion, false exclusion, a homicide, and criminal justice outcomes. Statistical analysis of crime laboratory STR data shows that inclusion match statistics exhibit a truncated normal distribution having zero center, with little correlation to actual identification information. By the law of large numbers (LLN), CPI-1 increases with the number of tested genetic loci, regardless of DNA mixture composition or match information. These statistical findings explain why CPI is relatively constant, with implications for DNA policy, criminal justice, cost of crime, and crime prevention. Conclusions: Forensic crime laboratories have generated CPI statistics on hundreds of thousands of DNA mixture evidence items. However, this commonly used match statistic behaves like a random generator of inclusionary values, following the LLN rather than measuring identification information. A quantitative CPI number adds little meaningful information beyond the analyst's initial qualitative assessment that a person's DNA is included in a mixture. Statistical methods for reporting on DNA mixture evidence should be scientifically validated before they are relied upon by criminal justice. PMID:26605124
Esbenshade, Adam J; Zhao, Zhiguo; Aftandilian, Catherine; Saab, Raya; Wattier, Rachel L; Beauchemin, Melissa; Miller, Tamara P; Wilkes, Jennifer J; Kelly, Michael J; Fernbach, Alison; Jeng, Michael; Schwartz, Cindy L; Dvorak, Christopher C; Shyr, Yu; Moons, Karl G M; Sulis, Maria-Luisa; Friedman, Debra L
2017-10-01
Pediatric oncology patients are at an increased risk of invasive bacterial infection due to immunosuppression. The risk of such infection in the absence of severe neutropenia (absolute neutrophil count ≥ 500/μL) is not well established and a validated prediction model for blood stream infection (BSI) risk offers clinical usefulness. A 6-site retrospective external validation was conducted using a previously published risk prediction model for BSI in febrile pediatric oncology patients without severe neutropenia: the Esbenshade/Vanderbilt (EsVan) model. A reduced model (EsVan2) excluding 2 less clinically reliable variables also was created using the initial EsVan model derivative cohort, and was validated using all 5 external validation cohorts. One data set was used only in sensitivity analyses due to missing some variables. From the 5 primary data sets, there were a total of 1197 febrile episodes and 76 episodes of bacteremia. The overall C statistic for predicting bacteremia was 0.695, with a calibration slope of 0.50 for the original model and a calibration slope of 1.0 when recalibration was applied to the model. The model performed better in predicting high-risk bacteremia (gram-negative or Staphylococcus aureus infection) versus BSI alone, with a C statistic of 0.801 and a calibration slope of 0.65. The EsVan2 model outperformed the EsVan model across data sets with a C statistic of 0.733 for predicting BSI and a C statistic of 0.841 for high-risk BSI. The results of this external validation demonstrated that the EsVan and EsVan2 models are able to predict BSI across multiple performance sites and, once validated and implemented prospectively, could assist in decision making in clinical practice. Cancer 2017;123:3781-3790. © 2017 American Cancer Society. © 2017 American Cancer Society.
VALUE - A Framework to Validate Downscaling Approaches for Climate Change Studies
NASA Astrophysics Data System (ADS)
Maraun, Douglas; Widmann, Martin; Gutiérrez, José M.; Kotlarski, Sven; Chandler, Richard E.; Hertig, Elke; Wibig, Joanna; Huth, Radan; Wilke, Renate A. I.
2015-04-01
VALUE is an open European network to validate and compare downscaling methods for climate change research. VALUE aims to foster collaboration and knowledge exchange between climatologists, impact modellers, statisticians, and stakeholders to establish an interdisciplinary downscaling community. A key deliverable of VALUE is the development of a systematic validation framework to enable the assessment and comparison of both dynamical and statistical downscaling methods. Here, we present the key ingredients of this framework. VALUE's main approach to validation is user-focused: starting from a specific user problem, a validation tree guides the selection of relevant validation indices and performance measures. Several experiments have been designed to isolate specific points in the downscaling procedure where problems may occur: what is the isolated downscaling skill? How do statistical and dynamical methods compare? How do methods perform at different spatial scales? Do methods fail in representing regional climate change? How is the overall representation of regional climate, including errors inherited from global climate models? The framework will be the basis for a comprehensive community-open downscaling intercomparison study, but is intended also to provide general guidance for other validation studies.
VALUE: A framework to validate downscaling approaches for climate change studies
NASA Astrophysics Data System (ADS)
Maraun, Douglas; Widmann, Martin; Gutiérrez, José M.; Kotlarski, Sven; Chandler, Richard E.; Hertig, Elke; Wibig, Joanna; Huth, Radan; Wilcke, Renate A. I.
2015-01-01
VALUE is an open European network to validate and compare downscaling methods for climate change research. VALUE aims to foster collaboration and knowledge exchange between climatologists, impact modellers, statisticians, and stakeholders to establish an interdisciplinary downscaling community. A key deliverable of VALUE is the development of a systematic validation framework to enable the assessment and comparison of both dynamical and statistical downscaling methods. In this paper, we present the key ingredients of this framework. VALUE's main approach to validation is user- focused: starting from a specific user problem, a validation tree guides the selection of relevant validation indices and performance measures. Several experiments have been designed to isolate specific points in the downscaling procedure where problems may occur: what is the isolated downscaling skill? How do statistical and dynamical methods compare? How do methods perform at different spatial scales? Do methods fail in representing regional climate change? How is the overall representation of regional climate, including errors inherited from global climate models? The framework will be the basis for a comprehensive community-open downscaling intercomparison study, but is intended also to provide general guidance for other validation studies.
Long, Blaine C; Jutte, Lisa S; Knight, Kenneth L
2010-01-01
Thermocouples and electrothermometers are used in therapeutic modality research. Until recently, researchers assumed that these instruments were valid and reliable. To examine 3 different thermocouple types in 5 degrees C, 15 degrees C, 18.4 degrees C, 25 degrees C, and 35 degrees C water baths. Randomized controlled trial. Therapeutic modality laboratory. Eighteen thermocouple leads were inserted through the wall of a foamed polystyrene cooler. The cooler was filled with water. Six thermocouples (2 of each model) were plugged into the 6 channels of the Datalogger and 6 randomly selected channels in the 2 Iso-Thermexes. A mercury thermometer was immersed into the water and was read every 10 seconds for 4 minutes during each of 6 trials. The entire process was repeated for each of 5 water bath temperatures (5 degrees C, 15 degrees C, 18.4 degrees C, 25 degrees C, 35 degrees C). Temperature and absolute temperature differences among 3 thermocouple types (IT-21, IT-18, PT-6) and 3 electrothermometers (Datalogger, Iso-Thermex calibrated from -50 degrees C to 50 degrees C, Iso-Thermex calibrated from -20 degrees C to 80 degrees C). Validity and reliability were dependent on thermocouple type, electrothermometer, and water bath temperature (P < .001; modified Levene P < .05). Statistically, the IT-18 and PT-6 thermocouples were not reliable in each electrothermometer; however, these differences were not practically different from each other. The PT-6 thermocouples were more valid than the IT-18s, and both thermocouple types were more valid than the IT-21s, regardless of water bath temperature (P < .001). The validity and reliability of thermocouples interfaced to an electrothermometer under experimental conditions should be tested before data collection. We also recommend that investigators report the validity, the reliability, and the calculated uncertainty (validity + reliability) of their temperature measurements for therapeutic modalities research. With this information, investigators and clinicians will be better able to interpret and compare results and conclusions.
Lam, Lucia L.; Ghadessi, Mercedeh; Erho, Nicholas; Vergara, Ismael A.; Alshalalfa, Mohammed; Buerki, Christine; Haddad, Zaid; Sierocinski, Thomas; Triche, Timothy J.; Skinner, Eila C.; Davicioni, Elai; Daneshmand, Siamak; Black, Peter C.
2014-01-01
Background Nearly half of muscle-invasive bladder cancer patients succumb to their disease following cystectomy. Selecting candidates for adjuvant therapy is currently based on clinical parameters with limited predictive power. This study aimed to develop and validate genomic-based signatures that can better identify patients at risk for recurrence than clinical models alone. Methods Transcriptome-wide expression profiles were generated using 1.4 million feature-arrays on archival tumors from 225 patients who underwent radical cystectomy and had muscle-invasive and/or node-positive bladder cancer. Genomic (GC) and clinical (CC) classifiers for predicting recurrence were developed on a discovery set (n = 133). Performances of GC, CC, an independent clinical nomogram (IBCNC), and genomic-clinicopathologic classifiers (G-CC, G-IBCNC) were assessed in the discovery and independent validation (n = 66) sets. GC was further validated on four external datasets (n = 341). Discrimination and prognostic abilities of classifiers were compared using area under receiver-operating characteristic curves (AUCs). All statistical tests were two-sided. Results A 15-feature GC was developed on the discovery set with area under curve (AUC) of 0.77 in the validation set. This was higher than individual clinical variables, IBCNC (AUC = 0.73), and comparable to CC (AUC = 0.78). Performance was improved upon combining GC with clinical nomograms (G-IBCNC, AUC = 0.82; G-CC, AUC = 0.86). G-CC high-risk patients had elevated recurrence probabilities (P < .001), with GC being the best predictor by multivariable analysis (P = .005). Genomic-clinicopathologic classifiers outperformed clinical nomograms by decision curve and reclassification analyses. GC performed the best in validation compared with seven prior signatures. GC markers remained prognostic across four independent datasets. Conclusions The validated genomic-based classifiers outperform clinical models for predicting postcystectomy bladder cancer recurrence. This may be used to better identify patients who need more aggressive management. PMID:25344601
Kramers, Cornelis; Derijks, Hieronymus J.; Wensing, Michel; Wetzels, Jack F. M.
2015-01-01
Background The Modification of Diet in Renal Disease (MDRD) formula is widely used in clinical practice to assess the correct drug dose. This formula is based on serum creatinine levels which might be influenced by chronic diseases itself or the effects of the chronic diseases. We conducted a systematic review to determine the validity of the MDRD formula in specific patient populations with renal impairment: elderly, hospitalized and obese patients, patients with cardiovascular disease, cancer, chronic respiratory diseases, diabetes mellitus, liver cirrhosis and human immunodeficiency virus. Methods and Findings We searched for articles in Pubmed published from January 1999 through January 2014. Selection criteria were (1) patients with a glomerular filtration rate (GFR) < 60 ml/min (/1.73m2), (2) MDRD formula compared with a gold standard and (3) statistical analysis focused on bias, precision and/or accuracy. Data extraction was done by the first author and checked by a second author. A bias of 20% or less, a precision of 30% or less and an accuracy expressed as P30% of 80% or higher were indicators of the validity of the MDRD formula. In total we included 27 studies. The number of patients included ranged from 8 to 1831. The gold standard and measurement method used varied across the studies. For none of the specific patient populations the studies provided sufficient evidence of validity of the MDRD formula regarding the three parameters. For patients with diabetes mellitus and liver cirrhosis, hospitalized patients and elderly with moderate to severe renal impairment we concluded that the MDRD formula is not valid. Limitations of the review are the lack of considering the method of measuring serum creatinine levels and the type of gold standard used. Conclusion In several specific patient populations with renal impairment the use of the MDRD formula is not valid or has uncertain validity. PMID:25741695
Torabinia, Mansour; Mahmoudi, Sara; Dolatshahi, Mojtaba; Abyaz, Mohamad Reza
2017-01-01
Background: Considering the overall tendency in psychology, researchers in the field of work and organizational psychology have become progressively interested in employees’ effective and optimistic experiments at work such as work engagement. This study was conducted to investigate 2 main purposes: assessing the psychometric properties of the Utrecht Work Engagement Scale, and finding any association between work engagement and burnout in nurses. Methods: The present methodological study was conducted in 2015 and included 248 females and 34 males with 6 months to 30 years of job experience. After the translation process, face and content validity were calculated by qualitative and quantitative methods. Moreover, content validation ratio, scale-level content validity index and item-level content validity index were measured for this scale. Construct validity was determined by factor analysis. Moreover, internal consistency and stability reliability were assessed. Factor analysis, test-retest, Cronbach’s alpha, and association analysis were used as statistical methods. Results: Face and content validity were acceptable. Exploratory factor analysis suggested a new 3- factor model. In this new model, some items from the construct model of the original version were dislocated with the same 17 items. The new model was confirmed by divergent Copenhagen Burnout Inventory as the Persian version of UWES. Internal consistency reliability for the total scale and the subscales was 0.76 to 0.89. Results from Pearson correlation test indicated a high degree of test-retest reliability (r = 0. 89). ICC was also 0.91. Engagement was negatively related to burnout and overtime per month, whereas it was positively related with age and job experiment. Conclusion: The Persian 3– factor model of Utrecht Work Engagement Scale is a valid and reliable instrument to measure work engagement in Iranian nurses as well as in other medical professionals. PMID:28955665
Reliability and validity of the Safe Routes to school parent and student surveys
2011-01-01
Background The purpose of this study is to assess the reliability and validity of the U.S. National Center for Safe Routes to School's in-class student travel tallies and written parent surveys. Over 65,000 tallies and 374,000 parent surveys have been completed, but no published studies have examined their measurement properties. Methods Students and parents from two Charlotte, NC (USA) elementary schools participated. Tallies were conducted on two consecutive days using a hand-raising protocol; on day two students were also asked to recall the previous days' travel. The recall from day two was compared with day one to assess 24-hour test-retest reliability. Convergent validity was assessed by comparing parent-reports of students' travel mode with student-reports of travel mode. Two-week test-retest reliability of the parent survey was assessed by comparing within-parent responses. Reliability and validity were assessed using kappa statistics. Results A total of 542 students participated in the in-class student travel tally reliability assessment and 262 parent-student dyads participated in the validity assessment. Reliability was high for travel to and from school (kappa > 0.8); convergent validity was lower but still high (kappa > 0.75). There were no differences by student grade level. Two-week test-retest reliability of the parent survey (n = 112) ranged from moderate to very high for objective questions on travel mode and travel times (kappa range: 0.62 - 0.97) but was substantially lower for subjective assessments of barriers to walking to school (kappa range: 0.31 - 0.76). Conclusions The student in-class student travel tally exhibited high reliability and validity at all elementary grades. The parent survey had high reliability on questions related to student travel mode, but lower reliability for attitudinal questions identifying barriers to walking to school. Parent survey design should be improved so that responses clearly indicate issues that influence parental decision making in regards to their children's mode of travel to school. PMID:21651794
Development and validation of the French-Canadian Chronic Pain Self-efficacy Scale
Lacasse, Anaïs; Bourgault, Patricia; Tousignant-Laflamme, Yannick; Courtemanche-Harel, Roxanne; Choinière, Manon
2015-01-01
BACKGROUND: Perceived self-efficacy is a non-negligible outcome when measuring the impact of self-management interventions for chronic pain patients. However, no validated, chronic pain-specific self-efficacy scales exist for studies conducted with French-speaking populations. OBJECTIVES: To establish the validity of the use of the French-Canadian Chronic Pain Self-efficacy Scale (FC-CPSES) among chronic pain patients. METHODS: The Chronic Disease Self-Efficacy Scale is a validated 33-item self-administered questionnaire that measures perceived self-efficacy to perform self-management behaviours, manage chronic disease in general and achieve outcomes (a six-item version is also available). This scale was adapted to the context of chronic pain patients following cross-cultural adaptation guidelines. The FC-CPSES was administered to 109 fibromyalgia and 34 chronic low back pain patients (n=143) who participated in an evidence-based self-management intervention (the PASSAGE program) offered in 10 health care centres across the province of Quebec. Cronbach’s alpha coefficients (α) were calculated to determine the internal consistency of the 33- and six-item versions of the FC-CPSES. With regard to convergent construct validity, the association between the FC-CPSES baseline scores and related clinical outcomes was examined. With regard to the scale’s sensitivity to change, pre- and postintervention FC-CPSES scores were compared. RESULTS: Internal consistency was high for both versions of the FC-CPSES (α=0.86 to α=0.96). Higher self-efficacy was significantly associated with higher mental health-related quality of life and lower pain intensity and catastrophizing (P<0.05), supporting convergent validity of the scale. There was a statistically significant increase in FC-CPSES scores between pre- and postintervention measures for both versions of the FC-CPSES (P<0.003), which supports their sensitivity to clinical change during an intervention. CONCLUSIONS: These data suggest that both versions of the FC-CPSES are reliable and valid for the measurement of pain management self-efficacy among chronic pain patients. PMID:25848845
Ravegnini, Gloria; Sammarini, Giulia; Angelini, Sabrina; Hrelia, Patrizia
2016-07-01
Gastrointestinal stromal tumors (GIST) and chronic myeloid leukemia (CML) are two tumor types deeply different from each other. Despite the differences, these disorders share treatment with tyrosine kinase inhibitor imatinib. Despite the success of imatinib, the response rates vary among different individuals and pharmacogenetics may play an important role in the final clinical outcome. In this review, the authors provide an overview of the pharmacogenetic literature analyzing the role of polymorphisms in both GIST and CML treatment efficacy and toxicity. So far, several polymorphisms influencing the pharmacokinetic determinants of imatinib have been identified. However, the data are not yet conclusive enough to translate pharmacogenetic tests in clinical practice. In this context, the major obstacles to pharmacogenetic test validation are represented by the small sample size of most studies, ethnicity and population admixture as confounding source, and uncertainty related to genetic variants analyzed. In conclusion, a combination of different theoretical approaches, experimental model systems and statistical methods is clearly needed, in order to appreciate pharmacogenetics applied to clinical practice in the near future.
Vinciotti, Veronica; Liu, Xiaohui; Turk, Rolf; de Meijer, Emile J; 't Hoen, Peter AC
2006-01-01
Background The identification of biologically interesting genes in a temporal expression profiling dataset is challenging and complicated by high levels of experimental noise. Most statistical methods used in the literature do not fully exploit the temporal ordering in the dataset and are not suited to the case where temporal profiles are measured for a number of different biological conditions. We present a statistical test that makes explicit use of the temporal order in the data by fitting polynomial functions to the temporal profile of each gene and for each biological condition. A Hotelling T2-statistic is derived to detect the genes for which the parameters of these polynomials are significantly different from each other. Results We validate the temporal Hotelling T2-test on muscular gene expression data from four mouse strains which were profiled at different ages: dystrophin-, beta-sarcoglycan and gamma-sarcoglycan deficient mice, and wild-type mice. The first three are animal models for different muscular dystrophies. Extensive biological validation shows that the method is capable of finding genes with temporal profiles significantly different across the four strains, as well as identifying potential biomarkers for each form of the disease. The added value of the temporal test compared to an identical test which does not make use of temporal ordering is demonstrated via a simulation study, and through confirmation of the expression profiles from selected genes by quantitative PCR experiments. The proposed method maximises the detection of the biologically interesting genes, whilst minimising false detections. Conclusion The temporal Hotelling T2-test is capable of finding relatively small and robust sets of genes that display different temporal profiles between the conditions of interest. The test is simple, it can be used on gene expression data generated from any experimental design and for any number of conditions, and it allows fast interpretation of the temporal behaviour of genes. The R code is available from V.V. The microarray data have been submitted to GEO under series GSE1574 and GSE3523. PMID:16584545
SU-E-J-85: Leave-One-Out Perturbation (LOOP) Fitting Algorithm for Absolute Dose Film Calibration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chu, A; Ahmad, M; Chen, Z
2014-06-01
Purpose: To introduce an outliers-recognition fitting routine for film dosimetry. It cannot only be flexible with any linear and non-linear regression but also can provide information for the minimal number of sampling points, critical sampling distributions and evaluating analytical functions for absolute film-dose calibration. Methods: The technique, leave-one-out (LOO) cross validation, is often used for statistical analyses on model performance. We used LOO analyses with perturbed bootstrap fitting called leave-one-out perturbation (LOOP) for film-dose calibration . Given a threshold, the LOO process detects unfit points (“outliers”) compared to other cohorts, and a bootstrap fitting process follows to seek any possibilitiesmore » of using perturbations for further improvement. After that outliers were reconfirmed by a traditional t-test statistics and eliminated, then another LOOP feedback resulted in the final. An over-sampled film-dose- calibration dataset was collected as a reference (dose range: 0-800cGy), and various simulated conditions for outliers and sampling distributions were derived from the reference. Comparisons over the various conditions were made, and the performance of fitting functions, polynomial and rational functions, were evaluated. Results: (1) LOOP can prove its sensitive outlier-recognition by its statistical correlation to an exceptional better goodness-of-fit as outliers being left-out. (2) With sufficient statistical information, the LOOP can correct outliers under some low-sampling conditions that other “robust fits”, e.g. Least Absolute Residuals, cannot. (3) Complete cross-validated analyses of LOOP indicate that the function of rational type demonstrates a much superior performance compared to the polynomial. Even with 5 data points including one outlier, using LOOP with rational function can restore more than a 95% value back to its reference values, while the polynomial fitting completely failed under the same conditions. Conclusion: LOOP can cooperate with any fitting routine functioning as a “robust fit”. In addition, it can be set as a benchmark for film-dose calibration fitting performance.« less
Card, Tim R.; West, Joe
2016-01-01
Background We have assessed whether the linkage between routine primary and secondary care records provided an opportunity to develop an improved population based co-morbidity score with the combined information on co-morbidities from both health care settings. Methods We extracted all people older than 20 years at the start of 2005 within the linkage between the Hospital Episodes Statistics, Clinical Practice Research Datalink, and Office for National Statistics death register in England. A random 50% sample was used to identify relevant diagnostic codes using a Bayesian hierarchy to share information between similar Read and ICD 10 code groupings. Internal validation of the score was performed in the remaining 50% and discrimination was assessed using Harrell’s C statistic. Comparisons were made over time, age, and consultation rate with the Charlson and Elixhauser indexes. Results 657,264 people were followed up from the 1st January 2005. 98 groupings of codes were derived from the Bayesian hierarchy, and 37 had an adjusted weighting of greater than zero in the Cox proportional hazards model. 11 of these groupings had a different weighting dependent on whether they were coded from hospital or primary care. The C statistic reduced from 0.88 (95% confidence interval 0.88–0.88) in the first year of follow up, to 0.85 (0.85–0.85) including all 5 years. When we stratified the linked score by consultation rate the association with mortality remained consistent, but there was a significant interaction with age, with improved discrimination and fit in those under 50 years old (C = 0.85, 0.83–0.87) compared to the Charlson (C = 0.79, 0.77–0.82) or Elixhauser index (C = 0.81, 0.79–0.83). Conclusions The use of linked population based primary and secondary care data developed a co-morbidity score that had improved discrimination, particularly in younger age groups, and had a greater effect when adjusting for co-morbidity than existing scores. PMID:27788230
Approach for Input Uncertainty Propagation and Robust Design in CFD Using Sensitivity Derivatives
NASA Technical Reports Server (NTRS)
Putko, Michele M.; Taylor, Arthur C., III; Newman, Perry A.; Green, Lawrence L.
2002-01-01
An implementation of the approximate statistical moment method for uncertainty propagation and robust optimization for quasi 3-D Euler CFD code is presented. Given uncertainties in statistically independent, random, normally distributed input variables, first- and second-order statistical moment procedures are performed to approximate the uncertainty in the CFD output. Efficient calculation of both first- and second-order sensitivity derivatives is required. In order to assess the validity of the approximations, these moments are compared with statistical moments generated through Monte Carlo simulations. The uncertainties in the CFD input variables are also incorporated into a robust optimization procedure. For this optimization, statistical moments involving first-order sensitivity derivatives appear in the objective function and system constraints. Second-order sensitivity derivatives are used in a gradient-based search to successfully execute a robust optimization. The approximate methods used throughout the analyses are found to be valid when considering robustness about input parameter mean values.
A statistical method (cross-validation) for bone loss region detection after spaceflight
Zhao, Qian; Li, Wenjun; Li, Caixia; Chu, Philip W.; Kornak, John; Lang, Thomas F.
2010-01-01
Astronauts experience bone loss after the long spaceflight missions. Identifying specific regions that undergo the greatest losses (e.g. the proximal femur) could reveal information about the processes of bone loss in disuse and disease. Methods for detecting such regions, however, remains an open problem. This paper focuses on statistical methods to detect such regions. We perform statistical parametric mapping to get t-maps of changes in images, and propose a new cross-validation method to select an optimum suprathreshold for forming clusters of pixels. Once these candidate clusters are formed, we use permutation testing of longitudinal labels to derive significant changes. PMID:20632144
The Statistics Teaching Inventory: A Survey on Statistics Teachers' Classroom Practices and Beliefs
ERIC Educational Resources Information Center
Zieffler, Andrew; Park, Jiyoon; Garfield, Joan; delMas, Robert; Bjornsdottir, Audbjorg
2012-01-01
This paper reports on an instrument designed to assess the practices and beliefs of instructors of introductory statistics courses across the disciplines. Funded by a grant from the National Science Foundation, this project developed, piloted, and gathered validity evidence for the Statistics Teaching Inventory (STI). The instrument consists of 50…
NASA Astrophysics Data System (ADS)
Lemaire, Vincent E. P.; Colette, Augustin; Menut, Laurent
2016-03-01
Because of its sensitivity to unfavorable weather patterns, air pollution is sensitive to climate change so that, in the future, a climate penalty could jeopardize the expected efficiency of air pollution mitigation measures. A common method to assess the impact of climate on air quality consists in implementing chemistry-transport models forced by climate projections. However, the computing cost of such methods requires optimizing ensemble exploration techniques. By using a training data set from a deterministic projection of climate and air quality over Europe, we identified the main meteorological drivers of air quality for eight regions in Europe and developed statistical models that could be used to predict air pollutant concentrations. The evolution of the key climate variables driving either particulate or gaseous pollution allows selecting the members of the EuroCordex ensemble of regional climate projections that should be used in priority for future air quality projections (CanESM2/RCA4; CNRM-CM5-LR/RCA4 and CSIRO-Mk3-6-0/RCA4 and MPI-ESM-LR/CCLM following the EuroCordex terminology). After having tested the validity of the statistical model in predictive mode, we can provide ranges of uncertainty attributed to the spread of the regional climate projection ensemble by the end of the century (2071-2100) for the RCP8.5. In the three regions where the statistical model of the impact of climate change on PM2.5 offers satisfactory performances, we find a climate benefit (a decrease of PM2.5 concentrations under future climate) of -1.08 (±0.21), -1.03 (±0.32), -0.83 (±0.14) µg m-3, for respectively Eastern Europe, Mid-Europe and Northern Italy. In the British-Irish Isles, Scandinavia, France, the Iberian Peninsula and the Mediterranean, the statistical model is not considered skillful enough to draw any conclusion for PM2.5. In Eastern Europe, France, the Iberian Peninsula, Mid-Europe and Northern Italy, the statistical model of the impact of climate change on ozone was considered satisfactory and it confirms the climate penalty bearing upon ozone of 10.51 (±3.06), 11.70 (±3.63), 11.53 (±1.55), 9.86 (±4.41), 4.82 (±1.79) µg m-3, respectively. In the British-Irish Isles, Scandinavia and the Mediterranean, the skill of the statistical model was not considered robust enough to draw any conclusion for ozone pollution.
Pontes, Halley M.; Király, Orsolya; Demetrovics, Zsolt; Griffiths, Mark D.
2014-01-01
Background Over the last decade, there has been growing concern about ‘gaming addiction’ and its widely documented detrimental impacts on a minority of individuals that play excessively. The latest (fifth) edition of the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders (DSM-5) included nine criteria for the potential diagnosis of Internet Gaming Disorder (IGD) and noted that it was a condition that warranted further empirical study. Aim: The main aim of this study was to develop a valid and reliable standardised psychometrically robust tool in addition to providing empirically supported cut-off points. Methods A sample of 1003 gamers (85.2% males; mean age 26 years) from 57 different countries were recruited via online gaming forums. Validity was assessed by confirmatory factor analysis (CFA), criterion-related validity, and concurrent validity. Latent profile analysis was also carried to distinguish disordered gamers from non-disordered gamers. Sensitivity and specificity analyses were performed to determine an empirical cut-off for the test. Results The CFA confirmed the viability of IGD-20 Test with a six-factor structure (salience, mood modification, tolerance, withdrawal, conflict and relapse) for the assessment of IGD according to the nine criteria from DSM-5. The IGD-20 Test proved to be valid and reliable. According to the latent profile analysis, 5.3% of the total participants were classed as disordered gamers. Additionally, an optimal empirical cut-off of 71 points (out of 100) seemed to be adequate according to the sensitivity and specificity analyses carried. Conclusions The present findings support the viability of the IGD-20 Test as an adequate standardised psychometrically robust tool for assessing internet gaming disorder. Consequently, the new instrument represents the first step towards unification and consensus in the field of gaming studies. PMID:25313515
Less is more? Assessing the validity of the ICD-11 model of PTSD across multiple trauma samples
Hansen, Maj; Hyland, Philip; Armour, Cherie; Shevlin, Mark; Elklit, Ask
2015-01-01
Background In the 5th edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5), the symptom profile of posttraumatic stress disorder (PTSD) was expanded to include 20 symptoms. An alternative model of PTSD is outlined in the proposed 11th edition of the International Classification of Diseases (ICD-11) that includes just six symptoms. Objectives and method The objectives of the current study are: 1) to independently investigate the fit of the ICD-11 model of PTSD, and three DSM-5-based models of PTSD, across seven different trauma samples (N=3,746) using confirmatory factor analysis; 2) to assess the concurrent validity of the ICD-11 model of PTSD; and 3) to determine if there are significant differences in diagnostic rates between the ICD-11 guidelines and the DSM-5 criteria. Results The ICD-11 model of PTSD was found to provide excellent model fit in six of the seven trauma samples, and tests of factorial invariance showed that the model performs equally well for males and females. DSM-5 models provided poor fit of the data. Concurrent validity was established as the ICD-11 PTSD factors were all moderately to strongly correlated with scores of depression, anxiety, dissociation, and aggression. Levels of association were similar for ICD-11 and DSM-5 suggesting that explanatory power is not affected due to the limited number of items included in the ICD-11 model. Diagnostic rates were significantly lower according to ICD-11 guidelines compared to the DSM-5 criteria. Conclusions The proposed factor structure of the ICD-11 model of PTSD appears valid across multiple trauma types, possesses good concurrent validity, and is more stringent in terms of diagnosis compared to the DSM-5 criteria. PMID:26450830
Injection Drug User Quality of Life Scale (IDUQOL): Findings from a content validation study
Hubley, Anita M; Palepu, Anita
2007-01-01
Background Quality of life studies among injection drug users have primarily focused on health-related measures. The chaotic life-style of many injection drug users (IDUs), however, extends far beyond their health, and impacts upon social relationships, employment opportunities, housing, and day to day survival. Most current quality of life instruments do not capture the realities of people living with addictions. The Injection Drug Users' Quality of Life Scale (IDUQOL) was developed to reflect the life areas of relevance to IDUs. The present study examined the content validity of the IDUQOL using judgmental methods based on subject matter experts' (SMEs) ratings of various elements of this measure (e.g., appropriateness of life areas or items, names and descriptions of life areas, instructions for administration and scoring). Methods Six SMEs were provided with a copy of the IDUQOL and its administration and scoring manual and a detailed content validation questionnaire. Two commonly used judgmental measures of inter-rater agreement, the Content Validity Index (CVI) and the Average Deviation Mean Index (ADM), were used to evaluate SMEs' agreement on ratings of IDUQOL elements. Results A total of 75 elements of the IDUQOL were examined. The CVI results showed that all elements were endorsed by the required number of SMEs or more. The ADM results showed that acceptable agreement (i.e., practical significance) was obtained for all elements but statistically significant agreement was missed for nine elements. For these elements, SMEs' feedback was examined for ways to improve the elements. Open-ended feedback also provided suggestions for other revisions to the IDUQOL. Conclusion The results of the study provided strong evidence in support of the content validity of the IDUQOL and direction for the revision of some IDUQOL elements. PMID:17663783
English, Devin; Bowleg, Lisa; del Río-González, Ana Maria; Tschann, Jeanne M.; Agans, Robert; Malebranche, David J
2017-01-01
Objectives Although social science research has examined police and law enforcement-perpetrated discrimination against Black men using policing statistics and implicit bias studies, there is little quantitative evidence detailing this phenomenon from the perspective of Black men. Consequently, there is a dearth of research detailing how Black men’s perspectives on police and law enforcement-related stress predict negative physiological and psychological health outcomes. This study addresses these gaps with the qualitative development and quantitative test of the Police and Law Enforcement (PLE) scale. Methods In Study 1, we employed thematic analysis on transcripts of individual qualitative interviews with 90 Black men to assess key themes and concepts and develop quantitative items. In Study 2, we used 2 focus groups comprised of 5 Black men each (n=10), intensive cognitive interviewing with a separate sample of Black men (n=15), and piloting with another sample of Black men (n=13) to assess the ecological validity of the quantitative items. For study 3, we analyzed data from a sample of 633 Black men between the ages of 18 and 65 to test the factor structure of the PLE, as we all as its concurrent validity and convergent/discriminant validity. Results Qualitative analyses and confirmatory factor analyses suggested that a 5-item, 1-factor measure appropriately represented respondents’ experiences of police/law enforcement discrimination. As hypothesized, the PLE was positively associated with measures of racial discrimination and depressive symptoms. Conclusions Preliminary evidence suggests that the PLE is a reliable and valid measure of Black men’s experiences of discrimination with police/law enforcement. PMID:28080104
2013-01-01
Background The assessment of personality organization and its observable behavioral manifestations, i.e. personality functioning, has a long tradition in psychodynamic psychiatry. Recently, the DSM-5 Levels of Personality Functioning Scale has moved it into the focus of psychiatric diagnostics. Based on Kernberg’s concept of personality organization the Structured Interview of Personality Organization (STIPO) was developed for diagnosing personality functioning. The STIPO covers seven dimensions: (1) identity, (2) object relations, (3) primitive defenses, (4) coping/rigidity, (5) aggression, (6) moral values, and (7) reality testing and perceptual distortions. The English version of the STIPO has previously revealed satisfying psychometric properties. Methods Validity and reliability of the German version of the 100-item instrument have been evaluated in 122 psychiatric patients. All patients were diagnosed according to the Diagnostic and Statistical Manual for Mental Disorders (DSM-IV) and were assessed by means of the STIPO. Moreover, all patients completed eight questionnaires that served as criteria for external validity of the STIPO. Results Interrater reliability varied between intraclass correlations of .89 and 1.0, Crohnbach’s α for the seven dimensions was .69 to .93. All a priori selected questionnaire scales correlated significantly with the corresponding STIPO dimensions. Patients with personality disorder (PD) revealed significantly higher STIPO scores (i.e. worse personality functioning) than patients without PD; patients cluster B PD showed significantly higher STIPO scores than patients with cluster C PD. Conclusions Interrater reliability, Crohnbach’s α, concurrent validity, and differential validity of the STIPO are satisfying. The STIPO represents an appropriate instrument for the assessment of personality functioning in clinical and research settings. PMID:23941404
ClinicalTrials.gov and Drugs@FDA: A comparison of results reporting for new drug approval trials
Schwartz, Lisa M.; Woloshin, Steven; Zheng, Eugene; Tse, Tony; Zarin, Deborah A.
2016-01-01
Background Pharmaceutical companies and other trial sponsors must submit certain trial results to ClinicalTrials.gov. The validity of these results is unclear. Purpose To validate results posted on ClinicalTrials.gov against publicly-available FDA reviews on Drugs@FDA. Data sources ClinicalTrials.gov (registry and results database) and Drugs@FDA (medical/statistical reviews). Study selection 100 parallel-group, randomized trials for new drug approvals (1/2013 – 7/2014) with results posted on ClinicalTrials.gov (3/15/2015). Data extraction Two assessors systematically extracted, and another verified, trial design, primary/secondary outcomes, adverse events, and deaths. Results The 100 trials were mostly phase 3 (90%) double-blind (92%), placebo-controlled (73%), representing 32 drugs from 24 companies. Of 137 primary outcomes from ClinicalTrials.gov, 134 (98%) had corresponding data in Drugs@FDA, 130 (95%) had concordant definitions, and 107 (78%) had concordant results; most differences were nominal (i.e. relative difference < 10%). Of 100 trials, primary outcome results in 14 could not be validated . Of 1,927 secondary outcomes from ClinicalTrials.gov, 1,061 (55%) definitions could be validated and 367 (19%) had results. Of 96 trials with ≥ 1 serious adverse event in either source, 14 could be compared and 7 were discordant. Of 62 trials with ≥ 1 death in either source, 25 could be compared and 17 were discordant. Limitations Unknown generalizability to uncontrolled or crossover trial results. Conclusion Primary outcome definitions and results were largely concordant between ClinicalTrials.gov and Drugs@FDA. Half of secondary outcomes could not be validated because Drugs@FDA only includes “key outcomes” for regulatory decision-making; nor could serious adverse events and deaths because Drugs@FDA frequently only includes results aggregated across multiple trials. PMID:27294570
Validation of the Pediatric Cardiac Quality of Life Inventory
Marino, Bradley S.; Tomlinson, Ryan S.; Wernovsky, Gil; Drotar, Dennis; Newburger, Jane W.; Mahony, Lynn; Mussatto, Kathleen; Tong, Elizabeth; Cohen, Mitchell; Andersen, Charlotte; Shera, David; Khoury, Philip R.; Wray, Jo; Gaynor, J. William; Helfaer, Mark A.; Kazak, Anne E.; Shea, Judy A.
2012-01-01
OBJECTIVE The purpose of this multicenter study was to confirm the validity and reliability of the Pediatric Cardiac Quality of Life Inventory (PCQLI). METHODS Seven centers recruited pediatric patients (8–18 years of age) with heart disease (HD) and their parents to complete the PCQLI and generic health-related quality of life (Pediatric Quality of Life Inventory [PedsQL]) and non–quality of life (Self-Perception Profile for Children [SPPC]/Self-Perception Profile for Adolescents [SPPA] and Youth Self-Report [YSR]/Child Behavior Checklist [CBCL]) tools. PCQLI construct validity was assessed through correlations of PCQLI scores between patients and parents and with severity of congenital HD, medical care utilization, and PedsQL, SPPC/SPPA, and YSR/CBCL scores. PCQLI test-retest reliability was evaluated. RESULTS The study enrolled 1605 patient-parent pairs. Construct validity was substantiated by the association of lower PCQLI scores with Fontan palliation and increased numbers of cardiac operations, hospital admissions, and physician visits (P < .001); moderate to good correlations between patient and parent PCQLI scores (r = 0.41–0.61; P <.001); and fair to good correlations between PCQLI total scores and PedsQL total (r = 0.70–0.76), SPPC/SPPA global self-worth (r = 0.43–0.46), YSR/CBCL total competency (r = 0.28–0.37), and syndrome and Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition-oriented scale (r = −0.58 to −0.30; P < .001) scores. Test-retest reliability correlations were excellent (r = 0.78–0.90; P < .001). CONCLUSIONS PCQLI scores are valid and reliable for children and adolescents with congenital and acquired HD and may be useful for future research and clinical management. Pediatrics 2010;126:498–508 PMID:20805147
Bergeron, Lise; Smolla, Nicole; Berthiaume, Claude; Renaud, Johanne; Breton, Jean-Jacques; St-Georges, Marie; Morin, Pauline; Zavaglia, Elissa; Labelle, Réal
2017-03-01
The Dominic Interactive for Adolescents-Revised (DIA-R) is a multimedia self-report screen for 9 mental disorders, borderline personality traits, and suicidality defined by the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders ( DSM-5). This study aimed to examine the reliability and the validity of this instrument. French- and English-speaking adolescents aged 12 to 15 years ( N = 447) were recruited from schools and clinical settings in Montreal and were evaluated twice. The internal consistency was estimated by Cronbach alpha coefficients and the test-retest reliability by intraclass correlation coefficients. Cutoff points on the DIA-R scales were determined by using clinically relevant measures for defining external validation criteria: the Schedule for Affective Disorders and Schizophrenia for School-Aged Children, the Beck Hopelessness Scale, and the Abbreviated-Diagnostic Interview for Borderlines. Receiver operating characteristic (ROC) analyses provided accuracy estimates (area under the ROC curve, sensitivity, specificity, likelihood ratio) to evaluate the ability of the DIA-R scales to predict external criteria. For most of the DIA-R scales, reliability coefficients were excellent or moderate. High or moderate accuracy estimates from ROC analyses demonstrated the ability of the DIA-R thresholds to predict psychopathological conditions. These thresholds were generally capable to discriminate between clinical and school subsamples. However, the validity of the obsessions/compulsions scale was too low. Findings clearly support the reliability and the validity of the DIA-R. This instrument may be useful to assess a wide range of adolescents' mental health problems in the continuum of services. This conclusion applies to all scales, except the obsessions/compulsions one.
2012-01-01
Background A method for assessing the model validity of randomised controlled trials of homeopathy is needed. To date, only conventional standards for assessing intrinsic bias (internal validity) of trials have been invoked, with little recognition of the special characteristics of homeopathy. We aimed to identify relevant judgmental domains to use in assessing the model validity of homeopathic treatment (MVHT). We define MVHT as the extent to which a homeopathic intervention and the main measure of its outcome, as implemented in a randomised controlled trial (RCT), reflect 'state-of-the-art' homeopathic practice. Methods Using an iterative process, an international group of experts developed a set of six judgmental domains, with associated descriptive criteria. The domains address: (I) the rationale for the choice of the particular homeopathic intervention; (II) the homeopathic principles reflected in the intervention; (III) the extent of homeopathic practitioner input; (IV) the nature of the main outcome measure; (V) the capability of the main outcome measure to detect change; (VI) the length of follow-up to the endpoint of the study. Six papers reporting RCTs of homeopathy of varying design were randomly selected from the literature. A standard form was used to record each assessor's independent response per domain, using the optional verdicts 'Yes', 'Unclear', 'No'. Concordance among the eight verdicts per domain, across all six papers, was evaluated using the kappa (κ) statistic. Results The six judgmental domains enabled MVHT to be assessed with 'fair' to 'almost perfect' concordance in each case. For the six RCTs examined, the method allowed MVHT to be classified overall as 'acceptable' in three, 'unclear' in two, and 'inadequate' in one. Conclusion Future systematic reviews of RCTs in homeopathy should adopt the MVHT method as part of a complete appraisal of trial validity. PMID:22510227
NASA Astrophysics Data System (ADS)
Ranaie, Mehrdad; Soffianian, Alireza; Pourmanafi, Saeid; Mirghaffari, Noorollah; Tarkesh, Mostafa
2018-03-01
In recent decade, analyzing the remotely sensed imagery is considered as one of the most common and widely used procedures in the environmental studies. In this case, supervised image classification techniques play a central role. Hence, taking a high resolution Worldview-3 over a mixed urbanized landscape in Iran, three less applied image classification methods including Bagged CART, Stochastic gradient boosting model and Neural network with feature extraction were tested and compared with two prevalent methods: random forest and support vector machine with linear kernel. To do so, each method was run ten time and three validation techniques was used to estimate the accuracy statistics consist of cross validation, independent validation and validation with total of train data. Moreover, using ANOVA and Tukey test, statistical difference significance between the classification methods was significantly surveyed. In general, the results showed that random forest with marginal difference compared to Bagged CART and stochastic gradient boosting model is the best performing method whilst based on independent validation there was no significant difference between the performances of classification methods. It should be finally noted that neural network with feature extraction and linear support vector machine had better processing speed than other.
Model-Based Geostatistical Mapping of the Prevalence of Onchocerca volvulus in West Africa
O’Hanlon, Simon J.; Slater, Hannah C.; Cheke, Robert A.; Boatin, Boakye A.; Coffeng, Luc E.; Pion, Sébastien D. S.; Boussinesq, Michel; Zouré, Honorat G. M.; Stolk, Wilma A.; Basáñez, María-Gloria
2016-01-01
Background The initial endemicity (pre-control prevalence) of onchocerciasis has been shown to be an important determinant of the feasibility of elimination by mass ivermectin distribution. We present the first geostatistical map of microfilarial prevalence in the former Onchocerciasis Control Programme in West Africa (OCP) before commencement of antivectorial and antiparasitic interventions. Methods and Findings Pre-control microfilarial prevalence data from 737 villages across the 11 constituent countries in the OCP epidemiological database were used as ground-truth data. These 737 data points, plus a set of statistically selected environmental covariates, were used in a Bayesian model-based geostatistical (B-MBG) approach to generate a continuous surface (at pixel resolution of 5 km x 5km) of microfilarial prevalence in West Africa prior to the commencement of the OCP. Uncertainty in model predictions was measured using a suite of validation statistics, performed on bootstrap samples of held-out validation data. The mean Pearson’s correlation between observed and estimated prevalence at validation locations was 0.693; the mean prediction error (average difference between observed and estimated values) was 0.77%, and the mean absolute prediction error (average magnitude of difference between observed and estimated values) was 12.2%. Within OCP boundaries, 17.8 million people were deemed to have been at risk, 7.55 million to have been infected, and mean microfilarial prevalence to have been 45% (range: 2–90%) in 1975. Conclusions and Significance This is the first map of initial onchocerciasis prevalence in West Africa using B-MBG. Important environmental predictors of infection prevalence were identified and used in a model out-performing those without spatial random effects or environmental covariates. Results may be compared with recent epidemiological mapping efforts to find areas of persisting transmission. These methods may be extended to areas where data are sparse, and may be used to help inform the feasibility of elimination with current and novel tools. PMID:26771545
The methodological quality of animal research in critical care: the public face of science
2014-01-01
Background Animal research (AR) findings often do not translate to humans; one potential reason is the poor methodological quality of AR. We aimed to determine this quality of AR reported in critical care journals. Methods All AR published from January to June 2012 in three high-impact critical care journals were reviewed. A case report form and instruction manual with clear definitions were created, based on published recommendations, including the ARRIVE guidelines. Data were analyzed with descriptive statistics. Results Seventy-seven AR publications were reviewed. Our primary outcome (animal strain, sex, and weight or age described) was reported in 52 (68%; 95% confidence interval, 56% to 77%). Of the 77 publications, 47 (61%) reported randomization; of these, 3 (6%) reported allocation concealment, and 1 (2%) the randomization procedure. Of the 77 publications, 31 (40%) reported some type of blinding; of these, disease induction (2, 7%), intervention (7, 23%), and/or subjective outcomes (17, 55%) were blinded. A sample size calculation was reported in 4/77 (5%). Animal numbers were missing in the Methods section in 16 (21%) publications; when stated, the median was 32 (range 6 to 320; interquartile range, 21 to 70). Extra animals used were mentioned in the Results section in 31 (40%) publications; this number was unclear in 23 (74%), and >100 for 12 (16%). When reporting most outcomes, numbers with denominators were given in 35 (45%), with no unaccounted numbers in 24 (31%), and no animals excluded from analysis in 20 (26%). Most (49, 64%) studies reported >40, and another 19 (25%) reported 21 to 40 statistical comparisons. Internal validity limitations were discussed in 7 (9%), and external validity (to humans) discussed in 71 (92%), most with no (30, 42%) or only a vague (9, 13%) limitation to this external validity mentioned. Conclusions The reported methodological quality of AR was poor. Unless the quality of AR significantly improves, the practice may be in serious jeopardy of losing public support. PMID:25114829
New Model for Estimating Glomerular Filtration Rate in Patients With Cancer
Janowitz, Tobias; Williams, Edward H.; Marshall, Andrea; Ainsworth, Nicola; Thomas, Peter B.; Sammut, Stephen J.; Shepherd, Scott; White, Jeff; Mark, Patrick B.; Lynch, Andy G.; Jodrell, Duncan I.; Tavaré, Simon; Earl, Helena
2017-01-01
Purpose The glomerular filtration rate (GFR) is essential for carboplatin chemotherapy dosing; however, the best method to estimate GFR in patients with cancer is unknown. We identify the most accurate and least biased method. Methods We obtained data on age, sex, height, weight, serum creatinine concentrations, and results for GFR from chromium-51 (51Cr) EDTA excretion measurements (51Cr-EDTA GFR) from white patients ≥ 18 years of age with histologically confirmed cancer diagnoses at the Cambridge University Hospital NHS Trust, United Kingdom. We developed a new multivariable linear model for GFR using statistical regression analysis. 51Cr-EDTA GFR was compared with the estimated GFR (eGFR) from seven published models and our new model, using the statistics root-mean-squared-error (RMSE) and median residual and on an internal and external validation data set. We performed a comparison of carboplatin dosing accuracy on the basis of an absolute percentage error > 20%. Results Between August 2006 and January 2013, data from 2,471 patients were obtained. The new model improved the eGFR accuracy (RMSE, 15.00 mL/min; 95% CI, 14.12 to 16.00 mL/min) compared with all published models. Body surface area (BSA)–adjusted chronic kidney disease epidemiology (CKD-EPI) was the most accurate published model for eGFR (RMSE, 16.30 mL/min; 95% CI, 15.34 to 17.38 mL/min) for the internal validation set. Importantly, the new model reduced the fraction of patients with a carboplatin dose absolute percentage error > 20% to 14.17% in contrast to 18.62% for the BSA-adjusted CKD-EPI and 25.51% for the Cockcroft-Gault formula. The results were externally validated. Conclusion In a large data set from patients with cancer, BSA-adjusted CKD-EPI is the most accurate published model to predict GFR. The new model improves this estimation and may present a new standard of care. PMID:28686534
Towards interoperable and reproducible QSAR analyses: Exchange of datasets
2010-01-01
Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but also allows for analyzing the effect descriptors have on the statistical model's performance. The presented Bioclipse plugins equip scientists with graphical tools that make QSAR-ML easily accessible for the community. PMID:20591161
New Model for Estimating Glomerular Filtration Rate in Patients With Cancer.
Janowitz, Tobias; Williams, Edward H; Marshall, Andrea; Ainsworth, Nicola; Thomas, Peter B; Sammut, Stephen J; Shepherd, Scott; White, Jeff; Mark, Patrick B; Lynch, Andy G; Jodrell, Duncan I; Tavaré, Simon; Earl, Helena
2017-08-20
Purpose The glomerular filtration rate (GFR) is essential for carboplatin chemotherapy dosing; however, the best method to estimate GFR in patients with cancer is unknown. We identify the most accurate and least biased method. Methods We obtained data on age, sex, height, weight, serum creatinine concentrations, and results for GFR from chromium-51 ( 51 Cr) EDTA excretion measurements ( 51 Cr-EDTA GFR) from white patients ≥ 18 years of age with histologically confirmed cancer diagnoses at the Cambridge University Hospital NHS Trust, United Kingdom. We developed a new multivariable linear model for GFR using statistical regression analysis. 51 Cr-EDTA GFR was compared with the estimated GFR (eGFR) from seven published models and our new model, using the statistics root-mean-squared-error (RMSE) and median residual and on an internal and external validation data set. We performed a comparison of carboplatin dosing accuracy on the basis of an absolute percentage error > 20%. Results Between August 2006 and January 2013, data from 2,471 patients were obtained. The new model improved the eGFR accuracy (RMSE, 15.00 mL/min; 95% CI, 14.12 to 16.00 mL/min) compared with all published models. Body surface area (BSA)-adjusted chronic kidney disease epidemiology (CKD-EPI) was the most accurate published model for eGFR (RMSE, 16.30 mL/min; 95% CI, 15.34 to 17.38 mL/min) for the internal validation set. Importantly, the new model reduced the fraction of patients with a carboplatin dose absolute percentage error > 20% to 14.17% in contrast to 18.62% for the BSA-adjusted CKD-EPI and 25.51% for the Cockcroft-Gault formula. The results were externally validated. Conclusion In a large data set from patients with cancer, BSA-adjusted CKD-EPI is the most accurate published model to predict GFR. The new model improves this estimation and may present a new standard of care.
Di Lorenzo, Rosaria; Cabri, Giulio; Carretti, Eleonora; Galli, Giacomo; Giambalvo, Nina; Rioli, Giulia; Saraceni, Serena; Spiga, Giulia; Del Giovane, Cinzia; Ferri, Paola
2017-01-01
Purpose To investigate the perception of dignity among patients hospitalized in a psychiatric setting using the Patient Dignity Inventory (PDI), which had been first validated in oncologic field among terminally ill patients. Patients and methods After having modified two items, we administered the Italian version of PDI to all patients hospitalized in a public psychiatric ward (Service of Psychiatric Diagnosis and Treatment of a northern Italian town), who provided their consent and completed it at discharge, from October 21, 2015 to May 31, 2016. We excluded minors and patients with moderate/severe dementia, with poor knowledge of Italian language, who completed PDI in previous hospitalizations and/or were hospitalized for <72 hours. We collected the demographic and clinical variables of our sample (n=135). We statistically analyzed PDI scores, performing Cronbach’s alpha coefficient and principal factor analysis, followed by orthogonal and oblique rotation. We concomitantly administered to our sample other scales (Hamilton Rating Scales for Depression and Anxiety, Global Assessment of Functioning and Health of the Nation Outcome Scales) to analyze the PDI concurrent validity. Results With a response rate of 93%, we obtained a mean PDI score of 48.27 (±19.59 SD) with excellent internal consistency (Cronbach’s alpha coefficient =0.93). The factorial analysis showed the following three factors with eigenvalue >1 (Kaiser’s criterion), which explained >80% of total variance with good internal consistency: 1) “Loss of self-identity and social role”, 2) “Anxiety and uncertainty for future” and 3) “Loss of personal autonomy”. The PDI and the three-factor scores were statistically significantly positively correlated with the Hamilton Scales for Depression and Anxiety but not with other scale scores. Conclusion Our preliminary research suggests that PDI can be a reliable tool to assess patients’ dignity perception in a psychiatric setting, until now little investigated, helping professionals to improve quality of care and patients to accept treatments. PMID:28182110
Measuring socioeconomic status in multicountry studies: results from the eight-country MAL-ED study
2014-01-01
Background There is no standardized approach to comparing socioeconomic status (SES) across multiple sites in epidemiological studies. This is particularly problematic when cross-country comparisons are of interest. We sought to develop a simple measure of SES that would perform well across diverse, resource-limited settings. Methods A cross-sectional study was conducted with 800 children aged 24 to 60 months across eight resource-limited settings. Parents were asked to respond to a household SES questionnaire, and the height of each child was measured. A statistical analysis was done in two phases. First, the best approach for selecting and weighting household assets as a proxy for wealth was identified. We compared four approaches to measuring wealth: maternal education, principal components analysis, Multidimensional Poverty Index, and a novel variable selection approach based on the use of random forests. Second, the selected wealth measure was combined with other relevant variables to form a more complete measure of household SES. We used child height-for-age Z-score (HAZ) as the outcome of interest. Results Mean age of study children was 41 months, 52% were boys, and 42% were stunted. Using cross-validation, we found that random forests yielded the lowest prediction error when selecting assets as a measure of household wealth. The final SES index included access to improved water and sanitation, eight selected assets, maternal education, and household income (the WAMI index). A 25% difference in the WAMI index was positively associated with a difference of 0.38 standard deviations in HAZ (95% CI 0.22 to 0.55). Conclusions Statistical learning methods such as random forests provide an alternative to principal components analysis in the development of SES scores. Results from this multicountry study demonstrate the validity of a simplified SES index. With further validation, this simplified index may provide a standard approach for SES adjustment across resource-limited settings. PMID:24656134
Predicting high risk of exacerbations in bronchiectasis: the E-FACED score
Martinez-Garcia, MA; Athanazio, RA; Girón, R; Máiz-Carro, L; de la Rosa, D; Olveira, C; de Gracia, J; Vendrell, M; Prados-Sánchez, C; Gramblicka, G; Corso Pereira, M; Lundgren, FL; Fernandes De Figueiredo, M; Arancibia, F; Rached, SZ
2017-01-01
Background Although the FACED score has demonstrated a great prognostic capacity in bronchiectasis, it does not include the number or severity of exacerbations as a separate variable, which is important in the natural history of these patients. Objective Construction and external validation of a new index, the E-FACED, to evaluate the predictive capacity of exacerbations and mortality. Methods The new score was constructed on the basis of the complete cohort for the construction of the original FACED score, while the external validation was undertaken with six cohorts from three countries (Brazil, Argentina, and Chile). The main outcome was the number of annual exacerbations/hospitalizations, with all-cause and respiratory-related deaths as the secondary outcomes. A statistical evaluation comprised the relative weight and ideal cut-off point for the number or severity of the exacerbations and was incorporated into the FACED score (E-FACED). The results obtained after the application of FACED and E-FACED were compared in both the cohorts. Results A total of 1,470 patients with bronchiectasis (819 from the construction cohorts and 651 from the external validation cohorts) were followed up for 5 years after diagnosis. The best cut-off point was at least two exacerbations in the previous year (two additional points), meaning that the E-FACED has nine points of growing severity. E-FACED presented an excellent prognostic capacity for exacerbations (areas under the receiver operating characteristic curve: 0.82 for at least two exacerbations in 1 year and 0.87 for at least one hospitalization in 1 year) that was statistically better than that of the FACED score (0.72 and 0.78, P<0.05, respectively). The predictive capacities for all-cause and respiratory mortality were 0.87 and 0.86, respectively, with both being similar to those of the FACED. Conclusion E-FACED score significantly increases the FACED capacity to predict future yearly exacerbations while maintaining the score’s simplicity and prognostic capacity for death. PMID:28182132
Mohammed, Mohammed A.; Rudge, Gavin; Watson, Duncan; Wood, Gordon; Smith, Gary B.; Prytherch, David R.; Girling, Alan; Stevens, Andrew
2013-01-01
Background We explored the use of routine blood tests and national early warning scores (NEWS) reported within ±24 hours of admission to predict in-hospital mortality in emergency admissions, using empirical decision Tree models because they are intuitive and may ultimately be used to support clinical decision making. Methodology A retrospective analysis of adult emergency admissions to a large acute hospital during April 2009 to March 2010 in the West Midlands, England, with a full set of index blood tests results (albumin, creatinine, haemoglobin, potassium, sodium, urea, white cell count and an index NEWS undertaken within ±24 hours of admission). We developed a Tree model by randomly splitting the admissions into a training (50%) and validation dataset (50%) and assessed its accuracy using the concordance (c-) statistic. Emergency admissions (about 30%) did not have a full set of index blood tests and/or NEWS and so were not included in our analysis. Results There were 23248 emergency admissions with a full set of blood tests and NEWS with an in-hospital mortality of 5.69%. The Tree model identified age, NEWS, albumin, sodium, white cell count and urea as significant (p<0.001) predictors of death, which described 17 homogeneous subgroups of admissions with mortality ranging from 0.2% to 60%. The c-statistic for the training model was 0.864 (95%CI 0.852 to 0.87) and when applied to the testing data set this was 0.853 (95%CI 0.840 to 0.866). Conclusions An easy to interpret validated risk adjustment Tree model using blood test and NEWS taken within ±24 hours of admission provides good discrimination and offers a novel approach to risk adjustment which may potentially support clinical decision making. Given the nature of the clinical data, the results are likely to be generalisable but further research is required to investigate this promising approach. PMID:23734195
NASA Astrophysics Data System (ADS)
Gutiérrez, Jose Manuel; Maraun, Douglas; Widmann, Martin; Huth, Radan; Hertig, Elke; Benestad, Rasmus; Roessler, Ole; Wibig, Joanna; Wilcke, Renate; Kotlarski, Sven
2016-04-01
VALUE is an open European network to validate and compare downscaling methods for climate change research (http://www.value-cost.eu). A key deliverable of VALUE is the development of a systematic validation framework to enable the assessment and comparison of both dynamical and statistical downscaling methods. This framework is based on a user-focused validation tree, guiding the selection of relevant validation indices and performance measures for different aspects of the validation (marginal, temporal, spatial, multi-variable). Moreover, several experiments have been designed to isolate specific points in the downscaling procedure where problems may occur (assessment of intrinsic performance, effect of errors inherited from the global models, effect of non-stationarity, etc.). The list of downscaling experiments includes 1) cross-validation with perfect predictors, 2) GCM predictors -aligned with EURO-CORDEX experiment- and 3) pseudo reality predictors (see Maraun et al. 2015, Earth's Future, 3, doi:10.1002/2014EF000259, for more details). The results of these experiments are gathered, validated and publicly distributed through the VALUE validation portal, allowing for a comprehensive community-open downscaling intercomparison study. In this contribution we describe the overall results from Experiment 1), consisting of a European wide 5-fold cross-validation (with consecutive 6-year periods from 1979 to 2008) using predictors from ERA-Interim to downscale precipitation and temperatures (minimum and maximum) over a set of 86 ECA&D stations representative of the main geographical and climatic regions in Europe. As a result of the open call for contribution to this experiment (closed in Dec. 2015), over 40 methods representative of the main approaches (MOS and Perfect Prognosis, PP) and techniques (linear scaling, quantile mapping, analogs, weather typing, linear and generalized regression, weather generators, etc.) were submitted, including information both data (downscaled values) and metadata (characterizing different aspects of the downscaling methods). This constitutes the largest and most comprehensive to date intercomparison of statistical downscaling methods. Here, we present an overall validation, analyzing marginal and temporal aspects to assess the intrinsic performance and added value of statistical downscaling methods at both annual and seasonal levels. This validation takes into account the different properties/limitations of different approaches and techniques (as reported in the provided metadata) in order to perform a fair comparison. It is pointed out that this experiment alone is not sufficient to evaluate the limitations of (MOS) bias correction techniques. Moreover, it also does not fully validate PP since we don't learn whether we have the right predictors and whether the PP assumption is valid. These problems will be analyzed in the subsequent community-open VALUE experiments 2) and 3), which will be open for participation along the present year.
ERIC Educational Resources Information Center
Riskowski, Jody L.; Olbricht, Gayla; Wilson, Jennifer
2010-01-01
Statistics is the art and science of gathering, analyzing, and making conclusions from data. However, many people do not fully understand how to interpret statistical results and conclusions. Placing students in a collaborative environment involving project-based learning may enable them to overcome misconceptions of probability and enhance the…
2013-08-01
in Sequential Design Optimization with Concurrent Calibration-Based Model Validation Dorin Drignei 1 Mathematics and Statistics Department...Validation 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Dorin Drignei; Zissimos Mourelatos; Vijitashwa Pandey
14 CFR Sec. 19-7 - Passenger origin-destination survey.
Code of Federal Regulations, 2011 CFR
2011-01-01
... Transportation Statistics' Director of Airline Information. (c) A statistically valid sample of light coupons... LAX Salt Lake City NorthwestOperating Carrier NorthwestTicketed Carrier Fare Code Phoenix American...
Ranking of physiotherapeutic evaluation methods as outcome measures of stifle functionality in dogs
2013-01-01
Background Various physiotherapeutic evaluation methods are used to assess the functionality of dogs with stifle problems. Neither validity nor sensitivity of these methods has been investigated. This study aimed to determine the most valid and sensitive physiotherapeutic evaluation methods for assessing functional capacity in hind limbs of dogs with stifle problems and to serve as a basis for developing an indexed test for these dogs. A group of 43 dogs with unilateral surgically treated cranial cruciate ligament deficiency and osteoarthritic findings was used to test different physiotherapeutic evaluation methods. Twenty-one healthy dogs served as the control group and were used to determine normal variation in static weight bearing and range of motion. The protocol consisted of 14 different evaluation methods: visual evaluation of lameness, visual evaluation of diagonal movement, visual evaluation of functional active range of motion and difference in thrust of hind limbs via functional tests (sit-to-move and lie-to-move), movement in stairs, evaluation of hind limb muscle atrophy, manual evaluation of hind limb static weight bearing, quantitative measurement of static weight bearing of hind limbs with bathroom scales, and passive range of motion of hind limb stifle (flexion and extension) and tarsal (flexion and extension) joints using a universal goniometer. The results were compared with those from an orthopaedic examination, force plate analysis, radiographic evaluation, and a conclusive assessment. Congruity of the methods was assessed with a combination of three statistical approaches (Fisher’s exact test and two differently calculated proportions of agreeing observations), and the components were ranked from best to worst. Sensitivities of all of the physiotherapeutic evaluation methods against each standard were calculated. Results Evaluation of asymmetry in a sitting and lying position, assessment of muscle atrophy, manual and measured static weight bearing, and measurement of stifle passive range of motion were the most valid and sensitive physiotherapeutic evaluation methods. Conclusions Ranking of the various physiotherapeutic evaluation methods was accomplished. Several of these methods can be considered valid and sensitive when examining the functionality of dogs with stifle problems. PMID:23566355
Klop, Corinne; de Vries, Frank; Bijlsma, Johannes W J; Leufkens, Hubert G M; Welsing, Paco M J
2016-01-01
Objectives FRAX incorporates rheumatoid arthritis (RA) as a dichotomous predictor for predicting the 10-year risk of hip and major osteoporotic fracture (MOF). However, fracture risk may deviate with disease severity, duration or treatment. Aims were to validate, and if needed to update, UK FRAX for patients with RA and to compare predictive performance with the general population (GP). Methods Cohort study within UK Clinical Practice Research Datalink (CPRD) (RA: n=11 582, GP: n=38 755), also linked to hospital admissions for hip fracture (CPRD-Hospital Episode Statistics, HES) (RA: n=7221, GP: n=24 227). Predictive performance of UK FRAX without bone mineral density was assessed by discrimination and calibration. Updating methods included recalibration and extension. Differences in predictive performance were assessed by the C-statistic and Net Reclassification Improvement (NRI) using the UK National Osteoporosis Guideline Group intervention thresholds. Results UK FRAX significantly overestimated fracture risk in patients with RA, both for MOF (mean predicted vs observed 10-year risk: 13.3% vs 8.4%) and hip fracture (CPRD: 5.5% vs 3.1%, CPRD-HES: 5.5% vs 4.1%). Calibration was good for hip fracture in the GP (CPRD-HES: 2.7% vs 2.4%). Discrimination was good for hip fracture (RA: 0.78, GP: 0.83) and moderate for MOF (RA: 0.69, GP: 0.71). Extension of the recalibrated UK FRAX using CPRD-HES with duration of RA disease, glucocorticoids (>7.5 mg/day) and secondary osteoporosis did not improve the NRI (0.01, 95% CI −0.04 to 0.05) or C-statistic (0.78). Conclusions UK FRAX overestimated fracture risk in RA, but performed well for hip fracture in the GP after linkage to hospitalisations. Extension of the recalibrated UK FRAX did not improve predictive performance. PMID:26984006
Lotan, Tamara L.; Wei, Wei; Morais, Carlos L.; Hawley, Sarah T.; Fazli, Ladan; Hurtado-Coll, Antonio; Troyer, Dean; McKenney, Jesse K.; Simko, Jeffrey; Carroll, Peter R.; Gleave, Martin; Lance, Raymond; Lin, Daniel W.; Nelson, Peter S.; Thompson, Ian M.; True, Lawrence D.; Feng, Ziding; Brooks, James D.
2015-01-01
Background PTEN is the most commonly deleted tumor suppressor gene in primary prostate cancer (PCa) and its loss is associated with poor clinical outcomes and ERG gene rearrangement. Objective We tested whether PTEN loss is associated with shorter recurrence-free survival (RFS) in surgically treated PCa patients with known ERG status. Design, setting, and participants A genetically validated, automated PTEN immunohistochemistry (IHC) protocol was used for 1275 primary prostate tumors from the Canary Foundation retrospective PCa tissue microarray cohort to assess homogeneous (in all tumor tissue sampled) or heterogeneous (in a subset of tumor tissue sampled) PTEN loss. ERG status as determined by a genetically validated IHC assay was available for a subset of 938 tumors. Outcome measurements and statistical analysis Associations between PTEN and ERG status were assessed using Fisher’s exact test. Kaplan-Meier and multivariate weighted Cox proportional models for RFS were constructed. Results and limitations When compared to intact PTEN, homogeneous (hazard ratio [HR] 1.66, p = 0.001) but not heterogeneous (HR 1.24, p = 0.14) PTEN loss was significantly associated with shorter RFS in multivariate models. Among ERG-positive tumors, homogeneous (HR 3.07, p < 0.0001) but not heterogeneous (HR 1.46, p = 0.10) PTEN loss was significantly associated with shorter RFS. Among ERG-negative tumors, PTEN did not reach significance for inclusion in the final multivariate models. The interaction term for PTEN and ERG status with respect to RFS did not reach statistical significance (p = 0.11) for the current sample size. Conclusions These data suggest that PTEN is a useful prognostic biomarker and that there is no statistically significant interaction between PTEN and ERG status for RFS. Patient summary We found that loss of the PTEN tumor suppressor gene in prostate tumors as assessed by tissue staining is correlated with shorter time to prostate cancer recurrence after radical prostatectomy. PMID:27617307
Oncology patient-reported claims: maximising the chance for success
Kitchen, H; Rofail, D; Caron, M; Emery, M-P
2011-01-01
Objectives/purpose: To review Patient Reported Outcome (PRO) labelling claims achieved in oncology in Europe and in the United States and consider the benefits, and challenges faced. Methods: PROLabels database was searched to identify oncology products with PRO labelling approved in Europe since 1995 or in the United States since 1998. The US Food and Drug Administration (FDA) and the European Medicines Agency (EMA) websites and guidance documents were reviewed. PUBMED was searched for articles on PRO claims in oncology. Results: Among all oncology products approved, 22 were identified with PRO claims; 10 in the United States, 7 in Europe, and 5 in both. The language used in the labelling was limited to benefit (e.g. “…resulted in symptom benefits by significantly prolonging time to deterioration in cough, dyspnoea, and pain, versus placebo”) and equivalence (e.g. “no statistical differences were observed between treatment groups for global QoL”). Seven products used a validated HRQoL tool; two used symptom tools; two used both; seven used single-item symptom measures (one was unknown). The following emerged as likely reasons for success: ensuring systematic PRO data collection; clear rationale for pre-specified endpoints; adequately powered trials to detect differences and clinically significant changes; adjusting for multiplicity; developing an a priori statistical analysis plan including primary and subgroup analyses, dealing with missing data, pooling multiple-site data; establishing clinical versus statistical significance; interpreting failure to detect change. End-stage patient drop-out rates and cessation of trials due to exceptional therapeutic benefit pose significant challenges to demonstrating treatment PRO improvement. Conclusions: PRO labelling claims demonstrate treatment impact and the trade-off between efficacy and side effects ultimately facilitating product differentiation. Reliable and valid instruments specific to the desired language, claim, and target population are required. Practical considerations include rationale for study endpoints, transparency in assumptions, and attention to subtle variations in data. PMID:22276055
Gerber, Madelyn M.; Hampel, Heather; Schulz, Nathan P.; Fernandez, Soledad; Wei, Lai; Zhou, Xiao-Ping; de la Chapelle, Albert; Toland, Amanda Ewart
2012-01-01
Background Tumors frequently exhibit loss of tumor suppressor genes or allelic gains of activated oncogenes. A significant proportion of cancer susceptibility loci in the mouse show somatic losses or gains consistent with the presence of a tumor susceptibility or resistance allele. Thus, allele-specific somatic gains or losses at loci may demarcate the presence of resistance or susceptibility alleles. The goal of this study was to determine if previously mapped susceptibility loci for colorectal cancer show evidence of allele-specific somatic events in colon tumors. Methods We performed quantitative genotyping of 16 single nucleotide polymorphisms (SNPs) showing statistically significant association with colorectal cancer in published genome-wide association studies (GWAS). We genotyped 194 paired normal and colorectal tumor DNA samples and 296 paired validation samples to investigate these SNPs for allele-specific somatic gains and losses. We combined analysis of our data with published data for seven of these SNPs. Results No statistically significant evidence for allele-specific somatic selection was observed for the tested polymorphisms in the discovery set. The rs6983267 variant, which has shown preferential loss of the non-risk T allele and relative gain of the risk G allele in previous studies, favored relative gain of the G allele in the combined discovery and validation samples (corrected p-value = 0.03). When we combined our data with published allele-specific imbalance data for this SNP, the G allele of rs6983267 showed statistically significant evidence of relative retention (p-value = 2.06×10−4). Conclusions Our results suggest that the majority of variants identified as colon cancer susceptibility alleles through GWAS do not exhibit somatic allele-specific imbalance in colon tumors. Our data confirm previously published results showing allele-specific imbalance for rs6983267. These results indicate that allele-specific imbalance of cancer susceptibility alleles may not be a common phenomenon in colon cancer. PMID:22629442
Compulsive Cell Phone Use and History of Motor Vehicle Crash
O’Connor, Stephen S.; Whitehill, Jennifer M.; King, Kevin M.; Kernic, Mary A.; Boyle, Linda Ng; Bresnahan, Brian; Mack, Christopher D.; Ebel, Beth E.
2013-01-01
Introduction Few studies have examined the psychological factors underlying the association between cell phone use and motor vehicle crash. We sought to examine the factor structure and convergent validity of a measure of problematic cell phone use and explore whether compulsive cell phone use is associated with a history of motor vehicle crash. Methods We recruited a sample of 383 undergraduate college students to complete an on-line assessment that included cell phone use and driving history. We explored the dimensionality of the Cell Phone Overuse Scale (CPOS) using factor analytic methods. Ordinary least squares regression models were used to examine associations between identified subscales and measures of impulsivity, alcohol use, and anxious relationship style to establish convergent validity. We used negative binomial regression models to investigate associations between the CPOS and motor vehicle crash incidence. Results We found the CPOS to be comprised of four subscales: anticipation, activity interfering, emotional reaction, and problem recognition. Each displayed significant associations with aspects of impulsivity, problematic alcohol use, and anxious relationship style characteristics. Only the anticipation subscale demonstrated statistically significant associations with reported motor vehicle crash incidence, controlling for clinical and demographic characteristics (RR 1.13, CI 1.01 to 1.26). For each one-point increase on the 6-point anticipation subscale, risk for previous motor vehicle crash increased by 13%. Conclusions Crash risk is strongly associated with heightened anticipation about incoming phone calls or messages. The mean score on the CPOS is associated with increased risk of motor vehicle crash but does not reach statistical significance. PMID:23910571
Bacterial infections in childhood: A risk factor for gastrointestinal and other diseases?
Unverdorben, Alexandra; Weimer, Katja; Schlarb, Angelika Anita; Gulewitsch, Marco Daniel; Ellert, Ute; Enck, Paul
2015-01-01
Background There is evidence for post-infectious irritable bowel syndrome (PI-IBS) in adults, but little is known about PI-IBS in children. The nationwide representative German Health Interview and Examination Survey for Children and Adolescents (KiGGS) assessed children’s health. Objective and methods We identified 643 children (50.1% males) in the KiGGS cohort (N = 15,878, 51% males) with a history of Salmonella infection. The number was validated comparing this group with the known infection statistics from the Robert Koch-Institute registry. We compared this group to the remaining KiGGS cohort (n = 12,951) with respect to sociodemographic characteristics, pain and quality of life. To check for specificity, we repeated the comparisons with a group with a history of scarlet fever. Results Infection statistics predicted 504 cases of Salmonella infection in the KiGGS cohort, indicating high validity of the data. In children between 3 and 10 years with a history of Salmonella infection, significantly more abdominal pain (31.7% versus 21.9%, p < 0.001) and headache (27.2% versus 15.1%, p < 0.001) were reported. This group showed lower quality of life (p < 0.001). Comparison to a group of scarlet fever-infected children revealed poor specificity of the data. Conclusion Differences found between children with and without Salmonella infection reveal the role of gastrointestinal infection in the development of post-infectious abdominal problems, but poor specificity may point toward a psychosocial (“somatization”) rather than a Salmonella-specific mechanism. PMID:25653857
Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R
2009-12-01
To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.
Welford, Mark R.; Bossak, Brian H.
2009-01-01
Background Recent studies have noted myriad qualitative and quantitative inconsistencies between the medieval Black Death (and subsequent “plagues”) and modern empirical Y. pestis plague data, most of which is derived from the Indian and Chinese plague outbreaks of A.D. 1900±15 years. Previous works have noted apparent differences in seasonal mortality peaks during Black Death outbreaks versus peaks of bubonic and pneumonic plagues attributed to Y. pestis infection, but have not provided spatiotemporal statistical support. Our objective here was to validate individual observations of this seasonal discrepancy in peak mortality between historical epidemics and modern empirical data. Methodology/Principal Findings We compiled and aggregated multiple daily, weekly and monthly datasets of both Y. pestis plague epidemics and suspected Black Death epidemics to compare seasonal differences in mortality peaks at a monthly resolution. Statistical and time series analyses of the epidemic data indicate that a seasonal inversion in peak mortality does exist between known Y. pestis plague and suspected Black Death epidemics. We provide possible explanations for this seasonal inversion. Conclusions/Significance These results add further evidence of inconsistency between historical plagues, including the Black Death, and our current understanding of Y. pestis-variant disease. We expect that the line of inquiry into the disputed cause of the greatest recorded epidemic will continue to intensify. Given the rapid pace of environmental change in the modern world, it is crucial that we understand past lethal outbreaks as fully as possible in order to prepare for future deadly pandemics. PMID:20027294
Sainz de Baranda, Pilar; Rodríguez-Iniesta, María; Ayala, Francisco; Santonja, Fernando; Cejudo, Antonio
2014-07-01
To examine the criterion-related validity of the horizontal hip joint angle (H-HJA) test and vertical hip joint angle (V-HJA) test for estimating hamstring flexibility measured through the passive straight-leg raise (PSLR) test using contemporary statistical measures. Validity study. Controlled laboratory environment. One hundred thirty-eight professional trampoline gymnasts (61 women and 77 men). Hamstring flexibility. Each participant performed 2 trials of H-HJA, V-HJA, and PSLR tests in a randomized order. The criterion-related validity of H-HJA and V-HJA tests was measured through the estimation equation, typical error of the estimate (TEEST), validity correlation (β), and their respective confidence limits. The findings from this study suggest that although H-HJA and V-HJA tests showed moderate to high validity scores for estimating hamstring flexibility (standardized TEEST = 0.63; β = 0.80), the TEEST statistic reported for both tests was not narrow enough for clinical purposes (H-HJA = 10.3 degrees; V-HJA = 9.5 degrees). Subsequently, the predicted likely thresholds for the true values that were generated were too wide (H-HJA = predicted value ± 13.2 degrees; V-HJA = predicted value ± 12.2 degrees). The results suggest that although the HJA test showed moderate to high validity scores for estimating hamstring flexibility, the prediction intervals between the HJA and PSLR tests are not strong enough to suggest that clinicians and sport medicine practitioners should use the HJA and PSLR tests interchangeably as gold standard measurement tools to evaluate and detect short hamstring muscle flexibility.
Cederbye, Camilla Natasha; Palshof, Jesper Andreas; Hansen, Tine Plato; Duun-Henriksen, Anne Katrine; Linnemann, Dorte; Stenvang, Jan; Nielsen, Dorte Lisbet; Brünner, Nils; Viuff, Birgitte Martine
2016-01-01
Overexpression of the ATP-dependent drug efflux pump ABCG2 is a major molecular mechanism of multidrug resistance in cancer and might be a predictive biomarker for drug response. Contradictory results have been reported for immunohistochemical studies of ABCG2 protein expression in colorectal cancer (CRC), probably because of the use of different antibodies and scoring approaches. In this study, we systematically studied six commercially available anti-ABCG2 antibodies, using cell lines with up-regulation of ABCG2, and selected one antibody for validation in CRC tissue. Furthermore, we established scoring guidelines for ABCG2 expression based on the clinically used guidelines for HER2 immunohistochemistry assessment in gastric cancer. The guidelines provide a semi-quantitative measure of the basolateral membrane staining of ABCG2 and disregard the apical membrane staining and the cytoplasmic signal. Intra-tumor heterogeneity in ABCG2 immunoreactivity was observed; however, statistical analyses of tissue microarrays (TMAs) and the corresponding whole sections from primary tumors of 57 metastatic CRC patients revealed a strong positive correlation between maximum TMA scores and whole sections, especially when more than one core was used. In conclusion, here, we provide validated results to guide future studies on the associations between ABCG2 immunoreactivity in tumor cells and the benefits of chemotherapeutic treatment in patients with CRC. PMID:27257141
Saffan, Abdulrahman Al; AlHobail, Sultan; Bin Salem, Fares; AlFuraih, AlBara; AlTamimi, Mohammad
2016-01-01
Background and Aim. Esthetic concerns in primary teeth have been studied mainly from the point of view of parents. The aim of this study was to study compare the opinions of children aged 5–8 years to have an opinion regarding the changes in appearance of their teeth due to dental caries and the materials used to restore those teeth. Methodology. A total of 107 children and both of their parents (n = 321), who were seeking dental treatment, were included in this study. A tool comprising a questionnaire and pictures of carious lesions and their treatment arranged in the form of a presentation was validated and tested on 20 children and their parents. The validated tool was then tested on all participants. Results. Children had acceptable validity statistics for the tool suggesting that they were able to make informed decisions regarding esthetic restorations. There was no difference between the responses of the children and their parents on most points. Zirconia crowns appeared to be the most acceptable full coverage restoration for primary anterior teeth among both children and their parents. Conclusion. Within the limitations of the study it can be concluded that children in their sixth year of life are capable of appreciating the esthetics of the restorations for their anterior teeth. PMID:27446212
Bahl, Gautam; Cruite, Irene; Wolfson, Tanya; Gamst, Anthony C.; Collins, Julie M.; Chavez, Alyssa D.; Barakat, Fatma; Hassanein, Tarek; Sirlin, Claude B.
2016-01-01
Purpose To demonstrate a proof of concept that quantitative texture feature analysis of double contrast-enhanced magnetic resonance imaging (MRI) can classify fibrosis noninvasively, using histology as a reference standard. Materials and Methods A Health Insurance Portability and Accountability Act (HIPAA)-compliant Institutional Review Board (IRB)-approved retrospective study of 68 patients with diffuse liver disease was performed at a tertiary liver center. All patients underwent double contrast-enhanced MRI, with histopathology-based staging of fibrosis obtained within 12 months of imaging. The MaZda software program was used to compute 279 texture parameters for each image. A statistical regularization technique, generalized linear model (GLM)-path, was used to develop a model based on texture features for dichotomous classification of fibrosis category (F ≤2 vs. F ≥3) of the 68 patients, with histology as the reference standard. The model's performance was assessed and cross-validated. There was no additional validation performed on an independent cohort. Results Cross-validated sensitivity, specificity, and total accuracy of the texture feature model in classifying fibrosis were 91.9%, 83.9%, and 88.2%, respectively. Conclusion This study shows proof of concept that accurate, noninvasive classification of liver fibrosis is possible by applying quantitative texture analysis to double contrast-enhanced MRI. Further studies are needed in independent cohorts of subjects. PMID:22851409
2010-01-01
Background The modular approach to analysis of genetically modified organisms (GMOs) relies on the independence of the modules combined (i.e. DNA extraction and GM quantification). The validity of this assumption has to be proved on the basis of specific performance criteria. Results An experiment was conducted using, as a reference, the validated quantitative real-time polymerase chain reaction (PCR) module for detection of glyphosate-tolerant Roundup Ready® GM soybean (RRS). Different DNA extraction modules (CTAB, Wizard and Dellaporta), were used to extract DNA from different food/feed matrices (feed, biscuit and certified reference material [CRM 1%]) containing the target of the real-time PCR module used for validation. Purity and structural integrity (absence of inhibition) were used as basic criteria that a DNA extraction module must satisfy in order to provide suitable template DNA for quantitative real-time (RT) PCR-based GMO analysis. When performance criteria were applied (removal of non-compliant DNA extracts), the independence of GMO quantification from the extraction method and matrix was statistically proved, except in the case of Wizard applied to biscuit. A fuzzy logic-based procedure also confirmed the relatively poor performance of the Wizard/biscuit combination. Conclusions For RRS, this study recognises that modularity can be generally accepted, with the limitation of avoiding combining highly processed material (i.e. biscuit) with a magnetic-beads system (i.e. Wizard). PMID:20687918
2009-01-01
Background Symptom-based surveys suggest that the prevalence of gastrointestinal diseases is lower in China than in Western countries. The aim of this study was to validate a methodology for the epidemiological investigation of gastrointestinal symptoms and endoscopic findings in China. Methods A randomized, stratified, multi-stage sampling methodology was used to select 18 000 adults aged 18-80 years from Shanghai, Beijing, Xi'an, Wuhan and Guangzhou. Participants from Shanghai were invited to provide blood samples and undergo upper gastrointestinal endoscopy. All participants completed Chinese versions of the Reflux Disease Questionnaire (RDQ) and the modified Rome II questionnaire; 20% were also invited to complete the 36-item Short Form Health Survey (SF-36) and Epworth Sleepiness Scale (ESS). The psychometric properties of the questionnaires were evaluated statistically. Results The study was completed by 16 091 individuals (response rate: 89.4%), with 3219 (89.4% of those invited) completing the SF-36 and ESS. All 3153 participants in Shanghai provided blood samples and 1030 (32.7%) underwent endoscopy. Cronbach's alpha coefficients were 0.89, 0.89, 0.80 and 0.91, respectively, for the RDQ, modified Rome II questionnaire, ESS and SF-36, supporting internal consistency. Factor analysis supported construct validity of all questionnaire dimensions except SF-36 psychosocial dimensions. Conclusion This population-based study has great potential to characterize the relationship between gastrointestinal symptoms and endoscopic findings in China. PMID:19925662
Carle, C; Alexander, P; Columb, M; Johal, J
2013-04-01
We designed and internally validated an aggregate weighted early warning scoring system specific to the obstetric population that has the potential for use in the ward environment. Direct obstetric admissions from the Intensive Care National Audit and Research Centre's Case Mix Programme Database were randomly allocated to model development (n = 2240) or validation (n = 2200) sets. Physiological variables collected during the first 24 h of critical care admission were analysed. Logistic regression analysis for mortality in the model development set was initially used to create a statistically based early warning score. The statistical score was then modified to create a clinically acceptable early warning score. Important features of this clinical obstetric early warning score are that the variables are weighted according to their statistical importance, a surrogate for the FI O2 /Pa O2 relationship is included, conscious level is assessed using a simplified alert/not alert variable, and the score, trigger thresholds and response are consistent with the new non-obstetric National Early Warning Score system. The statistical and clinical early warning scores were internally validated using the validation set. The area under the receiver operating characteristic curve was 0.995 (95% CI 0.992-0.998) for the statistical score and 0.957 (95% CI 0.923-0.991) for the clinical score. Pre-existing empirically designed early warning scores were also validated in the same way for comparison. The area under the receiver operating characteristic curve was 0.955 (95% CI 0.922-0.988) for Swanton et al.'s Modified Early Obstetric Warning System, 0.937 (95% CI 0.884-0.991) for the obstetric early warning score suggested in the 2003-2005 Report on Confidential Enquiries into Maternal Deaths in the UK, and 0.973 (95% CI 0.957-0.989) for the non-obstetric National Early Warning Score. This highlights that the new clinical obstetric early warning score has an excellent ability to discriminate survivors from non-survivors in this critical care data set. Further work is needed to validate our new clinical early warning score externally in the obstetric ward environment. Anaesthesia © 2013 The Association of Anaesthetists of Great Britain and Ireland.
Normality Tests for Statistical Analysis: A Guide for Non-Statisticians
Ghasemi, Asghar; Zahediasl, Saleh
2012-01-01
Statistical errors are common in scientific literature and about 50% of the published articles have at least one error. The assumption of normality needs to be checked for many statistical procedures, namely parametric tests, because their validity depends on it. The aim of this commentary is to overview checking for normality in statistical analysis using SPSS. PMID:23843808
ERIC Educational Resources Information Center
Thompson, Bruce
Web-based statistical instruction, like all statistical instruction, ought to focus on teaching the essence of the research endeavor: the exercise of reflective judgment. Using the framework of the recent report of the American Psychological Association (APA) Task Force on Statistical Inference (Wilkinson and the APA Task Force on Statistical…
Recent statistical methods for orientation data
NASA Technical Reports Server (NTRS)
Batschelet, E.
1972-01-01
The application of statistical methods for determining the areas of animal orientation and navigation are discussed. The method employed is limited to the two-dimensional case. Various tests for determining the validity of the statistical analysis are presented. Mathematical models are included to support the theoretical considerations and tables of data are developed to show the value of information obtained by statistical analysis.
38 CFR 1.15 - Standards for program evaluation.
Code of Federal Regulations, 2010 CFR
2010-07-01
... program operates. (3) Validity. The degree of statistical validity should be assessed within the research... decisions. (4) Reliability. Use of the same research design by others should yield the same findings. (g...
Analytical procedure validation and the quality by design paradigm.
Rozet, Eric; Lebrun, Pierre; Michiels, Jean-François; Sondag, Perceval; Scherder, Tara; Boulanger, Bruno
2015-01-01
Since the adoption of the ICH Q8 document concerning the development of pharmaceutical processes following a quality by design (QbD) approach, there have been many discussions on the opportunity for analytical procedure developments to follow a similar approach. While development and optimization of analytical procedure following QbD principles have been largely discussed and described, the place of analytical procedure validation in this framework has not been clarified. This article aims at showing that analytical procedure validation is fully integrated into the QbD paradigm and is an essential step in developing analytical procedures that are effectively fit for purpose. Adequate statistical methodologies have also their role to play: such as design of experiments, statistical modeling, and probabilistic statements. The outcome of analytical procedure validation is also an analytical procedure design space, and from it, control strategy can be set.
Impact of syncope on quality of life: validation of a measure in patients undergoing tilt testing.
Nave-Leal, Elisabete; Oliveira, Mário; Pais-Ribeiro, José; Santos, Sofia; Oliveira, Eunice; Alves, Teresa; Cruz Ferreira, Rui
2015-03-01
Recurrent syncope has a significant impact on quality of life. The development of measurement scales to assess this impact that are easy to use in clinical settings is crucial. The objective of the present study is a preliminary validation of the Impact of Syncope on Quality of Life questionnaire for the Portuguese population. The instrument underwent a process of translation, validation, analysis of cultural appropriateness and cognitive debriefing. A population of 39 patients with a history of recurrent syncope (>1 year) who underwent tilt testing, aged 52.1 ± 16.4 years (21-83), 43.5% male, most in active employment (n=18) or retired (n=13), constituted a convenience sample. The resulting Portuguese version is similar to the original, with 12 items in a single aggregate score, and underwent statistical validation, with assessment of reliability, validity and stability over time. With regard to reliability, the internal consistency of the scale is 0.9. Assessment of convergent and discriminant validity showed statistically significant results (p<0.01). Regarding stability over time, a test-retest of this instrument at six months after tilt testing with 22 patients of the sample who had not undergone any clinical intervention found no statistically significant changes in quality of life. The results indicate that this instrument is of value for assessing quality of life in patients with recurrent syncope in Portugal. Copyright © 2014 Sociedade Portuguesa de Cardiologia. Published by Elsevier España. All rights reserved.
ERIC Educational Resources Information Center
Patterson, Brian F.; Mattern, Krista D.
2013-01-01
The continued accumulation of validity evidence for the core uses of educational assessments is critical to ensure that proper inferences will be made for those core purposes. To that end, the College Board has continued to follow previous cohorts of college students and this report provides updated validity evidence for using the SAT to predict…
The Fifth Calibration/Data Product Validation Panel Meeting
NASA Technical Reports Server (NTRS)
1992-01-01
The minutes and associated documents prepared from presentations and meetings at the Fifth Calibration/Data Product Validation Panel meeting in Boulder, Colorado, April 8 - 10, 1992, are presented. Key issues include (1) statistical characterization of data sets: finding statistics that characterize key attributes of the data sets, and defining ways to characterize the comparisons among data sets; (2) selection of specific intercomparison exercises: selecting characteristic spatial and temporal regions for intercomparisons, and impact of validation exercises on the logistics of current and planned field campaigns and model runs; and (3) preparation of data sets for intercomparisons: characterization of assumptions, transportable data formats, labeling data files, content of data sets, and data storage and distribution (EOSDIS interface).
Charan, J; Saxena, D
2014-01-01
Biased negative studies not only reflect poor research effort but also have an impact on 'patient care' as they prevent further research with similar objectives, leading to potential research areas remaining unexplored. Hence, published 'negative studies' should be methodologically strong. All parameters that may help a reader to judge validity of results and conclusions should be reported in published negative studies. There is a paucity of data on reporting of statistical and methodological parameters in negative studies published in Indian Medical Journals. The present systematic review was designed with an aim to critically evaluate negative studies published in prominent Indian Medical Journals for reporting of statistical and methodological parameters. Systematic review. All negative studies published in 15 Science Citation Indexed (SCI) medical journals published from India were included in present study. Investigators involved in the study evaluated all negative studies for the reporting of various parameters. Primary endpoints were reporting of "power" and "confidence interval." Power was reported in 11.8% studies. Confidence interval was reported in 15.7% studies. Majority of parameters like sample size calculation (13.2%), type of sampling method (50.8%), name of statistical tests (49.1%), adjustment of multiple endpoints (1%), post hoc power calculation (2.1%) were reported poorly. Frequency of reporting was more in clinical trials as compared to other study designs and in journals having impact factor more than 1 as compared to journals having impact factor less than 1. Negative studies published in prominent Indian medical journals do not report statistical and methodological parameters adequately and this may create problems in the critical appraisal of findings reported in these journals by its readers.
Cardiac arrest risk standardization using administrative data compared to registry data
Gaieski, David F.; Donnino, Michael W.; Nelson, Joshua I. M.; Mutter, Eric L.; Carr, Brendan G.; Abella, Benjamin S.; Wiebe, Douglas J.
2017-01-01
Background Methods for comparing hospitals regarding cardiac arrest (CA) outcomes, vital for improving resuscitation performance, rely on data collected by cardiac arrest registries. However, most CA patients are treated at hospitals that do not participate in such registries. This study aimed to determine whether CA risk standardization modeling based on administrative data could perform as well as that based on registry data. Methods and results Two risk standardization logistic regression models were developed using 2453 patients treated from 2000–2015 at three hospitals in an academic health system. Registry and administrative data were accessed for all patients. The outcome was death at hospital discharge. The registry model was considered the “gold standard” with which to compare the administrative model, using metrics including comparing areas under the curve, calibration curves, and Bland-Altman plots. The administrative risk standardization model had a c-statistic of 0.891 (95% CI: 0.876–0.905) compared to a registry c-statistic of 0.907 (95% CI: 0.895–0.919). When limited to only non-modifiable factors, the administrative model had a c-statistic of 0.818 (95% CI: 0.799–0.838) compared to a registry c-statistic of 0.810 (95% CI: 0.788–0.831). All models were well-calibrated. There was no significant difference between c-statistics of the models, providing evidence that valid risk standardization can be performed using administrative data. Conclusions Risk standardization using administrative data performs comparably to standardization using registry data. This methodology represents a new tool that can enable opportunities to compare hospital performance in specific hospital systems or across the entire US in terms of survival after CA. PMID:28783754
FAITH, MYLES S.; STOREY, MEGAN; KRAL, TANJA V. E.; PIETROBELLI, ANGELO
2010-01-01
Background There are few validated instruments measuring parental beliefs about parent–child feeding relations and child compliance during meals. Objective To test the validity of the Feeding Demands Questionnaire, a parent-report instrument designed to measure parents’ beliefs about how their child should eat. Methods Participants were 85 mothers of 3- to 7-year-old same-sex twin pairs or sibling pairs, and their children. Mothers completed the eight-item Feeding Demands Questionnaire and the Child Feeding Questionnaire, plus measures of depression and fear of fat. Statistical analyses Psychometric evaluations of the Feeding Demands Questionnaire included principal components analysis, Cronbach’s α for internal consistency, tests for convergent and discriminant validities, and Flesh-Kincaid for readability. Results The Feeding Demands Questionnaire had three underlying factors: anger/frustration, food amount demandingness, and food type demandingness, for which subscales were computed. The Feeding Demands Questionnaire showed acceptable internal consistency (α ranging from .70 to .86) and was written at the 4.8th grade level. Mothers reporting greater anger/frustration during feeding were more likely to pressure their children to eat, while those reporting greater demands about the type of foods their children eat were more likely to monitor child fat intake. Mothers reporting greater demands about the amount of food their children eat were more likely to restrict eating, pressure children to eat, and monitor their fat intake. Conclusions The Feeding Demands Questionnaire appears valid for assessing maternal beliefs that children should comply with rules for eating and frustration during feeding. Different demand beliefs can underlie different feeding practices. PMID:18375218
Results of the 2015 Perfusionist Salary Study
Lewis, Doreen M.; Dove, Steven; Jordan, Ralph E.
2016-01-01
Abstract: Presently, there exists no published valid and reliable salary study of clinical perfusionists. The objective of the 2015 Perfusionist Salary Study was to gather verifiable employee information to determine current compensation market rates (salary averages) of clinical perfusionists working in the United States. A salary survey was conducted between April 2015 and March 2016. The survey required perfusionists to answer questions about work volume, scheduling, and employer-paid compensation including benefits. Participants were also required to submit a de-identified pay stub to validate the income they reported. Descriptive statistics were calculated for all survey questions (e.g., percentages, means, and ranges). The study procured 481 responses, of which 287 were validated (i.e., respondents provided income verification that matched reported earnings). Variables that were examined within the validated sample population include job title, type of institution of employment, education level, years of experience, and geographic region, among others. Additional forms of compensation which may affect base compensation rates were also calculated including benefits, call time, bonuses, and pay for ancillary services (e.g., extracorporeal membrane oxygenation and ventricular assist device). In conclusion, in 2015, the average salary for all perfusionists is $127,600 with 19 years' experience. This research explores the average salary within subpopulations based on other factors such as position role, employer type, and geography. Information from this study is presented to guide employer compensation programs and suggests the need for further study in consideration of attrition rates and generational changes (i.e., perfusionists reaching retirement age) occurring alongside the present perfusionist staffing shortage affecting many parts of the country. PMID:27994258
Construct, Concurrent and Predictive Validity of the URICA: Data from Two Multi-site Clinical Trials
Field, Craig A.; Adinoff, Bryon; Harris, T. Robert; Ball, Samuel A.; Carroll, Kathleen M.
2011-01-01
Background A better understanding of how to measure motivation to change and how it relates to behavior change in patients with drug and alcohol dependence would broaden our understanding of the role of motivation in addiction treatment. Methods Two multi-site, randomized clinical trials comparing brief motivational interventions with standard care were conducted in the National Institute on Drug Abuse Clinical Trials Network. Patients with primary drug dependence and alcohol dependence entering outpatient treatment participated in a study of either Motivational Enhancement Therapy (n=431) or Motivational Interviewing (n=423). The construct, concurrent, and predictive validity of two composite measures of motivation to change derived from the University of Rhode Island Change Assessment (URICA): Readiness to Change (RTC) and Committed Action (CA) were evaluated. Results Confirmatory factor analysis confirmed the a priori factor structure of the URICA. RTC was significantly associated with measures of addiction severity at baseline (r=.12-.52, p<.05). Although statistically significant (p<.01), the correlations between treatment outcomes and RTC were low (r=-.15 and -18). Additional analyses did not support a moderating or mediating effect of motivation on treatment retention or substance use. Conclusions The construct validity of the URICA was confirmed separately in a large sample of drug- and alcohol-dependent patients. However, evidence for the predictive validity of composite scores was very limited and there were no moderating or mediating effects of either measure on treatment outcome. Thus, increased motivation to change, as measured by the composite scores of motivation derived from the URICA, does not appear to influence treatment outcome. PMID:19157723
Multicentre study for validation of the French addictovigilance network reports assessment tool
Hardouin, Jean Benoit; Rousselet, Morgane; Gerardin, Marie; Guerlais, Marylène; Guillou, Morgane; Bronnec, Marie; Sébille, Véronique; Jolliet, Pascale
2016-01-01
Aims The French health authority (ANSM) is responsible for monitoring medicinal and other drug dependencies. To support these activities, the ANSM manages a network of 13 drug dependence evaluation and information centres (Centres d'Evaluation et d'Information sur la Pharmacodépendance ‐ Addictovigilance ‐ CEIP‐A) throughout France. In 2006, the Nantes CEIP‐A created a new tool called the EGAP (Echelle de GrAvité de la Pharmacodépendance‐ drug dependence severity scale) based on DSM IV criteria. This tool allows the creation of a substance use profile that enables the drug dependence severity to be homogeneously quantified by assigning a score to each substance indicated in the reports from health professionals. This article describes the validation and psychometric properties of the drug dependence severity score obtained from the scale ( Clinicaltrials.gov NCT01052675). Method The validity of the EGAP construct, the concurrent validity and the discriminative ability of the EGAP score, the consistency of answers to EGAP items, the internal consistency and inter rater reliability of the EGAP score were assessed using statistical methods that are generally used for psychometric tests. Results The total EGAP score was a reliable and precise measure for evaluating drug dependence (Cronbach alpha = 0.84; ASI correlation = 0.70; global ICC = 0.92). In addition to its good psychometric properties, the EGAP is a simple and efficient tool that can be easily specified on the official ANSM notification form. Conclusion The good psychometric properties of the total EGAP score justify its use for evaluating the severity of drug dependence. PMID:27302554
Amosun, Seyi L.; Shilalukey-Ngoma, Mary P.; Kafaar, Zuhayr
2017-01-01
Background Very little is known on outcome measures for children with spina bifida (SB) in Zambia. If rehabilitation professionals managing children with SB in Zambia and other parts of sub-Saharan Africa are to instigate measuring outcomes routinely, a tool has to be made available. The main objective of this study was to develop an appropriate and culturally sensitive instrument for evaluating the impact of the interventions on children with SB in Zambia. Methods A mixed design method was used for the study. Domains were identified retrospectively and confirmation was done through a systematic review study. Items were generated through semi-structured interviews and focus group discussions. Qualitative data were downloaded, translated into English, transcribed verbatim and presented. These were then placed into categories of the main domains of care deductively through the process of manifest content analysis. Descriptive statistics, alpha coefficient and index of content validity were calculated using SPSS. Results Self-care, mobility and social function were identified as main domains, while participation and communication were sub-domains. A total of 100 statements were generated and 78 items were selected deductively. An alpha coefficient of 0.98 was computed and experts judged the items. Conclusions The new functional measure with an acceptable level of content validity titled Zambia Spina Bifida Functional Measure (ZSBFM) was developed. It was designed to evaluate effectiveness of interventions given to children with SB from the age of 6 months to 5 years. Psychometric properties of reliability and construct validity were tested and are reported in another study. PMID:28951850
Taylor, William J; Redden, David; Dalbeth, Nicola; Schumacher, H Ralph; Edwards, N Lawrence; Simon, Lee S; John, Markus R; Essex, Margaret N; Watson, Douglas J; Evans, Robert; Rome, Keith; Singh, Jasvinder A
2014-01-01
Objective To determine the extent to which instruments that measure core outcome domains in acute gout fulfil the OMERACT filter requirements of truth, discrimination and feasibility. Methods Patient-level data from four randomised controlled trials of agents designed to treat acute gout and one observational study of acute gout were analysed. For each available measure construct validity, test-retest reliability, within-group change using effect size, between-group change using the Kruskall-Wallis statistic and repeated measures generalised estimating equations were assessed. Floor and ceiling effects were also assessed and MCID was estimated. These analyses were presented to participants at OMERACT 11 to help inform voting for possible endorsement. Results There was evidence for construct validity and discriminative ability for 3 measures of pain (0 to 4 Likert, 0 to 10 numeric rating scale, 0 to 100 mm visual analogue scale). Likewise, there appears to be sufficient evidence for a 4-point Likert scale to possess construct validity and discriminative ability for physician assessment of joint swelling and joint tenderness. There was some evidence for construct validity and within-group discriminative ability for the Health Assessment Questionnaire as a measure of activity limitations, but not for discrimination between groups allocated to different treatment. Conclusions There is sufficient evidence to support measures of pain (using Likert, numeric rating scale or visual analogue scales), joint tenderness and swelling (using Likert scale) as fulfilling the requirements of the OMERACT filter. Further research on a measure of activity limitations in acute gout clinical trials is required. PMID:24429178
Validating Actigraphy as a Measure of Sleep for Preschool Children
Bélanger, Marie-Ève; Bernier, Annie; Paquet, Jean; Simard, Valérie; Carrier, Julie
2013-01-01
Study Objectives: The algorithms used to derive sleep variables from actigraphy were developed with adults. Because children change position during sleep more often than adults, algorithms may detect wakefulness when the child is actually sleeping (false negative). This study compares the validity of three algorithms for detecting sleep with actigraphy by comparing them to PSG in preschoolers. The putative influence of device location (wrist or ankle) is also examined. Methods: Twelve children aged 2 to 5 years simultaneously wore an actigraph on an ankle and a wrist (Actiwatch-L, Mini-Mitter/Respironics) during a night of PSG recording at home. Three algorithms were tested: one recommended for adults and two designed to decrease false negative detection of sleep in children. Results: Actigraphy generally showed good sensitivity (> 95%; PSG sleep detection) but low specificity (± 50%; PSG wake detection). Intraclass correlations between PSG and actigraphy variables were strong (> 0.80) for sleep latency, sleep duration, and sleep efficiency, but weak for number of awakenings (< 0.40). The two algorithms designed for children enhanced the validity of actigraphy in preschoolers and increased the proportion of actigraphy-scored wake epochs scored that were also PSG-identified as wake. Sleep variables derived from the ankle and wrist were not statistically different. Conclusion: Despite the weak detection of wakefulness, Acti-watch-L appears to be a useful instrument for assessing sleep in preschoolers when used with an adapted algorithm. Citation: Bélanger M; Bernier A; Paquet J; Simard V; Julie Carrier J. Validating actigraphy as a measure of sleep for pre-school children. J Clin Sleep Med 2013;9(7):701-706. PMID:23853565
Nahathai, Wongpakaran
2012-01-01
Objective The Rosenberg Self-Esteem Scale (RSES) is a widely used instrument that has been tested for reliability and validity in many settings; however, some negative-worded items appear to have caused it to reveal low reliability in a number of studies. In this study, we revised one negative item that had previously (from the previous studies) produced the worst outcome in terms of the structure of the scale, then re-analyzed the new version for its reliability and construct validity, comparing it to the original version with respect to fit indices. Methods In total, 851 students from Chiang Mai University (mean age: 19.51±1.7, 57% of whom were female), participated in this study. Of these, 664 students completed the Thai version of the original RSES - containing five positively worded and five negatively worded items, while 187 students used the revised version containing six positively worded and four negatively worded items. Confirmatory factor analysis was applied, using a uni-dimensional model with method effects and a correlated uniqueness approach. Results The revised version showed the same level of reliability (good) as the original, but yielded a better model fit. The revised RSES demonstrated excellent fit statistics, with χ2=29.19 (df=19, n=187, p=0.063), GFI=0.970, TFI=0.969, NFI=0.964, CFI=0.987, SRMR=0.040 and RMSEA=0.054. Conclusion The revised version of the Thai RSES demonstrated an equivalent level of reliability but a better construct validity when compared to the original. PMID:22396685
A validation study of the Chinese-Cantonese Addenbrooke’s Cognitive Examination Revised (C-ACER)
Wong, LL; Chan, CC; Leung, JL; Yung, CY; Wu, KK; Cheung, SYY; Lam, CLM
2013-01-01
Background There is no valid instrument for multidomain cognitive assessment to aid the detection of mild cognitive impairment (MCI) and mild dementia in Hong Kong. This study aimed to validate the Cantonese Addenbrooke’s Cognitive Examination Revised (C-ACER) in the identification of MCI and dementia. Methods 147 participants (Dementia, n = 54; MCI, n = 50; controls, n = 43) aged 60 or above were assessed by a psychiatrist using C-ACER. The C-ACER scores were validated against the expert diagnosis according to DSM-IV criteria for dementia and Petersen criteria for MCI. Statistical analysis was performed using the receiver operating characteristic method and regression analyses. Results The optimal cut-off score for the C-ACER to differentiate MCI from normal controls was 79/80, giving the sensitivity of 0.74, specificity of 0.84 and area under curve (AUC) of 0.84. At the optimal cut-off of 73/74, C-ACER had satisfactory sensitivity (0.93), specificity (0.95) and AUC (0.98) to identify dementia from controls. Performance of C-ACER, as reflected by AUC, was not affected after adjustment of the effect of education level. Total C-ACER scores were significantly correlated with scores of global deterioration scale (Spearman’s rho = −0.73, P < 0.01). Conclusion C-ACER is a sensitive and specific bedside test to assess a broad spectrum of cognitive abilities, and to detect MCI and dementia of different severity. It can be used and interpreted with ease, without the need to adjust for education level in persons aged 60 or above. PMID:23785235
Kenyon, Lisa K.; Elliott, James M; Cheng, M. Samuel
2016-01-01
Purpose/Background Despite the availability of various field-tests for many competitive sports, a reliable and valid test specifically developed for use in men's gymnastics has not yet been developed. The Men's Gymnastics Functional Measurement Tool (MGFMT) was designed to assess sport-specific physical abilities in male competitive gymnasts. The purpose of this study was to develop the MGFMT by establishing a scoring system for individual test items and to initiate the process of establishing test-retest reliability and construct validity. Methods A total of 83 competitive male gymnasts ages 7-18 underwent testing using the MGFMT. Thirty of these subjects underwent re-testing one week later in order to assess test-retest reliability. Construct validity was assessed using a simple regression analysis between total MGFMT scores and the gymnasts’ USA-Gymnastics competitive level to calculate the coefficient of determination (r2). Test-retest reliability was analyzed using Model 1 Intraclass correlation coefficients (ICC). Statistical significance was set at the p<0.05 level. Results The relationship between total MGFMT scores and subjects’ current USA-Gymnastics competitive level was found to be good (r2 = 0.63). Reliability testing of the MGFMT composite test score showed excellent test-retest reliability over a one-week period (ICC = 0.97). Test-retest reliability of the individual component tests ranged from good to excellent (ICC = 0.75-0.97). Conclusions The results of this study provide initial support for the construct validity and test-retest reliability of the MGFMT. Level of Evidence Level 3 PMID:27999723
Liu, Sha; Zhang, Fuquan; Wang, Xijin; Shugart, Yin Yao; Zhao, Yingying; Li, Xinrong; Liu, Zhifen; Sun, Ning; Yang, Chunxia; Zhang, Kerang; Yue, Weihua; Yu, Xin; Xu, Yong
2017-11-10
There is an increasing interest in searching biomarkers for schizophrenia (SZ) diagnosis, which overcomes the drawbacks inherent with the subjective diagnostic methods. MicroRNA (miRNA) fingerprints have been explored for disease diagnosis. We performed a meta-analysis to examine miRNA diagnostic value for SZ and further validated the meta-analysis results. Using following terms: schizophrenia/SZ, microRNA/miRNA, diagnosis, sensitivity and specificity, we searched databases restricted to English language and reviewed all articles published from January 1990 to October 2016. All extracted data were statistically analyzed and the results were further validated with peripheral blood mononuclear cells (PBMNCs) isolated from patients and healthy controls using RT-qPCR and receiver operating characteristic (ROC) analysis. A total of 6 studies involving 330 patients and 202 healthy controls were included for meta-analysis. The pooled sensitivity, specificity and diagnostic odds ratio were 0.81 (95% CI: 0.75-0.86), 0.81 (95% CI: 0.72-0.88) and 18 (95% CI: 9-34), respectively; the positive and negative likelihood ratio was 4.3 and 0.24 respectively; the area under the curve in summary ROC was 0.87 (95% CI: 0.84-0.90). Validation revealed that miR-181b-5p, miR-21-5p, miR-195-5p, miR-137, miR-346 and miR-34a-5p in PBMNCs had high diagnostic sensitivity and specificity in the context of schizophrenia. In conclusion, blood-derived miRNAs might be promising biomarkers for SZ diagnosis.
Results of the 2015 Perfusionist Salary Study.
Lewis, Doreen M; Dove, Steven; Jordan, Ralph E
2016-12-01
Presently, there exists no published valid and reliable salary study of clinical perfusionists. The objective of the 2015 Perfusionist Salary Study was to gather verifiable employee information to determine current compensation market rates (salary averages) of clinical perfusionists working in the United States. A salary survey was conducted between April 2015 and March 2016. The survey required perfusionists to answer questions about work volume, scheduling, and employer-paid compensation including benefits. Participants were also required to submit a de-identified pay stub to validate the income they reported. Descriptive statistics were calculated for all survey questions (e.g., percentages, means, and ranges). The study procured 481 responses, of which 287 were validated (i.e., respondents provided income verification that matched reported earnings). Variables that were examined within the validated sample population include job title, type of institution of employment, education level, years of experience, and geographic region, among others. Additional forms of compensation which may affect base compensation rates were also calculated including benefits, call time, bonuses, and pay for ancillary services (e.g., extracorporeal membrane oxygenation and ventricular assist device). In conclusion, in 2015, the average salary for all perfusionists is $127,600 with 19 years' experience. This research explores the average salary within subpopulations based on other factors such as position role, employer type, and geography. Information from this study is presented to guide employer compensation programs and suggests the need for further study in consideration of attrition rates and generational changes (i.e., perfusionists reaching retirement age) occurring alongside the present perfusionist staffing shortage affecting many parts of the country.
Cancer-related Concerns of Spouses of Women with Breast Cancer
Fletcher, Kristin A.; Lewis, Frances Marcus; Haberman, Mel R.
2009-01-01
Objective To describe spouses' reported cancer-related demands attributed to their wife's breast cancer and to test the construct and predictive validity of a brief standardized measure of these demands. Methods Cross-sectional and longitudinal data were obtained from 151 spouses of women newly diagnosed with non-metastatic breast cancer. Descriptive statistics were computed to describe spouses' dominant cancer-related demands and multivariate regression analyses tested the construct and predictive validity of the standardized measure. Results Five categories of spouses' cancer-related demands were identified, such as concerns about: spouses' own functioning; wife's well being and response to treatment; couples' sexual activities; the family's and children's well-being; and the spouses' role in supporting their wives. A 33-item short version of the standardized measure of cancer demands demonstrated construct and predictive validity that was comparable to a 123-item version of the same questionnaire. Greater numbers of illness demands occurred when spouses were more depressed and had less confidence in their ability to manage the impact of the cancer (F=18.08 (3, 103), p<.001). Predictive validity was established by the short form's ability to significantly predict the quality of marital communication and spouses' self-efficacy at a two-month interval. Conclusion The short-version of the standardized measure of cancer-related demands shows promise for future application in clinic settings. Additional testing of the questionnaire is warranted. Spouses' breast cancer-related demands deserve attention by providers. In the absence of assisting them, spouses' illness pressures have deleterious consequences for the quality of marital communication and spouses' self-confidence. PMID:20014184
Ergo, Alex; Ritter, Julie; Gwatkin, Davidson R; Binkin, Nancy
2016-03-01
Equitable access to programs and health services is essential to achieving national and international health goals, but it is rarely assessed because of perceived measurement challenges. One of these challenges concerns the complexities of collecting the data needed to construct asset or wealth indices, which can involve asking as many as 40 survey questions, many with multiple responses. To determine whether the number of variables and questions could be reduced to a level low enough for more routine inclusion in evaluations and research without compromising programmatic conclusions, we used data from a program evaluation in Honduras that compared a pro-poor intervention with government clinic performance as well as data from a results-based financing project in Senegal. In both, the full Demographic and Health Survey (DHS) asset questionnaires had been used as part of the evaluations. Using the full DHS results as the "gold standard," we examined the effect of retaining successively smaller numbers of variables on the classification of the program clients in wealth quintiles. Principal components analysis was used to identify those variables in each country that demonstrated minimal absolute factor loading values for 8 different thresholds, ranging from 0.05 to 0.70. Cohen's kappa statistic was used to assess correlation. We found that the 111 asset variables and 41 questions in the Honduras DHS could be reduced to 9 variables, captured by only 8 survey questions (kappa statistic, 0.634), without substantially altering the wealth quintile distributions for either the pro-poor program or the government clinics or changing the resulting policy conclusions. In Senegal, the 103 asset variables and 36 questions could be reduced to 32 variables and 20 questions (kappa statistic, 0.882) while maintaining a consistent mix of users in each of the 2 lowest quintiles. Less than 60% of the asset variables in the 2 countries' full DHS asset indices overlapped, and in none of the 8 simplified asset index iterations did this proportion exceed 50%. We conclude that substantially reducing the number of variables and questions used to assess equity is feasible, producing valid results and providing a less burdensome way for program implementers or researchers to evaluate whether their interventions are pro-poor. Developing a standardized, simplified asset questionnaire that could be used across countries may prove difficult, however, given that the variables that contribute the most to the asset index are largely country-specific. © Ergo et al.
Ergo, Alex; Ritter, Julie; Gwatkin, Davidson R; Binkin, Nancy
2016-01-01
ABSTRACT Equitable access to programs and health services is essential to achieving national and international health goals, but it is rarely assessed because of perceived measurement challenges. One of these challenges concerns the complexities of collecting the data needed to construct asset or wealth indices, which can involve asking as many as 40 survey questions, many with multiple responses. To determine whether the number of variables and questions could be reduced to a level low enough for more routine inclusion in evaluations and research without compromising programmatic conclusions, we used data from a program evaluation in Honduras that compared a pro-poor intervention with government clinic performance as well as data from a results-based financing project in Senegal. In both, the full Demographic and Health Survey (DHS) asset questionnaires had been used as part of the evaluations. Using the full DHS results as the “gold standard,” we examined the effect of retaining successively smaller numbers of variables on the classification of the program clients in wealth quintiles. Principal components analysis was used to identify those variables in each country that demonstrated minimal absolute factor loading values for 8 different thresholds, ranging from 0.05 to 0.70. Cohen’s kappa statistic was used to assess correlation. We found that the 111 asset variables and 41 questions in the Honduras DHS could be reduced to 9 variables, captured by only 8 survey questions (kappa statistic, 0.634), without substantially altering the wealth quintile distributions for either the pro-poor program or the government clinics or changing the resulting policy conclusions. In Senegal, the 103 asset variables and 36 questions could be reduced to 32 variables and 20 questions (kappa statistic, 0.882) while maintaining a consistent mix of users in each of the 2 lowest quintiles. Less than 60% of the asset variables in the 2 countries’ full DHS asset indices overlapped, and in none of the 8 simplified asset index iterations did this proportion exceed 50%. We conclude that substantially reducing the number of variables and questions used to assess equity is feasible, producing valid results and providing a less burdensome way for program implementers or researchers to evaluate whether their interventions are pro-poor. Developing a standardized, simplified asset questionnaire that could be used across countries may prove difficult, however, given that the variables that contribute the most to the asset index are largely country-specific. PMID:27016551
Pasta nucleosynthesis: Molecular dynamics simulations of nuclear statistical equilibrium
NASA Astrophysics Data System (ADS)
Caplan, M. E.; Schneider, A. S.; Horowitz, C. J.; Berry, D. K.
2015-06-01
Background: Exotic nonspherical nuclear pasta shapes are expected in nuclear matter at just below saturation density because of competition between short-range nuclear attraction and long-range Coulomb repulsion. Purpose: We explore the impact nuclear pasta may have on nucleosynthesis during neutron star mergers when cold dense nuclear matter is ejected and decompressed. Methods: We use a hybrid CPU/GPU molecular dynamics (MD) code to perform decompression simulations of cold dense matter with 51 200 and 409 600 nucleons from 0.080 fm-3 down to 0.00125 fm-3 . Simulations are run for proton fractions YP= 0.05, 0.10, 0.20, 0.30, and 0.40 at temperatures T = 0.5, 0.75, and 1.0 MeV. The final composition of each simulation is obtained using a cluster algorithm and compared to a constant density run. Results: Size of nuclei in the final state of decompression runs are in good agreement with nuclear statistical equilibrium (NSE) models for temperatures of 1 MeV while constant density runs produce nuclei smaller than the ones obtained with NSE. Our MD simulations produces unphysical results with large rod-like nuclei in the final state of T =0.5 MeV runs. Conclusions: Our MD model is valid at higher densities than simple nuclear statistical equilibrium models and may help determine the initial temperatures and proton fractions of matter ejected in mergers.
What can 35 years and over 700,000 measurements tell us about noise exposure in the mining industry?
Roberts, Benjamin; Sun, Kan; Neitzel, Richard L.
2017-01-01
Objective To analyze over 700,000 cross-sectional measurements from the Mine Safety and Health Administration (MHSA) and develop statistical models to predict noise exposure for a worker. Design Descriptive statistics were used to summarize the data. Two linear regression models were used to predict noise exposure based on MSHA permissible exposure limit (PEL) and action level (AL) respectively. Two-fold cross validation was used to compare the exposure estimates from the models to actual measurements in the hold out data. The mean difference and t-statistic was calculated for each job title to determine if the model exposure predictions were significantly different from the actual data. Study Sample Measurements were acquired from MSHA through a Freedom of Information Act request. Results From 1979 to 2014 the average noise measurement has decreased. Measurements taken before the implementation of MSHA’s revised noise regulation in 2000 were on average 4.5 dBA higher than after the law came in to effect. Both models produced mean exposure predictions that were less than 1 dBA different compared to the holdout data. Conclusion Overall noise levels in mines have been decreasing. However, this decrease has not been uniform across all mining sectors. The exposure predictions from the model will be useful to help predict hearing loss in workers from the mining industry. PMID:27871188
Students' Initial Knowledge State and Test Design: Towards a Valid and Reliable Test Instrument
ERIC Educational Resources Information Center
CoPo, Antonio Roland I.
2015-01-01
Designing a good test instrument involves specifications, test construction, validation, try-out, analysis and revision. The initial knowledge state of forty (40) tertiary students enrolled in Business Statistics course was determined and the same test instrument undergoes validation. The designed test instrument did not only reveal the baseline…
The Michigan Alcoholism Screening Test (MAST): A Statistical Validation Analysis
ERIC Educational Resources Information Center
Laux, John M.; Newman, Isadore; Brown, Russ
2004-01-01
This study extends the Michigan Alcoholism Screening Test (MAST; M. L. Selzer, 1971) literature base by examining 4 issues related to the validity of the MAST scores. Specifically, the authors examine the validity of the MAST scores in light of the presence of impression management, participant demographic variables, and item endorsement…
Complete Statistical Survey Results of 1982 Texas Competency Validation Project.
ERIC Educational Resources Information Center
Rogers, Sandra K.; Dahlberg, Maurine F.
This report documents a project to develop current statewide validated competencies for auto mechanics, diesel mechanics, welding, office occupations, and printing. Section 1 describes the four steps used in the current competency validation project and provides a standardized process for conducting future studies at the local or statewide level.…
Analyzing the Validity of the Adult-Adolescent Parenting Inventory for Low-Income Populations
ERIC Educational Resources Information Center
Lawson, Michael A.; Alameda-Lawson, Tania; Byrnes, Edward
2017-01-01
Objectives: The purpose of this study was to examine the construct and predictive validity of the Adult-Adolescent Parenting Inventory (AAPI-2). Methods: The validity of the AAPI-2 was evaluated using multiple statistical methods, including exploratory factor analysis, confirmatory factor analysis, and latent class analysis. These analyses were…
The discriminant (and convergent) validity of the Personality Inventory for DSM-5.
Crego, Cristina; Gore, Whitney L; Rojas, Stephanie L; Widiger, Thomas A
2015-10-01
A considerable body of research has rapidly accumulated with respect to the validity of the Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5) dimensional trait model as it is assessed by the Personality Inventory for Diagnostic and Statistical Manual of Mental Disorders (PID-5; Krueger et al., 2012). This research though has not focused specifically on discriminant validity, although allusions to potentially problematic discriminant validity have been raised. The current study addressed discriminant validity, reporting for the first time the correlations among the PID-5 domain scales. Also reported are the bivariate correlations of the 25 PID-5 maladaptive trait scales with the personality domain scales of the NEO Personality Inventory-Revised (Costa & McCrae, 1992), the International Personality Item Pool-NEO (Goldberg et al., 2006), the Inventory of Personal Characteristics (Almagor et al., 1995), the 5-Dimensional Personality Test (van Kampen, 2012), and the HEXACO Personality Inventory-Revised (Lee & Ashton, 2004). The results are discussed with respect to the implications of and alternative explanations for potentially problematic discriminant validity. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
Gaudin, Valérie
2017-09-01
Screening methods are used as a first-line approach to detect the presence of antibiotic residues in food of animal origin. The validation process guarantees that the method is fit-for-purpose, suited to regulatory requirements, and provides evidence of its performance. This article is focused on intra-laboratory validation. The first step in validation is characterisation of performance, and the second step is the validation itself with regard to pre-established criteria. The validation approaches can be absolute (a single method) or relative (comparison of methods), overall (combination of several characteristics in one) or criterion-by-criterion. Various approaches to validation, in the form of regulations, guidelines or standards, are presented and discussed to draw conclusions on their potential application for different residue screening methods, and to determine whether or not they reach the same conclusions. The approach by comparison of methods is not suitable for screening methods for antibiotic residues. The overall approaches, such as probability of detection (POD) and accuracy profile, are increasingly used in other fields of application. They may be of interest for screening methods for antibiotic residues. Finally, the criterion-by-criterion approach (Decision 2002/657/EC and of European guideline for the validation of screening methods), usually applied to the screening methods for antibiotic residues, introduced a major characteristic and an improvement in the validation, i.e. the detection capability (CCβ). In conclusion, screening methods are constantly evolving, thanks to the development of new biosensors or liquid chromatography coupled to tandem-mass spectrometry (LC-MS/MS) methods. There have been clear changes in validation approaches these last 20 years. Continued progress is required and perspectives for future development of guidelines, regulations and standards for validation are presented here.
Selecting the "Best" Factor Structure and Moving Measurement Validation Forward: An Illustration.
Schmitt, Thomas A; Sass, Daniel A; Chappelle, Wayne; Thompson, William
2018-04-09
Despite the broad literature base on factor analysis best practices, research seeking to evaluate a measure's psychometric properties frequently fails to consider or follow these recommendations. This leads to incorrect factor structures, numerous and often overly complex competing factor models and, perhaps most harmful, biased model results. Our goal is to demonstrate a practical and actionable process for factor analysis through (a) an overview of six statistical and psychometric issues and approaches to be aware of, investigate, and report when engaging in factor structure validation, along with a flowchart for recommended procedures to understand latent factor structures; (b) demonstrating these issues to provide a summary of the updated Posttraumatic Stress Disorder Checklist (PCL-5) factor models and a rationale for validation; and (c) conducting a comprehensive statistical and psychometric validation of the PCL-5 factor structure to demonstrate all the issues we described earlier. Considering previous research, the PCL-5 was evaluated using a sample of 1,403 U.S. Air Force remotely piloted aircraft operators with high levels of battlefield exposure. Previously proposed PCL-5 factor structures were not supported by the data, but instead a bifactor model is arguably more statistically appropriate.
Magalhães, Eunice; Calheiros, María M
2015-01-01
Although the significant scientific advances on place attachment literature, no instruments exist specifically developed or adapted to residential care. 410 adolescents (11 - 18 years old) participated in this study. The place attachment scale evaluates five dimensions: Place identity, Place dependence, Institutional bonding, Caregivers bonding and Friend bonding. Data analysis included descriptive statistics, content validity, construct validity (Confirmatory Factor Analysis), concurrent validity with correlations with satisfaction with life and with institution, and reliability evidences. The relationship with individual characteristics and placement length was also verified. Content validity analysis revealed that more than half of the panellists perceive all the items as relevant to assess the construct in residential care. The structure with five dimensions revealed good fit statistics and concurrent validity evidences were found, with significant correlations with satisfaction with life and with the institution. Acceptable values of internal consistence and specific gender differences were found. The preliminary psychometric properties of this scale suggest it potential to be used with youth in care.
Individualism: a valid and important dimension of cultural differences between nations.
Schimmack, Ulrich; Oishi, Shigehiro; Diener, Ed
2005-01-01
Oyserman, Coon, and Kemmelmeier's (2002) meta-analysis suggested problems in the measurement of individualism and collectivism. Studies using Hofstede's individualism scores show little convergent validity with more recent measures of individualism and collectivism. We propose that the lack of convergent validity is due to national differences in response styles. Whereas Hofstede statistically controlled for response styles, Oyserman et al.'s meta-analysis relied on uncorrected ratings. Data from an international student survey demonstrated convergent validity between Hofstede's individualism dimension and horizontal individualism when response styles were statistically controlled, whereas uncorrected scores correlated highly with the individualism scores in Oyserman et al.'s meta-analysis. Uncorrected horizontal individualism scores and meta-analytic individualism scores did not correlate significantly with nations' development, whereas corrected horizontal individualism scores and Hofstede's individualism dimension were significantly correlated with development. This pattern of results suggests that individualism is a valid construct for cross-cultural comparisons, but that the measurement of this construct needs improvement.
Mindful attention and awareness: relationships with psychopathology and emotion regulation.
Gregório, Sónia; Pinto-Gouveia, José
2013-01-01
The growing interest in mindfulness from the scientific community has originated several self-report measures of this psychological construct. The Mindful Attention and Awareness Scale (MAAS) is a self-report measure of mindfulness at a trait-level. This paper aims at exploring MAAS psychometric characteristics and validating it for the Portuguese population. The first two studies replicate some of the original author's statistical procedures in two different samples from the Portuguese general community population, in particular confirmatory factor analyses. Results from both analyses confirmed the scale single-factor structure and indicated a very good reliability. Moreover, cross-validation statistics showed that this single-factor structure is valid for different respondents from the general community population. In the third study the Portuguese version of the MAAS was found to have good convergent and discriminant validities. Overall the findings support the psychometric validity of the Portuguese version of MAAS and suggest this is a reliable self-report measure of trait-mindfulness, a central construct in Clinical Psychology research and intervention fields.
Hazing DEOCS 4.1 Construct Validity Summary
2017-08-01
Hazing DEOCS 4.1 Construct Validity Summary DEFENSE EQUAL OPPORTUNITY MANAGEMENT INSTITUTE DIRECTORATE OF...the analysis. Tables 4 – 6 provide additional information regarding the descriptive statistics and reliability of the Hazing items. Table 7 provides
2013-01-01
Background In recent years response rates on telephone surveys have been declining. Rates for the behavioral risk factor surveillance system (BRFSS) have also declined, prompting the use of new methods of weighting and the inclusion of cell phone sampling frames. A number of scholars and researchers have conducted studies of the reliability and validity of the BRFSS estimates in the context of these changes. As the BRFSS makes changes in its methods of sampling and weighting, a review of reliability and validity studies of the BRFSS is needed. Methods In order to assess the reliability and validity of prevalence estimates taken from the BRFSS, scholarship published from 2004–2011 dealing with tests of reliability and validity of BRFSS measures was compiled and presented by topics of health risk behavior. Assessments of the quality of each publication were undertaken using a categorical rubric. Higher rankings were achieved by authors who conducted reliability tests using repeated test/retest measures, or who conducted tests using multiple samples. A similar rubric was used to rank validity assessments. Validity tests which compared the BRFSS to physical measures were ranked higher than those comparing the BRFSS to other self-reported data. Literature which undertook more sophisticated statistical comparisons was also ranked higher. Results Overall findings indicated that BRFSS prevalence rates were comparable to other national surveys which rely on self-reports, although specific differences are noted for some categories of response. BRFSS prevalence rates were less similar to surveys which utilize physical measures in addition to self-reported data. There is very little research on reliability and validity for some health topics, but a great deal of information supporting the validity of the BRFSS data for others. Conclusions Limitations of the examination of the BRFSS were due to question differences among surveys used as comparisons, as well as mode of data collection differences. As the BRFSS moves to incorporating cell phone data and changing weighting methods, a review of reliability and validity research indicated that past BRFSS landline only data were reliable and valid as measured against other surveys. New analyses and comparisons of BRFSS data which include the new methodologies and cell phone data will be needed to ascertain the impact of these changes on estimates in the future. PMID:23522349
[The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].
Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel
2017-01-01
The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.
Hospital-based expert model for health technology procurement planning in hospitals.
Miniati, R; Cecconi, G; Frosini, F; Dori, F; Regolini, J; Iadanza, E; Biffi Gentili, G
2014-01-01
Although in the last years technology innovation in healthcare brought big improvements in care level and patient quality of life, hospital complexity and management cost became higher. For this reason, necessity of planning for medical equipment procurement within hospitals is getting more and more important in order to sustainable provide appropriate technology for both routine activity and innovative procedures. In order to support hospital decision makers for technology procurement planning, an expert model was designed as reported in the following paper. It combines the most widely used approaches for technology evaluation by taking into consideration Health Technology Assessment (HTA) and Medical Equipment Replacement Model (MERM). The designing phases include a first definition of prioritization algorithms, then the weighting process through experts' interviews and a final step for the model validation that included both statistical testing and comparison with real decisions. In conclusion, the designed model was able to provide a semi-automated tool that through the use of multidisciplinary information is able to prioritize different requests of technology acquisition in hospitals. Validation outcomes improved the model accuracy and created different "user profiles" according to the specific needs of decision makers.
Riedel-Heller, S G; Schork, A; Matschinger, H; Angermeyer, M C
2000-02-01
According to the growing clinical interest in early indicators of dementia, numerous studies have examined the association between subjective memory complaints and cognitive performance in old age. Their results are contradictory. In this paper, studies carried out over the last 10 years are compared with regard to the study design and the assessment instruments used. The results are discussed with particular reference to the diagnostic validity of subjective memory complaints. The majority of case-control studies and cross-sectional studies of non-representative samples could not demonstrate an association between subjective memory complaints and cognitive performance. Most field studies of larger representative population samples, however, have come to the opposite conclusion. A consistent assessment of these statistically significant associations against the background of diagnostic validity showed that memory complaints cannot be taken as a clear clinical indicator for cognitive impairment. Subjective memory complaints may reflect depressive disorders and a multitude of other processes, of which an objective impairment of cognitive performance is just one aspect. As a consequence, an inclusion of subjective memory complaints as a diagnostic criterion for the diagnosis of "mild cognitive disorder" according to ICD-10 is not justified.
Epithelial Membrane Protein-2 Expression is an Early Predictor of Endometrial Cancer Development
Habeeb, Omar; Goodglick, Lee; Soslow, Robert A.; Rao, Rajiv; Gordon, Lynn K.; Schirripa, Osvaldo; Horvath, Steve; Braun, Jonathan; Seligson, David B.; Wadehra, Madhuri
2010-01-01
BACKGROUND Endometrial cancer (EC) is a common malignancy worldwide. It is often preceded by endometrial hyperplasia, whose management and risk of neoplastic progression vary. Previously, we have shown that the tetraspan protein Epithelial Membrane Protein-2 (EMP2) is a prognostic indicator for EC aggressiveness and survival. Here we validate the expression of EMP2 in EC, and further examine whether EMP2 expression within preneoplastic lesions is an early prognostic biomarker for EC development. METHODS A tissue microarray (TMA) was constructed with a wide representation of benign and malignant endometrial samples. The TMA contains a metachronous cohort of cases from individuals who either developed or did not develop EC. Intensity and frequency of EMP2 expression were assessed using immunohistochemistry. RESULTS There was a stepwise, statistically-significant increase in the average EMP2 expression from benign to hyperplasia to atypia to EC. Furthermore, detailed analysis of EMP2 expression in potentially premalignant cases demonstrated that EMP2 positivity was a strong predictor for EC development. CONCLUSION EMP2 is an early predictor of EC development in preneoplastic lesions. In addition, combined with our previous findings, these results validate that EMP2 as a novel biomarker for EC development. PMID:20578181
Spatial analysis techniques applied to uranium prospecting in Chihuahua State, Mexico
NASA Astrophysics Data System (ADS)
Hinojosa de la Garza, Octavio R.; Montero Cabrera, María Elena; Sanín, Luz H.; Reyes Cortés, Manuel; Martínez Meyer, Enrique
2014-07-01
To estimate the distribution of uranium minerals in Chihuahua, the advanced statistical model "Maximun Entropy Method" (MaxEnt) was applied. A distinguishing feature of this method is that it can fit more complex models in case of small datasets (x and y data), as is the location of uranium ores in the State of Chihuahua. For georeferencing uranium ores, a database from the United States Geological Survey and workgroup of experts in Mexico was used. The main contribution of this paper is the proposal of maximum entropy techniques to obtain the mineral's potential distribution. For this model were used 24 environmental layers like topography, gravimetry, climate (worldclim), soil properties and others that were useful to project the uranium's distribution across the study area. For the validation of the places predicted by the model, comparisons were done with other research of the Mexican Service of Geological Survey, with direct exploration of specific areas and by talks with former exploration workers of the enterprise "Uranio de Mexico". Results. New uranium areas predicted by the model were validated, finding some relationship between the model predictions and geological faults. Conclusions. Modeling by spatial analysis provides additional information to the energy and mineral resources sectors.
Diagnostic Crossover in Anorexia Nervosa and Bulimia Nervosa: Implications for DSM-V
Eddy, Kamryn T.; Dorer, David J.; Franko, Debra L.; Tahilani, Kavita; Thompson-Brenner, Heather; Herzog, David B.
2011-01-01
Objective The Diagnostic and Statistical Manual of Mental Disorders (DSM) is designed primarily as a clinical tool. Yet high rates of diagnostic “crossover” among the anorexia nervosa subtypes and bulimia nervosa may reflect problems with the validity of the current diagnostic schema, thereby limiting its clinical utility. This study was designed to examine diagnostic crossover longitudinally in anorexia nervosa and bulimia nervosa to inform the validity of the DSM-IV-TR eating disorders classification system. Method A total of 216 women with a diagnosis of anorexia nervosa or bulimia nervosa were followed for 7 years; weekly eating disorder symptom data collected using the Eating Disorder Longitudinal Interval Follow-Up Examination allowed for diagnoses to be made throughout the follow-up period. Results Over 7 years, the majority of women with anorexia nervosa experienced diagnostic crossover: more than half crossed between the restricting and binge eating/purging anorexia nervosa subtypes over time; one-third crossed over to bulimia nervosa but were likely to relapse into anorexia nervosa. Women with bulimia nervosa were unlikely to cross over to anorexia nervosa. Conclusions These findings support the longitudinal distinction of anorexia nervosa and bulimia nervosa but do not support the anorexia nervosa subtyping schema. PMID:18198267
ERIC Educational Resources Information Center
Williams, Amanda
2014-01-01
The purpose of the current research was to investigate the relationship between preference for numerical information (PNI), math self-concept, and six types of statistics anxiety in an attempt to establish support for the nomological validity of the PNI. Correlations indicate that four types of statistics anxiety were strongly related to PNI, and…