A statistical approach to selecting and confirming validation targets in -omics experiments
2012-01-01
Background Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. Results Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. Conclusions For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. PMID:22738145
Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.
Harrington, Peter de Boves
2018-01-02
Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.
ERIC Educational Resources Information Center
Idris, Khairiani; Yang, Kai-Lin
2017-01-01
This article reports the results of a mixed-methods approach to develop and validate an instrument to measure Indonesian pre-service teachers' conceptions of statistics. First, a phenomenographic study involving a sample of 44 participants uncovered six categories of conceptions of statistics. Second, an instrument of conceptions of statistics was…
LaBudde, Robert A; Harnly, James M
2012-01-01
A qualitative botanical identification method (BIM) is an analytical procedure that returns a binary result (1 = Identified, 0 = Not Identified). A BIM may be used by a buyer, manufacturer, or regulator to determine whether a botanical material being tested is the same as the target (desired) material, or whether it contains excessive nontarget (undesirable) material. The report describes the development and validation of studies for a BIM based on the proportion of replicates identified, or probability of identification (POI), as the basic observed statistic. The statistical procedures proposed for data analysis follow closely those of the probability of detection, and harmonize the statistical concepts and parameters between quantitative and qualitative method validation. Use of POI statistics also harmonizes statistical concepts for botanical, microbiological, toxin, and other analyte identification methods that produce binary results. The POI statistical model provides a tool for graphical representation of response curves for qualitative methods, reporting of descriptive statistics, and application of performance requirements. Single collaborator and multicollaborative study examples are given.
Assessing Discriminative Performance at External Validation of Clinical Prediction Models
Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W.
2016-01-01
Introduction External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. Methods We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. Results The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. Conclusion The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients. PMID:26881753
45 CFR 153.350 - Risk adjustment data validation standards.
Code of Federal Regulations, 2012 CFR
2012-10-01
... implementation of any risk adjustment software and ensure proper validation of a statistically valid sample of... respect to implementation of risk adjustment software or as a result of data validation conducted pursuant... implementation of risk adjustment software or data validation. ...
Willis, Brian H; Riley, Richard D
2017-09-20
An important question for clinicians appraising a meta-analysis is: are the findings likely to be valid in their own practice-does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity-where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple ('leave-one-out') cross-validation technique, we demonstrate how we may test meta-analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta-analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta-analysis and a tailored meta-regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within-study variance, between-study variance, study sample size, and the number of studies in the meta-analysis. Finally, we apply Vn to two published meta-analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta-analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
PCA as a practical indicator of OPLS-DA model reliability.
Worley, Bradley; Powers, Robert
Principal Component Analysis (PCA) and Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) are powerful statistical modeling tools that provide insights into separations between experimental groups based on high-dimensional spectral measurements from NMR, MS or other analytical instrumentation. However, when used without validation, these tools may lead investigators to statistically unreliable conclusions. This danger is especially real for Partial Least Squares (PLS) and OPLS, which aggressively force separations between experimental groups. As a result, OPLS-DA is often used as an alternative method when PCA fails to expose group separation, but this practice is highly dangerous. Without rigorous validation, OPLS-DA can easily yield statistically unreliable group separation. A Monte Carlo analysis of PCA group separations and OPLS-DA cross-validation metrics was performed on NMR datasets with statistically significant separations in scores-space. A linearly increasing amount of Gaussian noise was added to each data matrix followed by the construction and validation of PCA and OPLS-DA models. With increasing added noise, the PCA scores-space distance between groups rapidly decreased and the OPLS-DA cross-validation statistics simultaneously deteriorated. A decrease in correlation between the estimated loadings (added noise) and the true (original) loadings was also observed. While the validity of the OPLS-DA model diminished with increasing added noise, the group separation in scores-space remained basically unaffected. Supported by the results of Monte Carlo analyses of PCA group separations and OPLS-DA cross-validation metrics, we provide practical guidelines and cross-validatory recommendations for reliable inference from PCA and OPLS-DA models.
Austin, Peter C.; van Klaveren, David; Vergouwe, Yvonne; Nieboer, Daan; Lee, Douglas S.; Steyerberg, Ewout W.
2017-01-01
Objective Validation of clinical prediction models traditionally refers to the assessment of model performance in new patients. We studied different approaches to geographic and temporal validation in the setting of multicenter data from two time periods. Study Design and Setting We illustrated different analytic methods for validation using a sample of 14,857 patients hospitalized with heart failure at 90 hospitals in two distinct time periods. Bootstrap resampling was used to assess internal validity. Meta-analytic methods were used to assess geographic transportability. Each hospital was used once as a validation sample, with the remaining hospitals used for model derivation. Hospital-specific estimates of discrimination (c-statistic) and calibration (calibration intercepts and slopes) were pooled using random effects meta-analysis methods. I2 statistics and prediction interval width quantified geographic transportability. Temporal transportability was assessed using patients from the earlier period for model derivation and patients from the later period for model validation. Results Estimates of reproducibility, pooled hospital-specific performance, and temporal transportability were on average very similar, with c-statistics of 0.75. Between-hospital variation was moderate according to I2 statistics and prediction intervals for c-statistics. Conclusion This study illustrates how performance of prediction models can be assessed in settings with multicenter data at different time periods. PMID:27262237
Riley, Richard D.
2017-01-01
An important question for clinicians appraising a meta‐analysis is: are the findings likely to be valid in their own practice—does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity—where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple (‘leave‐one‐out’) cross‐validation technique, we demonstrate how we may test meta‐analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta‐analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta‐analysis and a tailored meta‐regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within‐study variance, between‐study variance, study sample size, and the number of studies in the meta‐analysis. Finally, we apply Vn to two published meta‐analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta‐analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28620945
Fuzzy-logic based strategy for validation of multiplex methods: example with qualitative GMO assays.
Bellocchi, Gianni; Bertholet, Vincent; Hamels, Sandrine; Moens, W; Remacle, José; Van den Eede, Guy
2010-02-01
This paper illustrates the advantages that a fuzzy-based aggregation method could bring into the validation of a multiplex method for GMO detection (DualChip GMO kit, Eppendorf). Guidelines for validation of chemical, bio-chemical, pharmaceutical and genetic methods have been developed and ad hoc validation statistics are available and routinely used, for in-house and inter-laboratory testing, and decision-making. Fuzzy logic allows summarising the information obtained by independent validation statistics into one synthetic indicator of overall method performance. The microarray technology, introduced for simultaneous identification of multiple GMOs, poses specific validation issues (patterns of performance for a variety of GMOs at different concentrations). A fuzzy-based indicator for overall evaluation is illustrated in this paper, and applied to validation data for different genetically modified elements. Remarks were drawn on the analytical results. The fuzzy-logic based rules were shown to be applicable to improve interpretation of results and facilitate overall evaluation of the multiplex method.
TSP Symposium 2012 Proceedings
2012-11-01
and Statistical Model 78 7.3 Analysis and Results 79 7.4 Threats to Validity and Limitations 85 7.5 Conclusions 86 7.6 Acknowledgments 87 7.7...Table 12: Overall Statistics of the Experiment 32 Table 13: Results of Pairwise ANOVA Analysis, Highlighting Statistically Significant Differences...we calculated the percentage of defects injected. The distribution statistics are shown in Table 2. Table 2: Mean Lower, Upper Confidence Interval
Assessing Discriminative Performance at External Validation of Clinical Prediction Models.
Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W
2016-01-01
External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients.
Murphy, Thomas; Schwedock, Julie; Nguyen, Kham; Mills, Anna; Jones, David
2015-01-01
New recommendations for the validation of rapid microbiological methods have been included in the revised Technical Report 33 release from the PDA. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This case study applies those statistical methods to accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological methods system being evaluated for water bioburden testing. Results presented demonstrate that the statistical methods described in the PDA Technical Report 33 chapter can all be successfully applied to the rapid microbiological method data sets and gave the same interpretation for equivalence to the standard method. The rapid microbiological method was in general able to pass the requirements of PDA Technical Report 33, though the study shows that there can be occasional outlying results and that caution should be used when applying statistical methods to low average colony-forming unit values. Prior to use in a quality-controlled environment, any new method or technology has to be shown to work as designed by the manufacturer for the purpose required. For new rapid microbiological methods that detect and enumerate contaminating microorganisms, additional recommendations have been provided in the revised PDA Technical Report No. 33. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This paper applies those statistical methods to analyze accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological method system being validated for water bioburden testing. The case study demonstrates that the statistical methods described in the PDA Technical Report No. 33 chapter can be successfully applied to rapid microbiological method data sets and give the same comparability results for similarity or difference as the standard method. © PDA, Inc. 2015.
Derivation and Applicability of Asymptotic Results for Multiple Subtests Person-Fit Statistics
Albers, Casper J.; Meijer, Rob R.; Tendeiro, Jorge N.
2016-01-01
In high-stakes testing, it is important to check the validity of individual test scores. Although a test may, in general, result in valid test scores for most test takers, for some test takers, test scores may not provide a good description of a test taker’s proficiency level. Person-fit statistics have been proposed to check the validity of individual test scores. In this study, the theoretical asymptotic sampling distribution of two person-fit statistics that can be used for tests that consist of multiple subtests is first discussed. Second, simulation study was conducted to investigate the applicability of this asymptotic theory for tests of finite length, in which the correlation between subtests and number of items in the subtests was varied. The authors showed that these distributions provide reasonable approximations, even for tests consisting of subtests of only 10 items each. These results have practical value because researchers do not have to rely on extensive simulation studies to simulate sampling distributions. PMID:29881053
Implementing statistical equating for MRCP(UK) Parts 1 and 2.
McManus, I C; Chis, Liliana; Fox, Ray; Waller, Derek; Tang, Peter
2014-09-26
The MRCP(UK) exam, in 2008 and 2010, changed the standard-setting of its Part 1 and Part 2 examinations from a hybrid Angoff/Hofstee method to statistical equating using Item Response Theory, the reference group being UK graduates. The present paper considers the implementation of the change, the question of whether the pass rate increased amongst non-UK candidates, any possible role of Differential Item Functioning (DIF), and changes in examination predictive validity after the change. Analysis of data of MRCP(UK) Part 1 exam from 2003 to 2013 and Part 2 exam from 2005 to 2013. Inspection suggested that Part 1 pass rates were stable after the introduction of statistical equating, but showed greater annual variation probably due to stronger candidates taking the examination earlier. Pass rates seemed to have increased in non-UK graduates after equating was introduced, but was not associated with any changes in DIF after statistical equating. Statistical modelling of the pass rates for non-UK graduates found that pass rates, in both Part 1 and Part 2, were increasing year on year, with the changes probably beginning before the introduction of equating. The predictive validity of Part 1 for Part 2 was higher with statistical equating than with the previous hybrid Angoff/Hofstee method, confirming the utility of IRT-based statistical equating. Statistical equating was successfully introduced into the MRCP(UK) Part 1 and Part 2 written examinations, resulting in higher predictive validity than the previous Angoff/Hofstee standard setting. Concerns about an artefactual increase in pass rates for non-UK candidates after equating were shown not to be well-founded. Most likely the changes resulted from a genuine increase in candidate ability, albeit for reasons which remain unclear, coupled with a cognitive illusion giving the impression of a step-change immediately after equating began. Statistical equating provides a robust standard-setting method, with a better theoretical foundation than judgemental techniques such as Angoff, and is more straightforward and requires far less examiner time to provide a more valid result. The present study provides a detailed case study of introducing statistical equating, and issues which may need to be considered with its introduction.
Development of a Research Methods and Statistics Concept Inventory
ERIC Educational Resources Information Center
Veilleux, Jennifer C.; Chapman, Kate M.
2017-01-01
Research methods and statistics are core courses in the undergraduate psychology major. To assess learning outcomes, it would be useful to have a measure that assesses research methods and statistical literacy beyond course grades. In two studies, we developed and provided initial validation results for a research methods and statistical knowledge…
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-01-01
Aims A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R2), using R2 as the primary metric of assay agreement. However, the use of R2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. Methods We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Results Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. Conclusions The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. PMID:28747393
2013-01-01
Background Relative validity (RV), a ratio of ANOVA F-statistics, is often used to compare the validity of patient-reported outcome (PRO) measures. We used the bootstrap to establish the statistical significance of the RV and to identify key factors affecting its significance. Methods Based on responses from 453 chronic kidney disease (CKD) patients to 16 CKD-specific and generic PRO measures, RVs were computed to determine how well each measure discriminated across clinically-defined groups of patients compared to the most discriminating (reference) measure. Statistical significance of RV was quantified by the 95% bootstrap confidence interval. Simulations examined the effects of sample size, denominator F-statistic, correlation between comparator and reference measures, and number of bootstrap replicates. Results The statistical significance of the RV increased as the magnitude of denominator F-statistic increased or as the correlation between comparator and reference measures increased. A denominator F-statistic of 57 conveyed sufficient power (80%) to detect an RV of 0.6 for two measures correlated at r = 0.7. Larger denominator F-statistics or higher correlations provided greater power. Larger sample size with a fixed denominator F-statistic or more bootstrap replicates (beyond 500) had minimal impact. Conclusions The bootstrap is valuable for establishing the statistical significance of RV estimates. A reasonably large denominator F-statistic (F > 57) is required for adequate power when using the RV to compare the validity of measures with small or moderate correlations (r < 0.7). Substantially greater power can be achieved when comparing measures of a very high correlation (r > 0.9). PMID:23721463
Improving the Validity of Activity of Daily Living Dependency Risk Assessment
Clark, Daniel O.; Stump, Timothy E.; Tu, Wanzhu; Miller, Douglas K.
2015-01-01
Objectives Efforts to prevent activity of daily living (ADL) dependency may be improved through models that assess older adults’ dependency risk. We evaluated whether cognition and gait speed measures improve the predictive validity of interview-based models. Method Participants were 8,095 self-respondents in the 2006 Health and Retirement Survey who were aged 65 years or over and independent in five ADLs. Incident ADL dependency was determined from the 2008 interview. Models were developed using random 2/3rd cohorts and validated in the remaining 1/3rd. Results Compared to a c-statistic of 0.79 in the best interview model, the model including cognitive measures had c-statistics of 0.82 and 0.80 while the best fitting gait speed model had c-statistics of 0.83 and 0.79 in the development and validation cohorts, respectively. Conclusion Two relatively brief models, one that requires an in-person assessment and one that does not, had excellent validity for predicting incident ADL dependency but did not significantly improve the predictive validity of the best fitting interview-based models. PMID:24652867
Sabour, Siamak
2018-03-08
The purpose of this letter, in response to Hall, Mehta, and Fackrell (2017), is to provide important knowledge about methodology and statistical issues in assessing the reliability and validity of an audiologist-administered tinnitus loudness matching test and a patient-reported tinnitus loudness rating. The author uses reference textbooks and published articles regarding scientific assessment of the validity and reliability of a clinical test to discuss the statistical test and the methodological approach in assessing validity and reliability in clinical research. Depending on the type of the variable (qualitative or quantitative), well-known statistical tests can be applied to assess reliability and validity. The qualitative variables of sensitivity, specificity, positive predictive value, negative predictive value, false positive and false negative rates, likelihood ratio positive and likelihood ratio negative, as well as odds ratio (i.e., ratio of true to false results), are the most appropriate estimates to evaluate validity of a test compared to a gold standard. In the case of quantitative variables, depending on distribution of the variable, Pearson r or Spearman rho can be applied. Diagnostic accuracy (validity) and diagnostic precision (reliability or agreement) are two completely different methodological issues. Depending on the type of the variable (qualitative or quantitative), well-known statistical tests can be applied to assess validity.
Impact of syncope on quality of life: validation of a measure in patients undergoing tilt testing.
Nave-Leal, Elisabete; Oliveira, Mário; Pais-Ribeiro, José; Santos, Sofia; Oliveira, Eunice; Alves, Teresa; Cruz Ferreira, Rui
2015-03-01
Recurrent syncope has a significant impact on quality of life. The development of measurement scales to assess this impact that are easy to use in clinical settings is crucial. The objective of the present study is a preliminary validation of the Impact of Syncope on Quality of Life questionnaire for the Portuguese population. The instrument underwent a process of translation, validation, analysis of cultural appropriateness and cognitive debriefing. A population of 39 patients with a history of recurrent syncope (>1 year) who underwent tilt testing, aged 52.1 ± 16.4 years (21-83), 43.5% male, most in active employment (n=18) or retired (n=13), constituted a convenience sample. The resulting Portuguese version is similar to the original, with 12 items in a single aggregate score, and underwent statistical validation, with assessment of reliability, validity and stability over time. With regard to reliability, the internal consistency of the scale is 0.9. Assessment of convergent and discriminant validity showed statistically significant results (p<0.01). Regarding stability over time, a test-retest of this instrument at six months after tilt testing with 22 patients of the sample who had not undergone any clinical intervention found no statistically significant changes in quality of life. The results indicate that this instrument is of value for assessing quality of life in patients with recurrent syncope in Portugal. Copyright © 2014 Sociedade Portuguesa de Cardiologia. Published by Elsevier España. All rights reserved.
NASA Astrophysics Data System (ADS)
Gutiérrez, Jose Manuel; Maraun, Douglas; Widmann, Martin; Huth, Radan; Hertig, Elke; Benestad, Rasmus; Roessler, Ole; Wibig, Joanna; Wilcke, Renate; Kotlarski, Sven
2016-04-01
VALUE is an open European network to validate and compare downscaling methods for climate change research (http://www.value-cost.eu). A key deliverable of VALUE is the development of a systematic validation framework to enable the assessment and comparison of both dynamical and statistical downscaling methods. This framework is based on a user-focused validation tree, guiding the selection of relevant validation indices and performance measures for different aspects of the validation (marginal, temporal, spatial, multi-variable). Moreover, several experiments have been designed to isolate specific points in the downscaling procedure where problems may occur (assessment of intrinsic performance, effect of errors inherited from the global models, effect of non-stationarity, etc.). The list of downscaling experiments includes 1) cross-validation with perfect predictors, 2) GCM predictors -aligned with EURO-CORDEX experiment- and 3) pseudo reality predictors (see Maraun et al. 2015, Earth's Future, 3, doi:10.1002/2014EF000259, for more details). The results of these experiments are gathered, validated and publicly distributed through the VALUE validation portal, allowing for a comprehensive community-open downscaling intercomparison study. In this contribution we describe the overall results from Experiment 1), consisting of a European wide 5-fold cross-validation (with consecutive 6-year periods from 1979 to 2008) using predictors from ERA-Interim to downscale precipitation and temperatures (minimum and maximum) over a set of 86 ECA&D stations representative of the main geographical and climatic regions in Europe. As a result of the open call for contribution to this experiment (closed in Dec. 2015), over 40 methods representative of the main approaches (MOS and Perfect Prognosis, PP) and techniques (linear scaling, quantile mapping, analogs, weather typing, linear and generalized regression, weather generators, etc.) were submitted, including information both data (downscaled values) and metadata (characterizing different aspects of the downscaling methods). This constitutes the largest and most comprehensive to date intercomparison of statistical downscaling methods. Here, we present an overall validation, analyzing marginal and temporal aspects to assess the intrinsic performance and added value of statistical downscaling methods at both annual and seasonal levels. This validation takes into account the different properties/limitations of different approaches and techniques (as reported in the provided metadata) in order to perform a fair comparison. It is pointed out that this experiment alone is not sufficient to evaluate the limitations of (MOS) bias correction techniques. Moreover, it also does not fully validate PP since we don't learn whether we have the right predictors and whether the PP assumption is valid. These problems will be analyzed in the subsequent community-open VALUE experiments 2) and 3), which will be open for participation along the present year.
2014-01-01
Background Thresholds for statistical significance are insufficiently demonstrated by 95% confidence intervals or P-values when assessing results from randomised clinical trials. First, a P-value only shows the probability of getting a result assuming that the null hypothesis is true and does not reflect the probability of getting a result assuming an alternative hypothesis to the null hypothesis is true. Second, a confidence interval or a P-value showing significance may be caused by multiplicity. Third, statistical significance does not necessarily result in clinical significance. Therefore, assessment of intervention effects in randomised clinical trials deserves more rigour in order to become more valid. Methods Several methodologies for assessing the statistical and clinical significance of intervention effects in randomised clinical trials were considered. Balancing simplicity and comprehensiveness, a simple five-step procedure was developed. Results For a more valid assessment of results from a randomised clinical trial we propose the following five-steps: (1) report the confidence intervals and the exact P-values; (2) report Bayes factor for the primary outcome, being the ratio of the probability that a given trial result is compatible with a ‘null’ effect (corresponding to the P-value) divided by the probability that the trial result is compatible with the intervention effect hypothesised in the sample size calculation; (3) adjust the confidence intervals and the statistical significance threshold if the trial is stopped early or if interim analyses have been conducted; (4) adjust the confidence intervals and the P-values for multiplicity due to number of outcome comparisons; and (5) assess clinical significance of the trial results. Conclusions If the proposed five-step procedure is followed, this may increase the validity of assessments of intervention effects in randomised clinical trials. PMID:24588900
Bruner, L H; Carr, G J; Harbell, J W; Curren, R D
2002-06-01
An approach commonly used to measure new toxicity test method (NTM) performance in validation studies is to divide toxicity results into positive and negative classifications, and the identify true positive (TP), true negative (TN), false positive (FP) and false negative (FN) results. After this step is completed, the contingent probability statistics (CPS), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) are calculated. Although these statistics are widely used and often the only statistics used to assess the performance of toxicity test methods, there is little specific guidance in the validation literature on what values for these statistics indicate adequate performance. The purpose of this study was to begin developing data-based answers to this question by characterizing the CPS obtained from an NTM whose data have a completely random association with a reference test method (RTM). Determining the CPS of this worst-case scenario is useful because it provides a lower baseline from which the performance of an NTM can be judged in future validation studies. It also provides an indication of relationships in the CPS that help identify random or near-random relationships in the data. The results from this study of randomly associated tests show that the values obtained for the statistics vary significantly depending on the cut-offs chosen, that high values can be obtained for individual statistics, and that the different measures cannot be considered independently when evaluating the performance of an NTM. When the association between results of an NTM and RTM is random the sum of the complementary pairs of statistics (sensitivity + specificity, NPV + PPV) is approximately 1, and the prevalence (i.e., the proportion of toxic chemicals in the population of chemicals) and PPV are equal. Given that combinations of high sensitivity-low specificity or low specificity-high sensitivity (i.e., the sum of the sensitivity and specificity equal to approximately 1) indicate lack of predictive capacity, an NTM having these performance characteristics should be considered no better for predicting toxicity than by chance alone.
Validation of Statistical Sampling Algorithms in Visual Sample Plan (VSP): Summary Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nuffer, Lisa L; Sego, Landon H.; Wilson, John E.
2009-02-18
The U.S. Department of Homeland Security, Office of Technology Development (OTD) contracted with a set of U.S. Department of Energy national laboratories, including the Pacific Northwest National Laboratory (PNNL), to write a Remediation Guidance for Major Airports After a Chemical Attack. The report identifies key activities and issues that should be considered by a typical major airport following an incident involving release of a toxic chemical agent. Four experimental tasks were identified that would require further research in order to supplement the Remediation Guidance. One of the tasks, Task 4, OTD Chemical Remediation Statistical Sampling Design Validation, dealt with statisticalmore » sampling algorithm validation. This report documents the results of the sampling design validation conducted for Task 4. In 2005, the Government Accountability Office (GAO) performed a review of the past U.S. responses to Anthrax terrorist cases. Part of the motivation for this PNNL report was a major GAO finding that there was a lack of validated sampling strategies in the U.S. response to Anthrax cases. The report (GAO 2005) recommended that probability-based methods be used for sampling design in order to address confidence in the results, particularly when all sample results showed no remaining contamination. The GAO also expressed a desire that the methods be validated, which is the main purpose of this PNNL report. The objective of this study was to validate probability-based statistical sampling designs and the algorithms pertinent to within-building sampling that allow the user to prescribe or evaluate confidence levels of conclusions based on data collected as guided by the statistical sampling designs. Specifically, the designs found in the Visual Sample Plan (VSP) software were evaluated. VSP was used to calculate the number of samples and the sample location for a variety of sampling plans applied to an actual release site. Most of the sampling designs validated are probability based, meaning samples are located randomly (or on a randomly placed grid) so no bias enters into the placement of samples, and the number of samples is calculated such that IF the amount and spatial extent of contamination exceeds levels of concern, at least one of the samples would be taken from a contaminated area, at least X% of the time. Hence, "validation" of the statistical sampling algorithms is defined herein to mean ensuring that the "X%" (confidence) is actually met.« less
Statistical methodology: II. Reliability and validity assessment in study design, Part B.
Karras, D J
1997-02-01
Validity measures the correspondence between a test and other purported measures of the same or similar qualities. When a reference standard exists, a criterion-based validity coefficient can be calculated. If no such standard is available, the concepts of content and construct validity may be used, but quantitative analysis may not be possible. The Pearson and Spearman tests of correlation are often used to assess the correspondence between tests, but do not account for measurement biases and may yield misleading results. Techniques that measure interest differences may be more meaningful in validity assessment, and the kappa statistic is useful for analyzing categorical variables. Questionnaires often can be designed to allow quantitative assessment of reliability and validity, although this may be difficult. Inclusion of homogeneous questions is necessary to assess reliability. Analysis is enhanced by using Likert scales or similar techniques that yield ordinal data. Validity assessment of questionnaires requires careful definition of the scope of the test and comparison with previously validated tools.
Complete Statistical Survey Results of 1982 Texas Competency Validation Project.
ERIC Educational Resources Information Center
Rogers, Sandra K.; Dahlberg, Maurine F.
This report documents a project to develop current statewide validated competencies for auto mechanics, diesel mechanics, welding, office occupations, and printing. Section 1 describes the four steps used in the current competency validation project and provides a standardized process for conducting future studies at the local or statewide level.…
Validating Coherence Measurements Using Aligned and Unaligned Coherence Functions
NASA Technical Reports Server (NTRS)
Miles, Jeffrey Hilton
2006-01-01
This paper describes a novel approach based on the use of coherence functions and statistical theory for sensor validation in a harsh environment. By the use of aligned and unaligned coherence functions and statistical theory one can test for sensor degradation, total sensor failure or changes in the signal. This advanced diagnostic approach and the novel data processing methodology discussed provides a single number that conveys this information. This number as calculated with standard statistical procedures for comparing the means of two distributions is compared with results obtained using Yuen's robust statistical method to create confidence intervals. Examination of experimental data from Kulite pressure transducers mounted in a Pratt & Whitney PW4098 combustor using spectrum analysis methods on aligned and unaligned time histories has verified the effectiveness of the proposed method. All the procedures produce good results which demonstrates how robust the technique is.
Stanisavljevic, Dejana; Trajkovic, Goran; Marinkovic, Jelena; Bukumiric, Zoran; Cirkovic, Andja; Milic, Natasa
2014-01-01
Background Medical statistics has become important and relevant for future doctors, enabling them to practice evidence based medicine. Recent studies report that students’ attitudes towards statistics play an important role in their statistics achievements. The aim of the study was to test the psychometric properties of the Serbian version of the Survey of Attitudes Towards Statistics (SATS) in order to acquire a valid instrument to measure attitudes inside the Serbian educational context. Methods The validation study was performed on a cohort of 417 medical students who were enrolled in an obligatory introductory statistics course. The SATS adaptation was based on an internationally accepted methodology for translation and cultural adaptation. Psychometric properties of the Serbian version of the SATS were analyzed through the examination of factorial structure and internal consistency. Results Most medical students held positive attitudes towards statistics. The average total SATS score was above neutral (4.3±0.8), and varied from 1.9 to 6.2. Confirmatory factor analysis validated the six-factor structure of the questionnaire (Affect, Cognitive Competence, Value, Difficulty, Interest and Effort). Values for fit indices TLI (0.940) and CFI (0.961) were above the cut-off of ≥0.90. The RMSEA value of 0.064 (0.051–0.078) was below the suggested value of ≤0.08. Cronbach’s alpha of the entire scale was 0.90, indicating scale reliability. In a multivariate regression model, self-rating of ability in mathematics and current grade point average were significantly associated with the total SATS score after adjusting for age and gender. Conclusion Present study provided the evidence for the appropriate metric properties of the Serbian version of SATS. Confirmatory factor analysis validated the six-factor structure of the scale. The SATS might be reliable and a valid instrument for identifying medical students’ attitudes towards statistics in the Serbian educational context. PMID:25405489
On the analysis of very small samples of Gaussian repeated measurements: an alternative approach.
Westgate, Philip M; Burchett, Woodrow W
2017-03-15
The analysis of very small samples of Gaussian repeated measurements can be challenging. First, due to a very small number of independent subjects contributing outcomes over time, statistical power can be quite small. Second, nuisance covariance parameters must be appropriately accounted for in the analysis in order to maintain the nominal test size. However, available statistical strategies that ensure valid statistical inference may lack power, whereas more powerful methods may have the potential for inflated test sizes. Therefore, we explore an alternative approach to the analysis of very small samples of Gaussian repeated measurements, with the goal of maintaining valid inference while also improving statistical power relative to other valid methods. This approach uses generalized estimating equations with a bias-corrected empirical covariance matrix that accounts for all small-sample aspects of nuisance correlation parameter estimation in order to maintain valid inference. Furthermore, the approach utilizes correlation selection strategies with the goal of choosing the working structure that will result in the greatest power. In our study, we show that when accurate modeling of the nuisance correlation structure impacts the efficiency of regression parameter estimation, this method can improve power relative to existing methods that yield valid inference. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
The Application of FT-IR Spectroscopy for Quality Control of Flours Obtained from Polish Producers
Ceglińska, Alicja; Reder, Magdalena; Ciemniewska-Żytkiewicz, Hanna
2017-01-01
Samples of wheat, spelt, rye, and triticale flours produced by different Polish mills were studied by both classic chemical methods and FT-IR MIR spectroscopy. An attempt was made to statistically correlate FT-IR spectral data with reference data with regard to content of various components, for example, proteins, fats, ash, and fatty acids as well as properties such as moisture, falling number, and energetic value. This correlation resulted in calibrated and validated statistical models for versatile evaluation of unknown flour samples. The calibration data set was used to construct calibration models with use of the CSR and the PLS with the leave one-out, cross-validation techniques. The calibrated models were validated with a validation data set. The results obtained confirmed that application of statistical models based on MIR spectral data is a robust, accurate, precise, rapid, inexpensive, and convenient methodology for determination of flour characteristics, as well as for detection of content of selected flour ingredients. The obtained models' characteristics were as follows: R2 = 0.97, PRESS = 2.14; R2 = 0.96, PRESS = 0.69; R2 = 0.95, PRESS = 1.27; R2 = 0.94, PRESS = 0.76, for content of proteins, lipids, ash, and moisture level, respectively. Best results of CSR models were obtained for protein, ash, and crude fat (R2 = 0.86; 0.82; and 0.78, resp.). PMID:28243483
Validating the simulation of large-scale parallel applications using statistical characteristics
Zhang, Deli; Wilke, Jeremiah; Hendry, Gilbert; ...
2016-03-01
Simulation is a widely adopted method to analyze and predict the performance of large-scale parallel applications. Validating the hardware model is highly important for complex simulations with a large number of parameters. Common practice involves calculating the percent error between the projected and the real execution time of a benchmark program. However, in a high-dimensional parameter space, this coarse-grained approach often suffers from parameter insensitivity, which may not be known a priori. Moreover, the traditional approach cannot be applied to the validation of software models, such as application skeletons used in online simulations. In this work, we present a methodologymore » and a toolset for validating both hardware and software models by quantitatively comparing fine-grained statistical characteristics obtained from execution traces. Although statistical information has been used in tasks like performance optimization, this is the first attempt to apply it to simulation validation. Lastly, our experimental results show that the proposed evaluation approach offers significant improvement in fidelity when compared to evaluation using total execution time, and the proposed metrics serve as reliable criteria that progress toward automating the simulation tuning process.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erickson, Paul A.; Liao, Chang-hsien
2007-11-15
A passive flow disturbance has been proven to enhance the conversion of fuel in a methanol-steam reformer. This study presents a statistical validation of the experiment based on a standard 2{sup k} factorial experiment design and the resulting empirical model of the enhanced hydrogen producing process. A factorial experiment design was used to statistically analyze the effects and interactions of various input factors in the experiment. Three input factors, including the number of flow disturbers, catalyst size, and reactant flow rate were investigated for their effects on the fuel conversion in the steam-reformation process. Based on the experimental results, anmore » empirical model was developed and further evaluated with an uncertainty analysis and interior point data. (author)« less
Toppi, J; Petti, M; Vecchiato, G; Cincotti, F; Salinari, S; Mattia, D; Babiloni, F; Astolfi, L
2013-01-01
Partial Directed Coherence (PDC) is a spectral multivariate estimator for effective connectivity, relying on the concept of Granger causality. Even if its original definition derived directly from information theory, two modifies were introduced in order to provide better physiological interpretations of the estimated networks: i) normalization of the estimator according to rows, ii) squared transformation. In the present paper we investigated the effect of PDC normalization on the performances achieved by applying the statistical validation process on investigated connectivity patterns under different conditions of Signal to Noise ratio (SNR) and amount of data available for the analysis. Results of the statistical analysis revealed an effect of PDC normalization only on the percentages of type I and type II errors occurred by using Shuffling procedure for the assessment of connectivity patterns. No effects of the PDC formulation resulted on the performances achieved during the validation process executed instead by means of Asymptotic Statistic approach. Moreover, the percentages of both false positives and false negatives committed by Asymptotic Statistic are always lower than those achieved by Shuffling procedure for each type of normalization.
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-02-01
A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Benjamin, Sara E; Neelon, Brian; Ball, Sarah C; Bangdiwala, Shrikant I; Ammerman, Alice S; Ward, Dianne S
2007-01-01
Background Few assessment instruments have examined the nutrition and physical activity environments in child care, and none are self-administered. Given the emerging focus on child care settings as a target for intervention, a valid and reliable measure of the nutrition and physical activity environment is needed. Methods To measure inter-rater reliability, 59 child care center directors and 109 staff completed the self-assessment concurrently, but independently. Three weeks later, a repeat self-assessment was completed by a sub-sample of 38 directors to assess test-retest reliability. To assess criterion validity, a researcher-administered environmental assessment was conducted at 69 centers and was compared to a self-assessment completed by the director. A weighted kappa test statistic and percent agreement were calculated to assess agreement for each question on the self-assessment. Results For inter-rater reliability, kappa statistics ranged from 0.20 to 1.00 across all questions. Test-retest reliability of the self-assessment yielded kappa statistics that ranged from 0.07 to 1.00. The inter-quartile kappa statistic ranges for inter-rater and test-retest reliability were 0.45 to 0.63 and 0.27 to 0.45, respectively. When percent agreement was calculated, questions ranged from 52.6% to 100% for inter-rater reliability and 34.3% to 100% for test-retest reliability. Kappa statistics for validity ranged from -0.01 to 0.79, with an inter-quartile range of 0.08 to 0.34. Percent agreement for validity ranged from 12.9% to 93.7%. Conclusion This study provides estimates of criterion validity, inter-rater reliability and test-retest reliability for an environmental nutrition and physical activity self-assessment instrument for child care. Results indicate that the self-assessment is a stable and reasonably accurate instrument for use with child care interventions. We therefore recommend the Nutrition and Physical Activity Self-Assessment for Child Care (NAP SACC) instrument to researchers and practitioners interested in conducting healthy weight intervention in child care. However, a more robust, less subjective measure would be more appropriate for researchers seeking an outcome measure to assess intervention impact. PMID:17615078
ERIC Educational Resources Information Center
Baker, Eva L.
2013-01-01
Background/Context: Education policy over the past 40 years has focused on the importance of accountability in school improvement. Although much of the scholarly discourse around testing and assessment is technical and statistical, understanding of validity by a non-specialist audience is essential as long as test results drive our educational…
NASA Astrophysics Data System (ADS)
Riandry, M. A.; Ismet, I.; Akhsan, H.
2017-09-01
This study aims to produce a valid and practical statistical physics course handout on distribution function materials based on STEM. Rowntree development model is used to produce this handout. The model consists of three stages: planning, development and evaluation stages. In this study, the evaluation stage used Tessmer formative evaluation. It consists of 5 stages: self-evaluation, expert review, one-to-one evaluation, small group evaluation and field test stages. However, the handout is limited to be tested on validity and practicality aspects, so the field test stage is not implemented. The data collection technique used walkthroughs and questionnaires. Subjects of this study are students of 6th and 8th semester of academic year 2016/2017 Physics Education Study Program of Sriwijaya University. The average result of expert review is 87.31% (very valid category). One-to-one evaluation obtained the average result is 89.42%. The result of small group evaluation is 85.92%. From one-to-one and small group evaluation stages, averagestudent response to this handout is 87,67% (very practical category). Based on the results of the study, it can be concluded that the handout is valid and practical.
Ganna, Andrea; Lee, Donghwan; Ingelsson, Erik; Pawitan, Yudi
2015-07-01
It is common and advised practice in biomedical research to validate experimental or observational findings in a population different from the one where the findings were initially assessed. This practice increases the generalizability of the results and decreases the likelihood of reporting false-positive findings. Validation becomes critical when dealing with high-throughput experiments, where the large number of tests increases the chance to observe false-positive results. In this article, we review common approaches to determine statistical thresholds for validation and describe the factors influencing the proportion of significant findings from a 'training' sample that are replicated in a 'validation' sample. We refer to this proportion as rediscovery rate (RDR). In high-throughput studies, the RDR is a function of false-positive rate and power in both the training and validation samples. We illustrate the application of the RDR using simulated data and real data examples from metabolomics experiments. We further describe an online tool to calculate the RDR using t-statistics. We foresee two main applications. First, if the validation study has not yet been collected, the RDR can be used to decide the optimal combination between the proportion of findings taken to validation and the size of the validation study. Secondly, if a validation study has already been done, the RDR estimated using the training data can be compared with the observed RDR from the validation data; hence, the success of the validation study can be assessed. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Interpreting carnivore scent-station surveys
Sargeant, G.A.; Johnson, D.H.; Berg, W.E.
1998-01-01
The scent-station survey method has been widely used to estimate trends in carnivore abundance. However, statistical properties of scent-station data are poorly understood, and the relation between scent-station indices and carnivore abundance has not been adequately evaluated. We assessed properties of scent-station indices by analyzing data collected in Minnesota during 1986-03. Visits to stations separated by <2 km were correlated for all species because individual carnivores sometimes visited several stations in succession. Thus, visits to stations had an intractable statistical distribution. Dichotomizing results for lines of 10 stations (0 or 21 visits) produced binomially distributed data that were robust to multiple visits by individuals. We abandoned 2-way comparisons among years in favor of tests for population trend, which are less susceptible to bias, and analyzed results separately for biogeographic sections of Minnesota because trends differed among sections. Before drawing inferences about carnivore population trends, we reevaluated published validation experiments. Results implicated low statistical power and confounding as possible explanations for equivocal or conflicting results of validation efforts. Long-term trends in visitation rates probably reflect real changes in populations, but poor spatial and temporal resolution, susceptibility to confounding, and low statistical power limit the usefulness of this survey method.
Analysis of model development strategies: predicting ventral hernia recurrence.
Holihan, Julie L; Li, Linda T; Askenasy, Erik P; Greenberg, Jacob A; Keith, Jerrod N; Martindale, Robert G; Roth, J Scott; Liang, Mike K
2016-11-01
There have been many attempts to identify variables associated with ventral hernia recurrence; however, it is unclear which statistical modeling approach results in models with greatest internal and external validity. We aim to assess the predictive accuracy of models developed using five common variable selection strategies to determine variables associated with hernia recurrence. Two multicenter ventral hernia databases were used. Database 1 was randomly split into "development" and "internal validation" cohorts. Database 2 was designated "external validation". The dependent variable for model development was hernia recurrence. Five variable selection strategies were used: (1) "clinical"-variables considered clinically relevant, (2) "selective stepwise"-all variables with a P value <0.20 were assessed in a step-backward model, (3) "liberal stepwise"-all variables were included and step-backward regression was performed, (4) "restrictive internal resampling," and (5) "liberal internal resampling." Variables were included with P < 0.05 for the Restrictive model and P < 0.10 for the Liberal model. A time-to-event analysis using Cox regression was performed using these strategies. The predictive accuracy of the developed models was tested on the internal and external validation cohorts using Harrell's C-statistic where C > 0.70 was considered "reasonable". The recurrence rate was 32.9% (n = 173/526; median/range follow-up, 20/1-58 mo) for the development cohort, 36.0% (n = 95/264, median/range follow-up 20/1-61 mo) for the internal validation cohort, and 12.7% (n = 155/1224, median/range follow-up 9/1-50 mo) for the external validation cohort. Internal validation demonstrated reasonable predictive accuracy (C-statistics = 0.772, 0.760, 0.767, 0.757, 0.763), while on external validation, predictive accuracy dipped precipitously (C-statistic = 0.561, 0.557, 0.562, 0.553, 0.560). Predictive accuracy was equally adequate on internal validation among models; however, on external validation, all five models failed to demonstrate utility. Future studies should report multiple variable selection techniques and demonstrate predictive accuracy on external data sets for model validation. Copyright © 2016 Elsevier Inc. All rights reserved.
Lee, Myeongjun; Kim, Hyunjung; Shin, Donghee; Lee, Sangyun
2016-01-01
Harassment means systemic and repeated unethical acts. Research on workplace harassment have been conducted widely and the NAQ-R has been widely used for the researches. But this tool, however the limitations in revealing differended in sub-factors depending on the culture and in reflecting that unique characteristics of the Koren society. So, The workplace harassment questionnaire for Korean finace and service workers has been developed to assess the level of personal harassment at work. This study aims to develop a tool to assess the level of personal harassment at work and to test its validity and reliability while examining specific characteristics of workplace harassment against finance and service workers in Korea. The framework of survey was established based on literature review, focused-group interview for the Korean finance and service workers. To verify its reliability, Cronbach's alpha coefficient was calculated; and to verify its validity, items and factors of the tool were analyzed. The correlation matrix analysis was examined to verify the tool's convergent validity and discriminant validity. Structural validity was verified by checking statistical significance in relation to the BDI-K. Cronbach's alpha coefficient of this survey was 0.93, which indicates a quite high level of reliability. To verify the appropriateness of this survey tool, its construct validity was examined through factor analysis. As a result of the factor analysis, 3 factors were extracted, explaining 56.5 % of the total variance. The loading values and communalities of the 20 items were 0.85 to 0.48 and 0.71 to 0.46. The convergent validity and discriminant validity were analyzed and rate of item discriminant validity was 100 %. Finally, for the concurrent validity, We examined the relationship between the WHI-KFSW and pschosocial stress by examining the correlation with the BDI-K. The results of chi-square test and multiple logistic analysis indicated that the correlation with the BDI-K was satatisctically significant. Workplace harassment in actual workplaces were investigated based on interviews, and the statistical analysis contributed to systematizing the types of actual workplace harassment. By statistical method, we developed the questionare, 20 items of 3 categories.
ERIC Educational Resources Information Center
Nolan, Meaghan M.; Beran, Tanya; Hecker, Kent G.
2012-01-01
Students with positive attitudes toward statistics are likely to show strong academic performance in statistics courses. Multiple surveys measuring students' attitudes toward statistics exist; however, a comparison of the validity and reliability of interpretations based on their scores is needed. A systematic review of relevant electronic…
Individualism: a valid and important dimension of cultural differences between nations.
Schimmack, Ulrich; Oishi, Shigehiro; Diener, Ed
2005-01-01
Oyserman, Coon, and Kemmelmeier's (2002) meta-analysis suggested problems in the measurement of individualism and collectivism. Studies using Hofstede's individualism scores show little convergent validity with more recent measures of individualism and collectivism. We propose that the lack of convergent validity is due to national differences in response styles. Whereas Hofstede statistically controlled for response styles, Oyserman et al.'s meta-analysis relied on uncorrected ratings. Data from an international student survey demonstrated convergent validity between Hofstede's individualism dimension and horizontal individualism when response styles were statistically controlled, whereas uncorrected scores correlated highly with the individualism scores in Oyserman et al.'s meta-analysis. Uncorrected horizontal individualism scores and meta-analytic individualism scores did not correlate significantly with nations' development, whereas corrected horizontal individualism scores and Hofstede's individualism dimension were significantly correlated with development. This pattern of results suggests that individualism is a valid construct for cross-cultural comparisons, but that the measurement of this construct needs improvement.
Mindful attention and awareness: relationships with psychopathology and emotion regulation.
Gregório, Sónia; Pinto-Gouveia, José
2013-01-01
The growing interest in mindfulness from the scientific community has originated several self-report measures of this psychological construct. The Mindful Attention and Awareness Scale (MAAS) is a self-report measure of mindfulness at a trait-level. This paper aims at exploring MAAS psychometric characteristics and validating it for the Portuguese population. The first two studies replicate some of the original author's statistical procedures in two different samples from the Portuguese general community population, in particular confirmatory factor analyses. Results from both analyses confirmed the scale single-factor structure and indicated a very good reliability. Moreover, cross-validation statistics showed that this single-factor structure is valid for different respondents from the general community population. In the third study the Portuguese version of the MAAS was found to have good convergent and discriminant validities. Overall the findings support the psychometric validity of the Portuguese version of MAAS and suggest this is a reliable self-report measure of trait-mindfulness, a central construct in Clinical Psychology research and intervention fields.
NASA Astrophysics Data System (ADS)
Ranaie, Mehrdad; Soffianian, Alireza; Pourmanafi, Saeid; Mirghaffari, Noorollah; Tarkesh, Mostafa
2018-03-01
In recent decade, analyzing the remotely sensed imagery is considered as one of the most common and widely used procedures in the environmental studies. In this case, supervised image classification techniques play a central role. Hence, taking a high resolution Worldview-3 over a mixed urbanized landscape in Iran, three less applied image classification methods including Bagged CART, Stochastic gradient boosting model and Neural network with feature extraction were tested and compared with two prevalent methods: random forest and support vector machine with linear kernel. To do so, each method was run ten time and three validation techniques was used to estimate the accuracy statistics consist of cross validation, independent validation and validation with total of train data. Moreover, using ANOVA and Tukey test, statistical difference significance between the classification methods was significantly surveyed. In general, the results showed that random forest with marginal difference compared to Bagged CART and stochastic gradient boosting model is the best performing method whilst based on independent validation there was no significant difference between the performances of classification methods. It should be finally noted that neural network with feature extraction and linear support vector machine had better processing speed than other.
An experimental validation of a statistical-based damage detection approach.
DOT National Transportation Integrated Search
2011-01-01
In this work, a previously-developed, statistical-based, damage-detection approach was validated for its ability to : autonomously detect damage in bridges. The damage-detection approach uses statistical differences in the actual and : predicted beha...
ERIC Educational Resources Information Center
O'Bryant, Monique J.
2017-01-01
The aim of this study was to validate an instrument that can be used by instructors or social scientist who are interested in evaluating statistics anxiety. The psychometric properties of the English version of the Statistical Anxiety Scale (SAS) was examined through a confirmatory factor analysis of scores from a sample of 323 undergraduate…
Kuretzki, Carlos Henrique; Campos, Antônio Carlos Ligocki; Malafaia, Osvaldo; Soares, Sandramara Scandelari Kusano de Paula; Tenório, Sérgio Bernardo; Timi, Jorge Rufino Ribas
2016-03-01
The use of information technology is often applied in healthcare. With regard to scientific research, the SINPE(c) - Integrated Electronic Protocols was created as a tool to support researchers, offering clinical data standardization. By the time, SINPE(c) lacked statistical tests obtained by automatic analysis. Add to SINPE(c) features for automatic realization of the main statistical methods used in medicine . The study was divided into four topics: check the interest of users towards the implementation of the tests; search the frequency of their use in health care; carry out the implementation; and validate the results with researchers and their protocols. It was applied in a group of users of this software in their thesis in the strict sensu master and doctorate degrees in one postgraduate program in surgery. To assess the reliability of the statistics was compared the data obtained both automatically by SINPE(c) as manually held by a professional in statistics with experience with this type of study. There was concern for the use of automatic statistical tests, with good acceptance. The chi-square, Mann-Whitney, Fisher and t-Student were considered as tests frequently used by participants in medical studies. These methods have been implemented and thereafter approved as expected. The incorporation of the automatic SINPE (c) Statistical Analysis was shown to be reliable and equal to the manually done, validating its use as a research tool for medical research.
Survey of statistical techniques used in validation studies of air pollution prediction models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bornstein, R D; Anderson, S F
1979-03-01
Statistical techniques used by meteorologists to validate predictions made by air pollution models are surveyed. Techniques are divided into the following three groups: graphical, tabular, and summary statistics. Some of the practical problems associated with verification are also discussed. Characteristics desired in any validation program are listed and a suggested combination of techniques that possesses many of these characteristics is presented.
Collins, Anne; Ross, Janine
2017-01-01
We performed a systematic review to identify all original publications describing the asymmetric inheritance of cellular organelles in normal animal eukaryotic cells and to critique the validity and imprecision of the evidence. Searches were performed in Embase, MEDLINE and Pubmed up to November 2015. Screening of titles, abstracts and full papers was performed by two independent reviewers. Data extraction and validity were performed by one reviewer and checked by a second reviewer. Study quality was assessed using the SYRCLE risk of bias tool, for animal studies and by developing validity tools for the experimental model, organelle markers and imprecision. A narrative data synthesis was performed. We identified 31 studies (34 publications) of the asymmetric inheritance of organelles after mitotic or meiotic division. Studies for the asymmetric inheritance of centrosomes (n = 9); endosomes (n = 6), P granules (n = 4), the midbody (n = 3), mitochondria (n = 3), proteosomes (n = 2), spectrosomes (n = 2), cilia (n = 2) and endoplasmic reticulum (n = 2) were identified. Asymmetry was defined and quantified by variable methods. Assessment of the statistical reliability of the results indicated only two studies (7%) were judged to have low concern, the majority of studies (77%) were 'unclear' and five (16%) were judged to have 'high concerns'; the main reasons were low technical repeats (<10). Assessment of model validity indicated that the majority of studies (61%) were judged to be valid, ten studies (32%) were unclear and two studies (7%) were judged to have 'high concerns'; both described 'stem cells' without providing experimental evidence to confirm this (pluripotency and self-renewal). Assessment of marker validity indicated that no studies had low concern, most studies were unclear (96.5%), indicating there were insufficient details to judge if the markers were appropriate. One study had high concern for marker validity due to the contradictory results of two markers for the same organelle. For most studies the validity and imprecision of results could not be confirmed. In particular, data were limited due to a lack of reporting of interassay variability, sample size calculations, controls and functional validation of organelle markers. An evaluation of 16 systematic reviews containing cell assays found that only 50% reported adherence to PRISMA or ARRIVE reporting guidelines and 38% reported a formal risk of bias assessment. 44% of the reviews did not consider how relevant or valid the models were to the research question. 75% reviews did not consider how valid the markers were. 69% of reviews did not consider the impact of the statistical reliability of the results. Future systematic reviews in basic or preclinical research should ensure the rigorous reporting of the statistical reliability of the results in addition to the validity of the methods. Increased awareness of the importance of reporting guidelines and validation tools is needed for the scientific community. PMID:28562636
The discriminant (and convergent) validity of the Personality Inventory for DSM-5.
Crego, Cristina; Gore, Whitney L; Rojas, Stephanie L; Widiger, Thomas A
2015-10-01
A considerable body of research has rapidly accumulated with respect to the validity of the Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5) dimensional trait model as it is assessed by the Personality Inventory for Diagnostic and Statistical Manual of Mental Disorders (PID-5; Krueger et al., 2012). This research though has not focused specifically on discriminant validity, although allusions to potentially problematic discriminant validity have been raised. The current study addressed discriminant validity, reporting for the first time the correlations among the PID-5 domain scales. Also reported are the bivariate correlations of the 25 PID-5 maladaptive trait scales with the personality domain scales of the NEO Personality Inventory-Revised (Costa & McCrae, 1992), the International Personality Item Pool-NEO (Goldberg et al., 2006), the Inventory of Personal Characteristics (Almagor et al., 1995), the 5-Dimensional Personality Test (van Kampen, 2012), and the HEXACO Personality Inventory-Revised (Lee & Ashton, 2004). The results are discussed with respect to the implications of and alternative explanations for potentially problematic discriminant validity. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
Enders, Felicity
2013-12-01
Although regression is widely used for reading and publishing in the medical literature, no instruments were previously available to assess students' understanding. The goal of this study was to design and assess such an instrument for graduate students in Clinical and Translational Science and Public Health. A 27-item REsearch on Global Regression Expectations in StatisticS (REGRESS) quiz was developed through an iterative process. Consenting students taking a course on linear regression in a Clinical and Translational Science program completed the quiz pre- and postcourse. Student results were compared to practicing statisticians with a master's or doctoral degree in statistics or a closely related field. Fifty-two students responded precourse, 59 postcourse , and 22 practicing statisticians completed the quiz. The mean (SD) score was 9.3 (4.3) for students precourse and 19.0 (3.5) postcourse (P < 0.001). Postcourse students had similar results to practicing statisticians (mean (SD) of 20.1(3.5); P = 0.21). Students also showed significant improvement pre/postcourse in each of six domain areas (P < 0.001). The REGRESS quiz was internally reliable (Cronbach's alpha 0.89). The initial validation is quite promising with statistically significant and meaningful differences across time and study populations. Further work is needed to validate the quiz across multiple institutions. © 2013 Wiley Periodicals, Inc.
Ciani, Oriana; Davis, Sarah; Tappenden, Paul; Garside, Ruth; Stein, Ken; Cantrell, Anna; Saad, Everardo D; Buyse, Marc; Taylor, Rod S
2014-07-01
Licensing of, and coverage decisions on, new therapies should rely on evidence from patient-relevant endpoints such as overall survival (OS). Nevertheless, evidence from surrogate endpoints may also be useful, as it may not only expedite the regulatory approval of new therapies but also inform coverage decisions. It is, therefore, essential that candidate surrogate endpoints be properly validated. However, there is no consensus on statistical methods for such validation and on how the evidence thus derived should be applied by policy makers. We review current statistical approaches to surrogate-endpoint validation based on meta-analysis in various advanced-tumor settings. We assessed the suitability of two surrogates (progression-free survival [PFS] and time-to-progression [TTP]) using three current validation frameworks: Elston and Taylor's framework, the German Institute of Quality and Efficiency in Health Care's (IQWiG) framework and the Biomarker-Surrogacy Evaluation Schema (BSES3). A wide variety of statistical methods have been used to assess surrogacy. The strength of the association between the two surrogates and OS was generally low. The level of evidence (observation-level versus treatment-level) available varied considerably by cancer type, by evaluation tools and was not always consistent even within one specific cancer type. Not in all solid tumors the treatment-level association between PFS or TTP and OS has been investigated. According to IQWiG's framework, only PFS achieved acceptable evidence of surrogacy in metastatic colorectal and ovarian cancer treated with cytotoxic agents. Our study emphasizes the challenges of surrogate-endpoint validation and the importance of building consensus on the development of evaluation frameworks.
Validation of virtual-reality-based simulations for endoscopic sinus surgery.
Dharmawardana, N; Ruthenbeck, G; Woods, C; Elmiyeh, B; Diment, L; Ooi, E H; Reynolds, K; Carney, A S
2015-12-01
Virtual reality (VR) simulators provide an alternative to real patients for practicing surgical skills but require validation to ensure accuracy. Here, we validate the use of a virtual reality sinus surgery simulator with haptic feedback for training in Otorhinolaryngology - Head & Neck Surgery (OHNS). Participants were recruited from final-year medical students, interns, resident medical officers (RMOs), OHNS registrars and consultants. All participants completed an online questionnaire after performing four separate simulation tasks. These were then used to assess face, content and construct validity. anova with post hoc correlation was used for statistical analysis. The following groups were compared: (i) medical students/interns, (ii) RMOs, (iii) registrars and (iv) consultants. Face validity results had a statistically significant (P < 0.05) difference between the consultant group and others, while there was no significant difference between medical student/intern and RMOs. Variability within groups was not significant. Content validity results based on consultant scoring and comments indicated that the simulations need further development in several areas to be effective for registrar-level teaching. However, students, interns and RMOs indicated that the simulations provide a useful tool for learning OHNS-related anatomy and as an introduction to ENT-specific procedures. The VR simulations have been validated for teaching sinus anatomy and nasendoscopy to medical students, interns and RMOs. However, they require further development before they can be regarded as a valid tool for more advanced surgical training. © 2015 John Wiley & Sons Ltd.
Selecting the "Best" Factor Structure and Moving Measurement Validation Forward: An Illustration.
Schmitt, Thomas A; Sass, Daniel A; Chappelle, Wayne; Thompson, William
2018-04-09
Despite the broad literature base on factor analysis best practices, research seeking to evaluate a measure's psychometric properties frequently fails to consider or follow these recommendations. This leads to incorrect factor structures, numerous and often overly complex competing factor models and, perhaps most harmful, biased model results. Our goal is to demonstrate a practical and actionable process for factor analysis through (a) an overview of six statistical and psychometric issues and approaches to be aware of, investigate, and report when engaging in factor structure validation, along with a flowchart for recommended procedures to understand latent factor structures; (b) demonstrating these issues to provide a summary of the updated Posttraumatic Stress Disorder Checklist (PCL-5) factor models and a rationale for validation; and (c) conducting a comprehensive statistical and psychometric validation of the PCL-5 factor structure to demonstrate all the issues we described earlier. Considering previous research, the PCL-5 was evaluated using a sample of 1,403 U.S. Air Force remotely piloted aircraft operators with high levels of battlefield exposure. Previously proposed PCL-5 factor structures were not supported by the data, but instead a bifactor model is arguably more statistically appropriate.
Statistical validation of normal tissue complication probability models.
Xu, Cheng-Jian; van der Schaaf, Arjen; Van't Veld, Aart A; Langendijk, Johannes A; Schilstra, Cornelis
2012-09-01
To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. A penalized regression method, LASSO (least absolute shrinkage and selection operator), was used to build NTCP models for xerostomia after radiation therapy treatment of head-and-neck cancer. Model assessment was based on the likelihood function and the area under the receiver operating characteristic curve. Repeated double cross-validation showed the uncertainty and instability of the NTCP models and indicated that the statistical significance of model performance can be obtained by permutation testing. Repeated double cross-validation and permutation tests are recommended to validate NTCP models before clinical use. Copyright © 2012 Elsevier Inc. All rights reserved.
Vahedi, Shahram; Farrokhi, Farahman
2011-01-01
Objective The aim of this study is to explore the confirmatory factor analysis results of the Persian adaptation of Statistics Anxiety Measure (SAM), proposed by Earp. Method The validity and reliability assessments of the scale were performed on 298 college students chosen randomly from Tabriz University in Iran. Confirmatory factor analysis (CFA) was carried out to determine the factor structures of the Persian adaptation of SAM. Results As expected, the second order model provided a better fit to the data than the three alternative models. Conclusions Hence, SAM provides an equally valid measure for use among college students. The study both expands and adds support to the existing body of math anxiety literature. PMID:22952530
Adams, James; Kruger, Uwe; Geis, Elizabeth; Gehn, Eva; Fimbres, Valeria; Pollard, Elena; Mitchell, Jessica; Ingram, Julie; Hellmers, Robert; Quig, David; Hahn, Juergen
2017-01-01
Introduction A number of previous studies examined a possible association of toxic metals and autism, and over half of those studies suggest that toxic metal levels are different in individuals with Autism Spectrum Disorders (ASD). Additionally, several studies found that those levels correlate with the severity of ASD. Methods In order to further investigate these points, this paper performs the most detailed statistical analysis to date of a data set in this field. First morning urine samples were collected from 67 children and adults with ASD and 50 neurotypical controls of similar age and gender. The samples were analyzed to determine the levels of 10 urinary toxic metals (UTM). Autism-related symptoms were assessed with eleven behavioral measures. Statistical analysis was used to distinguish participants on the ASD spectrum and neurotypical participants based upon the UTM data alone. The analysis also included examining the association of autism severity with toxic metal excretion data using linear and nonlinear analysis. “Leave-one-out” cross-validation was used to ensure statistical independence of results. Results and Discussion Average excretion levels of several toxic metals (lead, tin, thallium, antimony) were significantly higher in the ASD group. However, ASD classification using univariate statistics proved difficult due to large variability, but nonlinear multivariate statistical analysis significantly improved ASD classification with Type I/II errors of 15% and 18%, respectively. These results clearly indicate that the urinary toxic metal excretion profiles of participants in the ASD group were significantly different from those of the neurotypical participants. Similarly, nonlinear methods determined a significantly stronger association between the behavioral measures and toxic metal excretion. The association was strongest for the Aberrant Behavior Checklist (including subscales on Irritability, Stereotypy, Hyperactivity, and Inappropriate Speech), but significant associations were found for UTM with all eleven autism-related assessments with cross-validation R2 values ranging from 0.12–0.48. PMID:28068407
Risk-based Methodology for Validation of Pharmaceutical Batch Processes.
Wiles, Frederick
2013-01-01
In January 2011, the U.S. Food and Drug Administration published new process validation guidance for pharmaceutical processes. The new guidance debunks the long-held industry notion that three consecutive validation batches or runs are all that are required to demonstrate that a process is operating in a validated state. Instead, the new guidance now emphasizes that the level of monitoring and testing performed during process performance qualification (PPQ) studies must be sufficient to demonstrate statistical confidence both within and between batches. In some cases, three qualification runs may not be enough. Nearly two years after the guidance was first published, little has been written defining a statistical methodology for determining the number of samples and qualification runs required to satisfy Stage 2 requirements of the new guidance. This article proposes using a combination of risk assessment, control charting, and capability statistics to define the monitoring and testing scheme required to show that a pharmaceutical batch process is operating in a validated state. In this methodology, an assessment of process risk is performed through application of a process failure mode, effects, and criticality analysis (PFMECA). The output of PFMECA is used to select appropriate levels of statistical confidence and coverage which, in turn, are used in capability calculations to determine when significant Stage 2 (PPQ) milestones have been met. The achievement of Stage 2 milestones signals the release of batches for commercial distribution and the reduction of monitoring and testing to commercial production levels. Individuals, moving range, and range/sigma charts are used in conjunction with capability statistics to demonstrate that the commercial process is operating in a state of statistical control. The new process validation guidance published by the U.S. Food and Drug Administration in January of 2011 indicates that the number of process validation batches or runs required to demonstrate that a pharmaceutical process is operating in a validated state should be based on sound statistical principles. The old rule of "three consecutive batches and you're done" is no longer sufficient. The guidance, however, does not provide any specific methodology for determining the number of runs required, and little has been published to augment this shortcoming. The paper titled "Risk-based Methodology for Validation of Pharmaceutical Batch Processes" describes a statistically sound methodology for determining when a statistically valid number of validation runs has been acquired based on risk assessment and calculation of process capability.
NASA Astrophysics Data System (ADS)
Steger, Stefan; Brenning, Alexander; Bell, Rainer; Glade, Thomas
2016-12-01
There is unanimous agreement that a precise spatial representation of past landslide occurrences is a prerequisite to produce high quality statistical landslide susceptibility models. Even though perfectly accurate landslide inventories rarely exist, investigations of how landslide inventory-based errors propagate into subsequent statistical landslide susceptibility models are scarce. The main objective of this research was to systematically examine whether and how inventory-based positional inaccuracies of different magnitudes influence modelled relationships, validation results, variable importance and the visual appearance of landslide susceptibility maps. The study was conducted for a landslide-prone site located in the districts of Amstetten and Waidhofen an der Ybbs, eastern Austria, where an earth-slide point inventory was available. The methodological approach comprised an artificial introduction of inventory-based positional errors into the present landslide data set and an in-depth evaluation of subsequent modelling results. Positional errors were introduced by artificially changing the original landslide position by a mean distance of 5, 10, 20, 50 and 120 m. The resulting differently precise response variables were separately used to train logistic regression models. Odds ratios of predictor variables provided insights into modelled relationships. Cross-validation and spatial cross-validation enabled an assessment of predictive performances and permutation-based variable importance. All analyses were additionally carried out with synthetically generated data sets to further verify the findings under rather controlled conditions. The results revealed that an increasing positional inventory-based error was generally related to increasing distortions of modelling and validation results. However, the findings also highlighted that interdependencies between inventory-based spatial inaccuracies and statistical landslide susceptibility models are complex. The systematic comparisons of 12 models provided valuable evidence that the respective error-propagation was not only determined by the degree of positional inaccuracy inherent in the landslide data, but also by the spatial representation of landslides and the environment, landslide magnitude, the characteristics of the study area, the selected classification method and an interplay of predictors within multiple variable models. Based on the results, we deduced that a direct propagation of minor to moderate inventory-based positional errors into modelling results can be partly counteracted by adapting the modelling design (e.g. generalization of input data, opting for strongly generalizing classifiers). Since positional errors within landslide inventories are common and subsequent modelling and validation results are likely to be distorted, the potential existence of inventory-based positional inaccuracies should always be considered when assessing landslide susceptibility by means of empirical models.
ERIC Educational Resources Information Center
Godfrey, Kelly E.; Jagesic, Sanja
2016-01-01
The College-Level Examination Program® (CLEP®) is a computer-based prior-learning assessment that allows examinees the opportunity to demonstrate mastery of knowledge and skills necessary to earn postsecondary course credit in higher education. Currently, there are 33 exams in five subject areas: composition and literature, world languages,…
ERIC Educational Resources Information Center
Clemens, Nathan H.; Hagan-Burke, Shanna; Luo, Wen; Cerda, Carissa; Blakely, Alane; Frosch, Jennifer; Gamez-Patience, Brenda; Jones, Meredith
2015-01-01
This study examined the predictive validity of a computer-adaptive assessment for measuring kindergarten reading skills using the STAR Early Literacy (SEL) test. The findings showed that the results of SEL assessments administered during the fall, winter, and spring of kindergarten were moderate and statistically significant predictors of year-end…
Putting the "But" Back in Meta-Analysis: Issues Affecting the Validity of Quantitative Reviews.
ERIC Educational Resources Information Center
L'Hommedieu, Randi; And Others
Some of the frustrations inherent in trying to incorporate qualifications of statistical results into meta-analysis are reviewed, and some solutions are proposed to prevent the loss of information in meta-analytic reports. The validity of a meta-analysis depends on several factors, including the: thoroughness of the literature search; selection of…
ERIC Educational Resources Information Center
Koziol, Natalie A.; Bovaird, James A.
2018-01-01
Evaluations of measurement invariance provide essential construct validity evidence--a prerequisite for seeking meaning in psychological and educational research and ensuring fair testing procedures in high-stakes settings. However, the quality of such evidence is partly dependent on the validity of the resulting statistical conclusions. Type I or…
Sainz de Baranda, Pilar; Rodríguez-Iniesta, María; Ayala, Francisco; Santonja, Fernando; Cejudo, Antonio
2014-07-01
To examine the criterion-related validity of the horizontal hip joint angle (H-HJA) test and vertical hip joint angle (V-HJA) test for estimating hamstring flexibility measured through the passive straight-leg raise (PSLR) test using contemporary statistical measures. Validity study. Controlled laboratory environment. One hundred thirty-eight professional trampoline gymnasts (61 women and 77 men). Hamstring flexibility. Each participant performed 2 trials of H-HJA, V-HJA, and PSLR tests in a randomized order. The criterion-related validity of H-HJA and V-HJA tests was measured through the estimation equation, typical error of the estimate (TEEST), validity correlation (β), and their respective confidence limits. The findings from this study suggest that although H-HJA and V-HJA tests showed moderate to high validity scores for estimating hamstring flexibility (standardized TEEST = 0.63; β = 0.80), the TEEST statistic reported for both tests was not narrow enough for clinical purposes (H-HJA = 10.3 degrees; V-HJA = 9.5 degrees). Subsequently, the predicted likely thresholds for the true values that were generated were too wide (H-HJA = predicted value ± 13.2 degrees; V-HJA = predicted value ± 12.2 degrees). The results suggest that although the HJA test showed moderate to high validity scores for estimating hamstring flexibility, the prediction intervals between the HJA and PSLR tests are not strong enough to suggest that clinicians and sport medicine practitioners should use the HJA and PSLR tests interchangeably as gold standard measurement tools to evaluate and detect short hamstring muscle flexibility.
Sertdemir, Y; Burgut, R
2009-01-01
In recent years the use of surrogate end points (S) has become an interesting issue. In clinical trials, it is important to get treatment outcomes as early as possible. For this reason there is a need for surrogate endpoints (S) which are measured earlier than the true endpoint (T). However, before a surrogate endpoint can be used it must be validated. For a candidate surrogate endpoint, for example time to recurrence, the validation result may change dramatically between clinical trials. The aim of this study is to show how the validation criterion (R(2)(trial)) proposed by Buyse et al. are influenced by the magnitude of treatment effect with an application using real data. The criterion R(2)(trial) proposed by Buyse et al. (2000) is applied to the four data sets from colon cancer clinical trials (C-01, C-02, C-03 and C-04). Each clinical trial is analyzed separately for treatment effect on survival (true endpoint) and recurrence free survival (surrogate endpoint) and this analysis is done also for each center in each trial. Results are used for standard validation analysis. The centers were grouped by the Wald statistic in 3 equal groups. Validation criteria R(2)(trial) were 0.641 95% CI (0.432-0.782), 0.223 95% CI (0.008-0.503), 0.761 95% CI (0.550-0.872) and 0.560 95% CI (0.404-0.687) for C-01, C-02, C-03 and C-04 respectively. The R(2)(trial) criteria changed by the Wald statistics observed for the centers used in the validation process. Higher the Wald statistic groups are higher the R(2)(trial) values observed. The recurrence free survival is not a good surrogate for overall survival in clinical trials with non significant treatment effects and moderate for significant treatment effects. This shows that the level of significance of treatment effect should be taken into account in validation process of surrogate endpoints.
Esbenshade, Adam J; Zhao, Zhiguo; Aftandilian, Catherine; Saab, Raya; Wattier, Rachel L; Beauchemin, Melissa; Miller, Tamara P; Wilkes, Jennifer J; Kelly, Michael J; Fernbach, Alison; Jeng, Michael; Schwartz, Cindy L; Dvorak, Christopher C; Shyr, Yu; Moons, Karl G M; Sulis, Maria-Luisa; Friedman, Debra L
2017-10-01
Pediatric oncology patients are at an increased risk of invasive bacterial infection due to immunosuppression. The risk of such infection in the absence of severe neutropenia (absolute neutrophil count ≥ 500/μL) is not well established and a validated prediction model for blood stream infection (BSI) risk offers clinical usefulness. A 6-site retrospective external validation was conducted using a previously published risk prediction model for BSI in febrile pediatric oncology patients without severe neutropenia: the Esbenshade/Vanderbilt (EsVan) model. A reduced model (EsVan2) excluding 2 less clinically reliable variables also was created using the initial EsVan model derivative cohort, and was validated using all 5 external validation cohorts. One data set was used only in sensitivity analyses due to missing some variables. From the 5 primary data sets, there were a total of 1197 febrile episodes and 76 episodes of bacteremia. The overall C statistic for predicting bacteremia was 0.695, with a calibration slope of 0.50 for the original model and a calibration slope of 1.0 when recalibration was applied to the model. The model performed better in predicting high-risk bacteremia (gram-negative or Staphylococcus aureus infection) versus BSI alone, with a C statistic of 0.801 and a calibration slope of 0.65. The EsVan2 model outperformed the EsVan model across data sets with a C statistic of 0.733 for predicting BSI and a C statistic of 0.841 for high-risk BSI. The results of this external validation demonstrated that the EsVan and EsVan2 models are able to predict BSI across multiple performance sites and, once validated and implemented prospectively, could assist in decision making in clinical practice. Cancer 2017;123:3781-3790. © 2017 American Cancer Society. © 2017 American Cancer Society.
Validation of a Survey Questionnaire on Organ Donation: An Arabic World Scenario
Agarwal, Tulika Mehta; Al-Thani, Hassan; Al Maslamani, Yousuf
2018-01-01
Objective To validate a questionnaire for measuring factors influencing organ donation and transplant. Methods The constructed questionnaire was based on the theory of planned behavior by Ajzen Icek and had 45 questions including general inquiry and demographic information. Four experts on the topic, Arabic culture, and the Arabic and English languages established content validity through review. It was quantified by content validity index (CVI). Construct validity was established by principal component analysis (PCA), whereas internal consistency was checked by Cronbach's Alpha and intraclass correlation coefficient (ICC). Statistical analysis was performed by SPSS 22.0 statistical package. Results Content validity in the form of S-CVI/Average and S-CVI/UA was 0.95 and 0.82, respectively, suggesting adequate relevance content of the questionnaire. Factor analysis indicated that the construct validity for each domain (knowledge, attitudes, beliefs, and intention) was 65%, 71%, 77%, and 70%, respectively. Cronbach's Alpha and ICC coefficients were 0.90, 0.67, 0.75, and 0.74 and 0.82, 0.58, 0.61, and 0.74, respectively, for the domains. Conclusion The questionnaire consists of 39 items on knowledge, attitudes, beliefs, and intention domains which is valid and reliable tool to use for organ donation and transplant survey. PMID:29593894
Montei, Carolyn; McDougal, Susan; Mozola, Mark; Rice, Jennifer
2014-01-01
The Soleris Non-fermenting Total Viable Count method was previously validated for a wide variety of food products, including cocoa powder. A matrix extension study was conducted to validate the method for use with cocoa butter and cocoa liquor. Test samples included naturally contaminated cocoa liquor and cocoa butter inoculated with natural microbial flora derived from cocoa liquor. A probability of detection statistical model was used to compare Soleris results at multiple test thresholds (dilutions) with aerobic plate counts determined using the AOAC Official Method 966.23 dilution plating method. Results of the two methods were not statistically different at any dilution level in any of the three trials conducted. The Soleris method offers the advantage of results within 24 h, compared to the 48 h required by standard dilution plating methods.
Coble, M D; Buckleton, J; Butler, J M; Egeland, T; Fimmers, R; Gill, P; Gusmão, L; Guttman, B; Krawczak, M; Morling, N; Parson, W; Pinto, N; Schneider, P M; Sherry, S T; Willuweit, S; Prinz, M
2016-11-01
The use of biostatistical software programs to assist in data interpretation and calculate likelihood ratios is essential to forensic geneticists and part of the daily case work flow for both kinship and DNA identification laboratories. Previous recommendations issued by the DNA Commission of the International Society for Forensic Genetics (ISFG) covered the application of bio-statistical evaluations for STR typing results in identification and kinship cases, and this is now being expanded to provide best practices regarding validation and verification of the software required for these calculations. With larger multiplexes, more complex mixtures, and increasing requests for extended family testing, laboratories are relying more than ever on specific software solutions and sufficient validation, training and extensive documentation are of upmost importance. Here, we present recommendations for the minimum requirements to validate bio-statistical software to be used in forensic genetics. We distinguish between developmental validation and the responsibilities of the software developer or provider, and the internal validation studies to be performed by the end user. Recommendations for the software provider address, for example, the documentation of the underlying models used by the software, validation data expectations, version control, implementation and training support, as well as continuity and user notifications. For the internal validations the recommendations include: creating a validation plan, requirements for the range of samples to be tested, Standard Operating Procedure development, and internal laboratory training and education. To ensure that all laboratories have access to a wide range of samples for validation and training purposes the ISFG DNA commission encourages collaborative studies and public repositories of STR typing results. Published by Elsevier Ireland Ltd.
ERIC Educational Resources Information Center
Stemler, Steven E.; Grigorenko, Elena L.; Jarvin, Linda; Sternberg, Robert J.
2006-01-01
Sternberg's theory of successful intelligence was used to create augmented exams in Advanced Placement Psychology and Statistics. Participants included 1895 high school students from 19 states and 56 schools throughout the U.S. The psychometric results support the validity of creating examinations that assess memory, analytical, creative, and…
Validation of the Mini-OAKHQOL for use in patients with osteoarthritis in Spain.
Gonzalez Sáenz de Tejada, Marta; Bilbao, Amaia; Herrera, Carmen; García, Lidia; Sarasqueta, Cristina; Escobar, Antonio
2017-08-01
The Mini-Osteoarthritis Knee and Hip Quality of Life (Mini-OAKHQOL) questionnaire osteoarthritis is specific to individuals with knee or hip osteoarthritis. The objective of this study was to perform a validation of the Mini-OAKHQOL for use in Spain in terms of its psychometric properties of reliability, validity and responsiveness. Patients with osteoarthritis from the waiting list for a joint replacement completed the OAKHQOL, Short Form 36 Health Survey and Western Ontario and McMaster Universities Osteoarthritis Index. Reliability was assessed in terms of internal consistency and test-retest data, and convergent validity using Spearman's correlation coefficient. Structural validity was investigated by confirmatory factor analysis, and Rasch analysis was used to examine the unidimensionality of the scales. Responsiveness was assessed by calculating effect sizes. Confirmatory factor analysis confirmed the five-factor model, and the results of the Rasch analyses supported the unidimensionality assumption, with infit and outfit statistics. Cronbach's alpha ranged from 0.76 to 0.89 for all except the social dimensions. Statistically significant differences were observed between patients with different degrees of disease severity on all dimensions. There was convergent validity among dimensions expected to be correlated. The OAKHQOL questionnaire showed good responsiveness, with large changes for all dimensions apart from the two social dimensions, which had small effect sizes. Results of the study support the view that the Spanish version of the Mini-OAKHQOL questionnaire is a valid instrument to measure health-related quality of life in patients with osteoarthritis of the lower limb.
NASA Astrophysics Data System (ADS)
Yi, Yong; Chen, Zhengying; Wang, Liming
2018-05-01
Corona-originated discharge of DC transmission lines is the main reason for the radiated electromagnetic interference (EMI) field in the vicinity of transmission lines. A joint time-frequency analysis technique was proposed to extract the radiated EMI current (excitation current) of DC corona based on corona current statistical measurements. A reduced-scale experimental platform was setup to measure the statistical distributions of current waveform parameters of aluminum conductor steel reinforced. Based on the measured results, the peak value, root-mean-square value and average value with 9 kHz and 200 Hz band-with of 0.5 MHz radiated EMI current were calculated by the technique proposed and validated with conventional excitation function method. Radio interference (RI) was calculated based on the radiated EMI current and a wire-to-plate platform was built for the validity of the RI computation results. The reason for the certain deviation between the computations and measurements was detailed analyzed.
Sabel, Michael S.; Rice, John D.; Griffith, Kent A.; Lowe, Lori; Wong, Sandra L.; Chang, Alfred E.; Johnson, Timothy M.; Taylor, Jeremy M.G.
2013-01-01
Introduction To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid SLN biopsy (SLNB). Several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests and support vector machines. We sought to validate recently published models meant to predict sentinel node status. Methods We queried our comprehensive, prospectively-collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon 4 published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false negative rate (FNR). Results Logistic regression performed comparably with our data when considering NPV (89.4% vs. 93.6%); however the model’s specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsies rates that were lower 87.7% vs. 94.1% and 29.8% vs. 14.3%, respectively. Two published models could not be applied to our data due to model complexity and the use of proprietary software. Conclusions Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Development of statistical predictive models must be created in a clinically applicable manner to allow for both validation and ultimately clinical utility. PMID:21822550
DOT National Transportation Integrated Search
1979-03-01
There are several conditions that can influence the calculation of the statistical validity of a test battery such as that used to selected Air Traffic Control Specialists. Two conditions of prime importance to statistical validity are recruitment pr...
Revisiting photon-statistics effects on multiphoton ionization
NASA Astrophysics Data System (ADS)
Mouloudakis, G.; Lambropoulos, P.
2018-05-01
We present a detailed analysis of the effects of photon statistics on multiphoton ionization. Through a detailed study of the role of intermediate states, we evaluate the conditions under which the premise of nonresonant processes is valid. The limitations of its validity are manifested in the dependence of the process on the stochastic properties of the radiation and found to be quite sensitive to the intensity. The results are quantified through detailed calculations for coherent, chaotic, and squeezed vacuum radiation. Their significance in the context of recent developments in radiation sources such as the short-wavelength free-electron laser and squeezed vacuum radiation is also discussed.
Statistical Anomalies of Bitflips in SRAMs to Discriminate SBUs From MCUs
NASA Astrophysics Data System (ADS)
Clemente, Juan Antonio; Franco, Francisco J.; Villa, Francesca; Baylac, Maud; Rey, Solenne; Mecha, Hortensia; Agapito, Juan A.; Puchner, Helmut; Hubert, Guillaume; Velazco, Raoul
2016-08-01
Recently, the occurrence of multiple events in static tests has been investigated by checking the statistical distribution of the difference between the addresses of the words containing bitflips. That method has been successfully applied to Field Programmable Gate Arrays (FPGAs) and the original authors indicate that it is also valid for SRAMs. This paper presents a modified methodology that is based on checking the XORed addresses with bitflips, rather than on the difference. Irradiation tests on CMOS 130 & 90 nm SRAMs with 14-MeV neutrons have been performed to validate this methodology. Results in high-altitude environments are also presented and cross-checked with theoretical predictions. In addition, this methodology has also been used to detect modifications in the organization of said memories. Theoretical predictions have been validated with actual data provided by the manufacturer.
Packham, B; Barnes, G; Dos Santos, G Sato; Aristovich, K; Gilad, O; Ghosh, A; Oh, T; Holder, D
2016-06-01
Electrical impedance tomography (EIT) allows for the reconstruction of internal conductivity from surface measurements. A change in conductivity occurs as ion channels open during neural activity, making EIT a potential tool for functional brain imaging. EIT images can have >10 000 voxels, which means statistical analysis of such images presents a substantial multiple testing problem. One way to optimally correct for these issues and still maintain the flexibility of complicated experimental designs is to use random field theory. This parametric method estimates the distribution of peaks one would expect by chance in a smooth random field of a given size. Random field theory has been used in several other neuroimaging techniques but never validated for EIT images of fast neural activity, such validation can be achieved using non-parametric techniques. Both parametric and non-parametric techniques were used to analyze a set of 22 images collected from 8 rats. Significant group activations were detected using both techniques (corrected p < 0.05). Both parametric and non-parametric analyses yielded similar results, although the latter was less conservative. These results demonstrate the first statistical analysis of such an image set and indicate that such an analysis is an approach for EIT images of neural activity.
Packham, B; Barnes, G; dos Santos, G Sato; Aristovich, K; Gilad, O; Ghosh, A; Oh, T; Holder, D
2016-01-01
Abstract Electrical impedance tomography (EIT) allows for the reconstruction of internal conductivity from surface measurements. A change in conductivity occurs as ion channels open during neural activity, making EIT a potential tool for functional brain imaging. EIT images can have >10 000 voxels, which means statistical analysis of such images presents a substantial multiple testing problem. One way to optimally correct for these issues and still maintain the flexibility of complicated experimental designs is to use random field theory. This parametric method estimates the distribution of peaks one would expect by chance in a smooth random field of a given size. Random field theory has been used in several other neuroimaging techniques but never validated for EIT images of fast neural activity, such validation can be achieved using non-parametric techniques. Both parametric and non-parametric techniques were used to analyze a set of 22 images collected from 8 rats. Significant group activations were detected using both techniques (corrected p < 0.05). Both parametric and non-parametric analyses yielded similar results, although the latter was less conservative. These results demonstrate the first statistical analysis of such an image set and indicate that such an analysis is an approach for EIT images of neural activity. PMID:27203477
Investigating Validity of Math 105 as Prerequisite to Math 201 among Undergraduate Students, Nigeria
ERIC Educational Resources Information Center
Zakariya, Yusuf F.
2016-01-01
In this study, the author examined the validity of MATH 105 as a prerequisite to MATH 201. The data for this study was extracted directly from the examination results logic of the university. Descriptive statistics in form of correlations and linear regressions were used to analyze the obtained data. Three research questions were formulated and…
Statistical Validation of Image Segmentation Quality Based on a Spatial Overlap Index1
Zou, Kelly H.; Warfield, Simon K.; Bharatha, Aditya; Tempany, Clare M.C.; Kaus, Michael R.; Haker, Steven J.; Wells, William M.; Jolesz, Ferenc A.; Kikinis, Ron
2005-01-01
Rationale and Objectives To examine a statistical validation method based on the spatial overlap between two sets of segmentations of the same anatomy. Materials and Methods The Dice similarity coefficient (DSC) was used as a statistical validation metric to evaluate the performance of both the reproducibility of manual segmentations and the spatial overlap accuracy of automated probabilistic fractional segmentation of MR images, illustrated on two clinical examples. Example 1: 10 consecutive cases of prostate brachytherapy patients underwent both preoperative 1.5T and intraoperative 0.5T MR imaging. For each case, 5 repeated manual segmentations of the prostate peripheral zone were performed separately on preoperative and on intraoperative images. Example 2: A semi-automated probabilistic fractional segmentation algorithm was applied to MR imaging of 9 cases with 3 types of brain tumors. DSC values were computed and logit-transformed values were compared in the mean with the analysis of variance (ANOVA). Results Example 1: The mean DSCs of 0.883 (range, 0.876–0.893) with 1.5T preoperative MRI and 0.838 (range, 0.819–0.852) with 0.5T intraoperative MRI (P < .001) were within and at the margin of the range of good reproducibility, respectively. Example 2: Wide ranges of DSC were observed in brain tumor segmentations: Meningiomas (0.519–0.893), astrocytomas (0.487–0.972), and other mixed gliomas (0.490–0.899). Conclusion The DSC value is a simple and useful summary measure of spatial overlap, which can be applied to studies of reproducibility and accuracy in image segmentation. We observed generally satisfactory but variable validation results in two clinical applications. This metric may be adapted for similar validation tasks. PMID:14974593
Systematic review of prediction models for delirium in the older adult inpatient.
Lindroth, Heidi; Bratzke, Lisa; Purvis, Suzanne; Brown, Roger; Coburn, Mark; Mrkobrada, Marko; Chan, Matthew T V; Davis, Daniel H J; Pandharipande, Pratik; Carlsson, Cynthia M; Sanders, Robert D
2018-04-28
To identify existing prognostic delirium prediction models and evaluate their validity and statistical methodology in the older adult (≥60 years) acute hospital population. Systematic review. PubMed, CINAHL, PsychINFO, SocINFO, Cochrane, Web of Science and Embase were searched from 1 January 1990 to 31 December 2016. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses and CHARMS Statement guided protocol development. age >60 years, inpatient, developed/validated a prognostic delirium prediction model. alcohol-related delirium, sample size ≤50. The primary performance measures were calibration and discrimination statistics. Two authors independently conducted search and extracted data. The synthesis of data was done by the first author. Disagreement was resolved by the mentoring author. The initial search resulted in 7,502 studies. Following full-text review of 192 studies, 33 were excluded based on age criteria (<60 years) and 27 met the defined criteria. Twenty-three delirium prediction models were identified, 14 were externally validated and 3 were internally validated. The following populations were represented: 11 medical, 3 medical/surgical and 13 surgical. The assessment of delirium was often non-systematic, resulting in varied incidence. Fourteen models were externally validated with an area under the receiver operating curve range from 0.52 to 0.94. Limitations in design, data collection methods and model metric reporting statistics were identified. Delirium prediction models for older adults show variable and typically inadequate predictive capabilities. Our review highlights the need for development of robust models to predict delirium in older inpatients. We provide recommendations for the development of such models. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Light propagation and the distance-redshift relation in a realistic inhomogeneous universe
NASA Technical Reports Server (NTRS)
Futamase, Toshifumi; Sasaki, Misao
1989-01-01
The propagation of light rays in a clumpy universe constructed by cosmological version of the post-Newtonian approximation was investigated. It is shown that linear approximation to the propagation equations is valid in the region where zeta is approximately less than 1 even if the density contrast is much larger than unity. Based on a gerneral order-of-magnitude statistical consideration, it is argued that the linear approximation is still valid where zeta is approximately greater than 1. A general formula for the distance-redshift relation in a clumpy universe is given. An explicit expression is derived for a simplified situation in which the effect of the gravitational potential of inhomogeneities dominates. In the light of the derived relation, the validity of the Dyer-Roeder distance is discussed. Also, statistical properties of light rays are investigated for a simple model of an inhomogeneous universe. The result of this example supports the validity of the linear approximation.
Corron, Louise; Marchal, François; Condemi, Silvana; Chaumoître, Kathia; Adalian, Pascal
2017-01-01
Juvenile age estimation methods used in forensic anthropology generally lack methodological consistency and/or statistical validity. Considering this, a standard approach using nonparametric Multivariate Adaptive Regression Splines (MARS) models were tested to predict age from iliac biometric variables of male and female juveniles from Marseilles, France, aged 0-12 years. Models using unidimensional (length and width) and bidimensional iliac data (module and surface) were constructed on a training sample of 176 individuals and validated on an independent test sample of 68 individuals. Results show that MARS prediction models using iliac width, module and area give overall better and statistically valid age estimates. These models integrate punctual nonlinearities of the relationship between age and osteometric variables. By constructing valid prediction intervals whose size increases with age, MARS models take into account the normal increase of individual variability. MARS models can qualify as a practical and standardized approach for juvenile age estimation. © 2016 American Academy of Forensic Sciences.
Correcting for Optimistic Prediction in Small Data Sets
Smith, Gordon C. S.; Seaman, Shaun R.; Wood, Angela M.; Royston, Patrick; White, Ian R.
2014-01-01
The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. We used clinical data sets (United Kingdom Down syndrome screening data from Glasgow (1991–2003), Edinburgh (1999–2003), and Cambridge (1990–2006), as well as Scottish national pregnancy discharge data (2004–2007)) to evaluate different approaches to adjustment for optimism. We found that sample splitting, cross-validation without replication, and leave-1-out cross-validation produced optimism-adjusted estimates of the C statistic that were biased and/or associated with greater absolute error than other available methods. Cross-validation with replication, bootstrapping, and a new method (leave-pair-out cross-validation) all generated unbiased optimism-adjusted estimates of the C statistic and had similar absolute errors in the clinical data set. Larger simulation studies confirmed that all 3 methods performed similarly with 10 or more events per variable, or when the C statistic was 0.9 or greater. However, with lower events per variable or lower C statistics, bootstrapping tended to be optimistic but with lower absolute and mean squared errors than both methods of cross-validation. PMID:24966219
Enhancement and Validation of an Arab Surname Database
Schwartz, Kendra; Beebani, Ganj; Sedki, Mai; Tahhan, Mamon; Ruterbusch, Julie J.
2015-01-01
Objectives Arab Americans constitute a large, heterogeneous, and quickly growing subpopulation in the United States. Health statistics for this group are difficult to find because US governmental offices do not recognize Arab as separate from white. The development and validation of an Arab- and Chaldean-American name database will enhance research efforts in this population subgroup. Methods A previously validated name database was supplemented with newly identified names gathered primarily from vital statistic records and then evaluated using a multistep process. This process included 1) review by 4 Arabic- and Chaldean-speaking reviewers, 2) ethnicity assessment by social media searches, and 3) self-report of ancestry obtained from a telephone survey. Results Our Arab- and Chaldean-American name algorithm has a positive predictive value of 91% and a negative predictive value of 100%. Conclusions This enhanced name database and algorithm can be used to identify Arab Americans in health statistics data, such as cancer and hospital registries, where they are often coded as white, to determine the extent of health disparities in this population. PMID:24625771
Enhancement of CFD validation exercise along the roof profile of a low-rise building
NASA Astrophysics Data System (ADS)
Deraman, S. N. C.; Majid, T. A.; Zaini, S. S.; Yahya, W. N. W.; Abdullah, J.; Ismail, M. A.
2018-04-01
The aim of this study is to enhance the validation of CFD exercise along the roof profile of a low-rise building. An isolated gabled-roof house having 26.6° roof pitch was simulated to obtain the pressure coefficient around the house. Validation of CFD analysis with experimental data requires many input parameters. This study performed CFD simulation based on the data from a previous study. Where the input parameters were not clearly stated, new input parameters were established from the open literatures. The numerical simulations were performed in FLUENT 14.0 by applying the Computational Fluid Dynamics (CFD) approach based on steady RANS equation together with RNG k-ɛ model. Hence, the result from CFD was analysed by using quantitative test (statistical analysis) and compared with CFD results from the previous study. The statistical analysis results from ANOVA test and error measure showed that the CFD results from the current study produced good agreement and exhibited the closest error compared to the previous study. All the input data used in this study can be extended to other types of CFD simulation involving wind flow over an isolated single storey house.
The validity of multiphase DNS initialized on the basis of single--point statistics
NASA Astrophysics Data System (ADS)
Subramaniam, Shankar
1999-11-01
A study of the point--process statistical representation of a spray reveals that single--point statistical information contained in the droplet distribution function (ddf) is related to a sequence of single surrogate--droplet pdf's, which are in general different from the physical single--droplet pdf's. The results of this study have important consequences for the initialization and evolution of direct numerical simulations (DNS) of multiphase flows, which are usually initialized on the basis of single--point statistics such as the average number density in physical space. If multiphase DNS are initialized in this way, this implies that even the initial representation contains certain implicit assumptions concerning the complete ensemble of realizations, which are invalid for general multiphase flows. Also the evolution of a DNS initialized in this manner is shown to be valid only if an as yet unproven commutation hypothesis holds true. Therefore, it is questionable to what extent DNS that are initialized in this manner constitute a direct simulation of the physical droplets.
Investigation of Super Learner Methodology on HIV-1 Small Sample: Application on Jaguar Trial Data.
Houssaïni, Allal; Assoumou, Lambert; Marcelin, Anne Geneviève; Molina, Jean Michel; Calvez, Vincent; Flandre, Philippe
2012-01-01
Background. Many statistical models have been tested to predict phenotypic or virological response from genotypic data. A statistical framework called Super Learner has been introduced either to compare different methods/learners (discrete Super Learner) or to combine them in a Super Learner prediction method. Methods. The Jaguar trial is used to apply the Super Learner framework. The Jaguar study is an "add-on" trial comparing the efficacy of adding didanosine to an on-going failing regimen. Our aim was also to investigate the impact on the use of different cross-validation strategies and different loss functions. Four different repartitions between training set and validations set were tested through two loss functions. Six statistical methods were compared. We assess performance by evaluating R(2) values and accuracy by calculating the rates of patients being correctly classified. Results. Our results indicated that the more recent Super Learner methodology of building a new predictor based on a weighted combination of different methods/learners provided good performance. A simple linear model provided similar results to those of this new predictor. Slight discrepancy arises between the two loss functions investigated, and slight difference arises also between results based on cross-validated risks and results from full dataset. The Super Learner methodology and linear model provided around 80% of patients correctly classified. The difference between the lower and higher rates is around 10 percent. The number of mutations retained in different learners also varys from one to 41. Conclusions. The more recent Super Learner methodology combining the prediction of many learners provided good performance on our small dataset.
DBS-LC-MS/MS assay for caffeine: validation and neonatal application.
Bruschettini, Matteo; Barco, Sebastiano; Romantsik, Olga; Risso, Francesco; Gennai, Iulian; Chinea, Benito; Ramenghi, Luca A; Tripodi, Gino; Cangemi, Giuliana
2016-09-01
DBS might be an appropriate microsampling technique for therapeutic drug monitoring of caffeine in infants. Nevertheless, its application presents several issues that still limit its use. This paper describes a validated DBS-LC-MS/MS method for caffeine. The results of the method validation showed an hematocrit dependence. In the analysis of 96 paired plasma and DBS clinical samples, caffeine levels measured in DBS were statistically significantly lower than in plasma but the observed differences were independent from hematocrit. These results clearly showed the need for extensive validation with real-life samples for DBS-based methods. DBS-LC-MS/MS can be considered to be a good alternative to traditional methods for therapeutic drug monitoring or PK studies in preterm infants.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martin, Spencer; Rodrigues, George, E-mail: george.rodrigues@lhsc.on.ca; Department of Epidemiology/Biostatistics, University of Western Ontario, London
2013-01-01
Purpose: To perform a rigorous technological assessment and statistical validation of a software technology for anatomic delineations of the prostate on MRI datasets. Methods and Materials: A 3-phase validation strategy was used. Phase I consisted of anatomic atlas building using 100 prostate cancer MRI data sets to provide training data sets for the segmentation algorithms. In phase II, 2 experts contoured 15 new MRI prostate cancer cases using 3 approaches (manual, N points, and region of interest). In phase III, 5 new physicians with variable MRI prostate contouring experience segmented the same 15 phase II datasets using 3 approaches: manual,more » N points with no editing, and full autosegmentation with user editing allowed. Statistical analyses for time and accuracy (using Dice similarity coefficient) endpoints used traditional descriptive statistics, analysis of variance, analysis of covariance, and pooled Student t test. Results: In phase I, average (SD) total and per slice contouring time for the 2 physicians was 228 (75), 17 (3.5), 209 (65), and 15 seconds (3.9), respectively. In phase II, statistically significant differences in physician contouring time were observed based on physician, type of contouring, and case sequence. The N points strategy resulted in superior segmentation accuracy when initial autosegmented contours were compared with final contours. In phase III, statistically significant differences in contouring time were observed based on physician, type of contouring, and case sequence again. The average relative timesaving for N points and autosegmentation were 49% and 27%, respectively, compared with manual contouring. The N points and autosegmentation strategies resulted in average Dice values of 0.89 and 0.88, respectively. Pre- and postedited autosegmented contours demonstrated a higher average Dice similarity coefficient of 0.94. Conclusion: The software provided robust contours with minimal editing required. Observed time savings were seen for all physicians irrespective of experience level and baseline manual contouring speed.« less
Improving the Validity and Reliability of a Health Promotion Survey for Physical Therapists
Stephens, Jaca L.; Lowman, John D.; Graham, Cecilia L.; Morris, David M.; Kohler, Connie L.; Waugh, Jonathan B.
2013-01-01
Purpose Physical therapists (PTs) have a unique opportunity to intervene in the area of health promotion. However, no instrument has been validated to measure PTs’ views on health promotion in physical therapy practice. The purpose of this study was to evaluate the content validity and test-retest reliability of a health promotion survey designed for PTs. Methods An expert panel of PTs assessed the content validity of “The Role of Health Promotion in Physical Therapy Survey” and provided suggestions for revision. Item content validity was assessed using the content validity ratio (CVR) as well as the modified kappa statistic. Therapists then participated in the test-retest reliability assessment of the revised health promotion survey, which was assessed using a weighted kappa statistic. Results Based on feedback from the expert panelists, significant revisions were made to the original survey. The expert panel reached at least a majority consensus agreement for all items in the revised survey and the survey-CVR improved from 0.44 to 0.66. Only one item on the revised survey had substantial test-retest agreement, with 55% of the items having moderate agreement and 43% poor agreement. Conclusions All items on the revised health promotion survey demonstrated at least fair validity, but few items had reasonable test-retest reliability. Further modifications should be made to strengthen the validity and improve the reliability of this survey. PMID:23754935
Morales, Daniel R; Flynn, Rob; Zhang, Jianguo; Trucco, Emmanuel; Quint, Jennifer K; Zutis, Kris
2018-05-01
Several models for predicting the risk of death in people with chronic obstructive pulmonary disease (COPD) exist but have not undergone large scale validation in primary care. The objective of this study was to externally validate these models using statistical and machine learning approaches. We used a primary care COPD cohort identified using data from the UK Clinical Practice Research Datalink. Age-standardised mortality rates were calculated for the population by gender and discrimination of ADO (age, dyspnoea, airflow obstruction), COTE (COPD-specific comorbidity test), DOSE (dyspnoea, airflow obstruction, smoking, exacerbations) and CODEX (comorbidity, dyspnoea, airflow obstruction, exacerbations) at predicting death over 1-3 years measured using logistic regression and a support vector machine learning (SVM) method of analysis. The age-standardised mortality rate was 32.8 (95%CI 32.5-33.1) and 25.2 (95%CI 25.4-25.7) per 1000 person years for men and women respectively. Complete data were available for 54879 patients to predict 1-year mortality. ADO performed the best (c-statistic of 0.730) compared with DOSE (c-statistic 0.645), COTE (c-statistic 0.655) and CODEX (c-statistic 0.649) at predicting 1-year mortality. Discrimination of ADO and DOSE improved at predicting 1-year mortality when combined with COTE comorbidities (c-statistic 0.780 ADO + COTE; c-statistic 0.727 DOSE + COTE). Discrimination did not change significantly over 1-3 years. Comparable results were observed using SVM. In primary care, ADO appears superior at predicting death in COPD. Performance of ADO and DOSE improved when combined with COTE comorbidities suggesting better models may be generated with additional data facilitated using novel approaches. Copyright © 2018. Published by Elsevier Ltd.
Results of the Intelligence Test for Visually Impaired Children (ITVIC).
ERIC Educational Resources Information Center
Dekker, R.; And Others
1991-01-01
Statistical analyses of scores on subtests of the Intelligence Test for Visually Impaired Children were done for two groups of children, either with or without usable vision. Results suggest that the battery has differential factorial and predictive validity. (Author/DB)
ERIC Educational Resources Information Center
Hassad, Rossi A.
2009-01-01
This study examined the teaching practices of 227 college instructors of introductory statistics (from the health and behavioral sciences). Using primarily multidimensional scaling (MDS) techniques, a two-dimensional, 10-item teaching practice scale, TISS (Teaching of Introductory Statistics Scale), was developed and validated. The two dimensions…
Majumdar, Subhabrata; Basak, Subhash C
2018-04-26
Proper validation is an important aspect of QSAR modelling. External validation is one of the widely used validation methods in QSAR where the model is built on a subset of the data and validated on the rest of the samples. However, its effectiveness for datasets with a small number of samples but large number of predictors remains suspect. Calculating hundreds or thousands of molecular descriptors using currently available software has become the norm in QSAR research, owing to computational advances in the past few decades. Thus, for n chemical compounds and p descriptors calculated for each molecule, the typical chemometric dataset today has high value of p but small n (i.e. n < p). Motivated by the evidence of inadequacies of external validation in estimating the true predictive capability of a statistical model in recent literature, this paper performs an extensive and comparative study of this method with several other validation techniques. We compared four validation methods: leave-one-out, K-fold, external and multi-split validation, using statistical models built using the LASSO regression, which simultaneously performs variable selection and modelling. We used 300 simulated datasets and one real dataset of 95 congeneric amine mutagens for this evaluation. External validation metrics have high variation among different random splits of the data, hence are not recommended for predictive QSAR models. LOO has the overall best performance among all validation methods applied in our scenario. Results from external validation are too unstable for the datasets we analyzed. Based on our findings, we recommend using the LOO procedure for validating QSAR predictive models built on high-dimensional small-sample data. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
The construction and assessment of a statistical model for the prediction of protein assay data.
Pittman, J; Sacks, J; Young, S Stanley
2002-01-01
The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.
The Chinese version of the Outcome Expectations for Exercise scale: validation study.
Lee, Ling-Ling; Chiu, Yu-Yun; Ho, Chin-Chih; Wu, Shu-Chen; Watson, Roger
2011-06-01
Estimates of the reliability and validity of the English nine-item Outcome Expectations for Exercise (OEE) scale have been tested and found to be valid for use in various settings, particularly among older people, with good internal consistency and validity. Data on the use of the OEE scale among older Chinese people living in the community and how cultural differences might affect the administration of the OEE scale are limited. To test the validity and reliability of the Chinese version of the Outcome Expectations for Exercise scale among older people. A cross-sectional validation study was designed to test the Chinese version of the OEE scale (OEE-C). Reliability was examined by testing both the internal consistency for the overall scale and the squared multiple correlation coefficient for the single item measure. The validity of the scale was tested on the basis of both a traditional psychometric test and a confirmatory factor analysis using structural equation modelling. The Mokken Scaling Procedure (MSP) was used to investigate if there were any hierarchical, cumulative sets of items in the measure. The OEE-C scale was tested in a group of older people in Taiwan (n=108, mean age=77.1). There was acceptable internal consistency (alpha=.85) and model fit in the scale. Evidence of the validity of the measure was demonstrated by the tests for criterion-related validity and construct validity. There was a statistically significant correlation between exercise outcome expectations and exercise self-efficacy (r=.34, p<.01). An analysis of the Mokken Scaling Procedure found that nine items of the scale were all retained in the analysis and the resulting scale was reliable and statistically significant (p=.0008). The results obtained in the present study provided acceptable levels of reliability and validity evidence for the Chinese Outcome Expectations for Exercise scale when used with older people in Taiwan. Future testing of the OEE-C scale needs to be carried out to see whether these results are generalisable to older Chinese people living in urban areas. Copyright © 2010 Elsevier Ltd. All rights reserved.
Statistical Simulation of the Performance and Degradation of a PEMFC Membrane Electrode Assembly
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harvey, David; Bellemare-Davis, Alexander; Karan, Kunal
2012-07-01
A 1-D MEA Performance model was developed that considered transport of liquid water, agglomerate catalyst structure, and the statistical variation of the MEA characteristic parameters. The model was validated against a low surface area carbon supported catalyst across various platinum loadings and operational conditions. The statistical variation was found to play a significant role in creating noise in the validation data and that there was a coupling effect between movement in material properties with liquid water transport. Further, in studying the low platinum loaded catalyst layers it was found that liquid water played a significant role in the increasing themore » overall transport losses. The model was then further applied to study platinum dissolution via potential cycling accelerated stress tests, in which the platinum was found to dissolve nearest the membrane effectively resulting in reaction distribution shifts within the layer.« less
Pattern statistics on Markov chains and sensitivity to parameter estimation
Nuel, Grégory
2006-01-01
Background: In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). Results: In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of σ, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. Conclusion: We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation. PMID:17044916
Teshima, Tara Lynn; Patel, Vaibhav; Mainprize, James G; Edwards, Glenn; Antonyshyn, Oleh M
2015-07-01
The utilization of three-dimensional modeling technology in craniomaxillofacial surgery has grown exponentially during the last decade. Future development, however, is hindered by the lack of a normative three-dimensional anatomic dataset and a statistical mean three-dimensional virtual model. The purpose of this study is to develop and validate a protocol to generate a statistical three-dimensional virtual model based on a normative dataset of adult skulls. Two hundred adult skull CT images were reviewed. The average three-dimensional skull was computed by processing each CT image in the series using thin-plate spline geometric morphometric protocol. Our statistical average three-dimensional skull was validated by reconstructing patient-specific topography in cranial defects. The experiment was repeated 4 times. In each case, computer-generated cranioplasties were compared directly to the original intact skull. The errors describing the difference between the prediction and the original were calculated. A normative database of 33 adult human skulls was collected. Using 21 anthropometric landmark points, a protocol for three-dimensional skull landmarking and data reduction was developed and a statistical average three-dimensional skull was generated. Our results show the root mean square error (RMSE) for restoration of a known defect using the native best match skull, our statistical average skull, and worst match skull was 0.58, 0.74, and 4.4 mm, respectively. The ability to statistically average craniofacial surface topography will be a valuable instrument for deriving missing anatomy in complex craniofacial defects and deficiencies as well as in evaluating morphologic results of surgery.
Riedl, Janet; Esslinger, Susanne; Fauhl-Hassek, Carsten
2015-07-23
Food fingerprinting approaches are expected to become a very potent tool in authentication processes aiming at a comprehensive characterization of complex food matrices. By non-targeted spectrometric or spectroscopic chemical analysis with a subsequent (multivariate) statistical evaluation of acquired data, food matrices can be investigated in terms of their geographical origin, species variety or possible adulterations. Although many successful research projects have already demonstrated the feasibility of non-targeted fingerprinting approaches, their uptake and implementation into routine analysis and food surveillance is still limited. In many proof-of-principle studies, the prediction ability of only one data set was explored, measured within a limited period of time using one instrument within one laboratory. Thorough validation strategies that guarantee reliability of the respective data basis and that allow conclusion on the applicability of the respective approaches for its fit-for-purpose have not yet been proposed. Within this review, critical steps of the fingerprinting workflow were explored to develop a generic scheme for multivariate model validation. As a result, a proposed scheme for "good practice" shall guide users through validation and reporting of non-targeted fingerprinting results. Furthermore, food fingerprinting studies were selected by a systematic search approach and reviewed with regard to (a) transparency of data processing and (b) validity of study results. Subsequently, the studies were inspected for measures of statistical model validation, analytical method validation and quality assurance measures. In this context, issues and recommendations were found that might be considered as an actual starting point for developing validation standards of non-targeted metabolomics approaches for food authentication in the future. Hence, this review intends to contribute to the harmonization and standardization of food fingerprinting, both required as a prior condition for the authentication of food in routine analysis and official control. Copyright © 2015 Elsevier B.V. All rights reserved.
Air Combat Training: Good Stick Index Validation. Final Report for Period 3 April 1978-1 April 1979.
ERIC Educational Resources Information Center
Moore, Samuel B.; And Others
A study was conducted to investigate and statistically validate a performance measuring system (the Good Stick Index) in the Tactical Air Command Combat Engagement Simulator I (TAC ACES I) Air Combat Maneuvering (ACM) training program. The study utilized a twelve-week sample of eighty-nine student pilots to statistically validate the Good Stick…
NASA Technical Reports Server (NTRS)
Walker, Eric L.
2005-01-01
Wind tunnel experiments will continue to be a primary source of validation data for many types of mathematical and computational models in the aerospace industry. The increased emphasis on accuracy of data acquired from these facilities requires understanding of the uncertainty of not only the measurement data but also any correction applied to the data. One of the largest and most critical corrections made to these data is due to wall interference. In an effort to understand the accuracy and suitability of these corrections, a statistical validation process for wall interference correction methods has been developed. This process is based on the use of independent cases which, after correction, are expected to produce the same result. Comparison of these independent cases with respect to the uncertainty in the correction process establishes a domain of applicability based on the capability of the method to provide reasonable corrections with respect to customer accuracy requirements. The statistical validation method was applied to the version of the Transonic Wall Interference Correction System (TWICS) recently implemented in the National Transonic Facility at NASA Langley Research Center. The TWICS code generates corrections for solid and slotted wall interference in the model pitch plane based on boundary pressure measurements. Before validation could be performed on this method, it was necessary to calibrate the ventilated wall boundary condition parameters. Discrimination comparisons are used to determine the most representative of three linear boundary condition models which have historically been used to represent longitudinally slotted test section walls. Of the three linear boundary condition models implemented for ventilated walls, the general slotted wall model was the most representative of the data. The TWICS code using the calibrated general slotted wall model was found to be valid to within the process uncertainty for test section Mach numbers less than or equal to 0.60. The scatter among the mean corrected results of the bodies of revolution validation cases was within one count of drag on a typical transport aircraft configuration for Mach numbers at or below 0.80 and two counts of drag for Mach numbers at or below 0.90.
Hussain, Bilal; Sultana, Tayyaba; Sultana, Salma; Al-Ghanim, Khalid Abdullah; Masoud, Muhammad Shahreef; Mahboob, Shahid
2018-04-01
Cirrhinus mrigala, Labeo rohita, and Catla catla are economically important fish for human consumption in Pakistan, but industrial and sewage pollution has drastically reduced their population in the River Chenab. Statistics are an important tool to analyze and interpret comet assay results. The specific aims of the study were to determine the DNA damage in Cirrhinus mrigala, Labeo rohita, and Catla catla due to chemical pollution and to assess the validity of statistical analyses to determine the viability of the comet assay for a possible use with these freshwater fish species as a good indicator of pollution load and habitat degradation. Comet assay results indicated a significant (P < 0.05) degree of DNA fragmentation in Cirrhinus mrigala followed by Labeo rohita and Catla catla in respect to comet head diameter, comet tail length, and % DNA damage. Regression analysis and correlation matrices conducted among the parameters of the comet assay affirmed the precision and the legitimacy of the results. The present study, therefore, strongly recommends that genotoxicological studies conduct appropriate analysis of the various components of comet assays to offer better interpretation of the assay data.
Validating LES for Jet Aeroacoustics
NASA Technical Reports Server (NTRS)
Bridges, James
2011-01-01
Engineers charged with making jet aircraft quieter have long dreamed of being able to see exactly how turbulent eddies produce sound and this dream is now coming true with the advent of large eddy simulation (LES). Two obvious challenges remain: validating the LES codes at the resolution required to see the fluid-acoustic coupling, and the interpretation of the massive datasets that result in having dreams come true. This paper primarily addresses the former, the use of advanced experimental techniques such as particle image velocimetry (PIV) and Raman and Rayleigh scattering, to validate the computer codes and procedures used to create LES solutions. It also addresses the latter problem in discussing what are relevant measures critical for aeroacoustics that should be used in validating LES codes. These new diagnostic techniques deliver measurements and flow statistics of increasing sophistication and capability, but what of their accuracy? And what are the measures to be used in validation? This paper argues that the issue of accuracy be addressed by cross-facility and cross-disciplinary examination of modern datasets along with increased reporting of internal quality checks in PIV analysis. Further, it is argued that the appropriate validation metrics for aeroacoustic applications are increasingly complicated statistics that have been shown in aeroacoustic theory to be critical to flow-generated sound.
Degrees of separation as a statistical tool for evaluating candidate genes.
Nelson, Ronald M; Pettersson, Mats E
2014-12-01
Selection of candidate genes is an important step in the exploration of complex genetic architecture. The number of gene networks available is increasing and these can provide information to help with candidate gene selection. It is currently common to use the degree of connectedness in gene networks as validation in Genome Wide Association (GWA) and Quantitative Trait Locus (QTL) mapping studies. However, it can cause misleading results if not validated properly. Here we present a method and tool for validating the gene pairs from GWA studies given the context of the network they co-occur in. It ensures that proposed interactions and gene associations are not statistical artefacts inherent to the specific gene network architecture. The CandidateBacon package provides an easy and efficient method to calculate the average degree of separation (DoS) between pairs of genes to currently available gene networks. We show how these empirical estimates of average connectedness are used to validate candidate gene pairs. Validation of interacting genes by comparing their connectedness with the average connectedness in the gene network will provide support for said interactions by utilising the growing amount of gene network information available. Copyright © 2014 Elsevier Ltd. All rights reserved.
Jad, Seyyed Mohammad Moosavi; Geravandi, Sahar; Mohammadi, Mohammad Javad; Alizadeh, Rashin; Sarvarian, Mohammad; Rastegarimehr, Babak; Afkar, Abolhasan; Yari, Ahmad Reza; Momtazan, Mahboobeh; Valipour, Aliasghar; Mahboubi, Mohammad; Karimyan, Azimeh; Mazraehkar, Alireza; Nejad, Ali Soleimani; Mohammadi, Hafez
2017-12-01
The aim of this study was to identify the relationship between the knowledge of leadership and knowledge management practices. This research strategy, in terms of quantity, procedure and obtain information, is descriptive and correlational. Statistical population, consist of all employees of a food industry in Kurdistan province of Iran, who were engaged in 2016 and their total number is about 1800 people. 316 employees in the Kurdistan food industry (Kurdistan FI) were selected, using Cochran formula. Non-random method and valid questions (standard) for measurement of the data are used. Reliability and validity were confirmed. Statistical analysis of the data was carried out, using SPSS 16. The statistical analysis of collected data showed the relationship between knowledge-oriented of leadership and knowledge management activities as mediator variables. The results of the data and test hypotheses suggest that knowledge management activities play an important role in the functioning of product innovation and the results showed that the activities of Knowledge Management (knowledge transfer, storage knowledge, application of knowledge, creation of knowledge) on performance of product innovation.
Brackley, Victoria; Ball, Kevin; Tor, Elaine
2018-05-12
The effectiveness of the swimming turn is highly influential to overall performance in competitive swimming. The push-off or wall contact, within the turn phase, is directly involved in determining the speed the swimmer leaves the wall. Therefore, it is paramount to develop reliable methods to measure the wall-contact-time during the turn phase for training and research purposes. The aim of this study was to determine the concurrent validity and reliability of the Pool Pad App to measure wall-contact-time during the freestyle and backstroke tumble turn. The wall-contact-times of nine elite and sub-elite participants were recorded during their regular training sessions. Concurrent validity statistics included the standardised typical error estimate, linear analysis and effect sizes while the intraclass correlating coefficient (ICC) was used for the reliability statistics. The standardised typical error estimate resulted in a moderate Cohen's d effect size with an R 2 value of 0.80 and the ICC between the Pool Pad and 2D video footage was 0.89. Despite these measurement differences, the results from this concurrent validity and reliability analyses demonstrated that the Pool Pad is suitable for measuring wall-contact-time during the freestyle and backstroke tumble turn within a training environment.
Parsons, Nick R; Price, Charlotte L; Hiskens, Richard; Achten, Juul; Costa, Matthew L
2012-04-25
The application of statistics in reported research in trauma and orthopaedic surgery has become ever more important and complex. Despite the extensive use of statistical analysis, it is still a subject which is often not conceptually well understood, resulting in clear methodological flaws and inadequate reporting in many papers. A detailed statistical survey sampled 100 representative orthopaedic papers using a validated questionnaire that assessed the quality of the trial design and statistical analysis methods. The survey found evidence of failings in study design, statistical methodology and presentation of the results. Overall, in 17% (95% confidence interval; 10-26%) of the studies investigated the conclusions were not clearly justified by the results, in 39% (30-49%) of studies a different analysis should have been undertaken and in 17% (10-26%) a different analysis could have made a difference to the overall conclusions. It is only by an improved dialogue between statistician, clinician, reviewer and journal editor that the failings in design methodology and analysis highlighted by this survey can be addressed.
Risk prediction models of breast cancer: a systematic review of model performances.
Anothaisintawee, Thunyarat; Teerawattananon, Yot; Wiratkapun, Chollathip; Kasamesup, Vijj; Thakkinstian, Ammarin
2012-05-01
The number of risk prediction models has been increasingly developed, for estimating about breast cancer in individual women. However, those model performances are questionable. We therefore have conducted a study with the aim to systematically review previous risk prediction models. The results from this review help to identify the most reliable model and indicate the strengths and weaknesses of each model for guiding future model development. We searched MEDLINE (PubMed) from 1949 and EMBASE (Ovid) from 1974 until October 2010. Observational studies which constructed models using regression methods were selected. Information about model development and performance were extracted. Twenty-five out of 453 studies were eligible. Of these, 18 developed prediction models and 7 validated existing prediction models. Up to 13 variables were included in the models and sample sizes for each study ranged from 550 to 2,404,636. Internal validation was performed in four models, while five models had external validation. Gail and Rosner and Colditz models were the significant models which were subsequently modified by other scholars. Calibration performance of most models was fair to good (expected/observe ratio: 0.87-1.12), but discriminatory accuracy was poor to fair both in internal validation (concordance statistics: 0.53-0.66) and in external validation (concordance statistics: 0.56-0.63). Most models yielded relatively poor discrimination in both internal and external validation. This poor discriminatory accuracy of existing models might be because of a lack of knowledge about risk factors, heterogeneous subtypes of breast cancer, and different distributions of risk factors across populations. In addition the concordance statistic itself is insensitive to measure the improvement of discrimination. Therefore, the new method such as net reclassification index should be considered to evaluate the improvement of the performance of a new develop model.
Prabhakaran, Shyam; Jovin, Tudor G.; Tayal, Ashis H.; Hussain, Muhammad S.; Nguyen, Thanh N.; Sheth, Kevin N.; Terry, John B.; Nogueira, Raul G.; Horev, Anat; Gandhi, Dheeraj; Wisco, Dolora; Glenn, Brenda A.; Ludwig, Bryan; Clemmons, Paul F.; Cronin, Carolyn A.; Tian, Melissa; Liebeskind, David; Zaidat, Osama O.; Castonguay, Alicia C.; Martin, Coleman; Mueller-Kronast, Nils; English, Joey D.; Linfante, Italo; Malisch, Timothy W.; Gupta, Rishi
2014-01-01
Background There are multiple clinical and radiographic factors that influence outcomes after endovascular reperfusion therapy (ERT) in acute ischemic stroke (AIS). We sought to derive and validate an outcome prediction score for AIS patients undergoing ERT based on readily available pretreatment and posttreatment factors. Methods The derivation cohort included 511 patients with anterior circulation AIS treated with ERT at 10 centers between September 2009 and July 2011. The prospective validation cohort included 223 patients with anterior circulation AIS treated in the North American Solitaire Acute Stroke registry. Multivariable logistic regression identified predictors of good outcome (modified Rankin score ≤2 at 3 months) in the derivation cohort; model β coefficients were used to assign points and calculate a risk score. Discrimination was tested using C statistics with 95% confidence intervals (CIs) in the derivation and validation cohorts. Calibration was assessed using the Hosmer-Lemeshow test and plots of observed to expected outcomes. We assessed the net reclassification improvement for the derived score compared to the Totaled Health Risks in Vascular Events (THRIVE) score. Subgroup analysis in patients with pretreatment Alberta Stroke Program Early CT Score (ASPECTS) and posttreatment final infarct volume measurements was also performed to identify whether these radiographic predictors improved the model compared to simpler models. Results Good outcome was noted in 186 (36.4%) and 100 patients (44.8%) in the derivation and validation cohorts, respectively. Combining readily available pretreatment and posttreatment variables, we created a score (acronym: SNARL) based on the following parameters: symptomatic hemorrhage [2 points: none, hemorrhagic infarction (HI)1–2 or parenchymal hematoma (PH) type 1; 0 points: PH2], baseline National Institutes of Health Stroke Scale score (3 points: 0–10; 1 point: 11–20; 0 points: >20), age (2 points: <60 years; 1 point: 60–79 years; 0 points: >79 years), reperfusion (3 points: Thrombolysis In Cerebral Ischemia score 2b or 3) and location of clot (1 point: M2; 0 points: M1 or internal carotid artery). The SNARL score demonstrated good discrimination in the derivation (C statistic 0.79, 95% CI 0.75–0.83) and validation cohorts (C statistic 0.74, 95% CI 0.68–0.81) and was superior to the THRIVE score (derivation cohort: C statistic 0.65, 95% CI 0.60–0.70; validation cohort: C-statistic 0.59, 95% CI 0.52–0.67; p < 0.01 in both cohorts) but was inferior to a score that included age, ASPECTS, reperfusion status and final infarct volume (C statistic 0.86, 95% CI 0.82–0.91; p = 0.04). Compared with the THRIVE score, the SNARL score resulted in a net reclassification improvement of 34.8%. Conclusions Among AIS patients treated with ERT, pretreatment scores such as the THRIVE score provide only fair prognostic information. Inclusion of posttreatment variables such as reperfusion and symptomatic hemorrhage greatly influences outcome and results in improved outcome prediction. PMID:24942008
Load Model Verification, Validation and Calibration Framework by Statistical Analysis on Field Data
NASA Astrophysics Data System (ADS)
Jiao, Xiangqing; Liao, Yuan; Nguyen, Thai
2017-11-01
Accurate load models are critical for power system analysis and operation. A large amount of research work has been done on load modeling. Most of the existing research focuses on developing load models, while little has been done on developing formal load model verification and validation (V&V) methodologies or procedures. Most of the existing load model validation is based on qualitative rather than quantitative analysis. In addition, not all aspects of model V&V problem have been addressed by the existing approaches. To complement the existing methods, this paper proposes a novel load model verification and validation framework that can systematically and more comprehensively examine load model's effectiveness and accuracy. Statistical analysis, instead of visual check, quantifies the load model's accuracy, and provides a confidence level of the developed load model for model users. The analysis results can also be used to calibrate load models. The proposed framework can be used as a guidance to systematically examine load models for utility engineers and researchers. The proposed method is demonstrated through analysis of field measurements collected from a utility system.
ERIC Educational Resources Information Center
Osler, James Edward, II
2015-01-01
This monograph provides an epistemological rational for the Accumulative Manifold Validation Analysis [also referred by the acronym "AMOVA"] statistical methodology designed to test psychometric instruments. This form of inquiry is a form of mathematical optimization in the discipline of linear stochastic modelling. AMOVA is an in-depth…
Haslam, Divna; Filus, Ania; Morawska, Alina; Sanders, Matthew R; Fletcher, Renee
2015-06-01
This paper outlines the development and validation of the Work-Family Conflict Scale (WAFCS) designed to measure work-to-family conflict (WFC) and family-to-work conflict (FWC) for use with parents of young children. An expert informant and consumer feedback approach was utilised to develop and refine 20 items, which were subjected to a rigorous validation process using two separate samples of parents of 2-12 year old children (n = 305 and n = 264). As a result of statistical analyses several items were dropped resulting in a brief 10-item scale comprising two subscales assessing theoretically distinct but related constructs: FWC (five items) and WFC (five items). Analyses revealed both subscales have good internal consistency, construct validity as well as concurrent and predictive validity. The results indicate the WAFCS is a promising brief measure for the assessment of work-family conflict in parents. Benefits of the measure as well as potential uses are discussed.
Probability of Detection (POD) as a statistical model for the validation of qualitative methods.
Wehling, Paul; LaBudde, Robert A; Brunelle, Sharon L; Nelson, Maria T
2011-01-01
A statistical model is presented for use in validation of qualitative methods. This model, termed Probability of Detection (POD), harmonizes the statistical concepts and parameters between quantitative and qualitative method validation. POD characterizes method response with respect to concentration as a continuous variable. The POD model provides a tool for graphical representation of response curves for qualitative methods. In addition, the model allows comparisons between candidate and reference methods, and provides calculations of repeatability, reproducibility, and laboratory effects from collaborative study data. Single laboratory study and collaborative study examples are given.
Validation of 2 commercial Neospora caninum antibody enzyme linked immunosorbent assays
Wu, John T.Y.; Dreger, Sally; Chow, Eva Y.W.; Bowlby, Evelyn E.
2002-01-01
Abstract This is a validation study of 2 commercially available enzyme linked immunosorbent assays (ELISA) for the detection of antibodies against Neospora caninum in bovine serum. The results of the reference sera (n = 30) and field sera from an infected beef herd (n = 150) were tested by both ELISAs and the results were compared statistically. When the immunoblotting results of the reference bovine sera were compared to the ELISA results, the same identity score (96.67%) and kappa values (K) (0.93) were obtained for both ELISAs. The sensitivity and specificity values for the IDEXX test were 100% and 93.33% respectively. For the Biovet test 93.33% and 100% were obtained. The corresponding positive (PV+) and negative predictive (PV−) values for the 2 assays were 93.75% and 100% (IDEXX), and 100% and 93.75% (Biovet). In the 2nd study, competitive inhibition ELISA (c-ELISA) results on bovine sera from an infected herd were compared to the 2 sets of ELISA results. The identity scores of the 2 ELISAs were 98% (IDEXX) and 97.33% (Biovet). The K values calculated were 0.96 (IDEXX) and 0.95 (Biovet). For the IDEXX test the sensitivity and specificity were 97.56% and 98.53%, whereas for the Biovet assay 95.12% and 100% were recorded, respectively. The corresponding PV+ and PV− values were 98.77% and 97.1% (IDEXX), and 100% and 94.44% (Biovet). Our validation results showed that the 2 ELISAs worked equally well and there was no statistically significant difference between the performance of the 2 tests. Both tests showed high reproducibility, repeatability and substantial agreement with results from 2 other laboratories. A quality assurance based on the requirement of the ISO/IEC 17025 standards has been adopted throughout this project for test validation procedures. PMID:12418782
Job Satisfaction DEOCS 4.1 Construct Validity Summary
2017-08-01
focuses more specifically on satisfaction with the job. Included is a review of the 4.0 description and items, followed by the proposed modifications to...the factor. The DEOCS 4.0 description provided for job satisfaction is “the perception of personal fulfillment in a specific vocation, and sense of...piloting items on the DEOCS; (4) examining the descriptive statistics, exploratory factor analysis results, and aggregation statistics; and (5
Confidence crisis of results in biomechanics research.
Knudson, Duane
2017-11-01
Many biomechanics studies have small sample sizes and incorrect statistical analyses, so reporting of inaccurate inferences and inflated magnitude of effects are common in the field. This review examines these issues in biomechanics research and summarises potential solutions from research in other fields to increase the confidence in the experimental effects reported in biomechanics. Authors, reviewers and editors of biomechanics research reports are encouraged to improve sample sizes and the resulting statistical power, improve reporting transparency, improve the rigour of statistical analyses used, and increase the acceptance of replication studies to improve the validity of inferences from data in biomechanics research. The application of sports biomechanics research results would also improve if a larger percentage of unbiased effects and their uncertainty were reported in the literature.
Hashim, Mazlan
2015-01-01
This research presents the results of the GIS-based statistical models for generation of landslide susceptibility mapping using geographic information system (GIS) and remote-sensing data for Cameron Highlands area in Malaysia. Ten factors including slope, aspect, soil, lithology, NDVI, land cover, distance to drainage, precipitation, distance to fault, and distance to road were extracted from SAR data, SPOT 5 and WorldView-1 images. The relationships between the detected landslide locations and these ten related factors were identified by using GIS-based statistical models including analytical hierarchy process (AHP), weighted linear combination (WLC) and spatial multi-criteria evaluation (SMCE) models. The landslide inventory map which has a total of 92 landslide locations was created based on numerous resources such as digital aerial photographs, AIRSAR data, WorldView-1 images, and field surveys. Then, 80% of the landslide inventory was used for training the statistical models and the remaining 20% was used for validation purpose. The validation results using the Relative landslide density index (R-index) and Receiver operating characteristic (ROC) demonstrated that the SMCE model (accuracy is 96%) is better in prediction than AHP (accuracy is 91%) and WLC (accuracy is 89%) models. These landslide susceptibility maps would be useful for hazard mitigation purpose and regional planning. PMID:25898919
Shahabi, Himan; Hashim, Mazlan
2015-04-22
This research presents the results of the GIS-based statistical models for generation of landslide susceptibility mapping using geographic information system (GIS) and remote-sensing data for Cameron Highlands area in Malaysia. Ten factors including slope, aspect, soil, lithology, NDVI, land cover, distance to drainage, precipitation, distance to fault, and distance to road were extracted from SAR data, SPOT 5 and WorldView-1 images. The relationships between the detected landslide locations and these ten related factors were identified by using GIS-based statistical models including analytical hierarchy process (AHP), weighted linear combination (WLC) and spatial multi-criteria evaluation (SMCE) models. The landslide inventory map which has a total of 92 landslide locations was created based on numerous resources such as digital aerial photographs, AIRSAR data, WorldView-1 images, and field surveys. Then, 80% of the landslide inventory was used for training the statistical models and the remaining 20% was used for validation purpose. The validation results using the Relative landslide density index (R-index) and Receiver operating characteristic (ROC) demonstrated that the SMCE model (accuracy is 96%) is better in prediction than AHP (accuracy is 91%) and WLC (accuracy is 89%) models. These landslide susceptibility maps would be useful for hazard mitigation purpose and regional planning.
Virtual Model Validation of Complex Multiscale Systems: Applications to Nonlinear Elastostatics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oden, John Tinsley; Prudencio, Ernest E.; Bauman, Paul T.
We propose a virtual statistical validation process as an aid to the design of experiments for the validation of phenomenological models of the behavior of material bodies, with focus on those cases in which knowledge of the fabrication process used to manufacture the body can provide information on the micro-molecular-scale properties underlying macroscale behavior. One example is given by models of elastomeric solids fabricated using polymerization processes. We describe a framework for model validation that involves Bayesian updates of parameters in statistical calibration and validation phases. The process enables the quanti cation of uncertainty in quantities of interest (QoIs) andmore » the determination of model consistency using tools of statistical information theory. We assert that microscale information drawn from molecular models of the fabrication of the body provides a valuable source of prior information on parameters as well as a means for estimating model bias and designing virtual validation experiments to provide information gain over calibration posteriors.« less
Statistical analysis for validating ACO-KNN algorithm as feature selection in sentiment analysis
NASA Astrophysics Data System (ADS)
Ahmad, Siti Rohaidah; Yusop, Nurhafizah Moziyana Mohd; Bakar, Azuraliza Abu; Yaakub, Mohd Ridzwan
2017-10-01
This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbor (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, IG-GA, and IG-RSAR algorithms. This study evaluated the performance of the ACO-KNN algorithm using precision, recall, and F-score, which were validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data.
Towards sound epistemological foundations of statistical methods for high-dimensional biology.
Mehta, Tapan; Tanik, Murat; Allison, David B
2004-09-01
A sound epistemological foundation for biological inquiry comes, in part, from application of valid statistical procedures. This tenet is widely appreciated by scientists studying the new realm of high-dimensional biology, or 'omic' research, which involves multiplicity at unprecedented scales. Many papers aimed at the high-dimensional biology community describe the development or application of statistical techniques. The validity of many of these is questionable, and a shared understanding about the epistemological foundations of the statistical methods themselves seems to be lacking. Here we offer a framework in which the epistemological foundation of proposed statistical methods can be evaluated.
49 CFR Appendix B to Part 222 - Alternative Safety Measures
Code of Federal Regulations, 2014 CFR
2014-10-01
... statistically valid baseline violation rate must be established through automated or systematic manual... enforcement, a program of public education and awareness directed at motor vehicle drivers, pedestrians and..., a statistically valid baseline violation rate must be established through automated or systematic...
49 CFR Appendix B to Part 222 - Alternative Safety Measures
Code of Federal Regulations, 2013 CFR
2013-10-01
... statistically valid baseline violation rate must be established through automated or systematic manual... enforcement, a program of public education and awareness directed at motor vehicle drivers, pedestrians and..., a statistically valid baseline violation rate must be established through automated or systematic...
NASA Astrophysics Data System (ADS)
Most, S.; Nowak, W.; Bijeljic, B.
2014-12-01
Transport processes in porous media are frequently simulated as particle movement. This process can be formulated as a stochastic process of particle position increments. At the pore scale, the geometry and micro-heterogeneities prohibit the commonly made assumption of independent and normally distributed increments to represent dispersion. Many recent particle methods seek to loosen this assumption. Recent experimental data suggest that we have not yet reached the end of the need to generalize, because particle increments show statistical dependency beyond linear correlation and over many time steps. The goal of this work is to better understand the validity regions of commonly made assumptions. We are investigating after what transport distances can we observe: A statistical dependence between increments, that can be modelled as an order-k Markov process, boils down to order 1. This would be the Markovian distance for the process, where the validity of yet-unexplored non-Gaussian-but-Markovian random walks would start. A bivariate statistical dependence that simplifies to a multi-Gaussian dependence based on simple linear correlation (validity of correlated PTRW). Complete absence of statistical dependence (validity of classical PTRW/CTRW). The approach is to derive a statistical model for pore-scale transport from a powerful experimental data set via copula analysis. The model is formulated as a non-Gaussian, mutually dependent Markov process of higher order, which allows us to investigate the validity ranges of simpler models.
Testing prediction methods: Earthquake clustering versus the Poisson model
Michael, A.J.
1997-01-01
Testing earthquake prediction methods requires statistical techniques that compare observed success to random chance. One technique is to produce simulated earthquake catalogs and measure the relative success of predicting real and simulated earthquakes. The accuracy of these tests depends on the validity of the statistical model used to simulate the earthquakes. This study tests the effect of clustering in the statistical earthquake model on the results. Three simulation models were used to produce significance levels for a VLF earthquake prediction method. As the degree of simulated clustering increases, the statistical significance drops. Hence, the use of a seismicity model with insufficient clustering can lead to overly optimistic results. A successful method must pass the statistical tests with a model that fully replicates the observed clustering. However, a method can be rejected based on tests with a model that contains insufficient clustering. U.S. copyright. Published in 1997 by the American Geophysical Union.
Stanisavljevic, Dejana; Trajkovic, Goran; Marinkovic, Jelena; Bukumiric, Zoran; Cirkovic, Andja; Milic, Natasa
2014-01-01
Medical statistics has become important and relevant for future doctors, enabling them to practice evidence based medicine. Recent studies report that students' attitudes towards statistics play an important role in their statistics achievements. The aim of the study was to test the psychometric properties of the Serbian version of the Survey of Attitudes Towards Statistics (SATS) in order to acquire a valid instrument to measure attitudes inside the Serbian educational context. The validation study was performed on a cohort of 417 medical students who were enrolled in an obligatory introductory statistics course. The SATS adaptation was based on an internationally accepted methodology for translation and cultural adaptation. Psychometric properties of the Serbian version of the SATS were analyzed through the examination of factorial structure and internal consistency. Most medical students held positive attitudes towards statistics. The average total SATS score was above neutral (4.3±0.8), and varied from 1.9 to 6.2. Confirmatory factor analysis validated the six-factor structure of the questionnaire (Affect, Cognitive Competence, Value, Difficulty, Interest and Effort). Values for fit indices TLI (0.940) and CFI (0.961) were above the cut-off of ≥0.90. The RMSEA value of 0.064 (0.051-0.078) was below the suggested value of ≤0.08. Cronbach's alpha of the entire scale was 0.90, indicating scale reliability. In a multivariate regression model, self-rating of ability in mathematics and current grade point average were significantly associated with the total SATS score after adjusting for age and gender. Present study provided the evidence for the appropriate metric properties of the Serbian version of SATS. Confirmatory factor analysis validated the six-factor structure of the scale. The SATS might be reliable and a valid instrument for identifying medical students' attitudes towards statistics in the Serbian educational context.
Eisenberg, Dan T A; Kuzawa, Christopher W; Hayes, M Geoffrey
2015-01-01
Telomere length (TL) is commonly measured using quantitative PCR (qPCR). Although, easier than the southern blot of terminal restriction fragments (TRF) TL measurement method, one drawback of qPCR is that it introduces greater measurement error and thus reduces the statistical power of analyses. To address a potential source of measurement error, we consider the effect of well position on qPCR TL measurements. qPCR TL data from 3,638 people run on a Bio-Rad iCycler iQ are reanalyzed here. To evaluate measurement validity, correspondence with TRF, age, and between mother and offspring are examined. First, we present evidence for systematic variation in qPCR TL measurements in relation to thermocycler well position. Controlling for these well-position effects consistently improves measurement validity and yields estimated improvements in statistical power equivalent to increasing sample sizes by 16%. We additionally evaluated the linearity of the relationships between telomere and single copy gene control amplicons and between qPCR and TRF measures. We find that, unlike some previous reports, our data exhibit linear relationships. We introduce the standard error in percent, a superior method for quantifying measurement error as compared to the commonly used coefficient of variation. Using this measure, we find that excluding samples with high measurement error does not improve measurement validity in our study. Future studies using block-based thermocyclers should consider well position effects. Since additional information can be gleaned from well position corrections, rerunning analyses of previous results with well position correction could serve as an independent test of the validity of these results. © 2015 Wiley Periodicals, Inc.
Does rational selection of training and test sets improve the outcome of QSAR modeling?
Martin, Todd M; Harten, Paul; Young, Douglas M; Muratov, Eugene N; Golbraikh, Alexander; Zhu, Hao; Tropsha, Alexander
2012-10-22
Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and k-nearest neighbor (kNN) methods were used to develop QSAR models based on the training sets. For kNN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable.
Steganalysis based on reducing the differences of image statistical characteristics
NASA Astrophysics Data System (ADS)
Wang, Ran; Niu, Shaozhang; Ping, Xijian; Zhang, Tao
2018-04-01
Compared with the process of embedding, the image contents make a more significant impact on the differences of image statistical characteristics. This makes the image steganalysis to be a classification problem with bigger withinclass scatter distances and smaller between-class scatter distances. As a result, the steganalysis features will be inseparate caused by the differences of image statistical characteristics. In this paper, a new steganalysis framework which can reduce the differences of image statistical characteristics caused by various content and processing methods is proposed. The given images are segmented to several sub-images according to the texture complexity. Steganalysis features are separately extracted from each subset with the same or close texture complexity to build a classifier. The final steganalysis result is figured out through a weighted fusing process. The theoretical analysis and experimental results can demonstrate the validity of the framework.
Testing alternative ground water models using cross-validation and other methods
Foglia, L.; Mehl, S.W.; Hill, M.C.; Perona, P.; Burlando, P.
2007-01-01
Many methods can be used to test alternative ground water models. Of concern in this work are methods able to (1) rank alternative models (also called model discrimination) and (2) identify observations important to parameter estimates and predictions (equivalent to the purpose served by some types of sensitivity analysis). Some of the measures investigated are computationally efficient; others are computationally demanding. The latter are generally needed to account for model nonlinearity. The efficient model discrimination methods investigated include the information criteria: the corrected Akaike information criterion, Bayesian information criterion, and generalized cross-validation. The efficient sensitivity analysis measures used are dimensionless scaled sensitivity (DSS), composite scaled sensitivity, and parameter correlation coefficient (PCC); the other statistics are DFBETAS, Cook's D, and observation-prediction statistic. Acronyms are explained in the introduction. Cross-validation (CV) is a computationally intensive nonlinear method that is used for both model discrimination and sensitivity analysis. The methods are tested using up to five alternative parsimoniously constructed models of the ground water system of the Maggia Valley in southern Switzerland. The alternative models differ in their representation of hydraulic conductivity. A new method for graphically representing CV and sensitivity analysis results for complex models is presented and used to evaluate the utility of the efficient statistics. The results indicate that for model selection, the information criteria produce similar results at much smaller computational cost than CV. For identifying important observations, the only obviously inferior linear measure is DSS; the poor performance was expected because DSS does not include the effects of parameter correlation and PCC reveals large parameter correlations. ?? 2007 National Ground Water Association.
Measuring Microaggression and Organizational Climate Factors in Military Units
2011-04-01
i.e., items) to accurately assess what we intend for them to measure. To assess construct and convergent validity, the author assessed the statistical ...sample indicated both convergent and construct validity of the microaggression scale. Table 5 presents these statistics . Measuring Microaggressions...models. As shown in Table 7, the measurement models had acceptable fit indices. That is, the Chi-square statistics were at their minimum; although the
Spectral signature verification using statistical analysis and text mining
NASA Astrophysics Data System (ADS)
DeCoster, Mallory E.; Firpi, Alexe H.; Jacobs, Samantha K.; Cone, Shelli R.; Tzeng, Nigel H.; Rodriguez, Benjamin M.
2016-05-01
In the spectral science community, numerous spectral signatures are stored in databases representative of many sample materials collected from a variety of spectrometers and spectroscopists. Due to the variety and variability of the spectra that comprise many spectral databases, it is necessary to establish a metric for validating the quality of spectral signatures. This has been an area of great discussion and debate in the spectral science community. This paper discusses a method that independently validates two different aspects of a spectral signature to arrive at a final qualitative assessment; the textual meta-data and numerical spectral data. Results associated with the spectral data stored in the Signature Database1 (SigDB) are proposed. The numerical data comprising a sample material's spectrum is validated based on statistical properties derived from an ideal population set. The quality of the test spectrum is ranked based on a spectral angle mapper (SAM) comparison to the mean spectrum derived from the population set. Additionally, the contextual data of a test spectrum is qualitatively analyzed using lexical analysis text mining. This technique analyzes to understand the syntax of the meta-data to provide local learning patterns and trends within the spectral data, indicative of the test spectrum's quality. Text mining applications have successfully been implemented for security2 (text encryption/decryption), biomedical3 , and marketing4 applications. The text mining lexical analysis algorithm is trained on the meta-data patterns of a subset of high and low quality spectra, in order to have a model to apply to the entire SigDB data set. The statistical and textual methods combine to assess the quality of a test spectrum existing in a database without the need of an expert user. This method has been compared to other validation methods accepted by the spectral science community, and has provided promising results when a baseline spectral signature is present for comparison. The spectral validation method proposed is described from a practical application and analytical perspective.
2014-01-01
Background The UK Clinical Aptitude Test (UKCAT) was designed to address issues identified with traditional methods of selection. This study aims to examine the predictive validity of the UKCAT and compare this to traditional selection methods in the senior years of medical school. This was a follow-up study of two cohorts of students from two medical schools who had previously taken part in a study examining the predictive validity of the UKCAT in first year. Methods The sample consisted of 4th and 5th Year students who commenced their studies at the University of Aberdeen or University of Dundee medical schools in 2007. Data collected were: demographics (gender and age group), UKCAT scores; Universities and Colleges Admissions Service (UCAS) form scores; admission interview scores; Year 4 and 5 degree examination scores. Pearson’s correlations were used to examine the relationships between admissions variables, examination scores, gender and age group, and to select variables for multiple linear regression analysis to predict examination scores. Results Ninety-nine and 89 students at Aberdeen medical school from Years 4 and 5 respectively, and 51 Year 4 students in Dundee, were included in the analysis. Neither UCAS form nor interview scores were statistically significant predictors of examination performance. Conversely, the UKCAT yielded statistically significant validity coefficients between .24 and .36 in four of five assessments investigated. Multiple regression analysis showed the UKCAT made a statistically significant unique contribution to variance in examination performance in the senior years. Conclusions Results suggest the UKCAT appears to predict performance better in the later years of medical school compared to earlier years and provides modest supportive evidence for the UKCAT’s role in student selection within these institutions. Further research is needed to assess the predictive validity of the UKCAT against professional and behavioural outcomes as the cohort commences working life. PMID:24762134
On the error statistics of Viterbi decoding and the performance of concatenated codes
NASA Technical Reports Server (NTRS)
Miller, R. L.; Deutsch, L. J.; Butman, S. A.
1981-01-01
Computer simulation results are presented on the performance of convolutional codes of constraint lengths 7 and 10 concatenated with the (255, 223) Reed-Solomon code (a proposed NASA standard). These results indicate that as much as 0.8 dB can be gained by concatenating this Reed-Solomon code with a (10, 1/3) convolutional code, instead of the (7, 1/2) code currently used by the DSN. A mathematical model of Viterbi decoder burst-error statistics is developed and is validated through additional computer simulations.
van Walraven, Carl; Jackson, Timothy D; Daneman, Nick
2016-04-01
OBJECTIVE Surgical site infections (SSIs) are common hospital-acquired infections. Tracking SSIs is important to monitor their incidence, and this process requires primary data collection. In this study, we derived and validated a method using health administrative data to predict the probability that a person who had surgery would develop an SSI within 30 days. METHODS All patients enrolled in the National Surgical Quality Improvement Program (NSQIP) from 2 sites were linked to population-based administrative datasets in Ontario, Canada. We derived a multivariate model, stratified by surgical specialty, to determine the independent association of SSI status with patient and hospitalization covariates as well as physician claim codes. This SSI risk model was validated in 2 cohorts. RESULTS The derivation cohort included 5,359 patients with a 30-day SSI incidence of 6.0% (n=118). The SSI risk model predicted the probability that a person had an SSI based on 7 covariates: index hospitalization diagnostic score; physician claims score; emergency visit diagnostic score; operation duration; surgical service; and potential SSI codes. More than 90% of patients had predicted SSI risks lower than 10%. In the derivation group, model discrimination and calibration was excellent (C statistic, 0.912; Hosmer-Lemeshow [H-L] statistic, P=.47). In the 2 validation groups, performance decreased slightly (C statistics, 0.853 and 0.812; H-L statistics, 26.4 [P=.0009] and 8.0 [P=.42]), but low-risk patients were accurately identified. CONCLUSION Health administrative data can effectively identify postoperative patients with a very low risk of surgical site infection within 30 days of their procedure. Records of higher-risk patients can be reviewed to confirm SSI status.
Development and Calibration of Highway Safety Manual Equations for Florida Conditions
DOT National Transportation Integrated Search
2011-08-31
The Highway Safety Manual (HSM) provides statistically-valid analytical tools and techniques for quantifying the potential effects on crashes as a result of decisions made in planning, design, operations, and maintenance. Implementation of the new te...
Development and calibration of highway safety manual equations for Florida conditions.
DOT National Transportation Integrated Search
2011-08-31
The Highway Safety Manual (HSM) provides statistically-valid analytical tools and techniques for : quantifying the potential effects on crashes as a result of decisions made in planning, design, : operations, and maintenance. Implementation of the ne...
Variability-aware compact modeling and statistical circuit validation on SRAM test array
NASA Astrophysics Data System (ADS)
Qiao, Ying; Spanos, Costas J.
2016-03-01
Variability modeling at the compact transistor model level can enable statistically optimized designs in view of limitations imposed by the fabrication technology. In this work we propose a variability-aware compact model characterization methodology based on stepwise parameter selection. Transistor I-V measurements are obtained from bit transistor accessible SRAM test array fabricated using a collaborating foundry's 28nm FDSOI technology. Our in-house customized Monte Carlo simulation bench can incorporate these statistical compact models; and simulation results on SRAM writability performance are very close to measurements in distribution estimation. Our proposed statistical compact model parameter extraction methodology also has the potential of predicting non-Gaussian behavior in statistical circuit performances through mixtures of Gaussian distributions.
Statistical considerations on prognostic models for glioma
Molinaro, Annette M.; Wrensch, Margaret R.; Jenkins, Robert B.; Eckel-Passow, Jeanette E.
2016-01-01
Given the lack of beneficial treatments in glioma, there is a need for prognostic models for therapeutic decision making and life planning. Recently several studies defining subtypes of glioma have been published. Here, we review the statistical considerations of how to build and validate prognostic models, explain the models presented in the current glioma literature, and discuss advantages and disadvantages of each model. The 3 statistical considerations to establishing clinically useful prognostic models are: study design, model building, and validation. Careful study design helps to ensure that the model is unbiased and generalizable to the population of interest. During model building, a discovery cohort of patients can be used to choose variables, construct models, and estimate prediction performance via internal validation. Via external validation, an independent dataset can assess how well the model performs. It is imperative that published models properly detail the study design and methods for both model building and validation. This provides readers the information necessary to assess the bias in a study, compare other published models, and determine the model's clinical usefulness. As editors, reviewers, and readers of the relevant literature, we should be cognizant of the needed statistical considerations and insist on their use. PMID:26657835
A diagnostic model for chronic hypersensitivity pneumonitis
Johannson, Kerri A; Elicker, Brett M; Vittinghoff, Eric; Assayag, Deborah; de Boer, Kaïssa; Golden, Jeffrey A; Jones, Kirk D; King, Talmadge E; Koth, Laura L; Lee, Joyce S; Ley, Brett; Wolters, Paul J; Collard, Harold R
2017-01-01
The objective of this study was to develop a diagnostic model that allows for a highly specific diagnosis of chronic hypersensitivity pneumonitis using clinical and radiological variables alone. Chronic hypersensitivity pneumonitis and other interstitial lung disease cases were retrospectively identified from a longitudinal database. High-resolution CT scans were blindly scored for radiographic features (eg, ground-glass opacity, mosaic perfusion) as well as the radiologist’s diagnostic impression. Candidate models were developed then evaluated using clinical and radiographic variables and assessed by the cross-validated C-statistic. Forty-four chronic hypersensitivity pneumonitis and eighty other interstitial lung disease cases were identified. Two models were selected based on their statistical performance, clinical applicability and face validity. Key model variables included age, down feather and/or bird exposure, radiographic presence of ground-glass opacity and mosaic perfusion and moderate or high confidence in the radiographic impression of chronic hypersensitivity pneumonitis. Models were internally validated with good performance, and cut-off values were established that resulted in high specificity for a diagnosis of chronic hypersensitivity pneumonitis. PMID:27245779
Hydrostatic weighing without head submersion in morbidly obese females.
Evans, P E; Israel, R G; Flickinger, E G; O'Brien, K F; Donnelly, J E
1989-08-01
This study tests the validity of hydrostatic weighing without head submersion (HWNS) for determining the body density (Db) of morbidly obese (MO) females. Eighty MO females who were able to perform traditional hydrostatic weighing at residual volume (HW) underwent four counterbalanced trials for each procedure (HW and HWNS) to determine Db. Residual volume was determined by oxygen dilution. Twenty subjects were randomly excluded from the experimental group (EG) and assigned to a cross-validation group (CV). Simple linear regression was performed on EG data (n = 60, means = 36.8 y, means % fat = 50.1) to predict Db from HWNS (Db = 0.569563 [Db HWNS] + 0.408621, SEE = 0.0066). Comparison of the predicted and actual Db for CV group yielded r = 0.69, SEE = 0.0066, E statistic = 0.0067, mean difference = 0.0013 kg/L. The SEE and E statistic for body fat were 3.31 and 3.39, respectively. Mean difference for percent fat was 0.66%. Results indicate that HWNS is a valid technique for assessing body composition in MO females.
Griest, Susan; Zaugg, Tara L.; Thielman, Emily; Kaelin, Christine; Galvez, Gino; Carlson, Kathleen F.
2015-01-01
Purpose Individuals complaining of tinnitus often attribute hearing problems to the tinnitus. In such cases some (or all) of their reported “tinnitus distress” may in fact be caused by trouble communicating due to hearing problems. We developed the Tinnitus and Hearing Survey (THS) as a tool to rapidly differentiate hearing problems from tinnitus problems. Method For 2 of our research studies, we administered the THS twice (mean of 16.5 days between tests) to 67 participants who did not receive intervention. These data allow for measures of statistical validation of the THS. Results Reliability of the THS was good to excellent regarding internal consistency (α = .86–.94), test–retest reliability (r = .76–.83), and convergent validity between the Tinnitus Handicap Inventory (Newman, Jacobson, & Spitzer, 1996; Newman, Sandridge, & Jacobson, 1998) and the A (Tinnitus) subscale of the THS (r = .78). Factor analysis confirmed that the 2 subscales, A (Tinnitus) and B (Hearing), have strong internal structure, explaining 71.7% of the total variance, and low correlation with each other (r = .46), resulting in a small amount of shared variance (21%). Conclusion These results provide evidence that the THS is statistically validated and reliable for use in assisting patients and clinicians in quickly (and collaboratively) determining whether intervention for tinnitus is appropriate. PMID:25551458
CARVALHO, Suzana Papile Maciel; BRITO, Liz Magalhães; de PAIVA, Luiz Airton Saavedra; BICUDO, Lucilene Arilho Ribeiro; CROSATO, Edgard Michel; de OLIVEIRA, Rogério Nogueira
2013-01-01
Validation studies of physical anthropology methods in the different population groups are extremely important, especially in cases in which the population variations may cause problems in the identification of a native individual by the application of norms developed for different communities. Objective This study aimed to estimate the gender of skeletons by application of the method of Oliveira, et al. (1995), previously used in a population sample from Northeast Brazil. Material and Methods The accuracy of this method was assessed for a population from Southeast Brazil and validated by statistical tests. The method used two mandibular measurements, namely the bigonial distance and the mandibular ramus height. The sample was composed of 66 skulls and the method was applied by two examiners. The results were statistically analyzed by the paired t test, logistic discriminant analysis and logistic regression. Results The results demonstrated that the application of the method of Oliveira, et al. (1995) in this population achieved very different outcomes between genders, with 100% for females and only 11% for males, which may be explained by ethnic differences. However, statistical adjustment of measurement data for the population analyzed allowed accuracy of 76.47% for males and 78.13% for females, with the creation of a new discriminant formula. Conclusion It was concluded that methods involving physical anthropology present high rate of accuracy for human identification, easy application, low cost and simplicity; however, the methodologies must be validated for the different populations due to differences in ethnic patterns, which are directly related to the phenotypic aspects. In this specific case, the method of Oliveira, et al. (1995) presented good accuracy and may be used for gender estimation in Brazil in two geographic regions, namely Northeast and Southeast; however, for other regions of the country (North, Central West and South), previous methodological adjustment is recommended as demonstrated in this study. PMID:24037076
IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics
2016-01-01
Background We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. Objective To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. Methods The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Results Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix. Conclusions IBMWA is a new alternative for data analytics software that automates descriptive, predictive, and visual analytics. This program is very user-friendly but requires data preprocessing, statistical conceptual understanding, and domain expertise. PMID:27729304
Vlieg-Boerstra, Berber J; Bijleveld, Charles M A; van der Heide, Sicco; Beusekamp, Berta J; Wolt-Plompen, Saskia A A; Kukler, Jeanet; Brinkman, Joep; Duiverman, Eric J; Dubois, Anthony E J
2004-02-01
The use of double-blind, placebo-controlled food challenges (DBPCFCs) is considered the gold standard for the diagnosis of food allergy. Despite this, materials and methods used in DBPCFCs have not been standardized. The purpose of this study was to develop and validate recipes for use in DBPCFCs in children by using allergenic foods, preferably in their usual edible form. Recipes containing milk, soy, cooked egg, raw whole egg, peanut, hazelnut, and wheat were developed. For each food, placebo and active test food recipes were developed that met the requirements of acceptable taste, allowance of a challenge dose high enough to elicit reactions in an acceptable volume, optimal matrix ingredients, and good matching of sensory properties of placebo and active test food recipes. Validation was conducted on the basis of sensory tests for difference by using the triangle test and the paired comparison test. Recipes were first tested by volunteers from the hospital staff and subsequently by a professional panel of food tasters in a food laboratory designed for sensory testing. Recipes were considered to be validated if no statistically significant differences were found. Twenty-seven recipes were developed and found to be valid by the volunteer panel. Of these 27 recipes, 17 could be validated by the professional panel. Sensory testing with appropriate statistical analysis allows for objective validation of challenge materials. We recommend the use of professional tasters in the setting of a food laboratory for best results.
Statistical Analysis of CFD Solutions from the Drag Prediction Workshop
NASA Technical Reports Server (NTRS)
Hemsch, Michael J.
2002-01-01
A simple, graphical framework is presented for robust statistical evaluation of results obtained from N-Version testing of a series of RANS CFD codes. The solutions were obtained by a variety of code developers and users for the June 2001 Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration used for the computational tests is the DLR-F4 wing-body combination previously tested in several European wind tunnels and for which a previous N-Version test had been conducted. The statistical framework is used to evaluate code results for (1) a single cruise design point, (2) drag polars and (3) drag rise. The paper concludes with a discussion of the meaning of the results, especially with respect to predictability, Validation, and reporting of solutions.
Li, Zhenghua; Cheng, Fansheng; Xia, Zhining
2011-01-01
The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.
Statistically Valid Planting Trials
C. B. Briscoe
1961-01-01
More than 100 million tree seedlings are planted each year in Latin America, and at least ten time'that many should be planted Rational control and development of a program of such magnitude require establishing and interpreting carefully planned trial plantings which will yield statistically valid answers to real and important questions. Unfortunately, many...
Performance comparison of LUR and OK in PM2.5 concentration mapping: a multidimensional perspective
Zou, Bin; Luo, Yanqing; Wan, Neng; Zheng, Zhong; Sternberg, Troy; Liao, Yilan
2015-01-01
Methods of Land Use Regression (LUR) modeling and Ordinary Kriging (OK) interpolation have been widely used to offset the shortcomings of PM2.5 data observed at sparse monitoring sites. However, traditional point-based performance evaluation strategy for these methods remains stagnant, which could cause unreasonable mapping results. To address this challenge, this study employs ‘information entropy’, an area-based statistic, along with traditional point-based statistics (e.g. error rate, RMSE) to evaluate the performance of LUR model and OK interpolation in mapping PM2.5 concentrations in Houston from a multidimensional perspective. The point-based validation reveals significant differences between LUR and OK at different test sites despite the similar end-result accuracy (e.g. error rate 6.13% vs. 7.01%). Meanwhile, the area-based validation demonstrates that the PM2.5 concentrations simulated by the LUR model exhibits more detailed variations than those interpolated by the OK method (i.e. information entropy, 7.79 vs. 3.63). Results suggest that LUR modeling could better refine the spatial distribution scenario of PM2.5 concentrations compared to OK interpolation. The significance of this study primarily lies in promoting the integration of point- and area-based statistics for model performance evaluation in air pollution mapping. PMID:25731103
Uncertainty propagation for statistical impact prediction of space debris
NASA Astrophysics Data System (ADS)
Hoogendoorn, R.; Mooij, E.; Geul, J.
2018-01-01
Predictions of the impact time and location of space debris in a decaying trajectory are highly influenced by uncertainties. The traditional Monte Carlo (MC) method can be used to perform accurate statistical impact predictions, but requires a large computational effort. A method is investigated that directly propagates a Probability Density Function (PDF) in time, which has the potential to obtain more accurate results with less computational effort. The decaying trajectory of Delta-K rocket stages was used to test the methods using a six degrees-of-freedom state model. The PDF of the state of the body was propagated in time to obtain impact-time distributions. This Direct PDF Propagation (DPP) method results in a multi-dimensional scattered dataset of the PDF of the state, which is highly challenging to process. No accurate results could be obtained, because of the structure of the DPP data and the high dimensionality. Therefore, the DPP method is less suitable for practical uncontrolled entry problems and the traditional MC method remains superior. Additionally, the MC method was used with two improved uncertainty models to obtain impact-time distributions, which were validated using observations of true impacts. For one of the two uncertainty models, statistically more valid impact-time distributions were obtained than in previous research.
Pageler, Natalie M; Grazier G'Sell, Max Jacob; Chandler, Warren; Mailes, Emily; Yang, Christine; Longhurst, Christopher A
2016-09-01
The objective of this project was to use statistical techniques to determine the completeness and accuracy of data migrated during electronic health record conversion. Data validation during migration consists of mapped record testing and validation of a sample of the data for completeness and accuracy. We statistically determined a randomized sample size for each data type based on the desired confidence level and error limits. The only error identified in the post go-live period was a failure to migrate some clinical notes, which was unrelated to the validation process. No errors in the migrated data were found during the 12- month post-implementation period. Compared to the typical industry approach, we have demonstrated that a statistical approach to sampling size for data validation can ensure consistent confidence levels while maximizing efficiency of the validation process during a major electronic health record conversion. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
NASA Technical Reports Server (NTRS)
Grosveld, Ferdinand W.; Schiller, Noah H.; Cabell, Randolph H.
2011-01-01
Comet Enflow is a commercially available, high frequency vibroacoustic analysis software founded on Energy Finite Element Analysis (EFEA) and Energy Boundary Element Analysis (EBEA). Energy Finite Element Analysis (EFEA) was validated on a floor-equipped composite cylinder by comparing EFEA vibroacoustic response predictions with Statistical Energy Analysis (SEA) and experimental results. Statistical Energy Analysis (SEA) predictions were made using the commercial software program VA One 2009 from ESI Group. The frequency region of interest for this study covers the one-third octave bands with center frequencies from 100 Hz to 4000 Hz.
Sabel, Michael S; Rice, John D; Griffith, Kent A; Lowe, Lori; Wong, Sandra L; Chang, Alfred E; Johnson, Timothy M; Taylor, Jeremy M G
2012-01-01
To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid sentinel lymph node biopsy (SLNB), several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests, and support vector machines. We sought to validate recently published models meant to predict sentinel node status. We queried our comprehensive, prospectively collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon four published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false-negative rate (FNR). Logistic regression performed comparably with our data when considering NPV (89.4 versus 93.6%); however, the model's specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsy rates that were lower (87.7 versus 94.1 and 29.8 versus 14.3, respectively). Two published models could not be applied to our data due to model complexity and the use of proprietary software. Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Statistical predictive models must be developed in a clinically applicable manner to allow for both validation and ultimately clinical utility.
Model Identification in Time-Series Analysis: Some Empirical Results.
ERIC Educational Resources Information Center
Padia, William L.
Model identification of time-series data is essential to valid statistical tests of intervention effects. Model identification is, at best, inexact in the social and behavioral sciences where one is often confronted with small numbers of observations. These problems are discussed, and the results of independent identifications of 130 social and…
Hippisley-Cox, Julia; Coupland, Carol; Brindle, Peter
2014-01-01
Objectives To validate the performance of a set of risk prediction algorithms developed using the QResearch database, in an independent sample from general practices contributing to the Clinical Research Data Link (CPRD). Setting Prospective open cohort study using practices contributing to the CPRD database and practices contributing to the QResearch database. Participants The CPRD validation cohort consisted of 3.3 million patients, aged 25–99 years registered at 357 general practices between 1 Jan 1998 and 31 July 2012. The validation statistics for QResearch were obtained from the original published papers which used a one-third sample of practices separate to those used to derive the score. A cohort from QResearch was used to compare incidence rates and baseline characteristics and consisted of 6.8 million patients from 753 practices registered between 1 Jan 1998 and until 31 July 2013. Outcome measures Incident events relating to seven different risk prediction scores: QRISK2 (cardiovascular disease); QStroke (ischaemic stroke); QDiabetes (type 2 diabetes); QFracture (osteoporotic fracture and hip fracture); QKidney (moderate and severe kidney failure); QThrombosis (venous thromboembolism); QBleed (intracranial bleed and upper gastrointestinal haemorrhage). Measures of discrimination and calibration were calculated. Results Overall, the baseline characteristics of the CPRD and QResearch cohorts were similar though QResearch had higher recording levels for ethnicity and family history. The validation statistics for each of the risk prediction scores were very similar in the CPRD cohort compared with the published results from QResearch validation cohorts. For example, in women, the QDiabetes algorithm explained 50% of the variation within CPRD compared with 51% on QResearch and the receiver operator curve value was 0.85 on both databases. The scores were well calibrated in CPRD. Conclusions Each of the algorithms performed practically as well in the external independent CPRD validation cohorts as they had in the original published QResearch validation cohorts. PMID:25168040
Statistical Selection of Biological Models for Genome-Wide Association Analyses.
Bi, Wenjian; Kang, Guolian; Pounds, Stanley B
2018-05-24
Genome-wide association studies have discovered many biologically important associations of genes with phenotypes. Typically, genome-wide association analyses formally test the association of each genetic feature (SNP, CNV, etc) with the phenotype of interest and summarize the results with multiplicity-adjusted p-values. However, very small p-values only provide evidence against the null hypothesis of no association without indicating which biological model best explains the observed data. Correctly identifying a specific biological model may improve the scientific interpretation and can be used to more effectively select and design a follow-up validation study. Thus, statistical methodology to identify the correct biological model for a particular genotype-phenotype association can be very useful to investigators. Here, we propose a general statistical method to summarize how accurately each of five biological models (null, additive, dominant, recessive, co-dominant) represents the data observed for each variant in a GWAS study. We show that the new method stringently controls the false discovery rate and asymptotically selects the correct biological model. Simulations of two-stage discovery-validation studies show that the new method has these properties and that its validation power is similar to or exceeds that of simple methods that use the same statistical model for all SNPs. Example analyses of three data sets also highlight these advantages of the new method. An R package is freely available at www.stjuderesearch.org/site/depts/biostats/maew. Copyright © 2018. Published by Elsevier Inc.
Vesterinen, Hanna M; Vesterinen, Hanna V; Egan, Kieren; Deister, Amelie; Schlattmann, Peter; Macleod, Malcolm R; Dirnagl, Ulrich
2011-04-01
Translating experimental findings into clinically effective therapies is one of the major bottlenecks of modern medicine. As this has been particularly true for cerebrovascular research, attention has turned to the quality and validity of experimental cerebrovascular studies. We set out to assess the study design, statistical analyses, and reporting of cerebrovascular research. We assessed all original articles published in the Journal of Cerebral Blood Flow and Metabolism during the year 2008 against a checklist designed to capture the key attributes relating to study design, statistical analyses, and reporting. A total of 156 original publications were included (animal, in vitro, human). Few studies reported a primary research hypothesis, statement of purpose, or measures to safeguard internal validity (such as randomization, blinding, exclusion or inclusion criteria). Many studies lacked sufficient information regarding methods and results to form a reasonable judgment about their validity. In nearly 20% of studies, statistical tests were either not appropriate or information to allow assessment of appropriateness was lacking. This study identifies a number of factors that should be addressed if the quality of research in basic and translational biomedicine is to be improved. We support the widespread implementation of the ARRIVE (Animal Research Reporting In Vivo Experiments) statement for the reporting of experimental studies in biomedicine, for improving training in proper study design and analysis, and that reviewers and editors adopt a more constructively critical approach in the assessment of manuscripts for publication.
Vesterinen, Hanna V; Egan, Kieren; Deister, Amelie; Schlattmann, Peter; Macleod, Malcolm R; Dirnagl, Ulrich
2011-01-01
Translating experimental findings into clinically effective therapies is one of the major bottlenecks of modern medicine. As this has been particularly true for cerebrovascular research, attention has turned to the quality and validity of experimental cerebrovascular studies. We set out to assess the study design, statistical analyses, and reporting of cerebrovascular research. We assessed all original articles published in the Journal of Cerebral Blood Flow and Metabolism during the year 2008 against a checklist designed to capture the key attributes relating to study design, statistical analyses, and reporting. A total of 156 original publications were included (animal, in vitro, human). Few studies reported a primary research hypothesis, statement of purpose, or measures to safeguard internal validity (such as randomization, blinding, exclusion or inclusion criteria). Many studies lacked sufficient information regarding methods and results to form a reasonable judgment about their validity. In nearly 20% of studies, statistical tests were either not appropriate or information to allow assessment of appropriateness was lacking. This study identifies a number of factors that should be addressed if the quality of research in basic and translational biomedicine is to be improved. We support the widespread implementation of the ARRIVE (Animal Research Reporting In Vivo Experiments) statement for the reporting of experimental studies in biomedicine, for improving training in proper study design and analysis, and that reviewers and editors adopt a more constructively critical approach in the assessment of manuscripts for publication. PMID:21157472
Uniting statistical and individual-based approaches for animal movement modelling.
Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel
2014-01-01
The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems.
Uniting Statistical and Individual-Based Approaches for Animal Movement Modelling
Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel
2014-01-01
The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems. PMID:24979047
Statistically Controlling for Confounding Constructs Is Harder than You Think
Westfall, Jacob; Yarkoni, Tal
2016-01-01
Social scientists often seek to demonstrate that a construct has incremental validity over and above other related constructs. However, these claims are typically supported by measurement-level models that fail to consider the effects of measurement (un)reliability. We use intuitive examples, Monte Carlo simulations, and a novel analytical framework to demonstrate that common strategies for establishing incremental construct validity using multiple regression analysis exhibit extremely high Type I error rates under parameter regimes common in many psychological domains. Counterintuitively, we find that error rates are highest—in some cases approaching 100%—when sample sizes are large and reliability is moderate. Our findings suggest that a potentially large proportion of incremental validity claims made in the literature are spurious. We present a web application (http://jakewestfall.org/ivy/) that readers can use to explore the statistical properties of these and other incremental validity arguments. We conclude by reviewing SEM-based statistical approaches that appropriately control the Type I error rate when attempting to establish incremental validity. PMID:27031707
NASA Technical Reports Server (NTRS)
Generazio, Edward R.
2014-01-01
Unknown risks are introduced into failure critical systems when probability of detection (POD) capabilities are accepted without a complete understanding of the statistical method applied and the interpretation of the statistical results. The presence of this risk in the nondestructive evaluation (NDE) community is revealed in common statements about POD. These statements are often interpreted in a variety of ways and therefore, the very existence of the statements identifies the need for a more comprehensive understanding of POD methodologies. Statistical methodologies have data requirements to be met, procedures to be followed, and requirements for validation or demonstration of adequacy of the POD estimates. Risks are further enhanced due to the wide range of statistical methodologies used for determining the POD capability. Receiver/Relative Operating Characteristics (ROC) Display, simple binomial, logistic regression, and Bayes' rule POD methodologies are widely used in determining POD capability. This work focuses on Hit-Miss data to reveal the framework of the interrelationships between Receiver/Relative Operating Characteristics Display, simple binomial, logistic regression, and Bayes' Rule methodologies as they are applied to POD. Knowledge of these interrelationships leads to an intuitive and global understanding of the statistical data, procedural and validation requirements for establishing credible POD estimates.
NASA Astrophysics Data System (ADS)
Cánovas-García, Fulgencio; Alonso-Sarría, Francisco; Gomariz-Castillo, Francisco; Oñate-Valdivieso, Fernando
2017-06-01
Random forest is a classification technique widely used in remote sensing. One of its advantages is that it produces an estimation of classification accuracy based on the so called out-of-bag cross-validation method. It is usually assumed that such estimation is not biased and may be used instead of validation based on an external data-set or a cross-validation external to the algorithm. In this paper we show that this is not necessarily the case when classifying remote sensing imagery using training areas with several pixels or objects. According to our results, out-of-bag cross-validation clearly overestimates accuracy, both overall and per class. The reason is that, in a training patch, pixels or objects are not independent (from a statistical point of view) of each other; however, they are split by bootstrapping into in-bag and out-of-bag as if they were really independent. We believe that putting whole patch, rather than pixels/objects, in one or the other set would produce a less biased out-of-bag cross-validation. To deal with the problem, we propose a modification of the random forest algorithm to split training patches instead of the pixels (or objects) that compose them. This modified algorithm does not overestimate accuracy and has no lower predictive capability than the original. When its results are validated with an external data-set, the accuracy is not different from that obtained with the original algorithm. We analysed three remote sensing images with different classification approaches (pixel and object based); in the three cases reported, the modification we propose produces a less biased accuracy estimation.
Ganasegeran, Kurubaran; Selvaraj, Kamaraj; Rashid, Abdul
2017-01-01
Background The six item Confusion, Hubbub and Order Scale (CHAOS-6) has been validated as a reliable tool to measure levels of household disorder. We aimed to investigate the goodness of fit and reliability of a new Malay version of the CHAOS-6. Methods The original English version of the CHAOS-6 underwent forward-backward translation into the Malay language. The finalised Malay version was administered to 105 myocardial infarction survivors in a Malaysian cardiac health facility. We performed confirmatory factor analyses (CFAs) using structural equation modelling. A path diagram and fit statistics were yielded to determine the Malay version’s validity. Composite reliability was tested to determine the scale’s reliability. Results All 105 myocardial infarction survivors participated in the study. The CFA yielded a six-item, one-factor model with excellent fit statistics. Composite reliability for the single factor CHAOS-6 was 0.65, confirming that the scale is reliable for Malay speakers. Conclusion The Malay version of the CHAOS-6 was reliable and showed the best fit statistics for our study sample. We thus offer a simple, brief, validated, reliable and novel instrument to measure chaos, the Skala Kecelaruan, Keriuhan & Tertib Terubahsuai (CHAOS-6), for the Malaysian population. PMID:28951688
Nurse Education, Center of Excellence for Remote and Medically Under-Served Areas (CERMUSA)
2014-04-01
didactic portion o Online pre-test and post-test o Online survey • Statistical validity: o The study investigators enrolled 134 participants...NURSE.SAS Datacut: 2014-01-17 Generated: 2014-01-26:16:40 Table 2.09: Summary Statistics , Does your curriculum teach students how to develop a personal...disasters. Pre-test post-test results indicated that the delivery of didactic material via an online course management system is an effective
Cohen, Alan A; Milot, Emmanuel; Li, Qing; Legault, Véronique; Fried, Linda P; Ferrucci, Luigi
2014-09-01
Measuring physiological dysregulation during aging could be a key tool both to understand underlying aging mechanisms and to predict clinical outcomes in patients. However, most existing indices are either circular or hard to interpret biologically. Recently, we showed that statistical distance of 14 common blood biomarkers (a measure of how strange an individual's biomarker profile is) was associated with age and mortality in the WHAS II data set, validating its use as a measure of physiological dysregulation. Here, we extend the analyses to other data sets (WHAS I and InCHIANTI) to assess the stability of the measure across populations. We found that the statistical criteria used to determine the original 14 biomarkers produced diverging results across populations; in other words, had we started with a different data set, we would have chosen a different set of markers. Nonetheless, the same 14 markers (or the subset of 12 available for InCHIANTI) produced highly similar predictions of age and mortality. We include analyses of all combinatorial subsets of the markers and show that results do not depend much on biomarker choice or data set, but that more markers produce a stronger signal. We conclude that statistical distance as a measure of physiological dysregulation is stable across populations in Europe and North America. Copyright © 2014 Elsevier Inc. All rights reserved.
Yu, Wenxi; Liu, Yang; Ma, Zongwei; Bi, Jun
2017-08-01
Using satellite-based aerosol optical depth (AOD) measurements and statistical models to estimate ground-level PM 2.5 is a promising way to fill the areas that are not covered by ground PM 2.5 monitors. The statistical models used in previous studies are primarily Linear Mixed Effects (LME) and Geographically Weighted Regression (GWR) models. In this study, we developed a new regression model between PM 2.5 and AOD using Gaussian processes in a Bayesian hierarchical setting. Gaussian processes model the stochastic nature of the spatial random effects, where the mean surface and the covariance function is specified. The spatial stochastic process is incorporated under the Bayesian hierarchical framework to explain the variation of PM 2.5 concentrations together with other factors, such as AOD, spatial and non-spatial random effects. We evaluate the results of our model and compare them with those of other, conventional statistical models (GWR and LME) by within-sample model fitting and out-of-sample validation (cross validation, CV). The results show that our model possesses a CV result (R 2 = 0.81) that reflects higher accuracy than that of GWR and LME (0.74 and 0.48, respectively). Our results indicate that Gaussian process models have the potential to improve the accuracy of satellite-based PM 2.5 estimates.
Gharehbaghi, Arash; Linden, Maria
2017-10-12
This paper presents a novel method for learning the cyclic contents of stochastic time series: the deep time-growing neural network (DTGNN). The DTGNN combines supervised and unsupervised methods in different levels of learning for an enhanced performance. It is employed by a multiscale learning structure to classify cyclic time series (CTS), in which the dynamic contents of the time series are preserved in an efficient manner. This paper suggests a systematic procedure for finding the design parameter of the classification method for a one-versus-multiple class application. A novel validation method is also suggested for evaluating the structural risk, both in a quantitative and a qualitative manner. The effect of the DTGNN on the performance of the classifier is statistically validated through the repeated random subsampling using different sets of CTS, from different medical applications. The validation involves four medical databases, comprised of 108 recordings of the electroencephalogram signal, 90 recordings of the electromyogram signal, 130 recordings of the heart sound signal, and 50 recordings of the respiratory sound signal. Results of the statistical validations show that the DTGNN significantly improves the performance of the classification and also exhibits an optimal structural risk.
Barmou, Maher M; Hussain, Saba F; Abu Hassan, Mohamed I
2018-06-01
The aim of the study was to assess the reliability and validity of cephalometric variables from MicroScribe-3DXL. Seven cephalometric variables (facial angle, ANB, maxillary depth, U1/FH, FMA, IMPA, FMIA) were measured by a dentist in 60 Malay subjects (30 males and 30 females) with class I occlusion and balanced face. Two standard images were taken for each subject with conventional cephalometric radiography and MicroScribe-3DXL. All the images were traced and analysed. SPSS version 2.0 was used for statistical analysis with P-value was set at P<0.05. The results revealed a significant statistic difference in four measurements (U1/FH, FMA, IMPA, FMIA) with P-value range (0.00 to 0.03). The difference in the measurements was considered clinically acceptable. The overall reliability of MicroScribe-3DXL was 92.7% and its validity was 91.8%. The MicroScribe-3DXL is reliable and valid to most of the cephalometric variables with the advantages of saving time and cost. This is a promising device to assist in diverse areas in dental practice and research. Copyright © 2018. Published by Elsevier Masson SAS.
Fault detection, isolation, and diagnosis of self-validating multifunctional sensors.
Yang, Jing-Li; Chen, Yin-Sheng; Zhang, Li-Li; Sun, Zhen
2016-06-01
A novel fault detection, isolation, and diagnosis (FDID) strategy for self-validating multifunctional sensors is presented in this paper. The sparse non-negative matrix factorization-based method can effectively detect faults by using the squared prediction error (SPE) statistic, and the variables contribution plots based on SPE statistic can help to locate and isolate the faulty sensitive units. The complete ensemble empirical mode decomposition is employed to decompose the fault signals to a series of intrinsic mode functions (IMFs) and a residual. The sample entropy (SampEn)-weighted energy values of each IMFs and the residual are estimated to represent the characteristics of the fault signals. Multi-class support vector machine is introduced to identify the fault mode with the purpose of diagnosing status of the faulty sensitive units. The performance of the proposed strategy is compared with other fault detection strategies such as principal component analysis, independent component analysis, and fault diagnosis strategies such as empirical mode decomposition coupled with support vector machine. The proposed strategy is fully evaluated in a real self-validating multifunctional sensors experimental system, and the experimental results demonstrate that the proposed strategy provides an excellent solution to the FDID research topic of self-validating multifunctional sensors.
Reference datasets for 2-treatment, 2-sequence, 2-period bioequivalence studies.
Schütz, Helmut; Labes, Detlew; Fuglsang, Anders
2014-11-01
It is difficult to validate statistical software used to assess bioequivalence since very few datasets with known results are in the public domain, and the few that are published are of moderate size and balanced. The purpose of this paper is therefore to introduce reference datasets of varying complexity in terms of dataset size and characteristics (balance, range, outlier presence, residual error distribution) for 2-treatment, 2-period, 2-sequence bioequivalence studies and to report their point estimates and 90% confidence intervals which companies can use to validate their installations. The results for these datasets were calculated using the commercial packages EquivTest, Kinetica, SAS and WinNonlin, and the non-commercial package R. The results of three of these packages mostly agree, but imbalance between sequences seems to provoke questionable results with one package, which illustrates well the need for proper software validation.
[Suicide due to mental diseases based on the Vital Statistics Survey Death Form].
Takizawa, Tohru
2012-06-01
Mental diseases such as schizophrenia and depression put patients at risk for suicide. It is extremely important to understand that one way of preventing suicide is to determine the actual mental state of the individual. The purpose of this study was to analyze the true mental state of suicide victims reported in the vital statistics. This study investigated the vital statistics of 30,299 suicide victims in Japan in 2008. The use of these basic statistics for non-statistical purposes was approved by the Japanese Ministry of Health, Labour and Welfare. The method involved reviewing the Vital Statistics Survey Death Form at the Ministry of Health, Labour and Welfare as well as analyzing their Online Reporting of Vital Statistics. Furthermore, this study was able to validate 29,799 of the 30,299 suicides (98.3%) that occurred in 2008. Mental diseases were validated not only from the "Cause of death" section as marked on the death certificate, but also by information found in sections for "Additional items for death by external cause" and "Other special remarks." RESULTS; From the Vital Statistics Survey Death Form and Online Reporting of Vital Statistics, 2964 individuals with either a mental disease or mental disorder were identified. Of the 2964 identified individuals, 55 had dementia (of which 13 were dementia in Alzheimer's disease), 116 had alcohol dependence/psychotic disorder, 550 had schizophrenia, 101 had bipolar affective disorder, 1,913 has had a depressive episode, 13 had obsessive-compulsive disorder, 22 had adjustment disorders, 14 had eating disorders, 49 had nonorganic sleep disorders, 24 had personality disorder, and 6 had pervasive developmental disorders. In addition, 125 individuals had more than one mental disease. The national police statistics from 2008 show that 1,368 suicide victims had schizophrenia and 6,490 had depression. These figures show quite a difference between the results of this study and the police statistics. Further, there have been controversies regarding autopsies of suicide victims. Thus, further investigation into the cause of death is of great importance.
A Model for Investigating Predictive Validity at Highly Selective Institutions.
ERIC Educational Resources Information Center
Gross, Alan L.; And Others
A statistical model for investigating predictive validity at highly selective institutions is described. When the selection ratio is small, one must typically deal with a data set containing relatively large amounts of missing data on both criterion and predictor variables. Standard statistical approaches are based on the strong assumption that…
Selection of Marine Corps Drill Instructors
1980-03-01
8 4. ., ey- Construction and Cross-Validation Statistics for Drill Instructor School Performance Success Keys...Race, and School Attrition ........... ............................. ... 15 13. Key- Construction and Cross-Validation Statistics for Drill... constructed form, the Alternation Ranking of Series Drill Instruc- tors. In this form, DIs in a Series are ranked from highest to lowest in terms of their
Muller, David C; Johansson, Mattias; Brennan, Paul
2017-03-10
Purpose Several lung cancer risk prediction models have been developed, but none to date have assessed the predictive ability of lung function in a population-based cohort. We sought to develop and internally validate a model incorporating lung function using data from the UK Biobank prospective cohort study. Methods This analysis included 502,321 participants without a previous diagnosis of lung cancer, predominantly between 40 and 70 years of age. We used flexible parametric survival models to estimate the 2-year probability of lung cancer, accounting for the competing risk of death. Models included predictors previously shown to be associated with lung cancer risk, including sex, variables related to smoking history and nicotine addiction, medical history, family history of lung cancer, and lung function (forced expiratory volume in 1 second [FEV1]). Results During accumulated follow-up of 1,469,518 person-years, there were 738 lung cancer diagnoses. A model incorporating all predictors had excellent discrimination (concordance (c)-statistic [95% CI] = 0.85 [0.82 to 0.87]). Internal validation suggested that the model will discriminate well when applied to new data (optimism-corrected c-statistic = 0.84). The full model, including FEV1, also had modestly superior discriminatory power than one that was designed solely on the basis of questionnaire variables (c-statistic = 0.84 [0.82 to 0.86]; optimism-corrected c-statistic = 0.83; p FEV1 = 3.4 × 10 -13 ). The full model had better discrimination than standard lung cancer screening eligibility criteria (c-statistic = 0.66 [0.64 to 0.69]). Conclusion A risk prediction model that includes lung function has strong predictive ability, which could improve eligibility criteria for lung cancer screening programs.
Neman, R
1975-03-01
The Zigler and Seitz (1975) critique was carefully examined with respect to the conclusions of the Neman et al. (1975) study. Particular attention was given to the following questions: (a) did experimenter bias or commitment account for the results, (b) were unreliable and invalid psychometric instruments used, (c) were the statistical analyses insufficient or incorrect, (d) did the results reflect no more than the operation of chance, and (e) were the results biased by artifactually inflated profile scores. Experimenter bias and commitment were shown to be insufficient to account for the results; a further review of Buros (1972) showed that there was no need for apprehension about the testing instruments; the statistical analyses were shown to exceed prevailing standards for research reporting; the results were shown to reflect valid findings at the .05 probability level; and the Neman et al. (1975) results for the profile measure were equally significant using either "raw" neurological scores or "scales" neurological age scores. Zigler, Seitz, and I agreed on the needs for (a) using multivariate analyses, where applicable, in studies having more than one dependent variable; (b) defining the population for which sensorimotor training procedures may be appropriately prescribed; and (c) validating the profile measure as a tool to assess neurological disorganization.
NASA Astrophysics Data System (ADS)
Dubrovsky, M.; Hirschi, M.; Spirig, C.
2014-12-01
To quantify impact of the climate change on a specific pest (or any weather-dependent process) in a specific site, we may use a site-calibrated pest (or other) model and compare its outputs obtained with site-specific weather data representing present vs. perturbed climates. The input weather data may be produced by the stochastic weather generator. Apart from the quality of the pest model, the reliability of the results obtained in such experiment depend on an ability of the generator to represent the statistical structure of the real world weather series, and on the sensitivity of the pest model to possible imperfections of the generator. This contribution deals with the multivariate HOWGH weather generator, which is based on a combination of parametric and non-parametric statistical methods. Here, HOWGH is used to generate synthetic hourly series of three weather variables (solar radiation, temperature and precipitation) required by a dynamic pest model SOPRA to simulate the development of codling moth. The contribution presents results of the direct and indirect validation of HOWGH. In the direct validation, the synthetic series generated by HOWGH (various settings of its underlying model are assumed) are validated in terms of multiple climatic characteristics, focusing on the subdaily wet/dry and hot/cold spells. In the indirect validation, we assess the generator in terms of characteristics derived from the outputs of SOPRA model fed by the observed vs. synthetic series. The weather generator may be used to produce weather series representing present and future climates. In the latter case, the parameters of the generator may be modified by the climate change scenarios based on Global or Regional Climate Models. To demonstrate this feature, the results of codling moth simulations for future climate will be shown. Acknowledgements: The weather generator is developed and validated within the frame of projects WG4VALUE (project LD12029 sponsored by the Ministry of Education, Youth and Sports of CR), and VALUE (COST ES 1102 action).
MO-G-12A-01: Quantitative Imaging Metrology: What Should Be Assessed and How?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Giger, M; Petrick, N; Obuchowski, N
The first two symposia in the Quantitative Imaging Track focused on 1) the introduction of quantitative imaging (QI) challenges and opportunities, and QI efforts of agencies and organizations such as the RSNA, NCI, FDA, and NIST, and 2) the techniques, applications, and challenges of QI, with specific examples from CT, PET/CT, and MR. This third symposium in the QI Track will focus on metrology and its importance in successfully advancing the QI field. While the specific focus will be on QI, many of the concepts presented are more broadly applicable to many areas of medical physics research and applications. Asmore » such, the topics discussed should be of interest to medical physicists involved in imaging as well as therapy. The first talk of the session will focus on the introduction to metrology and why it is critically important in QI. The second talk will focus on appropriate methods for technical performance assessment. The third talk will address statistically valid methods for algorithm comparison, a common problem not only in QI but also in other areas of medical physics. The final talk in the session will address strategies for publication of results that will allow statistically valid meta-analyses, which is critical for combining results of individual studies with typically small sample sizes in a manner that can best inform decisions and advance the field. Learning Objectives: Understand the importance of metrology in the QI efforts. Understand appropriate methods for technical performance assessment. Understand methods for comparing algorithms with or without reference data (i.e., “ground truth”). Understand the challenges and importance of reporting results in a manner that allows for statistically valid meta-analyses.« less
PIV Data Validation Software Package
NASA Technical Reports Server (NTRS)
Blackshire, James L.
1997-01-01
A PIV data validation and post-processing software package was developed to provide semi-automated data validation and data reduction capabilities for Particle Image Velocimetry data sets. The software provides three primary capabilities including (1) removal of spurious vector data, (2) filtering, smoothing, and interpolating of PIV data, and (3) calculations of out-of-plane vorticity, ensemble statistics, and turbulence statistics information. The software runs on an IBM PC/AT host computer working either under Microsoft Windows 3.1 or Windows 95 operating systems.
Parameter Estimation and Model Validation of Nonlinear Dynamical Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abarbanel, Henry; Gill, Philip
In the performance period of this work under a DOE contract, the co-PIs, Philip Gill and Henry Abarbanel, developed new methods for statistical data assimilation for problems of DOE interest, including geophysical and biological problems. This included numerical optimization algorithms for variational principles, new parallel processing Monte Carlo routines for performing the path integrals of statistical data assimilation. These results have been summarized in the monograph: “Predicting the Future: Completing Models of Observed Complex Systems” by Henry Abarbanel, published by Spring-Verlag in June 2013. Additional results and details have appeared in the peer reviewed literature.
Validating MEDIQUAL Constructs
NASA Astrophysics Data System (ADS)
Lee, Sang-Gun; Min, Jae H.
In this paper, we validate MEDIQUAL constructs through the different media users in help desk service. In previous research, only two end-users' constructs were used: assurance and responsiveness. In this paper, we extend MEDIQUAL constructs to include reliability, empathy, assurance, tangibles, and responsiveness, which are based on the SERVQUAL theory. The results suggest that: 1) five MEDIQUAL constructs are validated through the factor analysis. That is, importance of the constructs have relatively high correlations between measures of the same construct using different methods and low correlations between measures of the constructs that are expected to differ; and 2) five MEDIQUAL constructs are statistically significant on media users' satisfaction in help desk service by regression analysis.
VALUE - Validating and Integrating Downscaling Methods for Climate Change Research
NASA Astrophysics Data System (ADS)
Maraun, Douglas; Widmann, Martin; Benestad, Rasmus; Kotlarski, Sven; Huth, Radan; Hertig, Elke; Wibig, Joanna; Gutierrez, Jose
2013-04-01
Our understanding of global climate change is mainly based on General Circulation Models (GCMs) with a relatively coarse resolution. Since climate change impacts are mainly experienced on regional scales, high-resolution climate change scenarios need to be derived from GCM simulations by downscaling. Several projects have been carried out over the last years to validate the performance of statistical and dynamical downscaling, yet several aspects have not been systematically addressed: variability on sub-daily, decadal and longer time-scales, extreme events, spatial variability and inter-variable relationships. Different downscaling approaches such as dynamical downscaling, statistical downscaling and bias correction approaches have not been systematically compared. Furthermore, collaboration between different communities, in particular regional climate modellers, statistical downscalers and statisticians has been limited. To address these gaps, the EU Cooperation in Science and Technology (COST) action VALUE (www.value-cost.eu) has been brought into life. VALUE is a research network with participants from currently 23 European countries running from 2012 to 2015. Its main aim is to systematically validate and develop downscaling methods for climate change research in order to improve regional climate change scenarios for use in climate impact studies. Inspired by the co-design idea of the international research initiative "future earth", stakeholders of climate change information have been involved in the definition of research questions to be addressed and are actively participating in the network. The key idea of VALUE is to identify the relevant weather and climate characteristics required as input for a wide range of impact models and to define an open framework to systematically validate these characteristics. Based on a range of benchmark data sets, in principle every downscaling method can be validated and compared with competing methods. The results of this exercise will directly provide end users with important information about the uncertainty of regional climate scenarios, and will furthermore provide the basis for further developing downscaling methods. This presentation will provide background information on VALUE and discuss the identified characteristics and the validation framework.
Kelder, Johannes C; Cowie, Martin R; McDonagh, Theresa A; Hardman, Suzanna M C; Grobbee, Diederick E; Cost, Bernard; Hoes, Arno W
2011-06-01
Diagnosing early stages of heart failure with mild symptoms is difficult. B-type natriuretic peptide (BNP) has promising biochemical test characteristics, but its diagnostic yield on top of readily available diagnostic knowledge has not been sufficiently quantified in early stages of heart failure. To quantify the added diagnostic value of BNP for the diagnosis of heart failure in a population relevant to GPs and validate the findings in an independent primary care patient population. Individual patient data meta-analysis followed by external validation. The additional diagnostic yield of BNP above standard clinical information was compared with ECG and chest x-ray results. Derivation was performed on two existing datasets from Hillingdon (n=127) and Rotterdam (n=149) while the UK Natriuretic Peptide Study (n=306) served as validation dataset. Included were patients with suspected heart failure referred to a rapid-access diagnostic outpatient clinic. Case definition was according to the ESC guideline. Logistic regression was used to assess discrimination (with the c-statistic) and calibration. Of the 276 patients in the derivation set, 30.8% had heart failure. The clinical model (encompassing age, gender, known coronary artery disease, diabetes, orthopnoea, elevated jugular venous pressure, crackles, pitting oedema and S3 gallop) had a c-statistic of 0.79. Adding, respectively, chest x-ray results, ECG results or BNP to the clinical model increased the c-statistic to 0.84, 0.85 and 0.92. Neither ECG nor chest x-ray added significantly to the 'clinical plus BNP' model. All models had adequate calibration. The 'clinical plus BNP' diagnostic model performed well in an independent cohort with comparable inclusion criteria (c-statistic=0.91 and adequate calibration). Using separate cut-off values for 'ruling in' (typically implying referral for echocardiography) and for 'ruling out' heart failure--creating a grey zone--resulted in insufficient proportions of patients with a correct diagnosis. BNP has considerable diagnostic value in addition to signs and symptoms in patients suspected of heart failure in primary care. However, using BNP alone with the currently recommended cut-off levels is not sufficient to make a reliable diagnosis of heart failure.
Evaluation of tools used to measure calcium and/or dairy consumption in adults.
Magarey, Anthea; Baulderstone, Lauren; Yaxley, Alison; Markow, Kylie; Miller, Michelle
2015-05-01
To identify and critique tools for the assessment of Ca and/or dairy intake in adults, in order to ascertain the most accurate and reliable tools available. A systematic review of the literature was conducted using defined inclusion and exclusion criteria. Articles reporting on originally developed tools or testing the reliability or validity of existing tools that measure Ca and/or dairy intake in adults were included. Author-defined criteria for reporting reliability and validity properties were applied. Studies conducted in Western countries. Adults. Thirty papers, utilising thirty-six tools assessing intake of dairy, Ca or both, were identified. Reliability testing was conducted on only two dairy and five Ca tools, with results indicating that only one dairy and two Ca tools were reliable. Validity testing was conducted for all but four Ca-only tools. There was high reliance in validity testing on lower-order tests such as correlation and failure to differentiate between statistical and clinically meaningful differences. Results of the validity testing suggest one dairy and five Ca tools are valid. Thus one tool was considered both reliable and valid for the assessment of dairy intake and only two tools proved reliable and valid for the assessment of Ca intake. While several tools are reliable and valid, their application across adult populations is limited by the populations in which they were tested. These results indicate a need for tools that assess Ca and/or dairy intake in adults to be rigorously tested for reliability and validity.
Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro
2016-01-01
The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.
Gestal, Marcos; Munteanu, Cristian R.; Dorado, Julian; Pazos, Alejandro
2016-01-01
The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable. PMID:27920952
NASA Technical Reports Server (NTRS)
Bremner, Paul G.; Vazquez, Gabriel; Christiano, Daniel J.; Trout, Dawn H.
2016-01-01
Prediction of the maximum expected electromagnetic pick-up of conductors inside a realistic shielding enclosure is an important canonical problem for system-level EMC design of space craft, launch vehicles, aircraft and automobiles. This paper introduces a simple statistical power balance model for prediction of the maximum expected current in a wire conductor inside an aperture enclosure. It calculates both the statistical mean and variance of the immission from the physical design parameters of the problem. Familiar probability density functions can then be used to predict the maximum expected immission for deign purposes. The statistical power balance model requires minimal EMC design information and solves orders of magnitude faster than existing numerical models, making it ultimately viable for scaled-up, full system-level modeling. Both experimental test results and full wave simulation results are used to validate the foundational model.
Estimating Classification Consistency and Accuracy for Cognitive Diagnostic Assessment
ERIC Educational Resources Information Center
Cui, Ying; Gierl, Mark J.; Chang, Hua-Hua
2012-01-01
This article introduces procedures for the computation and asymptotic statistical inference for classification consistency and accuracy indices specifically designed for cognitive diagnostic assessments. The new classification indices can be used as important indicators of the reliability and validity of classification results produced by…
Sakunpak, Apirak; Suksaeree, Jirapornchai; Monton, Chaowalit; Pathompak, Pathamaporn; Kraisintu, Krisana
2014-01-01
Objective To develop and validate an image analysis method for quantitative analysis of γ-oryzanol in cold pressed rice bran oil. Methods TLC-densitometric and TLC-image analysis methods were developed, validated, and used for quantitative analysis of γ-oryzanol in cold pressed rice bran oil. The results obtained by these two different quantification methods were compared by paired t-test. Results Both assays provided good linearity, accuracy, reproducibility and selectivity for determination of γ-oryzanol. Conclusions The TLC-densitometric and TLC-image analysis methods provided a similar reproducibility, accuracy and selectivity for the quantitative determination of γ-oryzanol in cold pressed rice bran oil. A statistical comparison of the quantitative determinations of γ-oryzanol in samples did not show any statistically significant difference between TLC-densitometric and TLC-image analysis methods. As both methods were found to be equal, they therefore can be used for the determination of γ-oryzanol in cold pressed rice bran oil. PMID:25182282
Empirical validation of an agent-based model of wood markets in Switzerland
Hilty, Lorenz M.; Lemm, Renato; Thees, Oliver
2018-01-01
We present an agent-based model of wood markets and show our efforts to validate this model using empirical data from different sources, including interviews, workshops, experiments, and official statistics. Own surveys closed gaps where data was not available. Our approach to model validation used a variety of techniques, including the replication of historical production amounts, prices, and survey results, as well as a historical case study of a large sawmill entering the market and becoming insolvent only a few years later. Validating the model using this case provided additional insights, showing how the model can be used to simulate scenarios of resource availability and resource allocation. We conclude that the outcome of the rigorous validation qualifies the model to simulate scenarios concerning resource availability and allocation in our study region. PMID:29351300
A scoring system for ascertainment of incident stroke; the Risk Index Score (RISc).
Kass-Hout, T A; Moyé, L A; Smith, M A; Morgenstern, L B
2006-01-01
The main objective of this study was to develop and validate a computer-based statistical algorithm that could be translated into a simple scoring system in order to ascertain incident stroke cases using hospital admission medical records data. The Risk Index Score (RISc) algorithm was developed using data collected prospectively by the Brain Attack Surveillance in Corpus Christi (BASIC) project, 2000. The validity of RISc was evaluated by estimating the concordance of scoring system stroke ascertainment to stroke ascertainment by physician and/or abstractor review of hospital admission records. RISc was developed on 1718 randomly selected patients (training set) and then statistically validated on an independent sample of 858 patients (validation set). A multivariable logistic model was used to develop RISc and subsequently evaluated by goodness-of-fit and receiver operating characteristic (ROC) analyses. The higher the value of RISc, the higher the patient's risk of potential stroke. The study showed RISc was well calibrated and discriminated those who had potential stroke from those that did not on initial screening. In this study we developed and validated a rapid, easy, efficient, and accurate method to ascertain incident stroke cases from routine hospital admission records for epidemiologic investigations. Validation of this scoring system was achieved statistically; however, clinical validation in a community hospital setting is warranted.
Spörrle, Matthias; Strobel, Maria; Tumasjan, Andranik
2010-11-01
This research examines the incremental validity of irrational thinking as conceptualized by Albert Ellis to predict diverse aspects of subjective well-being while controlling for the influence of personality factors. Rational-emotive behavior therapy (REBT) argues that irrational beliefs result in maladaptive emotions leading to reduced well-being. Although there is some early scientific evidence for this relation, it has never been investigated whether this connection would still persist when statistically controlling for the Big Five personality factors, which were consistently found to be important determinants of well-being. Regression analyses revealed significant incremental validity of irrationality over personality factors when predicting life satisfaction, but not when predicting subjective happiness. Results are discussed with respect to conceptual differences between these two aspects of subjective well-being.
Hickey, Graeme L; Blackstone, Eugene H
2016-08-01
Clinical risk-prediction models serve an important role in healthcare. They are used for clinical decision-making and measuring the performance of healthcare providers. To establish confidence in a model, external model validation is imperative. When designing such an external model validation study, thought must be given to patient selection, risk factor and outcome definitions, missing data, and the transparent reporting of the analysis. In addition, there are a number of statistical methods available for external model validation. Execution of a rigorous external validation study rests in proper study design, application of suitable statistical methods, and transparent reporting. Copyright © 2016 The American Association for Thoracic Surgery. Published by Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Krantz, Timothy L.
2002-01-01
The Weibull distribution has been widely adopted for the statistical description and inference of fatigue data. This document provides user instructions, examples, and verification for software to analyze gear fatigue test data. The software was developed presuming the data are adequately modeled using a two-parameter Weibull distribution. The calculations are based on likelihood methods, and the approach taken is valid for data that include type 1 censoring. The software was verified by reproducing results published by others.
NASA Technical Reports Server (NTRS)
Kranz, Timothy L.
2002-01-01
The Weibull distribution has been widely adopted for the statistical description and inference of fatigue data. This document provides user instructions, examples, and verification for software to analyze gear fatigue test data. The software was developed presuming the data are adequately modeled using a two-parameter Weibull distribution. The calculations are based on likelihood methods, and the approach taken is valid for data that include type I censoring. The software was verified by reproducing results published by others.
Proposal and validation of a clinical trunk control test in individuals with spinal cord injury.
Quinzaños, J; Villa, A R; Flores, A A; Pérez, R
2014-06-01
One of the problems that arise in spinal cord injury (SCI) is alteration in trunk control. Despite the need for standardized scales, these do not exist for evaluating trunk control in SCI. To propose and validate a trunk control test in individuals with SCI. National Institute of Rehabilitation, Mexico. The test was developed and later evaluated for reliability and criteria, content, and construct validity. We carried out 531 tests on 177 patients and found high inter- and intra-rater reliability. In terms of criterion validity, analysis of variance demonstrated a statistically significant difference in the test score of patients with adequate or inadequate trunk control according to the assessment of a group of experts. A receiver operating characteristic curve was plotted for optimizing the instrument's cutoff point, which was determined at 13 points, with a sensitivity of 98% and a specificity of 92.2%. With regard to construct validity, the correlation between the proposed test and the spinal cord independence measure (SCIM) was 0.873 (P=0.001) and that with the evolution time was 0.437 (P=0.001). For testing the hypothesis with qualitative variables, the Kruskal-Wallis test was performed, which resulted in a statistically significant difference between the scores in the proposed scale of each group defined by these variables. It was proven experimentally that the proposed trunk control test is valid and reliable. Furthermore, the test can be used for all patients with SCI despite the type and level of injury.
Renaudin, Isabelle; Poliakoff, Françoise
2017-01-01
A working group established in the framework of the EUPHRESCO European collaborative project aimed to compare and validate diagnostic protocols for the detection of “Flavescence dorée” (FD) phytoplasma in grapevines. Seven molecular protocols were compared in an interlaboratory test performance study where each laboratory had to analyze the same panel of samples consisting of DNA extracts prepared by the organizing laboratory. The tested molecular methods consisted of universal and group-specific real-time and end-point nested PCR tests. Different statistical approaches were applied to this collaborative study. Firstly, there was the standard statistical approach consisting in analyzing samples which are known to be positive and samples which are known to be negative and reporting the proportion of false-positive and false-negative results to respectively calculate diagnostic specificity and sensitivity. This approach was supplemented by the calculation of repeatability and reproducibility for qualitative methods based on the notions of accordance and concordance. Other new approaches were also implemented, based, on the one hand, on the probability of detection model, and, on the other hand, on Bayes’ theorem. These various statistical approaches are complementary and give consistent results. Their combination, and in particular, the introduction of new statistical approaches give overall information on the performance and limitations of the different methods, and are particularly useful for selecting the most appropriate detection scheme with regards to the prevalence of the pathogen. Three real-time PCR protocols (methods M4, M5 and M6 respectively developed by Hren (2007), Pelletier (2009) and under patent oligonucleotides) achieved the highest levels of performance for FD phytoplasma detection. This paper also addresses the issue of indeterminate results and the identification of outlier results. The statistical tools presented in this paper and their combination can be applied to many other studies concerning plant pathogens and other disciplines that use qualitative detection methods. PMID:28384335
Chabirand, Aude; Loiseau, Marianne; Renaudin, Isabelle; Poliakoff, Françoise
2017-01-01
A working group established in the framework of the EUPHRESCO European collaborative project aimed to compare and validate diagnostic protocols for the detection of "Flavescence dorée" (FD) phytoplasma in grapevines. Seven molecular protocols were compared in an interlaboratory test performance study where each laboratory had to analyze the same panel of samples consisting of DNA extracts prepared by the organizing laboratory. The tested molecular methods consisted of universal and group-specific real-time and end-point nested PCR tests. Different statistical approaches were applied to this collaborative study. Firstly, there was the standard statistical approach consisting in analyzing samples which are known to be positive and samples which are known to be negative and reporting the proportion of false-positive and false-negative results to respectively calculate diagnostic specificity and sensitivity. This approach was supplemented by the calculation of repeatability and reproducibility for qualitative methods based on the notions of accordance and concordance. Other new approaches were also implemented, based, on the one hand, on the probability of detection model, and, on the other hand, on Bayes' theorem. These various statistical approaches are complementary and give consistent results. Their combination, and in particular, the introduction of new statistical approaches give overall information on the performance and limitations of the different methods, and are particularly useful for selecting the most appropriate detection scheme with regards to the prevalence of the pathogen. Three real-time PCR protocols (methods M4, M5 and M6 respectively developed by Hren (2007), Pelletier (2009) and under patent oligonucleotides) achieved the highest levels of performance for FD phytoplasma detection. This paper also addresses the issue of indeterminate results and the identification of outlier results. The statistical tools presented in this paper and their combination can be applied to many other studies concerning plant pathogens and other disciplines that use qualitative detection methods.
IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics.
Hoyt, Robert Eugene; Snider, Dallas; Thompson, Carla; Mantravadi, Sarita
2016-10-11
We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix. IBMWA is a new alternative for data analytics software that automates descriptive, predictive, and visual analytics. This program is very user-friendly but requires data preprocessing, statistical conceptual understanding, and domain expertise.
Quantifying falsifiability of scientific theories
NASA Astrophysics Data System (ADS)
Nemenman, Ilya
I argue that the notion of falsifiability, a key concept in defining a valid scientific theory, can be quantified using Bayesian Model Selection, which is a standard tool in modern statistics. This relates falsifiability to the quantitative version of the statistical Occam's razor, and allows transforming some long-running arguments about validity of scientific theories from philosophical discussions to rigorous mathematical calculations.
Statistical analysis of the MODIS atmosphere products for the Tomsk region
NASA Astrophysics Data System (ADS)
Afonin, Sergey V.; Belov, Vladimir V.; Engel, Marina V.
2005-10-01
The paper presents the results of using the MODIS Atmosphere Products satellite information to study the atmospheric characteristics (the aerosol and water vapor) in the Tomsk Region (56-61°N, 75-90°E) in 2001-2004. The satellite data were received from the NASA Goddard Distributed Active Archive Center (DAAC) through the INTERNET.To use satellite data for a solution of scientific and applied problems, it is very important to know their accuracy. Despite the results of validation of the MODIS data have already been available in the literature, we decided to carry out additional investigations for the Tomsk Region. The paper presents the results of validation of the aerosol optical thickness (AOT) and total column precipitable water (TCPW), which are in good agreement with the test data. The statistical analysis revealed some interesting facts. Thus, for example, analyzing the data on the spatial distribution of the average seasonal values of AOT or TCPW for 2001-2003 in the Tomsk Region, we established that instead of the expected spatial homogeneity of these distributions, they have similar spatial structures.
A psychometric evaluation of the Rorschach comprehensive system's perceptual thinking index.
Dao, Tam K; Prevatt, Frances
2006-04-01
In this study, we investigated evidence for reliability and validity of the Perceptual Thinking Index (PTI; Exner, 2000a, 2000b) among an adult inpatient population. We conducted reliability and validity analyses on 107 patients who met the Diagnostic and Statistical Manual of Mental Disorders (4th ed., text revision; American Psychiatric Association, 2000) criteria for a schizophrenia-spectrum disorder (SSD) or mood disorder with no psychotic features (MD). Results provided support for interrater reliability as well as internal consistency of the PTI. Furthermore, the PTI was an effective index in differentiating SSD patients from patients diagnosed with an MD. Finally, the PTI demonstrated adequate diagnostic statistics that can be useful in the classification of patients diagnosed with SSD and MD. We discuss methodological issues, implications for assessment practice, and directions for future research.
The Serbian version of the Juvenile Arthritis Multidimensional Assessment Report (JAMAR).
Susic, Gordana; Vojinovic, Jelena; Vijatov-Djuric, Gordana; Stevanovic, Dejan; Lazarevic, Dragana; Djurovic, Nada; Novakovic, Dusica; Consolaro, Alessandro; Bovis, Francesca; Ruperto, Nicolino
2018-04-01
The Juvenile Arthritis Multidimensional Assessment Report (JAMAR) is a new parent/patient-reported outcome measure that enables a thorough assessment of the disease status in children with juvenile idiopathic arthritis (JIA). We report the results of the cross-cultural adaptation and validation of the parent and patient versions of the JAMAR in the Serbian language. The reading comprehension of the questionnaire was tested in 10 JIA parents and patients. Each participating centre was asked to collect demographic, clinical data and the JAMAR in 100 consecutive JIA patients or all consecutive patients seen in a 6-month period and to administer the JAMAR to 100 healthy children and their parents. The statistical validation phase explored descriptive statistics and the psychometric issues of the JAMAR: the three Likert assumptions, floor/ceiling effects, internal consistency, Cronbach's alpha, interscale correlations, test-retest reliability, and construct validity (convergent and discriminant validity). A total of 248 JIA patients (5.2% systemic, 44.3% oligoarticular, 23.8% RF-negative polyarthritis, 26.7% other categories) and 100 healthy children were enrolled in three centres. The JAMAR components discriminated healthy subjects from JIA patients. All JAMAR components revealed good psychometric performances. In conclusion, the Serbian version of the JAMAR is a valid tool for the assessment of children with JIA and is suitable for use both in routine clinical practice and clinical research.
The German version of the Juvenile Arthritis Multidimensional Assessment Report (JAMAR).
Holzinger, Dirk; Foell, Dirk; Horneff, Gerd; Foeldvari, Ivan; Tzaribachev, Nikolay; Tzaribachev, Catrin; Minden, Kirsten; Kallinich, Tilmann; Ganser, Gerd; Clara, Lucia; Haas, Johannes-Peter; Hügle, Boris; Huppertz, Hans-Iko; Weller, Frank; Consolaro, Alessandro; Bovis, Francesca; Ruperto, Nicolino
2018-04-01
The Juvenile Arthritis Multidimensional Assessment Report (JAMAR) is a new parent/patient reported outcome measure that enables a thorough assessment of the disease status in children with juvenile idiopathic arthritis (JIA). We report the results of the cross-cultural adaptation and validation of the parent and patient versions of the JAMAR in the German language. The reading comprehension of the questionnaire was tested in 10 JIA parents and patients. The participating centres were asked to collect demographic and clinical data along the JAMAR questionnaire in 100 consecutive JIA patients or all consecutive patients seen in a 6-month period and to administer the JAMAR to 100 healthy children and their parents. The statistical validation phase explored descriptive statistics and the psychometric issues of the JAMAR: the three Likert assumptions, floor/ceiling effects, internal consistency, Cronbach's alpha, interscale correlations, test-retest reliability, and construct validity (convergent and discriminant validity). A total of 319 JIA patients (2.8% systemic, 36.7% oligoarticular, 23.5% RF negative polyarthritis, and 37% other categories) and 100 healthy children were enrolled in eight centres. The JAMAR components discriminated well healthy subjects from JIA patients. All JAMAR components revealed good psychometric performances. In conclusion, the German version of the JAMAR is a valid tool for the assessment of children with JIA and is suitable for use both in routine clinical practice and in clinical research.
El Khattabi, Laïla Allach; Rouillac-Le Sciellour, Christelle; Le Tessier, Dominique; Luscan, Armelle; Coustier, Audrey; Porcher, Raphael; Bhouri, Rakia; Nectoux, Juliette; Sérazin, Valérie; Quibel, Thibaut; Mandelbrot, Laurent; Tsatsaris, Vassilis
2016-01-01
Objective NIPT for fetal aneuploidy by digital PCR has been hampered by the large number of PCR reactions needed to meet statistical requirements, preventing clinical application. Here, we designed an octoplex droplet digital PCR (ddPCR) assay which allows increasing the number of available targets and thus overcomes statistical obstacles. Method After technical optimization of the multiplex PCR on mixtures of trisomic and euploid DNA, we performed a validation study on samples of plasma DNA from 213 pregnant women. Molecular counting of circulating cell-free DNA was performed using a mix of hydrolysis probes targeting chromosome 21 and a reference chromosome. Results The results of our validation experiments showed that ddPCR detected trisomy 21 even when the sample’s trisomic DNA content is as low as 5%. In a validation study of plasma samples from 213 pregnant women, ddPCR discriminated clearly between the trisomy 21 and the euploidy groups. Conclusion Our results demonstrate that digital PCR can meet the requirements for non-invasive prenatal testing of trisomy 21. This approach is technically simple, relatively cheap, easy to implement in a diagnostic setting and compatible with ethical concerns regarding access to nucleotide sequence information. These advantages make it a potential technique of choice for population-wide screening for trisomy 21 in pregnant women. PMID:27167625
Précis of statistical significance: rationale, validity, and utility.
Chow, S L
1998-04-01
The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of data. Neither can any of them fulfill the nonstatistical functions expected of them by critics.
Pattern statistics on Markov chains and sensitivity to parameter estimation.
Nuel, Grégory
2006-10-17
In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of sigma, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation.
ERIC Educational Resources Information Center
Martuza, Victor R.; Engel, John D.
Results from classical power analysis (Brewer, 1972) suggest that a researcher should not set a=p (when p is less than a) in a posteriori fashion when a study yields statistically significant results because of a resulting decrease in power. The purpose of the present report is to use Bayesian theory in examining the validity of this…
Miao, Hui; Hartman, Mikael; Bhoo-Pathy, Nirmala; Lee, Soo-Chin; Taib, Nur Aishah; Tan, Ern-Yu; Chan, Patrick; Moons, Karel G. M.; Wong, Hoong-Seam; Goh, Jeremy; Rahim, Siti Mastura; Yip, Cheng-Har; Verkooijen, Helena M.
2014-01-01
Background In Asia, up to 25% of breast cancer patients present with distant metastases at diagnosis. Given the heterogeneous survival probabilities of de novo metastatic breast cancer, individual outcome prediction is challenging. The aim of the study is to identify existing prognostic models for patients with de novo metastatic breast cancer and validate them in Asia. Materials and Methods We performed a systematic review to identify prediction models for metastatic breast cancer. Models were validated in 642 women with de novo metastatic breast cancer registered between 2000 and 2010 in the Singapore Malaysia Hospital Based Breast Cancer Registry. Survival curves for low, intermediate and high-risk groups according to each prognostic score were compared by log-rank test and discrimination of the models was assessed by concordance statistic (C-statistic). Results We identified 16 prediction models, seven of which were for patients with brain metastases only. Performance status, estrogen receptor status, metastatic site(s) and disease-free interval were the most common predictors. We were able to validate nine prediction models. The capacity of the models to discriminate between poor and good survivors varied from poor to fair with C-statistics ranging from 0.50 (95% CI, 0.48–0.53) to 0.63 (95% CI, 0.60–0.66). Conclusion The discriminatory performance of existing prediction models for de novo metastatic breast cancer in Asia is modest. Development of an Asian-specific prediction model is needed to improve prognostication and guide decision making. PMID:24695692
NASA Astrophysics Data System (ADS)
Langley, Robin S.
2018-03-01
This work is concerned with the statistical properties of the frequency response function of the energy of a random system. Earlier studies have considered the statistical distribution of the function at a single frequency, or alternatively the statistics of a band-average of the function. In contrast the present analysis considers the statistical fluctuations over a frequency band, and results are obtained for the mean rate at which the function crosses a specified level (or equivalently, the average number of times the level is crossed within the band). Results are also obtained for the probability of crossing a specified level at least once, the mean rate of occurrence of peaks, and the mean trough-to-peak height. The analysis is based on the assumption that the natural frequencies and mode shapes of the system have statistical properties that are governed by the Gaussian Orthogonal Ensemble (GOE), and the validity of this assumption is demonstrated by comparison with numerical simulations for a random plate. The work has application to the assessment of the performance of dynamic systems that are sensitive to random imperfections.
ERIC Educational Resources Information Center
Acar, Tu¨lin
2014-01-01
In literature, it has been observed that many enhanced criteria are limited by factor analysis techniques. Besides examinations of statistical structure and/or psychological structure, such validity studies as cross validation and classification-sequencing studies should be performed frequently. The purpose of this study is to examine cross…
45 CFR 153.350 - Risk adjustment data validation standards.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 45 Public Welfare 1 2013-10-01 2013-10-01 false Risk adjustment data validation standards. 153.350... validation standards. (a) General requirement. The State, or HHS on behalf of the State, must ensure proper implementation of any risk adjustment software and ensure proper validation of a statistically valid sample of...
45 CFR 153.350 - Risk adjustment data validation standards.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 45 Public Welfare 1 2014-10-01 2014-10-01 false Risk adjustment data validation standards. 153.350... validation standards. (a) General requirement. The State, or HHS on behalf of the State, must ensure proper implementation of any risk adjustment software and ensure proper validation of a statistically valid sample of...
Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods.
Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J Sunil
2014-08-01
We introduce a survival/risk bump hunting framework to build a bump hunting model with a possibly censored time-to-event type of response and to validate model estimates. First, we describe the use of adequate survival peeling criteria to build a survival/risk bump hunting model based on recursive peeling methods. Our method called "Patient Recursive Survival Peeling" is a rule-induction method that makes use of specific peeling criteria such as hazard ratio or log-rank statistics. Second, to validate our model estimates and improve survival prediction accuracy, we describe a resampling-based validation technique specifically designed for the joint task of decision rule making by recursive peeling (i.e. decision-box) and survival estimation. This alternative technique, called "combined" cross-validation is done by combining test samples over the cross-validation loops, a design allowing for bump hunting by recursive peeling in a survival setting. We provide empirical results showing the importance of cross-validation and replication.
Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods
Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J. Sunil
2015-01-01
We introduce a survival/risk bump hunting framework to build a bump hunting model with a possibly censored time-to-event type of response and to validate model estimates. First, we describe the use of adequate survival peeling criteria to build a survival/risk bump hunting model based on recursive peeling methods. Our method called “Patient Recursive Survival Peeling” is a rule-induction method that makes use of specific peeling criteria such as hazard ratio or log-rank statistics. Second, to validate our model estimates and improve survival prediction accuracy, we describe a resampling-based validation technique specifically designed for the joint task of decision rule making by recursive peeling (i.e. decision-box) and survival estimation. This alternative technique, called “combined” cross-validation is done by combining test samples over the cross-validation loops, a design allowing for bump hunting by recursive peeling in a survival setting. We provide empirical results showing the importance of cross-validation and replication. PMID:26997922
A Possible Tool for Checking Errors in the INAA Results, Based on Neutron Data and Method Validation
NASA Astrophysics Data System (ADS)
Cincu, Em.; Grigore, Ioana Manea; Barbos, D.; Cazan, I. L.; Manu, V.
2008-08-01
This work presents preliminary results of a new type of possible application in the INAA experiments of elemental analysis, useful to check errors occurred during investigation of unknown samples; it relies on the INAA method validation experiments and accuracy of the neutron data from the literature. The paper comprises 2 sections, the first one presents—in short—the steps of the experimental tests carried out for INAA method validation and for establishing the `ACTIVA-N' laboratory performance, which is-at the same time-an illustration of the laboratory evolution on the way to get performance. Section 2 presents our recent INAA results on CRMs, of which interpretation opens discussions about the usefulness of using a tool for checking possible errors, different from the usual statistical procedures. The questionable aspects and the requirements to develop a practical checking tool are discussed.
Accounting for Unexpected Test Responses through Examinees' and Their Teachers' Explanations
ERIC Educational Resources Information Center
Petridou, Alexandra; Williams, Julian
2010-01-01
Researchers have developed indices to identify persons whose test results "misfit" and are considered statistically "aberrant" or "unexpected" and whose measures are consequently potentially invalid, drawing the test's validity into question. This study draws on interviews of pupils and their teachers, using a sample…
The Consolidation/Transition Model in Moral Reasoning Development.
ERIC Educational Resources Information Center
Walker, Lawrence J.; Gustafson, Paul; Hennig, Karl H.
2001-01-01
This longitudinal study with 62 children and adolescents examined the validity of the consolidation/transition model in the context of moral reasoning development. Results of standard statistical and Bayesian techniques supported the hypotheses regarding cyclical patterns of change and predictors of stage transition, and demonstrated the utility…
NASA Astrophysics Data System (ADS)
Mullan, Donal; Chen, Jie; Zhang, Xunchang John
2016-02-01
Statistical downscaling (SD) methods have become a popular, low-cost and accessible means of bridging the gap between the coarse spatial resolution at which climate models output climate scenarios and the finer spatial scale at which impact modellers require these scenarios, with various different SD techniques used for a wide range of applications across the world. This paper compares the Generator for Point Climate Change (GPCC) model and the Statistical DownScaling Model (SDSM)—two contrasting SD methods—in terms of their ability to generate precipitation series under non-stationary conditions across ten contrasting global climates. The mean, maximum and a selection of distribution statistics as well as the cumulative frequencies of dry and wet spells for four different temporal resolutions were compared between the models and the observed series for a validation period. Results indicate that both methods can generate daily precipitation series that generally closely mirror observed series for a wide range of non-stationary climates. However, GPCC tends to overestimate higher precipitation amounts, whilst SDSM tends to underestimate these. This infers that GPCC is more likely to overestimate the effects of precipitation on a given impact sector, whilst SDSM is likely to underestimate the effects. GPCC performs better than SDSM in reproducing wet and dry day frequency, which is a key advantage for many impact sectors. Overall, the mixed performance of the two methods illustrates the importance of users performing a thorough validation in order to determine the influence of simulated precipitation on their chosen impact sector.
Diagnosis checking of statistical analysis in RCTs indexed in PubMed.
Lee, Paul H; Tse, Andy C Y
2017-11-01
Statistical analysis is essential for reporting of the results of randomized controlled trials (RCTs), as well as evaluating their effectiveness. However, the validity of a statistical analysis also depends on whether the assumptions of that analysis are valid. To review all RCTs published in journals indexed in PubMed during December 2014 to provide a complete picture of how RCTs handle assumptions of statistical analysis. We reviewed all RCTs published in December 2014 that appeared in journals indexed in PubMed using the Cochrane highly sensitive search strategy. The 2014 impact factors of the journals were used as proxies for their quality. The type of statistical analysis used and whether the assumptions of the analysis were tested were reviewed. In total, 451 papers were included. Of the 278 papers that reported a crude analysis for the primary outcomes, 31 (27·2%) reported whether the outcome was normally distributed. Of the 172 papers that reported an adjusted analysis for the primary outcomes, diagnosis checking was rarely conducted, with only 20%, 8·6% and 7% checked for generalized linear model, Cox proportional hazard model and multilevel model, respectively. Study characteristics (study type, drug trial, funding sources, journal type and endorsement of CONSORT guidelines) were not associated with the reporting of diagnosis checking. The diagnosis of statistical analyses in RCTs published in PubMed-indexed journals was usually absent. Journals should provide guidelines about the reporting of a diagnosis of assumptions. © 2017 Stichting European Society for Clinical Investigation Journal Foundation.
77 FR 46096 - Statistical Process Controls for Blood Establishments; Public Workshop
Federal Register 2010, 2011, 2012, 2013, 2014
2012-08-02
...] Statistical Process Controls for Blood Establishments; Public Workshop AGENCY: Food and Drug Administration... workshop entitled: ``Statistical Process Controls for Blood Establishments.'' The purpose of this public workshop is to discuss the implementation of statistical process controls to validate and monitor...
Errors in reporting on dissolution research: methodological and statistical implications.
Jasińska-Stroschein, Magdalena; Kurczewska, Urszula; Orszulak-Michalak, Daria
2017-02-01
In vitro dissolution testing provides useful information at clinical and preclinical stages of the drug development process. The study includes pharmaceutical papers on dissolution research published in Polish journals between 2010 and 2015. They were analyzed with regard to information provided by authors about chosen methods, performed validation, statistical reporting or assumptions used to properly compare release profiles considering the present guideline documents addressed to dissolution methodology and its validation. Of all the papers included in the study, 23.86% presented at least one set of validation parameters, 63.64% gave the results of the weight uniformity test, 55.68% content determination, 97.73% dissolution testing conditions, and 50% discussed a comparison of release profiles. The assumptions for methods used to compare dissolution profiles were discussed in 6.82% of papers. By means of example analyses, we demonstrate that the outcome can be influenced by the violation of several assumptions or selection of an improper method to compare dissolution profiles. A clearer description of the procedures would undoubtedly increase the quality of papers in this area.
Harrison, Jay M; Breeze, Matthew L; Harrigan, George G
2011-08-01
Statistical comparisons of compositional data generated on genetically modified (GM) crops and their near-isogenic conventional (non-GM) counterparts typically rely on classical significance testing. This manuscript presents an introduction to Bayesian methods for compositional analysis along with recommendations for model validation. The approach is illustrated using protein and fat data from two herbicide tolerant GM soybeans (MON87708 and MON87708×MON89788) and a conventional comparator grown in the US in 2008 and 2009. Guidelines recommended by the US Food and Drug Administration (FDA) in conducting Bayesian analyses of clinical studies on medical devices were followed. This study is the first Bayesian approach to GM and non-GM compositional comparisons. The evaluation presented here supports a conclusion that a Bayesian approach to analyzing compositional data can provide meaningful and interpretable results. We further describe the importance of method validation and approaches to model checking if Bayesian approaches to compositional data analysis are to be considered viable by scientists involved in GM research and regulation. Copyright © 2011 Elsevier Inc. All rights reserved.
Validation of the Fatigue Impact Scale in Hungarian patients with multiple sclerosis.
Losonczi, Erika; Bencsik, Krisztina; Rajda, Cecília; Lencsés, Gyula; Török, Margit; Vécsei, László
2011-03-01
Fatigue is one of the most frequent complaints of patients with multiple sclerosis (MS). The Fatigue Impact Scale (FIS), one of the 30 available fatigue questionnaires, is commonly applied because it evaluates multidimensional aspects of fatigue. The main purposes of this study were to test the validity, test-retest reliability, and internal consistency of the Hungarian version of the FIS. One hundred and eleven MS patients and 85 healthy control (HC) subjects completed the FIS and the Beck Depression Inventory, a large majority of them on two occasions, 3 months apart. The total FIS score and subscale scores differed statistically between the MS patients and the HC subjects in both FIS sessions. In the test-retest reliability assessment, statistically, the intraclass correlation coefficients were high in both the MS and HC groups. Cronbach's alpha values were also notably high. The results of this study indicate that the FIS can be regarded as a valid and reliable scale with which to improve our understanding of the impact of fatigue on the health-related quality of life in MS patients without severe disability.
An Empirical Taxonomy of Hospital Governing Board Roles
Lee, Shoou-Yih D; Alexander, Jeffrey A; Wang, Virginia; Margolin, Frances S; Combes, John R
2008-01-01
Objective To develop a taxonomy of governing board roles in U.S. hospitals. Data Sources 2005 AHA Hospital Governance Survey, 2004 AHA Annual Survey of Hospitals, and Area Resource File. Study Design A governing board taxonomy was developed using cluster analysis. Results were validated and reviewed by industry experts. Differences in hospital and environmental characteristics across clusters were examined. Data Extraction Methods One-thousand three-hundred thirty-four hospitals with complete information on the study variables were included in the analysis. Principal Findings Five distinct clusters of hospital governing boards were identified. Statistical tests showed that the five clusters had high internal reliability and high internal validity. Statistically significant differences in hospital and environmental conditions were found among clusters. Conclusions The developed taxonomy provides policy makers, health care executives, and researchers a useful way to describe and understand hospital governing board roles. The taxonomy may also facilitate valid and systematic assessment of governance performance. Further, the taxonomy could be used as a framework for governing boards themselves to identify areas for improvement and direction for change. PMID:18355260
Longobardi, Francesco; Innamorato, Valentina; Di Gioia, Annalisa; Ventrella, Andrea; Lippolis, Vincenzo; Logrieco, Antonio F; Catucci, Lucia; Agostiano, Angela
2017-12-15
Lentil samples coming from two different countries, i.e. Italy and Canada, were analysed using untargeted 1 H NMR fingerprinting in combination with chemometrics in order to build models able to classify them according to their geographical origin. For such aim, Soft Independent Modelling of Class Analogy (SIMCA), k-Nearest Neighbor (k-NN), Principal Component Analysis followed by Linear Discriminant Analysis (PCA-LDA) and Partial Least Squares-Discriminant Analysis (PLS-DA) were applied to the NMR data and the results were compared. The best combination of average recognition (100%) and cross-validation prediction abilities (96.7%) was obtained for the PCA-LDA. All the statistical models were validated both by using a test set and by carrying out a Monte Carlo Cross Validation: the obtained performances were found to be satisfying for all the models, with prediction abilities higher than 95% demonstrating the suitability of the developed methods. Finally, the metabolites that mostly contributed to the lentil discrimination were indicated. Copyright © 2017 Elsevier Ltd. All rights reserved.
El Khattabi, Laïla Allach; Rouillac-Le Sciellour, Christelle; Le Tessier, Dominique; Luscan, Armelle; Coustier, Audrey; Porcher, Raphael; Bhouri, Rakia; Nectoux, Juliette; Sérazin, Valérie; Quibel, Thibaut; Mandelbrot, Laurent; Tsatsaris, Vassilis; Vialard, François; Dupont, Jean-Michel
2016-01-01
NIPT for fetal aneuploidy by digital PCR has been hampered by the large number of PCR reactions needed to meet statistical requirements, preventing clinical application. Here, we designed an octoplex droplet digital PCR (ddPCR) assay which allows increasing the number of available targets and thus overcomes statistical obstacles. After technical optimization of the multiplex PCR on mixtures of trisomic and euploid DNA, we performed a validation study on samples of plasma DNA from 213 pregnant women. Molecular counting of circulating cell-free DNA was performed using a mix of hydrolysis probes targeting chromosome 21 and a reference chromosome. The results of our validation experiments showed that ddPCR detected trisomy 21 even when the sample's trisomic DNA content is as low as 5%. In a validation study of plasma samples from 213 pregnant women, ddPCR discriminated clearly between the trisomy 21 and the euploidy groups. Our results demonstrate that digital PCR can meet the requirements for non-invasive prenatal testing of trisomy 21. This approach is technically simple, relatively cheap, easy to implement in a diagnostic setting and compatible with ethical concerns regarding access to nucleotide sequence information. These advantages make it a potential technique of choice for population-wide screening for trisomy 21 in pregnant women.
Onwujekwe, Obinna; Fox-Rushby, Julia; Hanson, Kara
2008-01-01
This study examines whether making question formats better fit the cultural context of markets would improve the construct validity of estimates of willingness to pay (WTP). WTP for insecticide-treated mosquito nets was elicited using the bidding game, binary with follow-up (BWFU), and a novel structured haggling technique (SH) that mimicked price taking in market places in the study area. The results show that different question formats generated different distributions of WTP. Following a comparison of alternative models for each question format, construct validity was compared using the most consistently appropriate model across question formats for the positive WTP values, in this case, ordinary least squares. Three criteria (the number of statistically significant explanatory variables that had the anticipated sign, the value of the adjusted R(2), and the proportion that were statistically significant With the anticipated sign) used to assess the relative performance of each question format indicated that SH performed best and BWFU worst. However, differences in the levels of income, education, and percentage of household heads responding to the different question formats across the samples complicate this conclusion. Hence, the results suggest that the SH technique is worthy of further investigation and use.
Skates, Steven J.; Gillette, Michael A.; LaBaer, Joshua; Carr, Steven A.; Anderson, N. Leigh; Liebler, Daniel C.; Ransohoff, David; Rifai, Nader; Kondratovich, Marina; Težak, Živana; Mansfield, Elizabeth; Oberg, Ann L.; Wright, Ian; Barnes, Grady; Gail, Mitchell; Mesri, Mehdi; Kinsinger, Christopher R.; Rodriguez, Henry; Boja, Emily S.
2014-01-01
Protein biomarkers are needed to deepen our understanding of cancer biology and to improve our ability to diagnose, monitor and treat cancers. Important analytical and clinical hurdles must be overcome to allow the most promising protein biomarker candidates to advance into clinical validation studies. Although contemporary proteomics technologies support the measurement of large numbers of proteins in individual clinical specimens, sample throughput remains comparatively low. This problem is amplified in typical clinical proteomics research studies, which routinely suffer from a lack of proper experimental design, resulting in analysis of too few biospecimens to achieve adequate statistical power at each stage of a biomarker pipeline. To address this critical shortcoming, a joint workshop was held by the National Cancer Institute (NCI), National Heart, Lung and Blood Institute (NHLBI), and American Association for Clinical Chemistry (AACC), with participation from the U.S. Food and Drug Administration (FDA). An important output from the workshop was a statistical framework for the design of biomarker discovery and verification studies. Herein, we describe the use of quantitative clinical judgments to set statistical criteria for clinical relevance, and the development of an approach to calculate biospecimen sample size for proteomic studies in discovery and verification stages prior to clinical validation stage. This represents a first step towards building a consensus on quantitative criteria for statistical design of proteomics biomarker discovery and verification research. PMID:24063748
Skates, Steven J; Gillette, Michael A; LaBaer, Joshua; Carr, Steven A; Anderson, Leigh; Liebler, Daniel C; Ransohoff, David; Rifai, Nader; Kondratovich, Marina; Težak, Živana; Mansfield, Elizabeth; Oberg, Ann L; Wright, Ian; Barnes, Grady; Gail, Mitchell; Mesri, Mehdi; Kinsinger, Christopher R; Rodriguez, Henry; Boja, Emily S
2013-12-06
Protein biomarkers are needed to deepen our understanding of cancer biology and to improve our ability to diagnose, monitor, and treat cancers. Important analytical and clinical hurdles must be overcome to allow the most promising protein biomarker candidates to advance into clinical validation studies. Although contemporary proteomics technologies support the measurement of large numbers of proteins in individual clinical specimens, sample throughput remains comparatively low. This problem is amplified in typical clinical proteomics research studies, which routinely suffer from a lack of proper experimental design, resulting in analysis of too few biospecimens to achieve adequate statistical power at each stage of a biomarker pipeline. To address this critical shortcoming, a joint workshop was held by the National Cancer Institute (NCI), National Heart, Lung, and Blood Institute (NHLBI), and American Association for Clinical Chemistry (AACC) with participation from the U.S. Food and Drug Administration (FDA). An important output from the workshop was a statistical framework for the design of biomarker discovery and verification studies. Herein, we describe the use of quantitative clinical judgments to set statistical criteria for clinical relevance and the development of an approach to calculate biospecimen sample size for proteomic studies in discovery and verification stages prior to clinical validation stage. This represents a first step toward building a consensus on quantitative criteria for statistical design of proteomics biomarker discovery and verification research.
Validation of a scenario-based assessment of critical thinking using an externally validated tool.
Buur, Jennifer L; Schmidt, Peggy; Smylie, Dean; Irizarry, Kris; Crocker, Carlos; Tyler, John; Barr, Margaret
2012-01-01
With medical education transitioning from knowledge-based curricula to competency-based curricula, critical thinking skills have emerged as a major competency. While there are validated external instruments for assessing critical thinking, many educators have created their own custom assessments of critical thinking. However, the face validity of these assessments has not been challenged. The purpose of this study was to compare results from a custom assessment of critical thinking with the results from a validated external instrument of critical thinking. Students from the College of Veterinary Medicine at Western University of Health Sciences were administered a custom assessment of critical thinking (ACT) examination and the externally validated instrument, California Critical Thinking Skills Test (CCTST), in the spring of 2011. Total scores and sub-scores from each exam were analyzed for significant correlations using Pearson correlation coefficients. Significant correlations between ACT Blooms 2 and deductive reasoning and total ACT score and deductive reasoning were demonstrated with correlation coefficients of 0.24 and 0.22, respectively. No other statistically significant correlations were found. The lack of significant correlation between the two examinations illustrates the need in medical education to externally validate internal custom assessments. Ultimately, the development and validation of custom assessments of non-knowledge-based competencies will produce higher quality medical professionals.
Method development and validation of potent pyrimidine derivative by UV-VIS spectrophotometer.
Chaudhary, Anshu; Singh, Anoop; Verma, Prabhakar Kumar
2014-12-01
A rapid and sensitive ultraviolet-visible (UV-VIS) spectroscopic method was developed for the estimation of pyrimidine derivative 6-Bromo-3-(6-(2,6-dichlorophenyl)-2-(morpolinomethylamino) pyrimidine4-yl) -2H-chromen-2-one (BT10M) in bulk form. Pyrimidine derivative was monitored at 275 nm with UV detection, and there is no interference of diluents at 275 nm. The method was found to be linear in the range of 50 to 150 μg/ml. The accuracy and precision were determined and validated statistically. The method was validated as a guideline. The results showed that the proposed method is suitable for the accurate, precise, and rapid determination of pyrimidine derivative. Graphical Abstract Method development and validation of potent pyrimidine derivative by UV spectroscopy.
NASA DOE POD NDE Capabilities Data Book
NASA Technical Reports Server (NTRS)
Generazio, Edward R.
2015-01-01
This data book contains the Directed Design of Experiments for Validating Probability of Detection (POD) Capability of NDE Systems (DOEPOD) analyses of the nondestructive inspection data presented in the NTIAC, Nondestructive Evaluation (NDE) Capabilities Data Book, 3rd ed., NTIAC DB-97-02. DOEPOD is designed as a decision support system to validate inspection system, personnel, and protocol demonstrating 0.90 POD with 95% confidence at critical flaw sizes, a90/95. The test methodology used in DOEPOD is based on the field of statistical sequential analysis founded by Abraham Wald. Sequential analysis is a method of statistical inference whose characteristic feature is that the number of observations required by the procedure is not determined in advance of the experiment. The decision to terminate the experiment depends, at each stage, on the results of the observations previously made. A merit of the sequential method, as applied to testing statistical hypotheses, is that test procedures can be constructed which require, on average, a substantially smaller number of observations than equally reliable test procedures based on a predetermined number of observations.
NASA Astrophysics Data System (ADS)
Fauziah, D.; Mardiyana; Saputro, D. R. S.
2018-05-01
Assessment is an integral part in the learning process. The process and the result should be in line, regarding to measure the ability of learners. Authentic assessment refers to a form of assessment that measures the competence of attitudes, knowledge, and skills. In fact, many teachers including mathematics teachers who have implemented curriculum based teaching 2013 feel confuse and difficult in mastering the use of authentic assessment instruments. Therefore, it is necessary to design an authentic assessment instrument with an interactive mini media project where teacher can adopt it in the assessment. The type of this research is developmental research. The developmental research refers to the 4D models development, which consist of four stages: define, design, develop and disseminate. The research purpose is to create a valid mini project interactive media on statistical materials in junior high school. The retrieved valid instrument based on expert judgment are 3,1 for eligibility constructions aspect, and 3,2 for eligibility presentation aspect, 3,25 for eligibility contents aspect, and 2,9 for eligibility didactic aspect. The research results obtained interactive mini media projects on statistical materials using Adobe Flash so it can help teachers and students in achieving learning objectives.
Martin, Kevin D; Amendola, Annunziato; Phisitkul, Phinit
2016-01-01
Abstract Purpose Orthopedic education continues to move towards evidence-based curriculum in order to comply with new residency accreditation mandates. There are currently three high fidelity arthroscopic virtual reality (VR) simulators available, each with multiple instructional modules and simulated arthroscopic procedures. The aim of the current study is to assess face validity, defined as the degree to which a procedure appears effective in terms of its stated aims, of three available VR simulators. Methods Thirty subjects were recruited from a single orthopedic residency training program. Each subject completed one training session on each of the three leading VR arthroscopic simulators (ARTHRO mentor-Symbionix, ArthroS-Virtamed, and ArthroSim-Toltech). Each arthroscopic session involved simulator-specific modules. After training sessions, subjects completed a previously validated simulator questionnaire for face validity. Results The median external appearances for the ARTHRO Mentor (9.3, range 6.7-10.0; p=0.0036) and ArthroS (9.3, range 7.3-10.0; p=0.0003) were statistically higher than for Arthro- Sim (6.7, range 3.3-9.7). There was no statistical difference in intraarticular appearance, instrument appearance, or user friendliness between the three groups. Most simulators reached an appropriate level of proportion of sufficient scores for each categor y (≥70%), except for ARTHRO Mentor (intraarticular appearance-50%; instrument appearance- 61.1%) and ArthroSim (external appearance- 50%; user friendliness-68.8%). Conclusion These results demonstrate that ArthroS has the highest overall face validity of the three current arthroscopic VR simulators. However, only external appearance for ArthroS reached statistical significance when compared to the other simulators. Additionally, each simulator had satisfactory intraarticular quality. This study helps further the understanding of VR simulation and necessary features for accurate arthroscopic representation. This data also provides objective data for educators when selecting equipment that will best facilitate residency training. PMID:27528830
Carvalho, Suzana Papile Maciel; Brito, Liz Magalhães; Paiva, Luiz Airton Saavedra de; Bicudo, Lucilene Arilho Ribeiro; Crosato, Edgard Michel; Oliveira, Rogério Nogueira de
2013-01-01
Validation studies of physical anthropology methods in the different population groups are extremely important, especially in cases in which the population variations may cause problems in the identification of a native individual by the application of norms developed for different communities. This study aimed to estimate the gender of skeletons by application of the method of Oliveira, et al. (1995), previously used in a population sample from Northeast Brazil. The accuracy of this method was assessed for a population from Southeast Brazil and validated by statistical tests. The method used two mandibular measurements, namely the bigonial distance and the mandibular ramus height. The sample was composed of 66 skulls and the method was applied by two examiners. The results were statistically analyzed by the paired t test, logistic discriminant analysis and logistic regression. The results demonstrated that the application of the method of Oliveira, et al. (1995) in this population achieved very different outcomes between genders, with 100% for females and only 11% for males, which may be explained by ethnic differences. However, statistical adjustment of measurement data for the population analyzed allowed accuracy of 76.47% for males and 78.13% for females, with the creation of a new discriminant formula. It was concluded that methods involving physical anthropology present high rate of accuracy for human identification, easy application, low cost and simplicity; however, the methodologies must be validated for the different populations due to differences in ethnic patterns, which are directly related to the phenotypic aspects. In this specific case, the method of Oliveira, et al. (1995) presented good accuracy and may be used for gender estimation in Brazil in two geographic regions, namely Northeast and Southeast; however, for other regions of the country (North, Central West and South), previous methodological adjustment is recommended as demonstrated in this study.
Singal, Amit G.; Mukherjee, Ashin; Elmunzer, B. Joseph; Higgins, Peter DR; Lok, Anna S.; Zhu, Ji; Marrero, Jorge A; Waljee, Akbar K
2015-01-01
Background Predictive models for hepatocellular carcinoma (HCC) have been limited by modest accuracy and lack of validation. Machine learning algorithms offer a novel methodology, which may improve HCC risk prognostication among patients with cirrhosis. Our study's aim was to develop and compare predictive models for HCC development among cirrhotic patients, using conventional regression analysis and machine learning algorithms. Methods We enrolled 442 patients with Child A or B cirrhosis at the University of Michigan between January 2004 and September 2006 (UM cohort) and prospectively followed them until HCC development, liver transplantation, death, or study termination. Regression analysis and machine learning algorithms were used to construct predictive models for HCC development, which were tested on an independent validation cohort from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial. Both models were also compared to the previously published HALT-C model. Discrimination was assessed using receiver operating characteristic curve analysis and diagnostic accuracy was assessed with net reclassification improvement and integrated discrimination improvement statistics. Results After a median follow-up of 3.5 years, 41 patients developed HCC. The UM regression model had a c-statistic of 0.61 (95%CI 0.56-0.67), whereas the machine learning algorithm had a c-statistic of 0.64 (95%CI 0.60–0.69) in the validation cohort. The machine learning algorithm had significantly better diagnostic accuracy as assessed by net reclassification improvement (p<0.001) and integrated discrimination improvement (p=0.04). The HALT-C model had a c-statistic of 0.60 (95%CI 0.50-0.70) in the validation cohort and was outperformed by the machine learning algorithm (p=0.047). Conclusion Machine learning algorithms improve the accuracy of risk stratifying patients with cirrhosis and can be used to accurately identify patients at high-risk for developing HCC. PMID:24169273
An Assessment Blueprint for EncStat: A Statistics Anxiety Intervention Program.
ERIC Educational Resources Information Center
Watson, Freda S.; Lang, Thomas R.; Kromrey, Jeffrey D.; Ferron, John M.; Hess, Melinda R.; Hogarty, Kristine Y.
EncStat (Encouraged about Statistics) is a multimedia program being developed to identify and assist students with statistics anxiety or negative attitudes about statistics. This study explored the validity of the assessment instruments included in EncStat with respect to their diagnostic value for statistics anxiety and negative attitudes about…
Villagómez-Ornelas, Paloma; Hernández-López, Pedro; Carrasco-Enríquez, Brenda; Barrios-Sánchez, Karina; Pérez-Escamilla, Rafael; Melgar-Quiñónez, Hugo
2014-01-01
This article validates the statistical consistency of two food security scales: the Mexican Food Security Scale (EMSA) and the Latin American and Caribbean Food Security Scale (ELCSA). Validity tests were conducted in order to verify that both scales were consistent instruments, conformed by independent, properly calibrated and adequately sorted items, arranged in a continuum of severity. The following tests were developed: sorting of items; Cronbach's alpha analysis; parallelism of prevalence curves; Rasch models; sensitivity analysis through mean differences' hypothesis test. The tests showed that both scales meet the required attributes and are robust statistical instruments for food security measurement. This is relevant given that the lack of access to food indicator, included in multidimensional poverty measurement in Mexico, is calculated with EMSA.
Coupland, Carol
2015-01-01
Study question Is it possible to develop and externally validate risk prediction equations to estimate the 10 year risk of blindness and lower limb amputation in patients with diabetes aged 25-84 years? Methods This was a prospective cohort study using routinely collected data from general practices in England contributing to the QResearch and Clinical Practice Research Datalink (CPRD) databases during the study period 1998-2014. The equations were developed using 763 QResearch practices (n=454 575 patients with diabetes) and validated in 254 different QResearch practices (n=142 419) and 357 CPRD practices (n=206 050). Cox proportional hazards models were used to derive separate risk equations for blindness and amputation in men and women that could be evaluated at 10 years. Measures of calibration and discrimination were calculated in the two validation cohorts. Study answer and limitations Risk prediction equations to quantify absolute risk of blindness and amputation in men and women with diabetes have been developed and externally validated. In the QResearch derivation cohort, 4822 new cases of lower limb amputation and 8063 new cases of blindness occurred during follow-up. The risk equations were well calibrated in both validation cohorts. Discrimination was good in men in the external CPRD cohort for amputation (D statistic 1.69, Harrell’s C statistic 0.77) and blindness (D statistic 1.40, Harrell’s C statistic 0.73), with similar results in women and in the QResearch validation cohort. The algorithms are based on variables that patients are likely to know or that are routinely recorded in general practice computer systems. They can be used to identify patients at high risk for prevention or further assessment. Limitations include lack of formally adjudicated outcomes, information bias, and missing data. What this study adds Patients with type 1 or type 2 diabetes are at increased risk of blindness and amputation but generally do not have accurate assessments of the magnitude of their individual risks. The new algorithms calculate the absolute risk of developing these complications over a 10 year period in patients with diabetes, taking account of their individual risk factors. Funding, competing interests, data sharing JH-C is co-director of QResearch, a not for profit organisation which is a joint partnership between the University of Nottingham and Egton Medical Information Systems, and is also a paid director of ClinRisk Ltd. CC is a paid consultant statistician for ClinRisk Ltd. PMID:26560308
Hippisley-Cox, Julia; Coupland, Carol
2015-11-11
Is it possible to develop and externally validate risk prediction equations to estimate the 10 year risk of blindness and lower limb amputation in patients with diabetes aged 25-84 years? This was a prospective cohort study using routinely collected data from general practices in England contributing to the QResearch and Clinical Practice Research Datalink (CPRD) databases during the study period 1998-2014. The equations were developed using 763 QResearch practices (n=454,575 patients with diabetes) and validated in 254 different QResearch practices (n=142,419) and 357 CPRD practices (n=206,050). Cox proportional hazards models were used to derive separate risk equations for blindness and amputation in men and women that could be evaluated at 10 years. Measures of calibration and discrimination were calculated in the two validation cohorts. Risk prediction equations to quantify absolute risk of blindness and amputation in men and women with diabetes have been developed and externally validated. In the QResearch derivation cohort, 4822 new cases of lower limb amputation and 8063 new cases of blindness occurred during follow-up. The risk equations were well calibrated in both validation cohorts. Discrimination was good in men in the external CPRD cohort for amputation (D statistic 1.69, Harrell's C statistic 0.77) and blindness (D statistic 1.40, Harrell's C statistic 0.73), with similar results in women and in the QResearch validation cohort. The algorithms are based on variables that patients are likely to know or that are routinely recorded in general practice computer systems. They can be used to identify patients at high risk for prevention or further assessment. Limitations include lack of formally adjudicated outcomes, information bias, and missing data. Patients with type 1 or type 2 diabetes are at increased risk of blindness and amputation but generally do not have accurate assessments of the magnitude of their individual risks. The new algorithms calculate the absolute risk of developing these complications over a 10 year period in patients with diabetes, taking account of their individual risk factors. JH-C is co-director of QResearch, a not for profit organisation which is a joint partnership between the University of Nottingham and Egton Medical Information Systems, and is also a paid director of ClinRisk Ltd. CC is a paid consultant statistician for ClinRisk Ltd. © Hippisley-Cox et al 2015.
Modeling of the reactant conversion rate in a turbulent shear flow
NASA Technical Reports Server (NTRS)
Frankel, S. H.; Madnia, C. K.; Givi, P.
1992-01-01
Results are presented of direct numerical simulations (DNS) of spatially developing shear flows under the influence of infinitely fast chemical reactions of the type A + B yields Products. The simulation results are used to construct the compositional structure of the scalar field in a statistical manner. The results of this statistical analysis indicate that the use of a Beta density for the probability density function (PDF) of an appropriate Shvab-Zeldovich mixture fraction provides a very good estimate of the limiting bounds of the reactant conversion rate within the shear layer. This provides a strong justification for the implementation of this density in practical modeling of non-homogeneous turbulent reacting flows. However, the validity of the model cannot be generalized for predictions of higher order statistical quantities. A closed form analytical expression is presented for predicting the maximum rate of reactant conversion in non-homogeneous reacting turbulence.
NASA Technical Reports Server (NTRS)
Moin, Parviz; Reynolds, William C.
1988-01-01
Lagrangian techniques have found widespread application to the prediction and understanding of turbulent transport phenomena and have yielded satisfactory results for different cases of shear flow problems. However, it must be kept in mind that in most experiments what is really available are Eulerian statistics, and it is far from obvious how to extract from them the information relevant to the Lagrangian behavior of the flow; in consequence, Lagrangian models still include some hypothesis for which no adequate supporting evidence was until now available. Direct numerical simulation of turbulence offers a new way to obtain Lagrangian statistics and so verify the validity of the current predictive models and the accuracy of their results. After the pioneering work of Riley (Riley and Patterson, 1974) in the 70's, some such results have just appeared in the literature (Lee et al, Yeung and Pope). The present contribution follows in part similar lines, but focuses on two particle statistics and comparison with existing models.
Translation and Validation of the Knee Society Score - KSS for Brazilian Portuguese
Silva, Adriana Lucia Pastore e; Demange, Marco Kawamura; Gobbi, Riccardo Gomes; da Silva, Tânia Fernanda Cardoso; Pécora, José Ricardo; Croci, Alberto Tesconi
2012-01-01
Objective To translate, culturally adapt and validate the "Knee Society Score"(KSS) for the Portuguese language and determine its measurement properties, reproducibility and validity. Methods We analyzed 70 patients of both sexes, aged between 55 and 85 years, in a cross-sectional clinical trial, with diagnosis of primary osteoarthritis ,undergoing total knee arthroplasty surgery. We assessed the patients with the English version of the KSS questionnaire and after 30 minutes with the Portuguese version of the KSS questionnaire, done by a different evaluator. All the patients were assessed preoperatively, and again at three, and six months postoperatively. Results There was no statistical difference, using Cronbach's alpha index and the Bland-Altman graphical analysis, for the knees core during the preoperative period (p =1), and at three months (p =0.991) and six months postoperatively (p =0.985). There was no statistical difference for knee function score for all three periods (p =1.0). Conclusion The Brazilian version of the Knee Society Score is easy to apply, as well providing as a valid and reliable instrument for measuring the knee score and function of Brazilian patients undergoing TKA. Level of Evidence: Level I - Diagnostic Studies- Investigating a Diagnostic Test- Testing of previously developed diagnostic criteria on consecutive patients (with universally applied 'gold' reference standard). PMID:24453576
Sensor data validation and reconstruction. Phase 1: System architecture study
NASA Technical Reports Server (NTRS)
1991-01-01
The sensor validation and data reconstruction task reviewed relevant literature and selected applicable validation and reconstruction techniques for further study; analyzed the selected techniques and emphasized those which could be used for both validation and reconstruction; analyzed Space Shuttle Main Engine (SSME) hot fire test data to determine statistical and physical relationships between various parameters; developed statistical and empirical correlations between parameters to perform validation and reconstruction tasks, using a computer aided engineering (CAE) package; and conceptually designed an expert system based knowledge fusion tool, which allows the user to relate diverse types of information when validating sensor data. The host hardware for the system is intended to be a Sun SPARCstation, but could be any RISC workstation with a UNIX operating system and a windowing/graphics system such as Motif or Dataviews. The information fusion tool is intended to be developed using the NEXPERT Object expert system shell, and the C programming language.
Hourdel, Véronique; Volant, Stevenn; O'Brien, Darragh P; Chenal, Alexandre; Chamot-Rooke, Julia; Dillies, Marie-Agnès; Brier, Sébastien
2016-11-15
With the continued improvement of requisite mass spectrometers and UHPLC systems, Hydrogen/Deuterium eXchange Mass Spectrometry (HDX-MS) workflows are rapidly evolving towards the investigation of more challenging biological systems, including large protein complexes and membrane proteins. The analysis of such extensive systems results in very large HDX-MS datasets for which specific analysis tools are required to speed up data validation and interpretation. We introduce a web application and a new R-package named 'MEMHDX' to help users analyze, validate and visualize large HDX-MS datasets. MEMHDX is composed of two elements. A statistical tool aids in the validation of the results by applying a mixed-effects model for each peptide, in each experimental condition, and at each time point, taking into account the time dependency of the HDX reaction and number of independent replicates. Two adjusted P-values are generated per peptide, one for the 'Change in dynamics' and one for the 'Magnitude of ΔD', and are used to classify the data by means of a 'Logit' representation. A user-friendly interface developed with Shiny by RStudio facilitates the use of the package. This interactive tool allows the user to easily and rapidly validate, visualize and compare the relative deuterium incorporation on the amino acid sequence and 3D structure, providing both spatial and temporal information. MEMHDX is freely available as a web tool at the project home page http://memhdx.c3bi.pasteur.fr CONTACT: marie-agnes.dillies@pasteur.fr or sebastien.brier@pasteur.frSupplementary information: Supplementary data is available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Myers, Tony; Balmer, Nigel
2012-01-01
Numerous factors have been proposed to explain the home advantage in sport. Several authors have suggested that a partisan home crowd enhances home advantage and that this is at least in part a consequence of their influence on officiating. However, while experimental studies examining this phenomenon have high levels of internal validity (since only the "crowd noise" intervention is allowed to vary), they suffer from a lack of external validity, with decision-making in a laboratory setting typically bearing little resemblance to decision-making in live sports settings. Conversely, observational and quasi-experimental studies with high levels of external validity suffer from low levels of internal validity as countless factors besides crowd noise vary. The present study provides a unique opportunity to address these criticisms, by conducting a controlled experiment on the impact of crowd noise on officiating in a live tournament setting. Seventeen qualified judges officiated on thirty Thai boxing bouts in a live international tournament setting featuring "home" and "away" boxers. In each bout, judges were randomized into a "noise" (live sound) or "no crowd noise" (noise-canceling headphones and white noise) condition, resulting in 59 judgments in the "no crowd noise" and 61 in the "crowd noise" condition. The results provide the first experimental evidence of the impact of live crowd noise on officials in sport. A cross-classified statistical model indicated that crowd noise had a statistically significant impact, equating to just over half a point per bout (in the context of five round bouts with the "10-point must" scoring system shared with professional boxing). The practical significance of the findings, their implications for officiating and for the future conduct of crowd noise studies are discussed.
Myers, Tony; Balmer, Nigel
2012-01-01
Numerous factors have been proposed to explain the home advantage in sport. Several authors have suggested that a partisan home crowd enhances home advantage and that this is at least in part a consequence of their influence on officiating. However, while experimental studies examining this phenomenon have high levels of internal validity (since only the “crowd noise” intervention is allowed to vary), they suffer from a lack of external validity, with decision-making in a laboratory setting typically bearing little resemblance to decision-making in live sports settings. Conversely, observational and quasi-experimental studies with high levels of external validity suffer from low levels of internal validity as countless factors besides crowd noise vary. The present study provides a unique opportunity to address these criticisms, by conducting a controlled experiment on the impact of crowd noise on officiating in a live tournament setting. Seventeen qualified judges officiated on thirty Thai boxing bouts in a live international tournament setting featuring “home” and “away” boxers. In each bout, judges were randomized into a “noise” (live sound) or “no crowd noise” (noise-canceling headphones and white noise) condition, resulting in 59 judgments in the “no crowd noise” and 61 in the “crowd noise” condition. The results provide the first experimental evidence of the impact of live crowd noise on officials in sport. A cross-classified statistical model indicated that crowd noise had a statistically significant impact, equating to just over half a point per bout (in the context of five round bouts with the “10-point must” scoring system shared with professional boxing). The practical significance of the findings, their implications for officiating and for the future conduct of crowd noise studies are discussed. PMID:23049520
Shim, Eun-Jung; Ha, Hyeju; Lee, Sun Hee; Kim, Nam Joong; Kim, Eu Suk; Bang, Ji Hwan; Song, Kyoung-Ho; Sohn, Bo Kyung; Park, Hye Youn; Son, Kyung-Lak; Hwang, Heesung; Lee, Kwang-Min; Hahm, Bong-Jin
2018-05-15
Precise assessment of health-related quality of life (HRQOL) with a reliable and valid measure is a prerequisite to the enhancement of HRQOL. This study examined the psychometric properties of the Korean version of the Medical Outcomes Study HIV Health Survey (K-MOS-HIV). The reliability and validity of the K-MOS-HIV were examined in a multicenter survey involving 201 outpatients with human immunodeficiency virus (HIV)/ acquired immunodeficiency syndrome (AIDS) from four teaching hospitals throughout Korea. Ceiling effects were observed in six subscales scores, particularly, for the role functioning (71.1%), social functioning (63.2%), and pain (48.8%) scores. The Cronbach's α for the physical health summary and mental health summary were 0.90 and 0.94, respectively, and it ranged from 0.78 to 0.95 for the subscales. The results of the exploratory structural equation modeling supported the two-factor structure of the K-MOS-HIV (physical health summary and mental health summary). An examination of the mean square statistics values from the Rasch analysis showed that the information-weighted fit and outlier-sensitive fit statistics were within the acceptable ranges of 0.6-1.4 except for two items in the mental health summary. The convergent validity of the K-MOS-HIV was supported by its significant positive correlations with the World Health Organization Quality of Life-HIV-BREF subscale scores. Its known-group validity was proven with its ability to detect significant differences in several K-MOS-HIV subscale scores among participants with different sociodemographic and clinical characteristics. The K-MOS-HIV health survey appears to be a reliable and valid measure of HRQOL.
40 CFR 798.3260 - Chronic toxicity.
Code of Federal Regulations, 2011 CFR
2011-07-01
... dose groups and in the controls should be low to permit a meaningful evaluation of the results. For non... meaningful and valid statistical evaluation of chronic effects. (2) Control groups. (i) A concurrent control group is suggested. This group should be an untreated or sham treated control group or, if a vehicle is...
40 CFR 798.3260 - Chronic toxicity.
Code of Federal Regulations, 2012 CFR
2012-07-01
... dose groups and in the controls should be low to permit a meaningful evaluation of the results. For non... meaningful and valid statistical evaluation of chronic effects. (2) Control groups. (i) A concurrent control group is suggested. This group should be an untreated or sham treated control group or, if a vehicle is...
40 CFR 798.3260 - Chronic toxicity.
Code of Federal Regulations, 2013 CFR
2013-07-01
... dose groups and in the controls should be low to permit a meaningful evaluation of the results. For non... meaningful and valid statistical evaluation of chronic effects. (2) Control groups. (i) A concurrent control group is suggested. This group should be an untreated or sham treated control group or, if a vehicle is...
40 CFR 798.3260 - Chronic toxicity.
Code of Federal Regulations, 2010 CFR
2010-07-01
... dose groups and in the controls should be low to permit a meaningful evaluation of the results. For non... meaningful and valid statistical evaluation of chronic effects. (2) Control groups. (i) A concurrent control group is suggested. This group should be an untreated or sham treated control group or, if a vehicle is...
40 CFR 798.3260 - Chronic toxicity.
Code of Federal Regulations, 2014 CFR
2014-07-01
... dose groups and in the controls should be low to permit a meaningful evaluation of the results. For non... meaningful and valid statistical evaluation of chronic effects. (2) Control groups. (i) A concurrent control group is suggested. This group should be an untreated or sham treated control group or, if a vehicle is...
Reliability and Validity of the Research Methods Skills Assessment
ERIC Educational Resources Information Center
Smith, Tamarah; Smith, Samantha
2018-01-01
The Research Methods Skills Assessment (RMSA) was created to measure psychology majors' statistics knowledge and skills. The American Psychological Association's Guidelines for the Undergraduate Major in Psychology (APA, 2007, 2013) served as a framework for development. Results from a Rasch analysis with data from n = 330 undergraduates showed…
A Comparison of Conjoint Analysis Response Formats
Kevin J. Boyle; Thomas P. Holmes; Mario F. Teisl; Brian Roe
2001-01-01
A split-sample design is used to evaluate the convergent validity of three response formats used in conjoint analysis experiments. WC investigate whether recoding rating data to rankings and choose-one formats, and recoding ranking data to choose one. result in structural models and welfare estimates that are statistically indistinguishable from...
Sensory Integration and Ego Development in a Schizophrenic Adolescent Male.
ERIC Educational Resources Information Center
Pettit, Karen A.
1987-01-01
A retrospective study compared hours spent by a schizophrenic adolescent in "time out" before and after initiation of treatment. The study evaluated the effects of sensory integrative treatment on the ability to handle anger and frustration. Results demonstrate the utility of statistical analysis versus visual comparison to validate effectiveness…
NASA Astrophysics Data System (ADS)
Lufri, L.; Fitri, R.; Yogica, R.
2018-04-01
The purpose of this study is to produce a learning model based on problem solving and meaningful learning standards by expert assessment or validation for the course of Animal Development. This research is a development research that produce the product in the form of learning model, which consist of sub product, namely: the syntax of learning model and student worksheets. All of these products are standardized through expert validation. The research data is the level of validity of all sub products obtained using questionnaire, filled by validators from various field of expertise (field of study, learning strategy, Bahasa). Data were analysed using descriptive statistics. The result of the research shows that the problem solving and meaningful learning model has been produced. Sub products declared appropriate by expert include the syntax of learning model and student worksheet.
Quantitative validation of carbon-fiber laminate low velocity impact simulations
English, Shawn A.; Briggs, Timothy M.; Nelson, Stacy M.
2015-09-26
Simulations of low velocity impact with a flat cylindrical indenter upon a carbon fiber fabric reinforced polymer laminate are rigorously validated. Comparison of the impact energy absorption between the model and experiment is used as the validation metric. Additionally, non-destructive evaluation, including ultrasonic scans and three-dimensional computed tomography, provide qualitative validation of the models. The simulations include delamination, matrix cracks and fiber breaks. An orthotropic damage and failure constitutive model, capable of predicting progressive damage and failure, is developed in conjunction and described. An ensemble of simulations incorporating model parameter uncertainties is used to predict a response distribution which ismore » then compared to experimental output using appropriate statistical methods. Lastly, the model form errors are exposed and corrected for use in an additional blind validation analysis. The result is a quantifiable confidence in material characterization and model physics when simulating low velocity impact in structures of interest.« less
40 CFR 86.1341-90 - Test cycle validation criteria.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 40 Protection of Environment 19 2011-07-01 2011-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...
40 CFR 86.1341-90 - Test cycle validation criteria.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 40 Protection of Environment 20 2013-07-01 2013-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...
Teaching "Instant Experience" with Graphical Model Validation Techniques
ERIC Educational Resources Information Center
Ekstrøm, Claus Thorn
2014-01-01
Graphical model validation techniques for linear normal models are often used to check the assumptions underlying a statistical model. We describe an approach to provide "instant experience" in looking at a graphical model validation plot, so it becomes easier to validate if any of the underlying assumptions are violated.
40 CFR 86.1341-90 - Test cycle validation criteria.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 40 Protection of Environment 20 2012-07-01 2012-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...
The Development of Statistics Textbook Supported with ICT and Portfolio-Based Assessment
NASA Astrophysics Data System (ADS)
Hendikawati, Putriaji; Yuni Arini, Florentina
2016-02-01
This research was development research that aimed to develop and produce a Statistics textbook model that supported with information and communication technology (ICT) and Portfolio-Based Assessment. This book was designed for students of mathematics at the college to improve students’ ability in mathematical connection and communication. There were three stages in this research i.e. define, design, and develop. The textbooks consisted of 10 chapters which each chapter contains introduction, core materials and include examples and exercises. The textbook developed phase begins with the early stages of designed the book (draft 1) which then validated by experts. Revision of draft 1 produced draft 2 which then limited test for readability test book. Furthermore, revision of draft 2 produced textbook draft 3 which simulated on a small sample to produce a valid model textbook. The data were analysed with descriptive statistics. The analysis showed that the Statistics textbook model that supported with ICT and Portfolio-Based Assessment valid and fill up the criteria of practicality.
Creating, generating and comparing random network models with NetworkRandomizer.
Tosadori, Gabriele; Bestvina, Ivan; Spoto, Fausto; Laudanna, Carlo; Scardoni, Giovanni
2016-01-01
Biological networks are becoming a fundamental tool for the investigation of high-throughput data in several fields of biology and biotechnology. With the increasing amount of information, network-based models are gaining more and more interest and new techniques are required in order to mine the information and to validate the results. To fill the validation gap we present an app, for the Cytoscape platform, which aims at creating randomised networks and randomising existing, real networks. Since there is a lack of tools that allow performing such operations, our app aims at enabling researchers to exploit different, well known random network models that could be used as a benchmark for validating real, biological datasets. We also propose a novel methodology for creating random weighted networks, i.e. the multiplication algorithm, starting from real, quantitative data. Finally, the app provides a statistical tool that compares real versus randomly computed attributes, in order to validate the numerical findings. In summary, our app aims at creating a standardised methodology for the validation of the results in the context of the Cytoscape platform.
NASA Astrophysics Data System (ADS)
Steger, Stefan; Brenning, Alexander; Bell, Rainer; Petschko, Helene; Glade, Thomas
2016-06-01
Empirical models are frequently applied to produce landslide susceptibility maps for large areas. Subsequent quantitative validation results are routinely used as the primary criteria to infer the validity and applicability of the final maps or to select one of several models. This study hypothesizes that such direct deductions can be misleading. The main objective was to explore discrepancies between the predictive performance of a landslide susceptibility model and the geomorphic plausibility of subsequent landslide susceptibility maps while a particular emphasis was placed on the influence of incomplete landslide inventories on modelling and validation results. The study was conducted within the Flysch Zone of Lower Austria (1,354 km2) which is known to be highly susceptible to landslides of the slide-type movement. Sixteen susceptibility models were generated by applying two statistical classifiers (logistic regression and generalized additive model) and two machine learning techniques (random forest and support vector machine) separately for two landslide inventories of differing completeness and two predictor sets. The results were validated quantitatively by estimating the area under the receiver operating characteristic curve (AUROC) with single holdout and spatial cross-validation technique. The heuristic evaluation of the geomorphic plausibility of the final results was supported by findings of an exploratory data analysis, an estimation of odds ratios and an evaluation of the spatial structure of the final maps. The results showed that maps generated by different inventories, classifiers and predictors appeared differently while holdout validation revealed similar high predictive performances. Spatial cross-validation proved useful to expose spatially varying inconsistencies of the modelling results while additionally providing evidence for slightly overfitted machine learning-based models. However, the highest predictive performances were obtained for maps that explicitly expressed geomorphically implausible relationships indicating that the predictive performance of a model might be misleading in the case a predictor systematically relates to a spatially consistent bias of the inventory. Furthermore, we observed that random forest-based maps displayed spatial artifacts. The most plausible susceptibility map of the study area showed smooth prediction surfaces while the underlying model revealed a high predictive capability and was generated with an accurate landslide inventory and predictors that did not directly describe a bias. However, none of the presented models was found to be completely unbiased. This study showed that high predictive performances cannot be equated with a high plausibility and applicability of subsequent landslide susceptibility maps. We suggest that greater emphasis should be placed on identifying confounding factors and biases in landslide inventories. A joint discussion between modelers and decision makers of the spatial pattern of the final susceptibility maps in the field might increase their acceptance and applicability.
Critical analysis of adsorption data statistically
NASA Astrophysics Data System (ADS)
Kaushal, Achla; Singh, S. K.
2017-10-01
Experimental data can be presented, computed, and critically analysed in a different way using statistics. A variety of statistical tests are used to make decisions about the significance and validity of the experimental data. In the present study, adsorption was carried out to remove zinc ions from contaminated aqueous solution using mango leaf powder. The experimental data was analysed statistically by hypothesis testing applying t test, paired t test and Chi-square test to (a) test the optimum value of the process pH, (b) verify the success of experiment and (c) study the effect of adsorbent dose in zinc ion removal from aqueous solutions. Comparison of calculated and tabulated values of t and χ 2 showed the results in favour of the data collected from the experiment and this has been shown on probability charts. K value for Langmuir isotherm was 0.8582 and m value for Freundlich adsorption isotherm obtained was 0.725, both are <1, indicating favourable isotherms. Karl Pearson's correlation coefficient values for Langmuir and Freundlich adsorption isotherms were obtained as 0.99 and 0.95 respectively, which show higher degree of correlation between the variables. This validates the data obtained for adsorption of zinc ions from the contaminated aqueous solution with the help of mango leaf powder.
Polygenic scores via penalized regression on summary statistics.
Mak, Timothy Shin Heng; Porsch, Robert Milan; Choi, Shing Wan; Zhou, Xueya; Sham, Pak Chung
2017-09-01
Polygenic scores (PGS) summarize the genetic contribution of a person's genotype to a disease or phenotype. They can be used to group participants into different risk categories for diseases, and are also used as covariates in epidemiological analyses. A number of possible ways of calculating PGS have been proposed, and recently there is much interest in methods that incorporate information available in published summary statistics. As there is no inherent information on linkage disequilibrium (LD) in summary statistics, a pertinent question is how we can use LD information available elsewhere to supplement such analyses. To answer this question, we propose a method for constructing PGS using summary statistics and a reference panel in a penalized regression framework, which we call lassosum. We also propose a general method for choosing the value of the tuning parameter in the absence of validation data. In our simulations, we showed that pseudovalidation often resulted in prediction accuracy that is comparable to using a dataset with validation phenotype and was clearly superior to the conservative option of setting the tuning parameter of lassosum to its lowest value. We also showed that lassosum achieved better prediction accuracy than simple clumping and P-value thresholding in almost all scenarios. It was also substantially faster and more accurate than the recently proposed LDpred. © 2017 WILEY PERIODICALS, INC.
Dimitrov, Borislav D; Motterlini, Nicola; Fahey, Tom
2015-01-01
Objective Estimating calibration performance of clinical prediction rules (CPRs) in systematic reviews of validation studies is not possible when predicted values are neither published nor accessible or sufficient or no individual participant or patient data are available. Our aims were to describe a simplified approach for outcomes prediction and calibration assessment and evaluate its functionality and validity. Study design and methods: Methodological study of systematic reviews of validation studies of CPRs: a) ABCD2 rule for prediction of 7 day stroke; and b) CRB-65 rule for prediction of 30 day mortality. Predicted outcomes in a sample validation study were computed by CPR distribution patterns (“derivation model”). As confirmation, a logistic regression model (with derivation study coefficients) was applied to CPR-based dummy variables in the validation study. Meta-analysis of validation studies provided pooled estimates of “predicted:observed” risk ratios (RRs), 95% confidence intervals (CIs), and indexes of heterogeneity (I2) on forest plots (fixed and random effects models), with and without adjustment of intercepts. The above approach was also applied to the CRB-65 rule. Results Our simplified method, applied to ABCD2 rule in three risk strata (low, 0–3; intermediate, 4–5; high, 6–7 points), indicated that predictions are identical to those computed by univariate, CPR-based logistic regression model. Discrimination was good (c-statistics =0.61–0.82), however, calibration in some studies was low. In such cases with miscalibration, the under-prediction (RRs =0.73–0.91, 95% CIs 0.41–1.48) could be further corrected by intercept adjustment to account for incidence differences. An improvement of both heterogeneities and P-values (Hosmer-Lemeshow goodness-of-fit test) was observed. Better calibration and improved pooled RRs (0.90–1.06), with narrower 95% CIs (0.57–1.41) were achieved. Conclusion Our results have an immediate clinical implication in situations when predicted outcomes in CPR validation studies are lacking or deficient by describing how such predictions can be obtained by everyone using the derivation study alone, without any need for highly specialized knowledge or sophisticated statistics. PMID:25931829
Supply Chain Collaboration: Information Sharing in a Tactical Operating Environment
2013-06-01
architecture, there are four tiers: Client (Web Application Clients ), Presentation (Web-Server), Processing (Application-Server), Data (Database...organization in each period. This data will be collected to analyze. i) Analyses and Validation: We will do a statistics test in this data, Pareto ...notes, outstanding deliveries, and inventory. i) Analyses and Validation: We will do a statistics test in this data, Pareto analyses and confirmation
Clark, Robin A; Shoaib, Mohammed; Hewitt, Katherine N; Stanford, S Clare; Bate, Simon T
2012-08-01
InVivoStat is a free-to-use statistical software package for analysis of data generated from animal experiments. The package is designed specifically for researchers in the behavioural sciences, where exploiting the experimental design is crucial for reliable statistical analyses. This paper compares the analysis of three experiments conducted using InVivoStat with other widely used statistical packages: SPSS (V19), PRISM (V5), UniStat (V5.6) and Statistica (V9). We show that InVivoStat provides results that are similar to those from the other packages and, in some cases, are more advanced. This investigation provides evidence of further validation of InVivoStat and should strengthen users' confidence in this new software package.
Methodology and issues of integral experiments selection for nuclear data validation
NASA Astrophysics Data System (ADS)
Tatiana, Ivanova; Ivanov, Evgeny; Hill, Ian
2017-09-01
Nuclear data validation involves a large suite of Integral Experiments (IEs) for criticality, reactor physics and dosimetry applications. [1] Often benchmarks are taken from international Handbooks. [2, 3] Depending on the application, IEs have different degrees of usefulness in validation, and usually the use of a single benchmark is not advised; indeed, it may lead to erroneous interpretation and results. [1] This work aims at quantifying the importance of benchmarks used in application dependent cross section validation. The approach is based on well-known General Linear Least Squared Method (GLLSM) extended to establish biases and uncertainties for given cross sections (within a given energy interval). The statistical treatment results in a vector of weighting factors for the integral benchmarks. These factors characterize the value added by a benchmark for nuclear data validation for the given application. The methodology is illustrated by one example, selecting benchmarks for 239Pu cross section validation. The studies were performed in the framework of Subgroup 39 (Methods and approaches to provide feedback from nuclear and covariance data adjustment for improvement of nuclear data files) established at the Working Party on International Nuclear Data Evaluation Cooperation (WPEC) of the Nuclear Science Committee under the Nuclear Energy Agency (NEA/OECD).
Research Education in Undergraduate Occupational Therapy Programs.
ERIC Educational Resources Information Center
Petersen, Paul; And Others
1992-01-01
Of 63 undergraduate occupational therapy programs surveyed, the 38 responses revealed some common areas covered: elementary descriptive statistics, validity, reliability, and measurement. Areas underrepresented include statistical analysis with or without computers, research design, and advanced statistics. (SK)
Choudhry, Shahid A.; Li, Jing; Davis, Darcy; Erdmann, Cole; Sikka, Rishi; Sutariya, Bharat
2013-01-01
Introduction: Preventing the occurrence of hospital readmissions is needed to improve quality of care and foster population health across the care continuum. Hospitals are being held accountable for improving transitions of care to avert unnecessary readmissions. Advocate Health Care in Chicago and Cerner (ACC) collaborated to develop all-cause, 30-day hospital readmission risk prediction models to identify patients that need interventional resources. Ideally, prediction models should encompass several qualities: they should have high predictive ability; use reliable and clinically relevant data; use vigorous performance metrics to assess the models; be validated in populations where they are applied; and be scalable in heterogeneous populations. However, a systematic review of prediction models for hospital readmission risk determined that most performed poorly (average C-statistic of 0.66) and efforts to improve their performance are needed for widespread usage. Methods: The ACC team incorporated electronic health record data, utilized a mixed-method approach to evaluate risk factors, and externally validated their prediction models for generalizability. Inclusion and exclusion criteria were applied on the patient cohort and then split for derivation and internal validation. Stepwise logistic regression was performed to develop two predictive models: one for admission and one for discharge. The prediction models were assessed for discrimination ability, calibration, overall performance, and then externally validated. Results: The ACC Admission and Discharge Models demonstrated modest discrimination ability during derivation, internal and external validation post-recalibration (C-statistic of 0.76 and 0.78, respectively), and reasonable model fit during external validation for utility in heterogeneous populations. Conclusions: The ACC Admission and Discharge Models embody the design qualities of ideal prediction models. The ACC plans to continue its partnership to further improve and develop valuable clinical models. PMID:24224068
NASA Astrophysics Data System (ADS)
Basith, Abdul; Prakoso, Yudhono; Kongko, Widjo
2017-07-01
A tsunami model using high resolution geometric data is indispensable in efforts to tsunami mitigation, especially in tsunami prone areas. It is one of the factors that affect the accuracy results of numerical modeling of tsunami. Sadeng Port is a new infrastructure in the Southern Coast of Java which could potentially hit by massive tsunami from seismic gap. This paper discusses validation and error estimation of tsunami model created using high resolution geometric data in Sadeng Port. Tsunami model validation uses the height wave of Tsunami Pangandaran 2006 recorded by Tide Gauge of Sadeng. Tsunami model will be used to accommodate the tsunami numerical modeling involves the parameters of earthquake-tsunami which is derived from the seismic gap. The validation results using t-test (student) shows that the height of the tsunami modeling results and observation in Tide Gauge of Sadeng are considered statistically equal at 95% confidence level and the value of the RMSE and NRMSE are 0.428 m and 22.12%, while the differences of tsunami wave travel time is 12 minutes.
NASA Astrophysics Data System (ADS)
Most, Sebastian; Nowak, Wolfgang; Bijeljic, Branko
2015-04-01
Fickian transport in groundwater flow is the exception rather than the rule. Transport in porous media is frequently simulated via particle methods (i.e. particle tracking random walk (PTRW) or continuous time random walk (CTRW)). These methods formulate transport as a stochastic process of particle position increments. At the pore scale, geometry and micro-heterogeneities prohibit the commonly made assumption of independent and normally distributed increments to represent dispersion. Many recent particle methods seek to loosen this assumption. Hence, it is important to get a better understanding of the processes at pore scale. For our analysis we track the positions of 10.000 particles migrating through the pore space over time. The data we use come from micro CT scans of a homogeneous sandstone and encompass about 10 grain sizes. Based on those images we discretize the pore structure and simulate flow at the pore scale based on the Navier-Stokes equation. This flow field realistically describes flow inside the pore space and we do not need to add artificial dispersion during the transport simulation. Next, we use particle tracking random walk and simulate pore-scale transport. Finally, we use the obtained particle trajectories to do a multivariate statistical analysis of the particle motion at the pore scale. Our analysis is based on copulas. Every multivariate joint distribution is a combination of its univariate marginal distributions. The copula represents the dependence structure of those univariate marginals and is therefore useful to observe correlation and non-Gaussian interactions (i.e. non-Fickian transport). The first goal of this analysis is to better understand the validity regions of commonly made assumptions. We are investigating three different transport distances: 1) The distance where the statistical dependence between particle increments can be modelled as an order-one Markov process. This would be the Markovian distance for the process, where the validity of yet-unexplored non-Gaussian-but-Markovian random walks start. 2) The distance where bivariate statistical dependence simplifies to a multi-Gaussian dependence based on simple linear correlation (validity of correlated PTRW/CTRW). 3) The distance of complete statistical independence (validity of classical PTRW/CTRW). The second objective is to reveal characteristic dependencies influencing transport the most. Those dependencies can be very complex. Copulas are highly capable of representing linear dependence as well as non-linear dependence. With that tool we are able to detect persistent characteristics dominating transport even across different scales. The results derived from our experimental data set suggest that there are many more non-Fickian aspects of pore-scale transport than the univariate statistics of longitudinal displacements. Non-Fickianity can also be found in transverse displacements, and in the relations between increments at different time steps. Also, the found dependence is non-linear (i.e. beyond simple correlation) and persists over long distances. Thus, our results strongly support the further refinement of techniques like correlated PTRW or correlated CTRW towards non-linear statistical relations.
Clark, Matthew T.; Calland, James Forrest; Enfield, Kyle B.; Voss, John D.; Lake, Douglas E.; Moorman, J. Randall
2017-01-01
Background Charted vital signs and laboratory results represent intermittent samples of a patient’s dynamic physiologic state and have been used to calculate early warning scores to identify patients at risk of clinical deterioration. We hypothesized that the addition of cardiorespiratory dynamics measured from continuous electrocardiography (ECG) monitoring to intermittently sampled data improves the predictive validity of models trained to detect clinical deterioration prior to intensive care unit (ICU) transfer or unanticipated death. Methods and findings We analyzed 63 patient-years of ECG data from 8,105 acute care patient admissions at a tertiary care academic medical center. We developed models to predict deterioration resulting in ICU transfer or unanticipated death within the next 24 hours using either vital signs, laboratory results, or cardiorespiratory dynamics from continuous ECG monitoring and also evaluated models using all available data sources. We calculated the predictive validity (C-statistic), the net reclassification improvement, and the probability of achieving the difference in likelihood ratio χ2 for the additional degrees of freedom. The primary outcome occurred 755 times in 586 admissions (7%). We analyzed 395 clinical deteriorations with continuous ECG data in the 24 hours prior to an event. Using only continuous ECG measures resulted in a C-statistic of 0.65, similar to models using only laboratory results and vital signs (0.63 and 0.69 respectively). Addition of continuous ECG measures to models using conventional measurements improved the C-statistic by 0.01 and 0.07; a model integrating all data sources had a C-statistic of 0.73 with categorical net reclassification improvement of 0.09 for a change of 1 decile in risk. The difference in likelihood ratio χ2 between integrated models with and without cardiorespiratory dynamics was 2158 (p value: <0.001). Conclusions Cardiorespiratory dynamics from continuous ECG monitoring detect clinical deterioration in acute care patients and improve performance of conventional models that use only laboratory results and vital signs. PMID:28771487
Moss, Travis J; Clark, Matthew T; Calland, James Forrest; Enfield, Kyle B; Voss, John D; Lake, Douglas E; Moorman, J Randall
2017-01-01
Charted vital signs and laboratory results represent intermittent samples of a patient's dynamic physiologic state and have been used to calculate early warning scores to identify patients at risk of clinical deterioration. We hypothesized that the addition of cardiorespiratory dynamics measured from continuous electrocardiography (ECG) monitoring to intermittently sampled data improves the predictive validity of models trained to detect clinical deterioration prior to intensive care unit (ICU) transfer or unanticipated death. We analyzed 63 patient-years of ECG data from 8,105 acute care patient admissions at a tertiary care academic medical center. We developed models to predict deterioration resulting in ICU transfer or unanticipated death within the next 24 hours using either vital signs, laboratory results, or cardiorespiratory dynamics from continuous ECG monitoring and also evaluated models using all available data sources. We calculated the predictive validity (C-statistic), the net reclassification improvement, and the probability of achieving the difference in likelihood ratio χ2 for the additional degrees of freedom. The primary outcome occurred 755 times in 586 admissions (7%). We analyzed 395 clinical deteriorations with continuous ECG data in the 24 hours prior to an event. Using only continuous ECG measures resulted in a C-statistic of 0.65, similar to models using only laboratory results and vital signs (0.63 and 0.69 respectively). Addition of continuous ECG measures to models using conventional measurements improved the C-statistic by 0.01 and 0.07; a model integrating all data sources had a C-statistic of 0.73 with categorical net reclassification improvement of 0.09 for a change of 1 decile in risk. The difference in likelihood ratio χ2 between integrated models with and without cardiorespiratory dynamics was 2158 (p value: <0.001). Cardiorespiratory dynamics from continuous ECG monitoring detect clinical deterioration in acute care patients and improve performance of conventional models that use only laboratory results and vital signs.
Validation of the H-SAF precipitation product H03 over Greece using rain gauge data
NASA Astrophysics Data System (ADS)
Feidas, H.; Porcu, F.; Puca, S.; Rinollo, A.; Lagouvardos, C.; Kotroni, V.
2018-01-01
This paper presents an extensive validation of the combined infrared/microwave H-SAF (EUMETSAT Satellite Application Facility on Support to Operational Hydrology and Water Management) precipitation product H03, for a 1-year period, using gauge observations from a relatively dense network of 233 stations over Greece. First, the quality of the interpolated data used to validate the precipitation product is assessed and a quality index is constructed based on parameters such as the density of the station network and the orography. Then, a validation analysis is conducted based on comparisons of satellite (H03) with interpolated rain gauge data to produce continuous and multi-categorical statistics at monthly and annual timescales by taking into account the different geophysical characteristics of the terrain (land, coast, sea, elevation). Finally, the impact of the quality of interpolated data on the validation statistics is examined in terms of different configurations of the interpolation model and the rain gauge network characteristics used in the interpolation. The possibility of using a quality index of the interpolated data as a filter in the validation procedure is also investigated. The continuous validation statistics show yearly root mean squared error (RMSE) and mean absolute error (MAE) corresponding to the 225 and 105 % of the mean rain rate, respectively. Mean error (ME) indicates a slight overall tendency for underestimation of the rain gauge rates, which takes large values for the high rain rates. In general, the H03 algorithm cannot retrieve very well the light (< 1 mm/h) and the convective type (>10 mm/h) precipitation. The poor correlation between satellite and gauge data points to algorithm problems in co-locating precipitation patterns. Seasonal comparison shows that retrieval errors are lower for cold months than in the summer months of the year. The multi-categorical statistics indicate that the H03 algorithm is able to discriminate efficiently the rain from the no rain events although a large number of rain events are missed. The most prominent feature is the very high false alarm ratio (FAR) (more than 70 %), the relatively low probability of detection (POD) (less than 40 %), and the overestimation of the rainy pixels. Although the different geophysical features of the terrain (land, coast, sea, elevation) and the quality of the interpolated data have an effect on the validation statistics, this, in general, is not significant and seems to be more distinct in the categorical than in the continuous statistics.
Hanskamp-Sebregts, Mirelle; Zegers, Marieke; Vincent, Charles; van Gurp, Petra J; de Vet, Henrica C W; Wollersheim, Hub
2016-01-01
Objectives Record review is the most used method to quantify patient safety. We systematically reviewed the reliability and validity of adverse event detection with record review. Design A systematic review of the literature. Methods We searched PubMed, EMBASE, CINAHL, PsycINFO and the Cochrane Library and from their inception through February 2015. We included all studies that aimed to describe the reliability and/or validity of record review. Two reviewers conducted data extraction. We pooled κ values (κ) and analysed the differences in subgroups according to number of reviewers, reviewer experience and training level, adjusted for the prevalence of adverse events. Results In 25 studies, the psychometric data of the Global Trigger Tool (GTT) and the Harvard Medical Practice Study (HMPS) were reported and 24 studies were included for statistical pooling. The inter-rater reliability of the GTT and HMPS showed a pooled κ of 0.65 and 0.55, respectively. The inter-rater agreement was statistically significantly higher when the group of reviewers within a study consisted of a maximum five reviewers. We found no studies reporting on the validity of the GTT and HMPS. Conclusions The reliability of record review is moderate to substantial and improved when a small group of reviewers carried out record review. The validity of the record review method has never been evaluated, while clinical data registries, autopsy or direct observations of patient care are potential reference methods that can be used to test concurrent validity. PMID:27550650
Papadopoulou, Soultana L; Exarchakos, Georgios; Christodoulou, Dimitrios; Theodorou, Stavroula; Beris, Alexandre; Ploumis, Avraam
2017-01-01
Introduction The Ohkuma questionnaire is a validated screening tool originally used to detect dysphagia among patients hospitalized in Japanese nursing facilities. Objective The purpose of this study is to evaluate the reliability and validity of the adapted Greek version of the Ohkuma questionnaire. Methods Following the steps for cross-cultural adaptation, we delivered the validated Ohkuma questionnaire to 70 patients (53 men, 17 women) who were either suffering from dysphagia or not. All of them completed the questionnaire a second time within a month. For all of them, we performed a bedside and VFSS study of dysphagia and asked participants to undergo a second VFSS screening, with the exception of nine individuals. Statistical analysis included measurement of internal consistency with Cronbach's α coefficient, reliability with Cohen's Kappa, Pearson's correlation coefficient and construct validity with categorical components, and One-Way Anova test. Results According to Cronbach's α coefficient (0.976) for total score, there was high internal consistency for the Ohkuma Dysphagia questionnaire. Test-retest reliability (Cohen's Kappa) ranged from 0.586 to 1.00, exhibiting acceptable stability. We also estimated the Pearson's correlation coefficient for the test-retest total score, which reached high levels (0.952; p = 0.000). The One-Way Anova test in the two measurement times showed statistically significant correlation in both measurements ( p = 0.02 and p = 0.016). Conclusion The adapted Greek version of the questionnaire is valid and reliable and can be used for the screening of dysphagia in the Greek-speaking patients.
Orsi, Rebecca
2017-02-01
Concept mapping is now a commonly-used technique for articulating and evaluating programmatic outcomes. However, research regarding validity of knowledge and outcomes produced with concept mapping is sparse. The current study describes quantitative validity analyses using a concept mapping dataset. We sought to increase the validity of concept mapping evaluation results by running multiple cluster analysis methods and then using several metrics to choose from among solutions. We present four different clustering methods based on analyses using the R statistical software package: partitioning around medoids (PAM), fuzzy analysis (FANNY), agglomerative nesting (AGNES) and divisive analysis (DIANA). We then used the Dunn and Davies-Bouldin indices to assist in choosing a valid cluster solution for a concept mapping outcomes evaluation. We conclude that the validity of the outcomes map is high, based on the analyses described. Finally, we discuss areas for further concept mapping methods research. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Xu, Kuan-Man
2006-01-01
A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.
Shinzato, Takashi
2016-12-01
The portfolio optimization problem in which the variances of the return rates of assets are not identical is analyzed in this paper using the methodology of statistical mechanical informatics, specifically, replica analysis. We defined two characteristic quantities of an optimal portfolio, namely, minimal investment risk and investment concentration, in order to solve the portfolio optimization problem and analytically determined their asymptotical behaviors using replica analysis. Numerical experiments were also performed, and a comparison between the results of our simulation and those obtained via replica analysis validated our proposed method.
NASA Astrophysics Data System (ADS)
Shinzato, Takashi
2016-12-01
The portfolio optimization problem in which the variances of the return rates of assets are not identical is analyzed in this paper using the methodology of statistical mechanical informatics, specifically, replica analysis. We defined two characteristic quantities of an optimal portfolio, namely, minimal investment risk and investment concentration, in order to solve the portfolio optimization problem and analytically determined their asymptotical behaviors using replica analysis. Numerical experiments were also performed, and a comparison between the results of our simulation and those obtained via replica analysis validated our proposed method.
Saraf, Sanatan; Mathew, Thomas; Roy, Anindya
2015-01-01
For the statistical validation of surrogate endpoints, an alternative formulation is proposed for testing Prentice's fourth criterion, under a bivariate normal model. In such a setup, the criterion involves inference concerning an appropriate regression parameter, and the criterion holds if the regression parameter is zero. Testing such a null hypothesis has been criticized in the literature since it can only be used to reject a poor surrogate, and not to validate a good surrogate. In order to circumvent this, an equivalence hypothesis is formulated for the regression parameter, namely the hypothesis that the parameter is equivalent to zero. Such an equivalence hypothesis is formulated as an alternative hypothesis, so that the surrogate endpoint is statistically validated when the null hypothesis is rejected. Confidence intervals for the regression parameter and tests for the equivalence hypothesis are proposed using bootstrap methods and small sample asymptotics, and their performances are numerically evaluated and recommendations are made. The choice of the equivalence margin is a regulatory issue that needs to be addressed. The proposed equivalence testing formulation is also adopted for other parameters that have been proposed in the literature on surrogate endpoint validation, namely, the relative effect and proportion explained.
Statistical Validation for Clinical Measures: Repeatability and Agreement of Kinect™-Based Software.
Lopez, Natalia; Perez, Elisa; Tello, Emanuel; Rodrigo, Alejandro; Valentinuzzi, Max E
2018-01-01
The rehabilitation process is a fundamental stage for recovery of people's capabilities. However, the evaluation of the process is performed by physiatrists and medical doctors, mostly based on their observations, that is, a subjective appreciation of the patient's evolution. This paper proposes a tracking platform of the movement made by an individual's upper limb using Kinect sensor(s) to be applied for the patient during the rehabilitation process. The main contribution is the development of quantifying software and the statistical validation of its performance, repeatability, and clinical use in the rehabilitation process. The software determines joint angles and upper limb trajectories for the construction of a specific rehabilitation protocol and quantifies the treatment evolution. In turn, the information is presented via a graphical interface that allows the recording, storage, and report of the patient's data. For clinical purposes, the software information is statistically validated with three different methodologies, comparing the measures with a goniometer in terms of agreement and repeatability. The agreement of joint angles measured with the proposed software and goniometer is evaluated with Bland-Altman plots; all measurements fell well within the limits of agreement, meaning interchangeability of both techniques. Additionally, the results of Bland-Altman analysis of repeatability show 95% confidence. Finally, the physiotherapists' qualitative assessment shows encouraging results for the clinical use. The main conclusion is that the software is capable of offering a clinical history of the patient and is useful for quantification of the rehabilitation success. The simplicity, low cost, and visualization possibilities enhance the use of the software Kinect for rehabilitation and other applications, and the expert's opinion endorses the choice of our approach for clinical practice. Comparison of the new measurement technique with established goniometric methods determines that the proposed software agrees sufficiently to be used interchangeably.
Korjus, Kristjan; Hebart, Martin N.; Vicente, Raul
2016-01-01
Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier’s generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term “Cross-validation and cross-testing” improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do. PMID:27564393
Korjus, Kristjan; Hebart, Martin N; Vicente, Raul
2016-01-01
Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier's generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term "Cross-validation and cross-testing" improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do.
Bujold, M; El Sherif, R; Bush, P L; Johnson-Lafleur, J; Doray, G; Pluye, P
2018-02-01
This mixed methods study content validated the Information Assessment Method for parents (IAM-parent) that allows users to systematically rate and comment on online parenting information. Quantitative data and results: 22,407 IAM ratings were collected; of the initial 32 items, descriptive statistics showed that 10 had low relevance. Qualitative data and results: IAM-based comments were collected, and 20 IAM users were interviewed (maximum variation sample); the qualitative data analysis assessed the representativeness of IAM items, and identified items with problematic wording. Researchers, the program director, and Web editors integrated quantitative and qualitative results, which led to a shorter and clearer IAM-parent. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Harizanova, Stanislava N; Mateva, Nonka G; Tarnovska, Tanya Ch
2016-12-01
Burnout syndrome is a phenomenon that seems to be studied globally in relation to all types of populations. The staff in the system of correctional institutions in Bulgaria, however, is oddly left out of this tendency. There is no standardized model in Bulgaria that can be used to detect possible susceptibility to professional burnout. The methods available at present only register the irreversible changes that have already set in the functioning of the individual. V. Boyko's method for burnout assessment allows clinicians to use individual approach to patients and affords easy comparability of results with data from other psychodiagnostic instruments. Adaptation of the assessment instruments to fit the specificities of a study population (linguistic, ethno-cultural, etc.) is obligatory so that the instrument could be correctly used and yield valid results. Validation is one of the most frequently used technique to achieve this. The aim of the present study was to adapt and validate V. Boyko's burnout inventory for diagnosing burnout and assessment of the severity of the burnout syndrome in correctional officers. We conducted a pilot study with 50 officers working in the Plovdiv Regional Correction Facility by test-retest survey performed at an interval of 2 to 4 months. All participants completed the adapted questionnaire translated into Bulgarian voluntarily and anonymously. Statistical analysis was performed using SPSS v.17. We found a mild-to-strong statistically significant correlation (P<0.01) across all subscales between the most frequently used questionnaire for assessing the burnout syndrome, the Maslach Burnout Inventory, and the tool we propose here. The high Cronbach's α coefficient (α=0.94) and Spearman-Brown coefficient (rsb=0.86), and the low mean between-item correlation (r=0.30) demonstrated the instrument's good reliability and validity. With the validation herein presented we offer a highly reliable Bulgarian variant of Boyko's method for burnout assessment and research.
Tipton, John; Hooten, Mevin B.; Goring, Simon
2017-01-01
Scientific records of temperature and precipitation have been kept for several hundred years, but for many areas, only a shorter record exists. To understand climate change, there is a need for rigorous statistical reconstructions of the paleoclimate using proxy data. Paleoclimate proxy data are often sparse, noisy, indirect measurements of the climate process of interest, making each proxy uniquely challenging to model statistically. We reconstruct spatially explicit temperature surfaces from sparse and noisy measurements recorded at historical United States military forts and other observer stations from 1820 to 1894. One common method for reconstructing the paleoclimate from proxy data is principal component regression (PCR). With PCR, one learns a statistical relationship between the paleoclimate proxy data and a set of climate observations that are used as patterns for potential reconstruction scenarios. We explore PCR in a Bayesian hierarchical framework, extending classical PCR in a variety of ways. First, we model the latent principal components probabilistically, accounting for measurement error in the observational data. Next, we extend our method to better accommodate outliers that occur in the proxy data. Finally, we explore alternatives to the truncation of lower-order principal components using different regularization techniques. One fundamental challenge in paleoclimate reconstruction efforts is the lack of out-of-sample data for predictive validation. Cross-validation is of potential value, but is computationally expensive and potentially sensitive to outliers in sparse data scenarios. To overcome the limitations that a lack of out-of-sample records presents, we test our methods using a simulation study, applying proper scoring rules including a computationally efficient approximation to leave-one-out cross-validation using the log score to validate model performance. The result of our analysis is a spatially explicit reconstruction of spatio-temporal temperature from a very sparse historical record.
Measuring the quality of child health care at first-level facilities.
Gouws, Eleanor; Bryce, Jennifer; Pariyo, George; Armstrong Schellenberg, Joanna; Amaral, João; Habicht, Jean-Pierre
2005-08-01
Sound policy and program decisions require timely information based on valid and relevant measures. Recent findings suggest that despite the availability of effective and affordable guidelines for the management of sick children in first-level health facilities in developing countries, the quality and coverage of these services remains low. We report on the development and evaluation of a set of summary indices reflecting the quality of care received by sick children in first-level facilities. The indices were first developed through a consultative process to achieve face validity by involving technical experts and policymakers. The definition of evaluation measures for many public health programs stops at this point. We added a second phase in which standard statistical techniques were used to evaluate the content and construct validity of the indices and their reliability, drawing on data sets from the multi-country evaluation of integrated management of childhood illness (MCE) in Brazil, Tanzania and Uganda. The statistical evaluation identified important conceptual errors in the indices arising from the theory-driven expert review. The experts had combined items into inappropriate indicators resulting in summary indices that were difficult to interpret and had limited validity for program decision making. We propose a revised set of summary indices for the measurement of child health care in developing countries that is supported by both expert and statistical reviews and that led to similar programmatic insights across the three countries. We advocate increased cross-disciplinary research within public health to improve measurement approaches. Child survival policymakers, program planners and implementers can use these tools to improve their monitoring and so increase the health impact of investments in health facility care.
The French version of the Juvenile Arthritis Multidimensional Assessment Report (JAMAR).
Quartier, Pierre; Hofer, Michael; Wouters, Carine; Truong, Thi Thanh Thao; Duong, Ngoc-Phoi; Agbo-Kpati, Kokou-Placide; Uettwiller, Florence; Melki, Isabelle; Mouy, Richard; Bader-Meunier, Brigitte; Consolaro, Alessandro; Bovis, Francesca; Ruperto, Nicolino
2018-04-01
The Juvenile Arthritis Multidimensional Assessment Report (JAMAR) is a new parent/patient reported outcome measure that enables a thorough assessment of the disease status in children with juvenile idiopathic arthritis (JIA). We report the results of the cross-cultural adaptation and validation of the parent and patient versions of the JAMAR in the French language. The reading comprehension of the questionnaire was tested in 10 JIA parents and patients. Each participating centre was asked to collect demographic, clinical data and the JAMAR in 100 consecutive JIA patients or all consecutive patients seen in a 6-month period and to administer the JAMAR to 100 healthy children and their parents. The statistical validation phase explored descriptive statistics and the psychometric issues of the JAMAR: the three Likert assumptions, floor/ceiling effects, internal consistency, Cronbach's alpha, interscale correlations and construct validity (convergent and discriminant validity). A total of 100 JIA patients (23% systemic, 45% oligoarticular, 20% RF negative polyarthritis, 12% other categories) and 122 healthy children, were enrolled at the paediatric rheumatology centre of the Necker Children's Hospital in Paris. Notably, none of the enrolled JIA patients is affected with psoriatic arthritis. The JAMAR components discriminated well healthy subjects from JIA patients. All JAMAR components revealed good psychometric performances. In conclusion, the French version of the JAMAR is a valid tool for the assessment of children with JIA and is suitable for use both in routine clinical practice and clinical research.
The Italian version of the Juvenile Arthritis Multidimensional Assessment Report (JAMAR).
Consolaro, Alessandro; Bovis, Francesca; Pistorio, Angela; Cimaz, Rolando; De Benedetti, Fabrizio; Miniaci, Angela; Corona, Fabrizia; Gerloni, Valeria; Martino, Silvana; Pastore, Serena; Barone, Patrizia; Pieropan, Sara; Cortis, Elisabetta; Podda, Rosa Anna; Gallizzi, Romina; Civino, Adele; Torre, Francesco La; Rigante, Donato; Consolini, Rita; Maggio, Maria Cristina; Magni-Manzoni, Silvia; Perfetti, Francesca; Filocamo, Giovanni; Toppino, Claudia; Licciardi, Francesco; Garrone, Marco; Scala, Silvia; Patrone, Elisa; Tonelli, Monica; Tani, Daniela; Ravelli, Angelo; Martini, Alberto; Ruperto, Nicolino
2018-04-01
The Juvenile Arthritis Multidimensional Assessment Report (JAMAR) is a new parent/patient reported outcome measure that enables a thorough assessment of the disease status in children with juvenile idiopathic arthritis (JIA). We report the results of the cross-cultural adaptation and validation of the parent and patient versions of the JAMAR in the Italian language.The reading comprehension of the questionnaire was tested in 10 JIA parents and patients. Each participating centre was asked to collect demographic, clinical data and the JAMAR in 100 consecutive JIA patients or all consecutive patients seen in a 6-month period and to administer the JAMAR to 100 healthy children and their parents.The statistical validation phase explored descriptive statistics and the psychometric issues of the JAMAR: the 3 Likert assumptions, floor/ceiling effects, internal consistency, Cronbach's alpha, interscale correlations, test-retest reliability, and construct validity (convergent and discriminant validity).A total of 1296 JIA patients (7.2% systemic, 59.5% oligoarticular, 21.4% RF negative polyarthritis, 11.9% other categories) and 100 healthy children, were enrolled in 18 centres. The JAMAR components discriminated well healthy subjects from JIA patients except for the Health Related Quality of Life (HRQoL) Psychosocial Health (PsH) subscales. All JAMAR components revealed good psychometric performances.In conclusion, the Italian version of the JAMAR is a valid tool for the assessment of children with JIA and is suitable for use both in routine clinical practice and clinical research.
The Paraguayan Spanish version of the Juvenile Arthritis Multidimensional Assessment Report (JAMAR).
Morel Ayala, Zoilo; Burgos-Vargas, Ruben; Consolaro, Alessandro; Bovis, Francesca; Ruperto, Nicolino
2018-04-01
The Juvenile Arthritis Multidimensional Assessment Report (JAMAR) is a new parent/patient reported outcome measure that enables a thorough assessment of the disease status in children with juvenile idiopathic arthritis (JIA). We report the results of the cross-cultural adaptation and validation of the parent and patient versions of the JAMAR in the Paraguayan Spanish language. The reading comprehension of the questionnaire was tested in 10 JIA parents and patients. Each participating centre was asked to collect demographic, clinical data and the JAMAR in 100 consecutive JIA patients or all consecutive patients seen in a 6-month period and to administer the JAMAR to 100 healthy children and their parents. The statistical validation phase explored descriptive statistics and the psychometric issues of the JAMAR: the 3 Likert assumptions, floor/ceiling effects, internal consistency, Cronbach's alpha, interscale correlations, and construct validity (convergent and discriminant validity). A total of 51 JIA patients (2% systemic, 27.4% oligoarticular, 37.2% RF negative polyarthritis, 33.4% other categories) and 100 healthy children, were enrolled. The JAMAR components discriminated well healthy subjects from JIA patients. Notably, there was no significant difference between healthy subjects and their affected peers in the school-related problem variable. All JAMAR components revealed good psychometric performances. In conclusion, the Paraguayan Spanish version of the JAMAR is a valid tool for the assessment of children with JIA and is suitable for use both in routine clinical practice and clinical research.
Husbands, Adrian; Mathieson, Alistair; Dowell, Jonathan; Cleland, Jennifer; MacKenzie, Rhoda
2014-04-23
The UK Clinical Aptitude Test (UKCAT) was designed to address issues identified with traditional methods of selection. This study aims to examine the predictive validity of the UKCAT and compare this to traditional selection methods in the senior years of medical school. This was a follow-up study of two cohorts of students from two medical schools who had previously taken part in a study examining the predictive validity of the UKCAT in first year. The sample consisted of 4th and 5th Year students who commenced their studies at the University of Aberdeen or University of Dundee medical schools in 2007. Data collected were: demographics (gender and age group), UKCAT scores; Universities and Colleges Admissions Service (UCAS) form scores; admission interview scores; Year 4 and 5 degree examination scores. Pearson's correlations were used to examine the relationships between admissions variables, examination scores, gender and age group, and to select variables for multiple linear regression analysis to predict examination scores. Ninety-nine and 89 students at Aberdeen medical school from Years 4 and 5 respectively, and 51 Year 4 students in Dundee, were included in the analysis. Neither UCAS form nor interview scores were statistically significant predictors of examination performance. Conversely, the UKCAT yielded statistically significant validity coefficients between .24 and .36 in four of five assessments investigated. Multiple regression analysis showed the UKCAT made a statistically significant unique contribution to variance in examination performance in the senior years. Results suggest the UKCAT appears to predict performance better in the later years of medical school compared to earlier years and provides modest supportive evidence for the UKCAT's role in student selection within these institutions. Further research is needed to assess the predictive validity of the UKCAT against professional and behavioural outcomes as the cohort commences working life.
Statistics of a neuron model driven by asymmetric colored noise.
Müller-Hansen, Finn; Droste, Felix; Lindner, Benjamin
2015-02-01
Irregular firing of neurons can be modeled as a stochastic process. Here we study the perfect integrate-and-fire neuron driven by dichotomous noise, a Markovian process that jumps between two states (i.e., possesses a non-Gaussian statistics) and exhibits nonvanishing temporal correlations (i.e., represents a colored noise). Specifically, we consider asymmetric dichotomous noise with two different transition rates. Using a first-passage-time formulation, we derive exact expressions for the probability density and the serial correlation coefficient of the interspike interval (time interval between two subsequent neural action potentials) and the power spectrum of the spike train. Furthermore, we extend the model by including additional Gaussian white noise, and we give approximations for the interspike interval (ISI) statistics in this case. Numerical simulations are used to validate the exact analytical results for pure dichotomous noise, and to test the approximations of the ISI statistics when Gaussian white noise is included. The results may help to understand how correlations and asymmetry of noise and signals in nerve cells shape neuronal firing statistics.
Validating an Air Traffic Management Concept of Operation Using Statistical Modeling
NASA Technical Reports Server (NTRS)
He, Yuning; Davies, Misty Dawn
2013-01-01
Validating a concept of operation for a complex, safety-critical system (like the National Airspace System) is challenging because of the high dimensionality of the controllable parameters and the infinite number of states of the system. In this paper, we use statistical modeling techniques to explore the behavior of a conflict detection and resolution algorithm designed for the terminal airspace. These techniques predict the robustness of the system simulation to both nominal and off-nominal behaviors within the overall airspace. They also can be used to evaluate the output of the simulation against recorded airspace data. Additionally, the techniques carry with them a mathematical value of the worth of each prediction-a statistical uncertainty for any robustness estimate. Uncertainty Quantification (UQ) is the process of quantitative characterization and ultimately a reduction of uncertainties in complex systems. UQ is important for understanding the influence of uncertainties on the behavior of a system and therefore is valuable for design, analysis, and verification and validation. In this paper, we apply advanced statistical modeling methodologies and techniques on an advanced air traffic management system, namely the Terminal Tactical Separation Assured Flight Environment (T-TSAFE). We show initial results for a parameter analysis and safety boundary (envelope) detection in the high-dimensional parameter space. For our boundary analysis, we developed a new sequential approach based upon the design of computer experiments, allowing us to incorporate knowledge from domain experts into our modeling and to determine the most likely boundary shapes and its parameters. We carried out the analysis on system parameters and describe an initial approach that will allow us to include time-series inputs, such as the radar track data, into the analysis
NASA Astrophysics Data System (ADS)
Sohn, Soo-Jin; Min, Young-Mi; Lee, June-Yi; Tam, Chi-Yung; Kang, In-Sik; Wang, Bin; Ahn, Joong-Bae; Yamagata, Toshio
2012-02-01
The performance of the probabilistic multimodel prediction (PMMP) system of the APEC Climate Center (APCC) in predicting the Asian summer monsoon (ASM) precipitation at a four-month lead (with February initial condition) was compared with that of a statistical model using hindcast data for 1983-2005 and real-time forecasts for 2006-2011. Particular attention was paid to probabilistic precipitation forecasts for the boreal summer after the mature phase of El Niño and Southern Oscillation (ENSO). Taking into account the fact that coupled models' skill for boreal spring and summer precipitation mainly comes from their ability to capture ENSO teleconnection, we developed the statistical model using linear regression with the preceding winter ENSO condition as the predictor. Our results reveal several advantages and disadvantages in both forecast systems. First, the PMMP appears to have higher skills for both above- and below-normal categories in the six-year real-time forecast period, whereas the cross-validated statistical model has higher skills during the 23-year hindcast period. This implies that the cross-validated statistical skill may be overestimated. Second, the PMMP is the better tool for capturing atypical ENSO (or non-canonical ENSO related) teleconnection, which has affected the ASM precipitation during the early 1990s and in the recent decade. Third, the statistical model is more sensitive to the ENSO phase and has an advantage in predicting the ASM precipitation after the mature phase of La Niña.
Atmospheric convective velocities and the Fourier phase spectrum
NASA Technical Reports Server (NTRS)
Cliff, W. C.
1974-01-01
The relationship between convective velocity and the Fourier phase spectrum of the cross correlation is developed. By examining the convective velocity as a function of frequency, one may determine if Taylor's conversion from time statistics to space statistics is valid. It is felt that the high shear regions of the atmospheric boundary layer need to be explored to determine the validity of the use of Taylor's hypothesis for this region.
Ortuño-Sierra, Javier; Aritio-Solana, Rebeca; Inchausti, Félix; Chocarro de Luis, Edurne; Lucas Molina, Beatriz; Pérez de Albéniz, Alicia; Fonseca-Pedrero, Eduardo
2017-01-01
The main purpose of the present study was to assess the depressive symptomatology and to gather new validity evidences of the Reynolds Depression Scale-Short form (RADS-SF) in a representative sample of youths. The sample consisted of 2914 adolescents with a mean age of 15.85 years (SD = 1.68). We calculated the descriptive statistics and internal consistency of the RADS-SF scores. Also, confirmatory factor analyses (CFAs) at the item level and successive multigroup CFAs to test measurement invariance, were conducted. Latent mean differences across gender and educational level groups were estimated, and finally, we studied the sources of validity evidences with other external variables. The level of internal consistency of the RADS-SF Total score by means of Ordinal alpha was .89. Results from CFAs showed that the one-dimensional model displayed appropriate goodness of-fit indices with CFI value over .95, and RMSEA value under .08. In addition, the results support the strong measurement invariance of the RADS-SF scores across gender and age. When latent means were compared, statistically significant differences were found by gender and age. Females scored 0.347 over than males in Depression latent variable, whereas older adolescents scored 0.111 higher than the younger group. In addition, the RADS-SF score was associated with the RADS scores. The results suggest that the RADS-SF could be used as an efficient screening test to assess self-reported depressive symptoms in adolescents from the general population.
FLiGS Score: A New Method of Outcome Assessment for Lip Carcinoma–Treated Patients
Grassi, Rita; Toia, Francesca; Di Rosa, Luigi; Cordova, Adriana
2015-01-01
Background: Lip cancer and its treatment have considerable functional and cosmetic effects with resultant nutritional and physical detriments. As we continue to investigate new treatment regimens, we are simultaneously required to assess postoperative outcomes to design interventions that lessen the adverse impact of this disease process. We wish to introduce Functional Lip Glasgow Scale (FLiGS) score as a new method of outcome assessment to measure the effect of lip cancer and its treatment on patients’ daily functioning. Methods: Fifty patients affected by lip squamous cell carcinoma were recruited between 2009 and 2013. Patients were asked to fill the FLiGS questionnaire before surgery, 1 month, 6 months, and 1 year after surgery. The subscores were used to calculate a total FLiGS score of global oral disability. Statistical analysis was performed to test validity and reliability. Results: FLiGS scores improved significantly from preoperative to 12 months postoperative values (P = 0.000). Statistical evidence of validity was provided through rs (Spearman correlation coefficient) that resulted >0.30 for all surveys and for which P < 0.001. FLiGS score reliability was shown through examination of internal consistency and test-retest reliability. Conclusions: FLiGS score is a simple way of assessing functional impairment related to lip cancer before and after surgery; it is sensitive, valid, reliable, and clinically relevant: it provides useful information to orient the physician in the postoperative management and in the rehabilitation program. PMID:26034652
External validation of a Cox prognostic model: principles and methods
2013-01-01
Background A prognostic model should not enter clinical practice unless it has been demonstrated that it performs a useful role. External validation denotes evaluation of model performance in a sample independent of that used to develop the model. Unlike for logistic regression models, external validation of Cox models is sparsely treated in the literature. Successful validation of a model means achieving satisfactory discrimination and calibration (prediction accuracy) in the validation sample. Validating Cox models is not straightforward because event probabilities are estimated relative to an unspecified baseline function. Methods We describe statistical approaches to external validation of a published Cox model according to the level of published information, specifically (1) the prognostic index only, (2) the prognostic index together with Kaplan-Meier curves for risk groups, and (3) the first two plus the baseline survival curve (the estimated survival function at the mean prognostic index across the sample). The most challenging task, requiring level 3 information, is assessing calibration, for which we suggest a method of approximating the baseline survival function. Results We apply the methods to two comparable datasets in primary breast cancer, treating one as derivation and the other as validation sample. Results are presented for discrimination and calibration. We demonstrate plots of survival probabilities that can assist model evaluation. Conclusions Our validation methods are applicable to a wide range of prognostic studies and provide researchers with a toolkit for external validation of a published Cox model. PMID:23496923
Non-statistical effects in bond fission reactions of 1,2-difluoroethane
NASA Astrophysics Data System (ADS)
Schranz, Harold W.; Raff, Lionel M.; Thompson, Donald L.
1991-08-01
A microcanonical, classical variational transition-state theory based on the use of the efficient microcanonical sampling (EMS) procedure is applied to simple bond fission in 1,2-difluoroethane. Comparison is made with results of trajectory calculations performed on the same global potential-energy surface. Agreement between the statistical theory and trajectory results for CC CF and CH bond fissions is poor with differences as large as a factor of 125. Most importantly, at the lower energy studied, 6.0 eV, the statistical calculations predict considerably slower rates than those computed from trajectories. We conclude from these results that the statistical assumptions inherent in the transition-state theory method are not valid for 1,2-difluoroethane in spite of the fact that the total intramolecular energy transfer rate out of CH and CC normal and local modes is large relative to the bond fission rates. The IVR rate is not globally rapid and the trajectories do not access all of the energetically available phase space uniformly on the timescale of the reactions.
USDA-ARS?s Scientific Manuscript database
A qualitative botanical identification method (BIM) is an analytical procedure which returns a binary result (1 = Identified, 0 = Not Identified). A BIM may be used by a buyer, manufacturer, or regulator to determine whether a botanical material being tested is the same as the target (desired) mate...
Federal Register 2010, 2011, 2012, 2013, 2014
2011-03-28
... experimental design requires this quantity of salmon to ensure statistically valid results. The applicant also... encounters sufficient concentrations of salmon and pollock for meeting the experimental design. Groundfish... of the groundfish harvested is expected to be pollock. The experimental design requires this quantity...
A Psychometric Investigation of the Marlowe-Crowne Social Desirability Scale Using Rasch Measurement
ERIC Educational Resources Information Center
Seol, Hyunsoo
2007-01-01
The author used Rasch measurement to examine the reliability and validity of 382 Korean university students' scores on the Marlowe-Crowne Social Desirability Scale (MCSDS; D. P. Crowne and D. Marlowe, 1960). Results revealed that item-fit statistics and principal component analysis with standardized residuals provide evidence of MCSDS'…
NASA Astrophysics Data System (ADS)
Ballari, D.; Castro, E.; Campozano, L.
2016-06-01
Precipitation monitoring is of utmost importance for water resource management. However, in regions of complex terrain such as Ecuador, the high spatio-temporal precipitation variability and the scarcity of rain gauges, make difficult to obtain accurate estimations of precipitation. Remotely sensed estimated precipitation, such as the Multi-satellite Precipitation Analysis TRMM, can cope with this problem after a validation process, which must be representative in space and time. In this work we validate monthly estimates from TRMM 3B43 satellite precipitation (0.25° x 0.25° resolution), by using ground data from 14 rain gauges in Ecuador. The stations are located in the 3 most differentiated regions of the country: the Pacific coastal plains, the Andean highlands, and the Amazon rainforest. Time series, between 1998 - 2010, of imagery and rain gauges were compared using statistical error metrics such as bias, root mean square error, and Pearson correlation; and with detection indexes such as probability of detection, equitable threat score, false alarm rate and frequency bias index. The results showed that precipitation seasonality is well represented and TRMM 3B43 acceptably estimates the monthly precipitation in the three regions of the country. According to both, statistical error metrics and detection indexes, the coastal and Amazon regions are better estimated quantitatively than the Andean highlands. Additionally, it was found that there are better estimations for light precipitation rates. The present validation of TRMM 3B43 provides important results to support further studies on calibration and bias correction of precipitation in ungagged watershed basins.
Mitchell, Travis D.; Urli, Kristina E.; Breitenbach, Jacques; Yelverton, Chris
2007-01-01
Abstract Objective This study aimed to evaluate the validity of the sacral base pressure test in diagnosing sacroiliac joint dysfunction. It also determined the predictive powers of the test in determining which type of sacroiliac joint dysfunction was present. Methods This was a double-blind experimental study with 62 participants. The results from the sacral base pressure test were compared against a cluster of previously validated tests of sacroiliac joint dysfunction to determine its validity and predictive powers. The external rotation of the feet, occurring during the sacral base pressure test, was measured using a digital inclinometer. Results There was no statistically significant difference in the results of the sacral base pressure test between the types of sacroiliac joint dysfunction. In terms of the results of validity, the sacral base pressure test was useful in identifying positive values of sacroiliac joint dysfunction. It was fairly helpful in correctly diagnosing patients with negative test results; however, it had only a “slight” agreement with the diagnosis for κ interpretation. Conclusions In this study, the sacral base pressure test was not a valid test for determining the presence of sacroiliac joint dysfunction or the type of dysfunction present. Further research comparing the agreement of the sacral base pressure test or other sacroiliac joint dysfunction tests with a criterion standard of diagnosis is necessary. PMID:19674694
3D Texture Analysis in Renal Cell Carcinoma Tissue Image Grading
Cho, Nam-Hoon; Choi, Heung-Kook
2014-01-01
One of the most significant processes in cancer cell and tissue image analysis is the efficient extraction of features for grading purposes. This research applied two types of three-dimensional texture analysis methods to the extraction of feature values from renal cell carcinoma tissue images, and then evaluated the validity of the methods statistically through grade classification. First, we used a confocal laser scanning microscope to obtain image slices of four grades of renal cell carcinoma, which were then reconstructed into 3D volumes. Next, we extracted quantitative values using a 3D gray level cooccurrence matrix (GLCM) and a 3D wavelet based on two types of basis functions. To evaluate their validity, we predefined 6 different statistical classifiers and applied these to the extracted feature sets. In the grade classification results, 3D Haar wavelet texture features combined with principal component analysis showed the best discrimination results. Classification using 3D wavelet texture features was significantly better than 3D GLCM, suggesting that the former has potential for use in a computer-based grading system. PMID:25371701
The Earthquake‐Source Inversion Validation (SIV) Project
Mai, P. Martin; Schorlemmer, Danijel; Page, Morgan T.; Ampuero, Jean-Paul; Asano, Kimiyuki; Causse, Mathieu; Custodio, Susana; Fan, Wenyuan; Festa, Gaetano; Galis, Martin; Gallovic, Frantisek; Imperatori, Walter; Käser, Martin; Malytskyy, Dmytro; Okuwaki, Ryo; Pollitz, Fred; Passone, Luca; Razafindrakoto, Hoby N. T.; Sekiguchi, Haruko; Song, Seok Goo; Somala, Surendra N.; Thingbaijam, Kiran K. S.; Twardzik, Cedric; van Driel, Martin; Vyas, Jagdish C.; Wang, Rongjiang; Yagi, Yuji; Zielke, Olaf
2016-01-01
Finite‐fault earthquake source inversions infer the (time‐dependent) displacement on the rupture surface from geophysical data. The resulting earthquake source models document the complexity of the rupture process. However, multiple source models for the same earthquake, obtained by different research teams, often exhibit remarkable dissimilarities. To address the uncertainties in earthquake‐source inversion methods and to understand strengths and weaknesses of the various approaches used, the Source Inversion Validation (SIV) project conducts a set of forward‐modeling exercises and inversion benchmarks. In this article, we describe the SIV strategy, the initial benchmarks, and current SIV results. Furthermore, we apply statistical tools for quantitative waveform comparison and for investigating source‐model (dis)similarities that enable us to rank the solutions, and to identify particularly promising source inversion approaches. All SIV exercises (with related data and descriptions) and statistical comparison tools are available via an online collaboration platform, and we encourage source modelers to use the SIV benchmarks for developing and testing new methods. We envision that the SIV efforts will lead to new developments for tackling the earthquake‐source imaging problem.
Bayesian statistical ionospheric tomography improved by incorporating ionosonde measurements
NASA Astrophysics Data System (ADS)
Norberg, Johannes; Virtanen, Ilkka I.; Roininen, Lassi; Vierinen, Juha; Orispää, Mikko; Kauristie, Kirsti; Lehtinen, Markku S.
2016-04-01
We validate two-dimensional ionospheric tomography reconstructions against EISCAT incoherent scatter radar measurements. Our tomography method is based on Bayesian statistical inversion with prior distribution given by its mean and covariance. We employ ionosonde measurements for the choice of the prior mean and covariance parameters and use the Gaussian Markov random fields as a sparse matrix approximation for the numerical computations. This results in a computationally efficient tomographic inversion algorithm with clear probabilistic interpretation. We demonstrate how this method works with simultaneous beacon satellite and ionosonde measurements obtained in northern Scandinavia. The performance is compared with results obtained with a zero-mean prior and with the prior mean taken from the International Reference Ionosphere 2007 model. In validating the results, we use EISCAT ultra-high-frequency incoherent scatter radar measurements as the ground truth for the ionization profile shape. We find that in comparison to the alternative prior information sources, ionosonde measurements improve the reconstruction by adding accurate information about the absolute value and the altitude distribution of electron density. With an ionosonde at continuous disposal, the presented method enhances stand-alone near-real-time ionospheric tomography for the given conditions significantly.
Christensen, Karl Bang; Makransky, Guido; Horton, Mike
2016-01-01
The assumption of local independence is central to all item response theory (IRT) models. Violations can lead to inflated estimates of reliability and problems with construct validity. For the most widely used fit statistic Q3, there are currently no well-documented suggestions of the critical values which should be used to indicate local dependence (LD), and for this reason, a variety of arbitrary rules of thumb are used. In this study, an empirical data example and Monte Carlo simulation were used to investigate the different factors that can influence the null distribution of residual correlations, with the objective of proposing guidelines that researchers and practitioners can follow when making decisions about LD during scale development and validation. A parametric bootstrapping procedure should be implemented in each separate situation to obtain the critical value of LD applicable to the data set, and provide example critical values for a number of data structure situations. The results show that for the Q3 fit statistic, no single critical value is appropriate for all situations, as the percentiles in the empirical null distribution are influenced by the number of items, the sample size, and the number of response categories. Furthermore, the results show that LD should be considered relative to the average observed residual correlation, rather than to a uniform value, as this results in more stable percentiles for the null distribution of an adjusted fit statistic. PMID:29881087
Soo, Danielle H E; Pendharkar, Sayali A; Jivanji, Chirag J; Gillies, Nicola A; Windsor, John A; Petrov, Maxim S
2017-10-01
Approximately 40% of patients develop abnormal glucose metabolism after a single episode of acute pancreatitis. This study aimed to develop and validate a prediabetes self-assessment screening score for patients after acute pancreatitis. Data from non-overlapping training (n=82) and validation (n=80) cohorts were analysed. Univariate logistic and linear regression identified variables associated with prediabetes after acute pancreatitis. Multivariate logistic regression developed the score, ranging from 0 to 215. The area under the receiver-operating characteristic curve (AUROC), Hosmer-Lemeshow χ 2 statistic, and calibration plots were used to assess model discrimination and calibration. The developed score was validated using data from the validation cohort. The score had an AUROC of 0.88 (95% CI, 0.80-0.97) and Hosmer-Lemeshow χ 2 statistic of 5.75 (p=0.676). Patients with a score of ≥75 had a 94.1% probability of having prediabetes, and were 29 times more likely to have prediabetes than those with a score of <75. The AUROC in the validation cohort was 0.81 (95% CI, 0.70-0.92) and the Hosmer-Lemeshow χ 2 statistic was 5.50 (p=0.599). Model calibration of the score showed good calibration in both cohorts. The developed and validated score, called PERSEUS, is the first instrument to identify individuals who are at high risk of developing abnormal glucose metabolism following an episode of acute pancreatitis. Copyright © 2017 Editrice Gastroenterologica Italiana S.r.l. Published by Elsevier Ltd. All rights reserved.
Assessing the significance of pedobarographic signals using random field theory.
Pataky, Todd C
2008-08-07
Traditional pedobarographic statistical analyses are conducted over discrete regions. Recent studies have demonstrated that regionalization can corrupt pedobarographic field data through conflation when arbitrary dividing lines inappropriately delineate smooth field processes. An alternative is to register images such that homologous structures optimally overlap and then conduct statistical tests at each pixel to generate statistical parametric maps (SPMs). The significance of SPM processes may be assessed within the framework of random field theory (RFT). RFT is ideally suited to pedobarographic image analysis because its fundamental data unit is a lattice sampling of a smooth and continuous spatial field. To correct for the vast number of multiple comparisons inherent in such data, recent pedobarographic studies have employed a Bonferroni correction to retain a constant family-wise error rate. This approach unfortunately neglects the spatial correlation of neighbouring pixels, so provides an overly conservative (albeit valid) statistical threshold. RFT generally relaxes the threshold depending on field smoothness and on the geometry of the search area, but it also provides a framework for assigning p values to suprathreshold clusters based on their spatial extent. The current paper provides an overview of basic RFT concepts and uses simulated and experimental data to validate both RFT-relevant field smoothness estimations and RFT predictions regarding the topological characteristics of random pedobarographic fields. Finally, previously published experimental data are re-analysed using RFT inference procedures to demonstrate how RFT yields easily understandable statistical results that may be incorporated into routine clinical and laboratory analyses.
Bjerregaard, Peter; Becker, Ulrik
2013-01-01
Questionnaires are widely used to obtain information on health-related behaviour, and they are more often than not the only method that can be used to assess the distribution of behaviour in subgroups of the population. No validation studies of reported consumption of tobacco or alcohol have been published from circumpolar indigenous communities. The purpose of the study is to compare information on the consumption of tobacco and alcohol obtained from 3 population surveys in Greenland with import statistics. Estimates of consumption of cigarettes and alcohol using several different survey instruments in cross-sectional population studies from 1993-1994, 1999-2001 and 2005-2010 were compared with import statistics from the same years. For cigarettes, survey results accounted for virtually the total import. Alcohol consumption was significantly under-reported with reporting completeness ranging from 40% to 51% for different estimates of habitual weekly consumption in the 3 study periods. Including an estimate of binge drinking increased the estimated total consumption to 78% of the import. Compared with import statistics, questionnaire-based population surveys capture the consumption of cigarettes well in Greenland. Consumption of alcohol is under-reported, but asking about binge episodes in addition to the usual intake considerably increased the reported intake in this population and made it more in agreement with import statistics. It is unknown to what extent these findings at the population level can be inferred to population subgroups.
Baqué, Michèle; Amendt, Jens
2013-01-01
Developmental data of juvenile blow flies (Diptera: Calliphoridae) are typically used to calculate the age of immature stages found on or around a corpse and thus to estimate a minimum post-mortem interval (PMI(min)). However, many of those data sets don't take into account that immature blow flies grow in a non-linear fashion. Linear models do not supply a sufficient reliability on age estimates and may even lead to an erroneous determination of the PMI(min). According to the Daubert standard and the need for improvements in forensic science, new statistic tools like smoothing methods and mixed models allow the modelling of non-linear relationships and expand the field of statistical analyses. The present study introduces into the background and application of these statistical techniques by analysing a model which describes the development of the forensically important blow fly Calliphora vicina at different temperatures. The comparison of three statistical methods (linear regression, generalised additive modelling and generalised additive mixed modelling) clearly demonstrates that only the latter provided regression parameters that reflect the data adequately. We focus explicitly on both the exploration of the data--to assure their quality and to show the importance of checking it carefully prior to conducting the statistical tests--and the validation of the resulting models. Hence, we present a common method for evaluating and testing forensic entomological data sets by using for the first time generalised additive mixed models.
Using entropy measures to characterize human locomotion.
Leverick, Graham; Szturm, Tony; Wu, Christine Q
2014-12-01
Entropy measures have been widely used to quantify the complexity of theoretical and experimental dynamical systems. In this paper, the value of using entropy measures to characterize human locomotion is demonstrated based on their construct validity, predictive validity in a simple model of human walking and convergent validity in an experimental study. Results show that four of the five considered entropy measures increase meaningfully with the increased probability of falling in a simple passive bipedal walker model. The same four entropy measures also experienced statistically significant increases in response to increasing age and gait impairment caused by cognitive interference in an experimental study. Of the considered entropy measures, the proposed quantized dynamical entropy (QDE) and quantization-based approximation of sample entropy (QASE) offered the best combination of sensitivity to changes in gait dynamics and computational efficiency. Based on these results, entropy appears to be a viable candidate for assessing the stability of human locomotion.
Validation of Physics Standardized Test Items
NASA Astrophysics Data System (ADS)
Marshall, Jill
2008-10-01
The Texas Physics Assessment Team (TPAT) examined the Texas Assessment of Knowledge and Skills (TAKS) to determine whether it is a valid indicator of physics preparation for future course work and employment, and of the knowledge and skills needed to act as an informed citizen in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam using standard statistical methods employed by test developers (factor analysis and Item Response Theory). Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing..
Endpoints and surrogate endpoints in colorectal cancer: a review of recent developments.
Piedbois, Pascal; Buyse, Marc
2008-07-01
The purpose of this review is to discuss recently published work on endpoints for early and advanced colorectal cancer, as well as the statistical approaches used to validate surrogate endpoints. Most attempts to validate surrogate endpoints have estimated the correlation between the surrogate and the true endpoint, and between the treatment effects on these endpoints. The correlation approach has made it possible to validate disease-free survival and progression-free survival as acceptable surrogates for overall survival in early and advanced disease, respectively. The search for surrogate endpoints will intensify over the coming years. In parallel, efforts to either standardize or extend the endpoints or both will improve the reliability and relevance of clinical trial results.
Feu, Sebastián; Ibáñez, Sergio José; Graça, Amândio; Sampaio, Jaime
2007-11-01
The purpose of this study was to develop a questionnaire to investigate volleyball coaches' orientations toward the coaching process. The study was preceded by four developmental stages in order to improve user understanding, validate the content, and refine the psychometric properties of the instrument. Participants for the reliability and validity study were 334 Spanish volleyball team coaches, 86.5% men and 13.2% women. The following 6 factors emerged from the exploratory factor analysis: team-work orientation, technological orientation, innovative orientation, dialogue orientation, directive orientation, and social climate orientation. Statistical results indicated that the instrument produced reliable and valid scores in all the obtained factors (a> .70), showing that this questionnaire is a useful tool to examine coaches' orientations towards coaching.
A Framework for Assessing High School Students' Statistical Reasoning.
Chan, Shiau Wei; Ismail, Zaleha; Sumintono, Bambang
2016-01-01
Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students' statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students' statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework's cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments.
A Framework for Assessing High School Students' Statistical Reasoning
2016-01-01
Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students’ statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students’ statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework’s cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments. PMID:27812091
Statistical Validation for Clinical Measures: Repeatability and Agreement of Kinect™-Based Software
Tello, Emanuel; Rodrigo, Alejandro; Valentinuzzi, Max E.
2018-01-01
Background The rehabilitation process is a fundamental stage for recovery of people's capabilities. However, the evaluation of the process is performed by physiatrists and medical doctors, mostly based on their observations, that is, a subjective appreciation of the patient's evolution. This paper proposes a tracking platform of the movement made by an individual's upper limb using Kinect sensor(s) to be applied for the patient during the rehabilitation process. The main contribution is the development of quantifying software and the statistical validation of its performance, repeatability, and clinical use in the rehabilitation process. Methods The software determines joint angles and upper limb trajectories for the construction of a specific rehabilitation protocol and quantifies the treatment evolution. In turn, the information is presented via a graphical interface that allows the recording, storage, and report of the patient's data. For clinical purposes, the software information is statistically validated with three different methodologies, comparing the measures with a goniometer in terms of agreement and repeatability. Results The agreement of joint angles measured with the proposed software and goniometer is evaluated with Bland-Altman plots; all measurements fell well within the limits of agreement, meaning interchangeability of both techniques. Additionally, the results of Bland-Altman analysis of repeatability show 95% confidence. Finally, the physiotherapists' qualitative assessment shows encouraging results for the clinical use. Conclusion The main conclusion is that the software is capable of offering a clinical history of the patient and is useful for quantification of the rehabilitation success. The simplicity, low cost, and visualization possibilities enhance the use of the software Kinect for rehabilitation and other applications, and the expert's opinion endorses the choice of our approach for clinical practice. Comparison of the new measurement technique with established goniometric methods determines that the proposed software agrees sufficiently to be used interchangeably. PMID:29750166
Yu, Ping; Pan, Yuesong; Wang, Yongjun; Wang, Xianwei; Liu, Liping; Ji, Ruijun; Meng, Xia; Jing, Jing; Tong, Xu; Guo, Li; Wang, Yilong
2016-01-01
Background and Purpose A case-mix adjustment model has been developed and externally validated, demonstrating promise. However, the model has not been thoroughly tested among populations in China. In our study, we evaluated the performance of the model in Chinese patients with acute stroke. Methods The case-mix adjustment model A includes items on age, presence of atrial fibrillation on admission, National Institutes of Health Stroke Severity Scale (NIHSS) score on admission, and stroke type. Model B is similar to Model A but includes only the consciousness component of the NIHSS score. Both model A and B were evaluated to predict 30-day mortality rates in 13,948 patients with acute stroke from the China National Stroke Registry. The discrimination of the models was quantified by c-statistic. Calibration was assessed using Pearson’s correlation coefficient. Results The c-statistic of model A in our external validation cohort was 0.80 (95% confidence interval, 0.79–0.82), and the c-statistic of model B was 0.82 (95% confidence interval, 0.81–0.84). Excellent calibration was reported in the two models with Pearson’s correlation coefficient (0.892 for model A, p<0.001; 0.927 for model B, p = 0.008). Conclusions The case-mix adjustment model could be used to effectively predict 30-day mortality rates in Chinese patients with acute stroke. PMID:27846282
NASA Astrophysics Data System (ADS)
Eckert, Nicolas; Schläppy, Romain; Jomelli, Vincent; Naaim, Mohamed
2013-04-01
A crucial step for proposing relevant long-term mitigation measures in long term avalanche forecasting is the accurate definition of high return period avalanches. Recently, "statistical-dynamical" approach combining a numerical model with stochastic operators describing the variability of its inputs-outputs have emerged. Their main interests is to take into account the topographic dependency of snow avalanche runout distances, and to constrain the correlation structure between model's variables by physical rules, so as to simulate the different marginal distributions of interest (pressure, flow depth, etc.) with a reasonable realism. Bayesian methods have been shown to be well adapted to achieve model inference, getting rid of identifiability problems thanks to prior information. An important problem which has virtually never been considered before is the validation of the predictions resulting from a statistical-dynamical approach (or from any other engineering method for computing extreme avalanches). In hydrology, independent "fossil" data such as flood deposits in caves are sometimes confronted to design discharges corresponding to high return periods. Hence, the aim of this work is to implement a similar comparison between high return period avalanches obtained with a statistical-dynamical approach and independent validation data resulting from careful dendrogeomorphological reconstructions. To do so, an up-to-date statistical model based on the depth-averaged equations and the classical Voellmy friction law is used on a well-documented case study. First, parameter values resulting from another path are applied, and the dendrological validation sample shows that this approach fails in providing realistic prediction for the case study. This may be due to the strongly bounded behaviour of runouts in this case (the extreme of their distribution is identified as belonging to the Weibull attraction domain). Second, local calibration on the available avalanche chronicle is performed with various prior distributions resulting from expert knowledge and/or other paths. For all calibrations, a very successful convergence is obtained, which confirms the robustness of the used Metropolis-Hastings estimation algorithm. This also demonstrates the interest of the Bayesian framework for aggregating information by sequential assimilation in the frequently encountered case of limited data quantity. Confrontation with the dendrological sample stresses the predominant role of the Coulombian friction coefficient distribution's variance on predicted high magnitude runouts. The optimal fit is obtained for a strong prior reflecting the local bounded behavior, and results in a 10-40 m difference for return periods ranging between 10 and 300 years. Implementing predictive simulations shows that this is largely within the range of magnitude of uncertainties to be taken into account. On the other hand, the different priors tested for the turbulent friction coefficient influence predictive performances only slightly, but have a large influence on predicted velocity and flow depth distributions. This all may be of high interest to refine calibration and predictive use of the statistical-dynamical model for any engineering application.
Validation of Milliflex® Quantum for Bioburden Testing of Pharmaceutical Products.
Gordon, Oliver; Goverde, Marcel; Staerk, Alexandra; Roesti, David
2017-01-01
This article reports the validation strategy used to demonstrate that the Milliflex ® Quantum yielded non-inferior results to the traditional bioburden method. It was validated according to USP <1223>, European Pharmacopoeia 5.1.6, and Parenteral Drug Association Technical Report No. 33 and comprised the validation parameters robustness, ruggedness, repeatability, specificity, limit of detection and quantification, accuracy, precision, linearity, range, and equivalence in routine operation. For the validation, a combination of pharmacopeial ATCC strains as well as a broad selection of in-house isolates were used. In-house isolates were used in stressed state. Results were statistically evaluated regarding the pharmacopeial acceptance criterion of ≥70% recovery compared to the traditional method. Post-hoc test power calculations verified the appropriateness of the used sample size to detect such a difference. Furthermore, equivalence tests verified non-inferiority of the rapid method as compared to the traditional method. In conclusion, the rapid bioburden on basis of the Milliflex ® Quantum was successfully validated as alternative method to the traditional bioburden test. LAY ABSTRACT: Pharmaceutical drug products must fulfill specified quality criteria regarding their microbial content in order to ensure patient safety. Drugs that are delivered into the body via injection, infusion, or implantation must be sterile (i.e., devoid of living microorganisms). Bioburden testing measures the levels of microbes present in the bulk solution of a drug before sterilization, and thus it provides important information for manufacturing a safe product. In general, bioburden testing has to be performed using the methods described in the pharmacopoeias (membrane filtration or plate count). These methods are well established and validated regarding their effectiveness; however, the incubation time required to visually identify microbial colonies is long. Thus, alternative methods that detect microbial contamination faster will improve control over the manufacturing process and speed up product release. Before alternative methods may be used, they must undergo a side-by-side comparison with pharmacopeial methods. In this comparison, referred to as validation, it must be shown in a statistically verified manner that the effectiveness of the alternative method is at least equivalent to that of the pharmacopeial methods. Here we describe the successful validation of an alternative bioburden testing method based on fluorescent staining of growing microorganisms applying the Milliflex ® Quantum system by MilliporeSigma. © PDA, Inc. 2017.
Debray, Thomas P A; Vergouwe, Yvonne; Koffijberg, Hendrik; Nieboer, Daan; Steyerberg, Ewout W; Moons, Karel G M
2015-03-01
It is widely acknowledged that the performance of diagnostic and prognostic prediction models should be assessed in external validation studies with independent data from "different but related" samples as compared with that of the development sample. We developed a framework of methodological steps and statistical methods for analyzing and enhancing the interpretation of results from external validation studies of prediction models. We propose to quantify the degree of relatedness between development and validation samples on a scale ranging from reproducibility to transportability by evaluating their corresponding case-mix differences. We subsequently assess the models' performance in the validation sample and interpret the performance in view of the case-mix differences. Finally, we may adjust the model to the validation setting. We illustrate this three-step framework with a prediction model for diagnosing deep venous thrombosis using three validation samples with varying case mix. While one external validation sample merely assessed the model's reproducibility, two other samples rather assessed model transportability. The performance in all validation samples was adequate, and the model did not require extensive updating to correct for miscalibration or poor fit to the validation settings. The proposed framework enhances the interpretation of findings at external validation of prediction models. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Bastianello, Alvise; Piroli, Lorenzo; Calabrese, Pasquale
2018-05-01
We derive exact analytic expressions for the n -body local correlations in the one-dimensional Bose gas with contact repulsive interactions (Lieb-Liniger model) in the thermodynamic limit. Our results are valid for arbitrary states of the model, including ground and thermal states, stationary states after a quantum quench, and nonequilibrium steady states arising in transport settings. Calculations for these states are explicitly presented and physical consequences are critically discussed. We also show that the n -body local correlations are directly related to the full counting statistics for the particle-number fluctuations in a short interval, for which we provide an explicit analytic result.
NASA Technical Reports Server (NTRS)
Krajewski, Witold F.; Rexroth, David T.; Kiriaki, Kiriakie
1991-01-01
Two problems related to radar rainfall estimation are described. The first part is a description of a preliminary data analysis for the purpose of statistical estimation of rainfall from multiple (radar and raingage) sensors. Raingage, radar, and joint radar-raingage estimation is described, and some results are given. Statistical parameters of rainfall spatial dependence are calculated and discussed in the context of optimal estimation. Quality control of radar data is also described. The second part describes radar scattering by ellipsoidal raindrops. An analytical solution is derived for the Rayleigh scattering regime. Single and volume scattering are presented. Comparison calculations with the known results for spheres and oblate spheroids are shown.
Misconceptions of the p-value among Chilean and Italian Academic Psychologists
Badenes-Ribera, Laura; Frias-Navarro, Dolores; Iotti, Bryan; Bonilla-Campos, Amparo; Longobardi, Claudio
2016-01-01
Common misconceptions of p-values are based on certain beliefs and attributions about the significance of the results. Thus, they affect the professionals' decisions and jeopardize the quality of interventions and the accumulation of valid scientific knowledge. We conducted a survey on 164 academic psychologists (134 Italian, 30 Chilean) questioned on this topic. Our findings are consistent with previous research and suggest that some participants do not know how to correctly interpret p-values. The inverse probability fallacy presents the greatest comprehension problems, followed by the replication fallacy. These results highlight the importance of the statistical re-education of researchers. Recommendations for improving statistical cognition are proposed. PMID:27602007
A SURVEY OF LABORATORY AND STATISTICAL ISSUES RELATED TO FARMWORKER EXPOSURE STUDIES
Developing internally valid, and perhaps generalizable, farmworker exposure studies is a complex process that involves many statistical and laboratory considerations. Statistics are an integral component of each study beginning with the design stage and continuing to the final da...
Statistics Online Computational Resource for Education
ERIC Educational Resources Information Center
Dinov, Ivo D.; Christou, Nicolas
2009-01-01
The Statistics Online Computational Resource (http://www.SOCR.ucla.edu) provides one of the largest collections of free Internet-based resources for probability and statistics education. SOCR develops, validates and disseminates two core types of materials--instructional resources and computational libraries. (Contains 2 figures.)
New robust statistical procedures for the polytomous logistic regression models.
Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro
2018-05-17
This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Statistical validity of using ratio variables in human kinetics research.
Liu, Yuanlong; Schutz, Robert W
2003-09-01
The purposes of this study were to investigate the validity of the simple ratio and three alternative deflation models and examine how the variation of the numerator and denominator variables affects the reliability of a ratio variable. A simple ratio and three alternative deflation models were fitted to four empirical data sets, and common criteria were applied to determine the best model for deflation. Intraclass correlation was used to examine the component effect on the reliability of a ratio variable. The results indicate that the validity, of a deflation model depends on the statistical characteristics of the particular component variables used, and an optimal deflation model for all ratio variables may not exist. Therefore, it is recommended that different models be fitted to each empirical data set to determine the best deflation model. It was found that the reliability of a simple ratio is affected by the coefficients of variation and the within- and between-trial correlations between the numerator and denominator variables. It was recommended that researchers should compute the reliability of the derived ratio scores and not assume that strong reliabilities in the numerator and denominator measures automatically lead to high reliability in the ratio measures.
A diagnostic model for chronic hypersensitivity pneumonitis.
Johannson, Kerri A; Elicker, Brett M; Vittinghoff, Eric; Assayag, Deborah; de Boer, Kaïssa; Golden, Jeffrey A; Jones, Kirk D; King, Talmadge E; Koth, Laura L; Lee, Joyce S; Ley, Brett; Wolters, Paul J; Collard, Harold R
2016-10-01
The objective of this study was to develop a diagnostic model that allows for a highly specific diagnosis of chronic hypersensitivity pneumonitis using clinical and radiological variables alone. Chronic hypersensitivity pneumonitis and other interstitial lung disease cases were retrospectively identified from a longitudinal database. High-resolution CT scans were blindly scored for radiographic features (eg, ground-glass opacity, mosaic perfusion) as well as the radiologist's diagnostic impression. Candidate models were developed then evaluated using clinical and radiographic variables and assessed by the cross-validated C-statistic. Forty-four chronic hypersensitivity pneumonitis and eighty other interstitial lung disease cases were identified. Two models were selected based on their statistical performance, clinical applicability and face validity. Key model variables included age, down feather and/or bird exposure, radiographic presence of ground-glass opacity and mosaic perfusion and moderate or high confidence in the radiographic impression of chronic hypersensitivity pneumonitis. Models were internally validated with good performance, and cut-off values were established that resulted in high specificity for a diagnosis of chronic hypersensitivity pneumonitis. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
The validity of ultrasound estimation of muscle volumes.
Infantolino, Benjamin W; Gales, Daniel J; Winter, Samantha L; Challis, John H
2007-08-01
The purpose of this study was to validate ultrasound muscle volume estimation in vivo. To examine validity, vastus lateralis ultrasound images were collected from cadavers before muscle dissection; after dissection, the volumes were determined by hydrostatic weighing. Seven thighs from cadaver specimens were scanned using a 7.5-MHz ultrasound probe (SSD-1000, Aloka, Japan). The perimeter of the vastus lateralis was identified in the ultrasound images and manually digitized. Volumes were then estimated using the Cavalieri principle, by measuring the image areas of sets of parallel two-dimensional slices through the muscles. The muscles were then dissected from the cadavers, and muscle volume was determined via hydrostatic weighing. There was no statistically significant difference between the ultrasound estimation of muscle volume and that estimated using hydrostatic weighing (p > 0.05). The mean percentage error between the two volume estimates was 0.4% +/- 6.9. Three operators all performed four digitizations of all images from one randomly selected muscle; there was no statistical difference between operators or trials and the intraclass correlation was high (>0.8). The results of this study indicate that ultrasound is an accurate method for estimating muscle volumes in vivo.
Developing a cosmic ray muon sampling capability for muon tomography and monitoring applications
NASA Astrophysics Data System (ADS)
Chatzidakis, S.; Chrysikopoulou, S.; Tsoukalas, L. H.
2015-12-01
In this study, a cosmic ray muon sampling capability using a phenomenological model that captures the main characteristics of the experimentally measured spectrum coupled with a set of statistical algorithms is developed. The "muon generator" produces muons with zenith angles in the range 0-90° and energies in the range 1-100 GeV and is suitable for Monte Carlo simulations with emphasis on muon tomographic and monitoring applications. The muon energy distribution is described by the Smith and Duller (1959) [35] phenomenological model. Statistical algorithms are then employed for generating random samples. The inverse transform provides a means to generate samples from the muon angular distribution, whereas the Acceptance-Rejection and Metropolis-Hastings algorithms are employed to provide the energy component. The predictions for muon energies 1-60 GeV and zenith angles 0-90° are validated with a series of actual spectrum measurements and with estimates from the software library CRY. The results confirm the validity of the phenomenological model and the applicability of the statistical algorithms to generate polyenergetic-polydirectional muons. The response of the algorithms and the impact of critical parameters on computation time and computed results were investigated. Final output from the proposed "muon generator" is a look-up table that contains the sampled muon angles and energies and can be easily integrated into Monte Carlo particle simulation codes such as Geant4 and MCNP.
Open-source platform to benchmark fingerprints for ligand-based virtual screening
2013-01-01
Similarity-search methods using molecular fingerprints are an important tool for ligand-based virtual screening. A huge variety of fingerprints exist and their performance, usually assessed in retrospective benchmarking studies using data sets with known actives and known or assumed inactives, depends largely on the validation data sets used and the similarity measure used. Comparing new methods to existing ones in any systematic way is rather difficult due to the lack of standard data sets and evaluation procedures. Here, we present a standard platform for the benchmarking of 2D fingerprints. The open-source platform contains all source code, structural data for the actives and inactives used (drawn from three publicly available collections of data sets), and lists of randomly selected query molecules to be used for statistically valid comparisons of methods. This allows the exact reproduction and comparison of results for future studies. The results for 12 standard fingerprints together with two simple baseline fingerprints assessed by seven evaluation methods are shown together with the correlations between methods. High correlations were found between the 12 fingerprints and a careful statistical analysis showed that only the two baseline fingerprints were different from the others in a statistically significant way. High correlations were also found between six of the seven evaluation methods, indicating that despite their seeming differences, many of these methods are similar to each other. PMID:23721588
MISR Global Aerosol Product Assessment by Comparison with AERONET
NASA Technical Reports Server (NTRS)
Kahn, Ralph A.; Gaitley, Barbara J.; Garay, Michael J.; Diner, David J.; Eck, Thomas F.; Smirnov, Alexander; Holben, Brent N.
2010-01-01
A statistical approach is used to assess the quality of the MISR Version 22 (V22) aerosol products. Aerosol Optical Depth (AOD) retrieval results are improved relative to the early post- launch values reported by Kahn et al. [2005a], varying with particle type category. Overall, about 70% to 75% of MISR AOD retrievals fall within 0.05 or 20% AOD of the paired validation data, and about 50% to 55% are within 0.03 or 10% AOD, except at sites where dust, or mixed dust and smoke, are commonly found. Retrieved particle microphysical properties amount to categorical values, such as three groupings in size: "small," "medium," and "large." For particle size, ground-based AERONET sun photometer Angstrom Exponents are used to assess statistically the corresponding MISR values, which are interpreted in terms of retrieved size categories. Coincident Single-Scattering Albedo (SSA) and fraction AOD spherical data are too limited for statistical validation. V22 distinguishes two or three size bins, depending on aerosol type, and about two bins in SSA (absorbing vs. non-absorbing), as well as spherical vs. non-spherical particles, under good retrieval conditions. Particle type sensitivity varies considerably with conditions, and is diminished for mid-visible AOD below about 0.15 or 0.2. Based on these results, specific algorithm upgrades are proposed, and are being investigated by the MISR team for possible implementation in future versions of the product.
Statistical Modeling of Natural Backgrounds in Hyperspectral LWIR Data
2016-09-06
extremely important for studying performance trades. First, we study the validity of this model using real hyperspectral data, and compare the relative...difficult to validate any statistical model created for a target of interest. However, since background measurements are plentiful, it is reasonable to...Golden, S., Less, D., Jin, X., and Rynes, P., “ Modeling and analysis of LWIR signature variability associated with 3d and BRDF effects,” 98400P (May 2016
NASA Astrophysics Data System (ADS)
Kim, Hyun-Tae; Romanelli, M.; Yuan, X.; Kaye, S.; Sips, A. C. C.; Frassinetti, L.; Buchanan, J.; Contributors, JET
2017-06-01
This paper presents for the first time a statistical validation of predictive TRANSP simulations of plasma temperature using two transport models, GLF23 and TGLF, over a database of 80 baseline H-mode discharges in JET-ILW. While the accuracy of the predicted T e with TRANSP-GLF23 is affected by plasma collisionality, the dependency of predictions on collisionality is less significant when using TRANSP-TGLF, indicating that the latter model has a broader applicability across plasma regimes. TRANSP-TGLF also shows a good matching of predicted T i with experimental measurements allowing for a more accurate prediction of the neutron yields. The impact of input data and assumptions prescribed in the simulations are also investigated in this paper. The statistical validation and the assessment of uncertainty level in predictive TRANSP simulations for JET-ILW-DD will constitute the basis for the extrapolation to JET-ILW-DT experiments.
Churcher, Frances P; Mills, Jeremy F; Forth, Adelle E
2016-08-01
Over the past few decades many structured risk appraisal measures have been created to respond to this need. The Two-Tiered Violence Risk Estimates Scale (TTV) is a measure designed to integrate both an actuarial estimate of violence risk with critical risk management indicators. The current study examined interrater reliability and the predictive validity of the TTV in a sample of violent offenders (n = 120) over an average follow-up period of 17.75 years. The TTV was retrospectively scored and compared with the Violence Risk Appraisal Guide (VRAG), the Statistical Information of Recidivism Scale-Revised (SIR-R1), and the Psychopathy Checklist-Revised (PCL-R). Approximately 53% of the sample reoffended violently, with an overall recidivism rate of 74%. Although the VRAG was the strongest predictor of violent recidivism in the sample, the Actuarial Risk Estimates (ARE) scale of the TTV produced a small, significant effect. The Risk Management Indicators (RMI) produced nonsignificant area under the curve (AUC) values for all recidivism outcomes. Comparisons between measures using AUC values and Cox regression showed that there were no statistical differences in predictive validity. The results of this research will be used to inform the validation and reliability literature on the TTV, and will contribute to the overall risk assessment literature. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
VALUE - A Framework to Validate Downscaling Approaches for Climate Change Studies
NASA Astrophysics Data System (ADS)
Maraun, Douglas; Widmann, Martin; Gutiérrez, José M.; Kotlarski, Sven; Chandler, Richard E.; Hertig, Elke; Wibig, Joanna; Huth, Radan; Wilke, Renate A. I.
2015-04-01
VALUE is an open European network to validate and compare downscaling methods for climate change research. VALUE aims to foster collaboration and knowledge exchange between climatologists, impact modellers, statisticians, and stakeholders to establish an interdisciplinary downscaling community. A key deliverable of VALUE is the development of a systematic validation framework to enable the assessment and comparison of both dynamical and statistical downscaling methods. Here, we present the key ingredients of this framework. VALUE's main approach to validation is user-focused: starting from a specific user problem, a validation tree guides the selection of relevant validation indices and performance measures. Several experiments have been designed to isolate specific points in the downscaling procedure where problems may occur: what is the isolated downscaling skill? How do statistical and dynamical methods compare? How do methods perform at different spatial scales? Do methods fail in representing regional climate change? How is the overall representation of regional climate, including errors inherited from global climate models? The framework will be the basis for a comprehensive community-open downscaling intercomparison study, but is intended also to provide general guidance for other validation studies.
VALUE: A framework to validate downscaling approaches for climate change studies
NASA Astrophysics Data System (ADS)
Maraun, Douglas; Widmann, Martin; Gutiérrez, José M.; Kotlarski, Sven; Chandler, Richard E.; Hertig, Elke; Wibig, Joanna; Huth, Radan; Wilcke, Renate A. I.
2015-01-01
VALUE is an open European network to validate and compare downscaling methods for climate change research. VALUE aims to foster collaboration and knowledge exchange between climatologists, impact modellers, statisticians, and stakeholders to establish an interdisciplinary downscaling community. A key deliverable of VALUE is the development of a systematic validation framework to enable the assessment and comparison of both dynamical and statistical downscaling methods. In this paper, we present the key ingredients of this framework. VALUE's main approach to validation is user- focused: starting from a specific user problem, a validation tree guides the selection of relevant validation indices and performance measures. Several experiments have been designed to isolate specific points in the downscaling procedure where problems may occur: what is the isolated downscaling skill? How do statistical and dynamical methods compare? How do methods perform at different spatial scales? Do methods fail in representing regional climate change? How is the overall representation of regional climate, including errors inherited from global climate models? The framework will be the basis for a comprehensive community-open downscaling intercomparison study, but is intended also to provide general guidance for other validation studies.
Nakae, Ken; Ikegaya, Yuji; Ishikawa, Tomoe; Oba, Shigeyuki; Urakubo, Hidetoshi; Koyama, Masanori; Ishii, Shin
2014-01-01
Crosstalk between neurons and glia may constitute a significant part of information processing in the brain. We present a novel method of statistically identifying interactions in a neuron–glia network. We attempted to identify neuron–glia interactions from neuronal and glial activities via maximum-a-posteriori (MAP)-based parameter estimation by developing a generalized linear model (GLM) of a neuron–glia network. The interactions in our interest included functional connectivity and response functions. We evaluated the cross-validated likelihood of GLMs that resulted from the addition or removal of connections to confirm the existence of specific neuron-to-glia or glia-to-neuron connections. We only accepted addition or removal when the modification improved the cross-validated likelihood. We applied the method to a high-throughput, multicellular in vitro Ca2+ imaging dataset obtained from the CA3 region of a rat hippocampus, and then evaluated the reliability of connectivity estimates using a statistical test based on a surrogate method. Our findings based on the estimated connectivity were in good agreement with currently available physiological knowledge, suggesting our method can elucidate undiscovered functions of neuron–glia systems. PMID:25393874
Approach for Input Uncertainty Propagation and Robust Design in CFD Using Sensitivity Derivatives
NASA Technical Reports Server (NTRS)
Putko, Michele M.; Taylor, Arthur C., III; Newman, Perry A.; Green, Lawrence L.
2002-01-01
An implementation of the approximate statistical moment method for uncertainty propagation and robust optimization for quasi 3-D Euler CFD code is presented. Given uncertainties in statistically independent, random, normally distributed input variables, first- and second-order statistical moment procedures are performed to approximate the uncertainty in the CFD output. Efficient calculation of both first- and second-order sensitivity derivatives is required. In order to assess the validity of the approximations, these moments are compared with statistical moments generated through Monte Carlo simulations. The uncertainties in the CFD input variables are also incorporated into a robust optimization procedure. For this optimization, statistical moments involving first-order sensitivity derivatives appear in the objective function and system constraints. Second-order sensitivity derivatives are used in a gradient-based search to successfully execute a robust optimization. The approximate methods used throughout the analyses are found to be valid when considering robustness about input parameter mean values.
Spectroflurimetric estimation of the new antiviral agent ledipasvir in presence of sofosbuvir
NASA Astrophysics Data System (ADS)
Salama, Fathy M.; Attia, Khalid A.; Abouserie, Ahmed A.; El-Olemy, Ahmed; Abolmagd, Ebrahim
2018-02-01
A spectroflurimetric method has been developed and validated for the selective quantitative determination of ledipasvir in presence of sofosbuvir. In this method the native fluorescence of ledipasvir in ethanol at 405 nm was measured after excitation at 340 nm. The proposed method was validated according to ICH guidelines and show high sensitivity, accuracy and precision. Furthermore this method was successfully applied to the analysis of ledipasvir in pharmaceutical dosage form without interference from sofosbuvir and other additives and the results were statistically compared to a reported method and found no significant difference.
An optimization model to agroindustrial sector in antioquia (Colombia, South America)
NASA Astrophysics Data System (ADS)
Fernandez, J.
2015-06-01
This paper develops a proposal of a general optimization model for the flower industry, which is defined by using discrete simulation and nonlinear optimization, whose mathematical models have been solved by using ProModel simulation tools and Gams optimization. It defines the operations that constitute the production and marketing of the sector, statistically validated data taken directly from each operation through field work, the discrete simulation model of the operations and the linear optimization model of the entire industry chain are raised. The model is solved with the tools described above and presents the results validated in a case study.
Kim, Won Kuel; Seo, Kyung Mook; Kang, Si Hyun
2014-01-01
Objective To determine the reliability and validity of hand-held dynamometer (HHD) depending on its fixation in measuring isometric knee extensor strength by comparing the results with an isokinetic dynamometer. Methods Twenty-seven healthy female volunteers participated in this study. The subjects were tested in seated and supine position using three measurement methods: isometric knee extension by isokinetic dynamometer, non-fixed HHD, and fixed HHD. During the measurement, the knee joints of subjects were fixed at a 35° angle from the extended position. The fixed HHD measurement was conducted with the HHD fixed to distal tibia with a Velcro strap; non-fixed HHD was performed with a hand-held method without Velcro fixation. All the measurements were repeated three times and among them, the maximum values of peak torque were used for the analysis. Results The data from the fixed HHD method showed higher validity than the non-fixed method compared with the results of the isokinetic dynamometer. Pearson correlation coefficients (r) between fixed HHD and isokinetic dynamometer method were statistically significant (supine-right: r=0.806, p<0.05; seating-right: r=0.473, p<0.05; supine-left: r=0.524, p<0.05), whereas Pearson correlation coefficients between non-fixed dynamometer and isokinetic dynamometer methods were not statistically significant, except for the result of the supine position of the left leg (r=0.384, p<0.05). Both fixed and non-fixed HHD methods showed excellent inter-rater reliability. However, the fixed HHD method showed a higher reliability than the non-fixed HHD method by considering the intraclass correlation coefficient (fixed HHD, 0.952-0.984; non-fixed HHD, 0.940-0.963). Conclusion Fixation of HHD during measurement in the supine position increases the reliability and validity in measuring the quadriceps strength. PMID:24639931
A statistical method (cross-validation) for bone loss region detection after spaceflight
Zhao, Qian; Li, Wenjun; Li, Caixia; Chu, Philip W.; Kornak, John; Lang, Thomas F.
2010-01-01
Astronauts experience bone loss after the long spaceflight missions. Identifying specific regions that undergo the greatest losses (e.g. the proximal femur) could reveal information about the processes of bone loss in disuse and disease. Methods for detecting such regions, however, remains an open problem. This paper focuses on statistical methods to detect such regions. We perform statistical parametric mapping to get t-maps of changes in images, and propose a new cross-validation method to select an optimum suprathreshold for forming clusters of pixels. Once these candidate clusters are formed, we use permutation testing of longitudinal labels to derive significant changes. PMID:20632144
Argilés, Josep M.; Betancourt, Angelica; Guàrdia-Olmos, Joan; Peró-Cebollero, Maribel; López-Soriano, Francisco J.; Madeddu, Clelia; Serpe, Roberto; Busquets, Sílvia
2017-01-01
The CAchexia SCOre (CASCO) was described as a tool for the staging of cachectic cancer patients. The aim of this study is to show the metric properties of CASCO in order to classify cachectic cancer patients into three different groups, which are associated with a numerical scoring. The final aim was to clinically validate CASCO for its use in the classification of cachectic cancer patients in clinical practice. We carried out a case -control study that enrolled prospectively 186 cancer patients and 95 age-matched controls. The score includes five components: (1) body weight loss and composition, (2) inflammation/metabolic disturbances/immunosuppression, (3) physical performance, (4) anorexia, and (5) quality of life. The present study provides clinical validation for the use of the score. In order to show the metric properties of CASCO, three different groups of cachectic cancer patients were established according to the results obtained with the statistical approach used: mild cachexia (15 ≤ × ≤ 28), moderate cachexia (29 ≤ × ≤ 46), and severe cachexia (47 ≤ × ≤ 100). In addition, a simplified version of CASCO, MiniCASCO (MCASCO), was also presented and it contributes as a valid and easy-to-use tool for cachexia staging. Significant statistically correlations were found between CASCO and other validated indexes such as Eastern Cooperative Oncology Group (ECOG) and the subjective diagnosis of cachexia by specialized oncologists. A very significant estimated correlation between CASCO and MCASCO was found that suggests that MCASCO might constitute an easy and valid tool for the staging of the cachectic cancer patients. CASCO and MCASCO provide a new tool for the quantitative staging of cachectic cancer patients with a clear advantage over previous classifications. PMID:28261113
Argilés, Josep M; Betancourt, Angelica; Guàrdia-Olmos, Joan; Peró-Cebollero, Maribel; López-Soriano, Francisco J; Madeddu, Clelia; Serpe, Roberto; Busquets, Sílvia
2017-01-01
The CAchexia SCOre (CASCO) was described as a tool for the staging of cachectic cancer patients. The aim of this study is to show the metric properties of CASCO in order to classify cachectic cancer patients into three different groups, which are associated with a numerical scoring. The final aim was to clinically validate CASCO for its use in the classification of cachectic cancer patients in clinical practice. We carried out a case -control study that enrolled prospectively 186 cancer patients and 95 age-matched controls. The score includes five components: (1) body weight loss and composition, (2) inflammation/metabolic disturbances/immunosuppression, (3) physical performance, (4) anorexia, and (5) quality of life. The present study provides clinical validation for the use of the score. In order to show the metric properties of CASCO, three different groups of cachectic cancer patients were established according to the results obtained with the statistical approach used: mild cachexia (15 ≤ × ≤ 28), moderate cachexia (29 ≤ × ≤ 46), and severe cachexia (47 ≤ × ≤ 100). In addition, a simplified version of CASCO, MiniCASCO (MCASCO), was also presented and it contributes as a valid and easy-to-use tool for cachexia staging. Significant statistically correlations were found between CASCO and other validated indexes such as Eastern Cooperative Oncology Group (ECOG) and the subjective diagnosis of cachexia by specialized oncologists. A very significant estimated correlation between CASCO and MCASCO was found that suggests that MCASCO might constitute an easy and valid tool for the staging of the cachectic cancer patients. CASCO and MCASCO provide a new tool for the quantitative staging of cachectic cancer patients with a clear advantage over previous classifications.
NASA Technical Reports Server (NTRS)
Wong, K. W.
1974-01-01
In lunar phototriangulation, there is a complete lack of accurate ground control points. The accuracy analysis of the results of lunar phototriangulation must, therefore, be completely dependent on statistical procedure. It was the objective of this investigation to examine the validity of the commonly used statistical procedures, and to develop both mathematical techniques and computer softwares for evaluating (1) the accuracy of lunar phototriangulation; (2) the contribution of the different types of photo support data on the accuracy of lunar phototriangulation; (3) accuracy of absolute orientation as a function of the accuracy and distribution of both the ground and model points; and (4) the relative slope accuracy between any triangulated pass points.
Empirical performance of interpolation techniques in risk-neutral density (RND) estimation
NASA Astrophysics Data System (ADS)
Bahaludin, H.; Abdullah, M. H.
2017-03-01
The objective of this study is to evaluate the empirical performance of interpolation techniques in risk-neutral density (RND) estimation. Firstly, the empirical performance is evaluated by using statistical analysis based on the implied mean and the implied variance of RND. Secondly, the interpolation performance is measured based on pricing error. We propose using the leave-one-out cross-validation (LOOCV) pricing error for interpolation selection purposes. The statistical analyses indicate that there are statistical differences between the interpolation techniques:second-order polynomial, fourth-order polynomial and smoothing spline. The results of LOOCV pricing error shows that interpolation by using fourth-order polynomial provides the best fitting to option prices in which it has the lowest value error.
The Statistics Teaching Inventory: A Survey on Statistics Teachers' Classroom Practices and Beliefs
ERIC Educational Resources Information Center
Zieffler, Andrew; Park, Jiyoon; Garfield, Joan; delMas, Robert; Bjornsdottir, Audbjorg
2012-01-01
This paper reports on an instrument designed to assess the practices and beliefs of instructors of introductory statistics courses across the disciplines. Funded by a grant from the National Science Foundation, this project developed, piloted, and gathered validity evidence for the Statistics Teaching Inventory (STI). The instrument consists of 50…
An entropy-based statistic for genomewide association studies.
Zhao, Jinying; Boerwinkle, Eric; Xiong, Momiao
2005-07-01
Efficient genotyping methods and the availability of a large collection of single-nucleotide polymorphisms provide valuable tools for genetic studies of human disease. The standard chi2 statistic for case-control studies, which uses a linear function of allele frequencies, has limited power when the number of marker loci is large. We introduce a novel test statistic for genetic association studies that uses Shannon entropy and a nonlinear function of allele frequencies to amplify the differences in allele and haplotype frequencies to maintain statistical power with large numbers of marker loci. We investigate the relationship between the entropy-based test statistic and the standard chi2 statistic and show that, in most cases, the power of the entropy-based statistic is greater than that of the standard chi2 statistic. The distribution of the entropy-based statistic and the type I error rates are validated using simulation studies. Finally, we apply the new entropy-based test statistic to two real data sets, one for the COMT gene and schizophrenia and one for the MMP-2 gene and esophageal carcinoma, to evaluate the performance of the new method for genetic association studies. The results show that the entropy-based statistic obtained smaller P values than did the standard chi2 statistic.
Vieira, Gisele de Lacerda Chaves; Pagano, Adriana Silvino; Reis, Ilka Afonso; Rodrigues, Júlia Santos Nunes; Torres, Heloísa de Carvalho
2018-01-01
ABSTRACT Objective: to perform the translation, adaptation and validation of the Diabetes Attitudes Scale - third version instrument into Brazilian Portuguese. Methods: methodological study carried out in six stages: initial translation, synthesis of the initial translation, back-translation, evaluation of the translated version by the Committee of Judges (27 Linguists and 29 health professionals), pre-test and validation. The pre-test and validation (test-retest) steps included 22 and 120 health professionals, respectively. The Content Validity Index, the analyses of internal consistency and reproducibility were performed using the R statistical program. Results: in the content validation, the instrument presented good acceptance among the Judges with a mean Content Validity Index of 0.94. The scale presented acceptable internal consistency (Cronbach’s alpha = 0.60), while the correlation of the total score at the test and retest moments was considered high (Polychoric Correlation Coefficient = 0.86). The Intra-class Correlation Coefficient, for the total score, presented a value of 0.65. Conclusion: the Brazilian version of the instrument (Escala de Atitudes dos Profissionais em relação ao Diabetes Mellitus) was considered valid and reliable for application by health professionals in Brazil. PMID:29319739
Validation Of The Airspace Concept Evaluation System Using Real World Data
NASA Technical Reports Server (NTRS)
Zelinski, Shannon
2005-01-01
This paper discusses the process of performing a validation of the Airspace Concept Evaluation System (ACES) using real world historical flight operational data. ACES inputs are generated from select real world data and processed to create a realistic reproduction of a single day of operations within the National Airspace System (NAS). ACES outputs are then compared to real world operational metrics and delay statistics for the reproduced day. Preliminary results indicate that ACES produces delays and airport operational metrics similar to the real world with minor variations of delay by phase of flight. ACES is a nation-wide fast-time simulation tool developed at NASA Ames Research Center. ACES models and simulates the NAS using interacting agents representing center control, terminal flow management, airports, individual flights, and other NAS elements. These agents pass messages between one another similar to real world communications. This distributed agent based system is designed to emulate the highly unpredictable nature of the NAS, making it a suitable tool to evaluate current and envisioned airspace concepts. To ensure that ACES produces the most realistic results, the system must be validated. There is no way to validate future concepts scenarios using real world historical data, but current day scenario validations increase confidence in the validity of future scenario results. Each operational day has unique weather and traffic demand schedules. The more a simulation utilizes the unique characteristic of a specific day, the more realistic the results should be. ACES is able to simulate the full scale demand traffic necessary to perform a validation using real world data. Through direct comparison with the real world, models may continuee to be improved and unusual trends and biases may be filtered out of the system or used to normalize the results of future concept simulations.
Khan, Mohammad Jakir Hossain; Hussain, Mohd Azlan; Mujtaba, Iqbal Mohammed
2014-01-01
Propylene is one type of plastic that is widely used in our everyday life. This study focuses on the identification and justification of the optimum process parameters for polypropylene production in a novel pilot plant based fluidized bed reactor. This first-of-its-kind statistical modeling with experimental validation for the process parameters of polypropylene production was conducted by applying ANNOVA (Analysis of variance) method to Response Surface Methodology (RSM). Three important process variables i.e., reaction temperature, system pressure and hydrogen percentage were considered as the important input factors for the polypropylene production in the analysis performed. In order to examine the effect of process parameters and their interactions, the ANOVA method was utilized among a range of other statistical diagnostic tools such as the correlation between actual and predicted values, the residuals and predicted response, outlier t plot, 3D response surface and contour analysis plots. The statistical analysis showed that the proposed quadratic model had a good fit with the experimental results. At optimum conditions with temperature of 75°C, system pressure of 25 bar and hydrogen percentage of 2%, the highest polypropylene production obtained is 5.82% per pass. Hence it is concluded that the developed experimental design and proposed model can be successfully employed with over a 95% confidence level for optimum polypropylene production in a fluidized bed catalytic reactor (FBCR). PMID:28788576
Reviewing Reliability and Validity of Information for University Educational Evaluation
NASA Astrophysics Data System (ADS)
Otsuka, Yusaku
To better utilize evaluations in higher education, it is necessary to share the methods of reviewing reliability and validity of examination scores and grades, and to accumulate and share data for confirming results. Before the GPA system is first introduced into a university or college, the reliability of examination scores and grades, especially for essay examinations, must be assured. Validity is a complicated concept, so should be assured in various ways, including using professional audits, theoretical models, and statistical data analysis. Because individual students and teachers are continually improving, using evaluations to appraise their progress is not always compatible with using evaluations in appraising the implementation of accountability in various departments or the university overall. To better utilize evaluations and improve higher education, evaluations should be integrated into the current system by sharing the vision of an academic learning community and promoting interaction between students and teachers based on sufficiently reliable and validated evaluation tools.
Perception that "everything requires a lot of effort": transcultural SCL-25 item validation.
Moreau, Nicolas; Hassan, Ghayda; Rousseau, Cécile; Chenguiti, Khalid
2009-09-01
This brief report illustrates how the migration context can affect specific item validity of mental health measures. The SCL-25 was administered to 432 recently settled immigrants (220 Haitian and 212 Arabs). We performed descriptive analyses, as well as Infit and Outfit statistics analyses using WINSTEPS Rasch Measurement Software based on Item Response Theory. The participants' comments about the item You feel everything requires a lot of effort in the SCL-25 were also qualitatively analyzed. Results revealed that the item You feel everything requires a lot of effort is an outlier and does not adjust in an expected and valid fashion with its cluster items, as it is over-endorsed by Haitian and Arab healthy participants. Our study thus shows that, in transcultural mental health research, the cultural and migratory contexts may interact and significantly influence the meaning of some symptom items and consequently, the validity of symptom scales.
DOE Office of Scientific and Technical Information (OSTI.GOV)
English, Shawn A.; Briggs, Timothy M.; Nelson, Stacy M.
Simulations of low velocity impact with a flat cylindrical indenter upon a carbon fiber fabric reinforced polymer laminate are rigorously validated. Comparison of the impact energy absorption between the model and experiment is used as the validation metric. Additionally, non-destructive evaluation, including ultrasonic scans and three-dimensional computed tomography, provide qualitative validation of the models. The simulations include delamination, matrix cracks and fiber breaks. An orthotropic damage and failure constitutive model, capable of predicting progressive damage and failure, is developed in conjunction and described. An ensemble of simulations incorporating model parameter uncertainties is used to predict a response distribution which ismore » then compared to experimental output using appropriate statistical methods. Lastly, the model form errors are exposed and corrected for use in an additional blind validation analysis. The result is a quantifiable confidence in material characterization and model physics when simulating low velocity impact in structures of interest.« less
Singh, Aparna; Singh, Girish; Patwardhan, Kishor; Gehlot, Sangeeta
2017-01-01
According to Ayurveda, the traditional system of healthcare of Indian origin, Agni is the factor responsible for digestion and metabolism. Four functional states (Agnibala) of Agni have been recognized: regular, irregular, intense, and weak. The objective of the present study was to develop and validate a self-assessment tool to estimate Agnibala The developed tool was evaluated for its reliability and validity by administering it to 300 healthy volunteers of either gender belonging to 18 to 40-year age group. Besides confirming the statistical validity and reliability, the practical utility of the newly developed tool was also evaluated by recording serum lipid parameters of all the volunteers. The results show that the lipid parameters vary significantly according to the status of Agni The tool, therefore, may be used to screen normal population to look for possible susceptibility to certain health conditions. © The Author(s) 2016.
McLean, Rachael M; Farmer, Victoria L; Nettleton, Alice; Cameron, Claire M; Cook, Nancy R; Campbell, Norman R C
2017-12-01
Food frequency questionnaires (FFQs) are often used to assess dietary sodium intake, although 24-hour urinary excretion is the most accurate measure of intake. The authors conducted a systematic review to investigate whether FFQs are a reliable and valid way of measuring usual dietary sodium intake. Results from 18 studies are described in this review, including 16 validation studies. The methods of study design and analysis varied widely with respect to FFQ instrument, number of 24-hour urine collections collected per participant, methods used to assess completeness of urine collections, and statistical analysis. Overall, there was poor agreement between estimates from FFQ and 24-hour urine. The authors suggest a framework for validation and reporting based on a consensus statement (2004), and recommend that all FFQs used to estimate dietary sodium intake undergo validation against multiple 24-hour urine collections. ©2017 Wiley Periodicals, Inc.
Does bad inference drive out good?
Marozzi, Marco
2015-07-01
The (mis)use of statistics in practice is widely debated, and a field where the debate is particularly active is medicine. Many scholars emphasize that a large proportion of published medical research contains statistical errors. It has been noted that top class journals like Nature Medicine and The New England Journal of Medicine publish a considerable proportion of papers that contain statistical errors and poorly document the application of statistical methods. This paper joins the debate on the (mis)use of statistics in the medical literature. Even though the validation process of a statistical result may be quite elusive, a careful assessment of underlying assumptions is central in medicine as well as in other fields where a statistical method is applied. Unfortunately, a careful assessment of underlying assumptions is missing in many papers, including those published in top class journals. In this paper, it is shown that nonparametric methods are good alternatives to parametric methods when the assumptions for the latter ones are not satisfied. A key point to solve the problem of the misuse of statistics in the medical literature is that all journals have their own statisticians to review the statistical method/analysis section in each submitted paper. © 2015 Wiley Publishing Asia Pty Ltd.
ERIC Educational Resources Information Center
Larbi-Apau, Josephine A.; Guerra-Lopez, Ingrid; Moseley, James L.; Spannaus, Timothy; Yaprak, Attila
2017-01-01
The study examined teaching faculty's educational technology-related performances (ETRP) as a measure for predicting eLearning management in Ghana. A total of valid data (n = 164) were collected and analyzed on applied ISTE-NETS-T Performance Standards using descriptive and ANOVA statistics. Results showed an overall moderate performance with the…
Sugar maple height-diameter and age-diameter relationships in an uneven-aged northern hardwood stand
Laura S. Kenefic; R.D. Nyland
1999-01-01
Sugar maple (Acer saccharum Marsh.) height-diameter and age-diameter relationships are explored in a balanced uneven-aged northern hardwood stand in central New York. Results show that although both height and age vary considerably with diameter, these relationships can be described by statistically valid equations. The age-diameter relationship...
ERIC Educational Resources Information Center
Hohlfeld, Tina N.; Ritzhaupt, Albert D.; Barron, Ann E.
2013-01-01
This paper examines gender differences related to Information and Communication Technology (ICT) literacy using two valid and internally consistent measures with eighth grade students (N = 1,513) from Florida public schools. The results of t test statistical analyses, which examined only gender differences in demonstrated and perceived ICT skills,…
ERIC Educational Resources Information Center
Marson, Stephen M.; DeAngelis, Donna; Mittal, Nisha
2010-01-01
Objectives: The purpose of this article is to create transparency for the psychometric methods employed for the development of the Association of Social Work Boards' (ASWB) exams. Results: The article includes an assessment of the macro (political) and micro (statistical) environments of testing social work competence. The seven-step process used…
Code of Federal Regulations, 2011 CFR
2011-07-01
... profiles shall be obtained from a statistically valid number of TLEVs, LEVs, and ULEVs. (3) The speciated... methanol or liquefied petroleum gas, the quotient calculated above shall be multiplied by 1.1. The resulting value shall constitute the “reactivity adjustment factor” for the methanol or liquefied petroleum...
Code of Federal Regulations, 2010 CFR
2010-07-01
... profiles shall be obtained from a statistically valid number of TLEVs, LEVs, and ULEVs. (3) The speciated... methanol or liquefied petroleum gas, the quotient calculated above shall be multiplied by 1.1. The resulting value shall constitute the “reactivity adjustment factor” for the methanol or liquefied petroleum...
On the dispute between Boltzmann and Gibbs entropy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buonsante, Pierfrancesco; Franzosi, Roberto, E-mail: roberto.franzosi@ino.it; Smerzi, Augusto
2016-12-15
The validity of the concept of negative temperature has been recently challenged by arguing that the Boltzmann entropy (that allows negative temperatures) is inconsistent from a mathematical and statistical point of view, whereas the Gibbs entropy (that does not admit negative temperatures) provides the correct definition for the microcanonical entropy. Here we prove that the Boltzmann entropy is thermodynamically and mathematically consistent. Analytical results on two systems supporting negative temperatures illustrate the scenario we propose. In addition we numerically study a lattice system to show that negative temperature equilibrium states are accessible and obey standard statistical mechanics prediction.
NASA Astrophysics Data System (ADS)
Adib, Artur B.
In the last two decades or so, a collection of results in nonequilibrium statistical mechanics that departs from the traditional near-equilibrium framework introduced by Lars Onsager in 1931 has been derived, yielding new fundamental insights into far-from-equilibrium processes in general. Apart from offering a more quantitative statement of the second law of thermodynamics, some of these results---typified by the so-called "Jarzynski equality"---have also offered novel means of estimating equilibrium quantities from nonequilibrium processes, such as free energy differences from single-molecule "pulling" experiments. This thesis contributes to such efforts by offering three novel results in nonequilibrium statistical mechanics: (a) The entropic analog of the Jarzynski equality; (b) A methodology for estimating free energies from "clamp-and-release" nonequilibrium processes; and (c) A directly measurable symmetry relation in chemical kinetics similar to (but more general than) chemical detailed balance. These results share in common the feature of remaining valid outside Onsager's near-equilibrium regime, and bear direct applicability in protein folding kinetics as well as in single-molecule free energy estimation.
2013-08-01
in Sequential Design Optimization with Concurrent Calibration-Based Model Validation Dorin Drignei 1 Mathematics and Statistics Department...Validation 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Dorin Drignei; Zissimos Mourelatos; Vijitashwa Pandey
14 CFR Sec. 19-7 - Passenger origin-destination survey.
Code of Federal Regulations, 2011 CFR
2011-01-01
... Transportation Statistics' Director of Airline Information. (c) A statistically valid sample of light coupons... LAX Salt Lake City NorthwestOperating Carrier NorthwestTicketed Carrier Fare Code Phoenix American...
Shmulewitz, D.; Wall, M.M.; Aharonovich, E.; Spivak, B.; Weizman, A.; Frisch, A.; Grant, B. F.; Hasin, D.
2013-01-01
Background The fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) proposes aligning nicotine use disorder (NUD) criteria with those for other substances, by including the current DSM fourth edition (DSM-IV) nicotine dependence (ND) criteria, three abuse criteria (neglect roles, hazardous use, interpersonal problems) and craving. Although NUD criteria indicate one latent trait, evidence is lacking on: (1) validity of each criterion; (2) validity of the criteria as a set; (3) comparative validity between DSM-5 NUD and DSM-IV ND criterion sets; and (4) NUD prevalence. Method Nicotine criteria (DSM-IV ND, abuse and craving) and external validators (e.g. smoking soon after awakening, number of cigarettes per day) were assessed with a structured interview in 734 lifetime smokers from an Israeli household sample. Regression analysis evaluated the association between validators and each criterion. Receiver operating characteristic analysis assessed the association of the validators with the DSM-5 NUD set (number of criteria endorsed) and tested whether DSM-5 or DSM-IV provided the most discriminating criterion set. Changes in prevalence were examined. Results Each DSM-5 NUD criterion was significantly associated with the validators, with strength of associations similar across the criteria. As a set, DSM-5 criteria were significantly associated with the validators, were significantly more discriminating than DSM-IV ND criteria, and led to increased prevalence of binary NUD (two or more criteria) over ND. Conclusions All findings address previous concerns about the DSM-IV nicotine diagnosis and its criteria and support the proposed changes for DSM-5 NUD, which should result in improved diagnosis of nicotine disorders. PMID:23312475
ClinicalTrials.gov and Drugs@FDA: A comparison of results reporting for new drug approval trials
Schwartz, Lisa M.; Woloshin, Steven; Zheng, Eugene; Tse, Tony; Zarin, Deborah A.
2016-01-01
Background Pharmaceutical companies and other trial sponsors must submit certain trial results to ClinicalTrials.gov. The validity of these results is unclear. Purpose To validate results posted on ClinicalTrials.gov against publicly-available FDA reviews on Drugs@FDA. Data sources ClinicalTrials.gov (registry and results database) and Drugs@FDA (medical/statistical reviews). Study selection 100 parallel-group, randomized trials for new drug approvals (1/2013 – 7/2014) with results posted on ClinicalTrials.gov (3/15/2015). Data extraction Two assessors systematically extracted, and another verified, trial design, primary/secondary outcomes, adverse events, and deaths. Results The 100 trials were mostly phase 3 (90%) double-blind (92%), placebo-controlled (73%), representing 32 drugs from 24 companies. Of 137 primary outcomes from ClinicalTrials.gov, 134 (98%) had corresponding data in Drugs@FDA, 130 (95%) had concordant definitions, and 107 (78%) had concordant results; most differences were nominal (i.e. relative difference < 10%). Of 100 trials, primary outcome results in 14 could not be validated . Of 1,927 secondary outcomes from ClinicalTrials.gov, 1,061 (55%) definitions could be validated and 367 (19%) had results. Of 96 trials with ≥ 1 serious adverse event in either source, 14 could be compared and 7 were discordant. Of 62 trials with ≥ 1 death in either source, 25 could be compared and 17 were discordant. Limitations Unknown generalizability to uncontrolled or crossover trial results. Conclusion Primary outcome definitions and results were largely concordant between ClinicalTrials.gov and Drugs@FDA. Half of secondary outcomes could not be validated because Drugs@FDA only includes “key outcomes” for regulatory decision-making; nor could serious adverse events and deaths because Drugs@FDA frequently only includes results aggregated across multiple trials. PMID:27294570
Dantas, Raquel Batista; Oliveira, Graziella Lage; Silveira, Andréa Maria
2017-01-01
ABSTRACT OBJECTIVE Adapt and evaluate the psychometric properties of the Vulnerability to Abuse Screening Scale to identify risk of domestic violence against older adults in Brazil. METHODS The instrument was adapted and validated in a sample of 151 older adults from a geriatric reference center in the municipality of Belo Horizonte, State of Minas Gerais, in 2014. We collected sociodemographic, clinical, and abuse-related information, and verified reliability by reproducibility in a sample of 55 older people, who underwent re-testing of the instrument seven days after the first application. Descriptive and comparative analyses were performed for all variables, with a significance level of 5%. The construct validity was analyzed by the principal components method with a tetrachoric correlation matrix, the reliability of the scale by the weighted Kappa (Kp) statistic, and the internal consistency by the Kuder-Richardson estimator formula 20 (KR-20). RESULTS The average age of the participants was 72.1 years (DP = 6.96; 95%CI 70.94–73.17), with a maximum of 92 years, and they were predominantly female (76.2%; 95%CI 69.82–83.03). When analyzing the relationship between the scores of the Vulnerability to Abuse Screening Scale, categorized by presence (score > 3) or absence (score < 3) of vulnerability to abuse, with clinical and health conditions, we found statistically significant differences for self-perception of health (p = 0.002), depressive symptoms (p = 0.000), and presence of rheumatism (p = 0.003). There were no statistically significant differences between sexes. The Vulnerability to Abuse Screening Scale acceptably evaluated validity in the transcultural adaptation process, demonstrating dimensionality coherent with the original proposal (four factors). In the internal consistency analysis, the instrument presented good results (KR-20 = 0.69) and the reliability via reproducibility was considered excellent for the global scale (Kp = 0.92). CONCLUSIONS The Vulnerability to Abuse Screening Scale proved to be a valid instrument with good psychometric capacity for screening domestic abuse against older adults in Brazil. PMID:28423137
Olsen, L R; Jensen, D V; Noerholm, V; Martiny, K; Bech, P
2003-02-01
We have developed the Major Depression Inventory (MDI), consisting of 10 items, covering the DSM-IV as well as the ICD-10 symptoms of depressive illness. We aimed to evaluate this as a scale measuring severity of depressive states with reference to both internal and external validity. Patients representing the score range from no depression to marked depression on the Hamilton Depression Scale (HAM-D) completed the MDI. Both classical and modern psychometric methods were applied for the evaluation of validity, including the Rasch analysis. In total, 91 patients were included. The results showed that the MDI had an adequate internal validity in being a unidimensional scale (the total score an appropriate or sufficient statistic). The external validity of the MDI was also confirmed as the total score of the MDI correlated significantly with the HAM-D (Pearson's coefficient 0.86, P < or = 0.01, Spearman 0.80, P < or = 0.01). When used in a sample of patients with different states of depression the MDI has an adequate internal and external validity.
Oliveira, Lanuza Borges; Soares, Fernanda Amaral; Silveira, Marise Fagundes; de Pinho, Lucinéia; Caldeira, Antônio Prates; Leite, Maísa Tavares de Souza
2016-01-01
ABSTRACT Objective: to develop and validate an instrument to evaluate the knowledge of health professionals about domestic violence on children. Method: this was a study conducted with 194 physicians, nurses and dentists. A literature review was performed for preparation of the items and identification of the dimensions. Apparent and content validation was performed using analysis of three experts and 27 professors of the pediatric health discipline. For construct validation, Cronbach's alpha was used, and the Kappa test was applied to verify reproducibility. The criterion validation was conducted using the Student's t-test. Results: the final instrument included 56 items; the Cronbach alpha was 0.734, the Kappa test showed a correlation greater than 0.6 for most items, and the Student t-test showed a statistically significant value to the level of 5% for the two selected variables: years of education and using the Family Health Strategy. Conclusion: the instrument is valid and can be used as a promising tool to develop or direct actions in public health and evaluate knowledge about domestic violence on children. PMID:27556878
Roumelioti, Maria; Leotsinidis, Michalis
2009-01-01
Background The use of food frequency questionnaires (FFQs) has become increasingly important in epidemiologic studies. During the past few decades, a wide variety of nutritional studies have used the semiquantitative FFQ as a tool for assessing and evaluating dietary intake. One of the main concerns in a dietary analysis is the validity of the collected dietary data. Methods This paper discusses several methodological and statistical issues related to the validation of a semiquantitative FFQ. This questionnaire was used to assess the nutritional habits of schoolchildren in western Greece. For validation purposes, we selected 200 schoolchildren and contacted their respective parents. We evaluated the relative validity of 400 FFQs (200 children's FFQs and 200 parents' FFQs). Results The correlations between the children's and the parents' questionnaire responses showed that the questionnaire we designed was appropriate for fulfilling the purposes of our study and in ranking subjects according to food group intake. Conclusion Our study shows that the semiquantitative FFQ provides a reasonably reliable measure of dietary intake and corroborates the relative validity of our questionnaire. PMID:19196469
Carle, C; Alexander, P; Columb, M; Johal, J
2013-04-01
We designed and internally validated an aggregate weighted early warning scoring system specific to the obstetric population that has the potential for use in the ward environment. Direct obstetric admissions from the Intensive Care National Audit and Research Centre's Case Mix Programme Database were randomly allocated to model development (n = 2240) or validation (n = 2200) sets. Physiological variables collected during the first 24 h of critical care admission were analysed. Logistic regression analysis for mortality in the model development set was initially used to create a statistically based early warning score. The statistical score was then modified to create a clinically acceptable early warning score. Important features of this clinical obstetric early warning score are that the variables are weighted according to their statistical importance, a surrogate for the FI O2 /Pa O2 relationship is included, conscious level is assessed using a simplified alert/not alert variable, and the score, trigger thresholds and response are consistent with the new non-obstetric National Early Warning Score system. The statistical and clinical early warning scores were internally validated using the validation set. The area under the receiver operating characteristic curve was 0.995 (95% CI 0.992-0.998) for the statistical score and 0.957 (95% CI 0.923-0.991) for the clinical score. Pre-existing empirically designed early warning scores were also validated in the same way for comparison. The area under the receiver operating characteristic curve was 0.955 (95% CI 0.922-0.988) for Swanton et al.'s Modified Early Obstetric Warning System, 0.937 (95% CI 0.884-0.991) for the obstetric early warning score suggested in the 2003-2005 Report on Confidential Enquiries into Maternal Deaths in the UK, and 0.973 (95% CI 0.957-0.989) for the non-obstetric National Early Warning Score. This highlights that the new clinical obstetric early warning score has an excellent ability to discriminate survivors from non-survivors in this critical care data set. Further work is needed to validate our new clinical early warning score externally in the obstetric ward environment. Anaesthesia © 2013 The Association of Anaesthetists of Great Britain and Ireland.
Lee, Jason; Morishima, Toshitaka; Kunisawa, Susumu; Sasaki, Noriko; Otsubo, Tetsuya; Ikai, Hiroshi; Imanaka, Yuichi
2013-01-01
Stroke and other cerebrovascular diseases are a major cause of death and disability. Predicting in-hospital mortality in ischaemic stroke patients can help to identify high-risk patients and guide treatment approaches. Chart reviews provide important clinical information for mortality prediction, but are laborious and limiting in sample sizes. Administrative data allow for large-scale multi-institutional analyses but lack the necessary clinical information for outcome research. However, administrative claims data in Japan has seen the recent inclusion of patient consciousness and disability information, which may allow more accurate mortality prediction using administrative data alone. The aim of this study was to derive and validate models to predict in-hospital mortality in patients admitted for ischaemic stroke using administrative data. The sample consisted of 21,445 patients from 176 Japanese hospitals, who were randomly divided into derivation and validation subgroups. Multivariable logistic regression models were developed using 7- and 30-day and overall in-hospital mortality as dependent variables. Independent variables included patient age, sex, comorbidities upon admission, Japan Coma Scale (JCS) score, Barthel Index score, modified Rankin Scale (mRS) score, and admissions after hours and on weekends/public holidays. Models were developed in the derivation subgroup, and coefficients from these models were applied to the validation subgroup. Predictive ability was analysed using C-statistics; calibration was evaluated with Hosmer-Lemeshow χ(2) tests. All three models showed predictive abilities similar or surpassing that of chart review-based models. The C-statistics were highest in the 7-day in-hospital mortality prediction model, at 0.906 and 0.901 in the derivation and validation subgroups, respectively. For the 30-day in-hospital mortality prediction models, the C-statistics for the derivation and validation subgroups were 0.893 and 0.872, respectively; in overall in-hospital mortality prediction these values were 0.883 and 0.876. In this study, we have derived and validated in-hospital mortality prediction models for three different time spans using a large population of ischaemic stroke patients in a multi-institutional analysis. The recent inclusion of JCS, Barthel Index, and mRS scores in Japanese administrative data has allowed the prediction of in-hospital mortality with accuracy comparable to that of chart review analyses. The models developed using administrative data had consistently high predictive abilities for all models in both the derivation and validation subgroups. These results have implications in the role of administrative data in future mortality prediction analyses. Copyright © 2013 S. Karger AG, Basel.
Validating a Monotonically-Integrated Large Eddy Simulation Code for Subsonic Jet Acoustics
NASA Technical Reports Server (NTRS)
Ingraham, Daniel; Bridges, James
2017-01-01
The results of subsonic jet validation cases for the Naval Research Lab's Jet Engine Noise REduction (JENRE) code are reported. Two set points from the Tanna matrix, set point 3 (Ma = 0.5, unheated) and set point 7 (Ma = 0.9, unheated) are attempted on three different meshes. After a brief discussion of the JENRE code and the meshes constructed for this work, the turbulent statistics for the axial velocity are presented and compared to experimental data, with favorable results. Preliminary simulations for set point 23 (Ma = 0.5, Tj=T1 = 1.764) on one of the meshes are also described. Finally, the proposed configuration for the farfield noise prediction with JENRE's Ffowcs-Williams Hawking solver are detailed.
Normality Tests for Statistical Analysis: A Guide for Non-Statisticians
Ghasemi, Asghar; Zahediasl, Saleh
2012-01-01
Statistical errors are common in scientific literature and about 50% of the published articles have at least one error. The assumption of normality needs to be checked for many statistical procedures, namely parametric tests, because their validity depends on it. The aim of this commentary is to overview checking for normality in statistical analysis using SPSS. PMID:23843808
ERIC Educational Resources Information Center
Thompson, Bruce
Web-based statistical instruction, like all statistical instruction, ought to focus on teaching the essence of the research endeavor: the exercise of reflective judgment. Using the framework of the recent report of the American Psychological Association (APA) Task Force on Statistical Inference (Wilkinson and the APA Task Force on Statistical…
Recent statistical methods for orientation data
NASA Technical Reports Server (NTRS)
Batschelet, E.
1972-01-01
The application of statistical methods for determining the areas of animal orientation and navigation are discussed. The method employed is limited to the two-dimensional case. Various tests for determining the validity of the statistical analysis are presented. Mathematical models are included to support the theoretical considerations and tables of data are developed to show the value of information obtained by statistical analysis.
Parametric vs. non-parametric statistics of low resolution electromagnetic tomography (LORETA).
Thatcher, R W; North, D; Biver, C
2005-01-01
This study compared the relative statistical sensitivity of non-parametric and parametric statistics of 3-dimensional current sources as estimated by the EEG inverse solution Low Resolution Electromagnetic Tomography (LORETA). One would expect approximately 5% false positives (classification of a normal as abnormal) at the P < .025 level of probability (two tailed test) and approximately 1% false positives at the P < .005 level. EEG digital samples (2 second intervals sampled 128 Hz, 1 to 2 minutes eyes closed) from 43 normal adult subjects were imported into the Key Institute's LORETA program. We then used the Key Institute's cross-spectrum and the Key Institute's LORETA output files (*.lor) as the 2,394 gray matter pixel representation of 3-dimensional currents at different frequencies. The mean and standard deviation *.lor files were computed for each of the 2,394 gray matter pixels for each of the 43 subjects. Tests of Gaussianity and different transforms were computed in order to best approximate a normal distribution for each frequency and gray matter pixel. The relative sensitivity of parametric vs. non-parametric statistics were compared using a "leave-one-out" cross validation method in which individual normal subjects were withdrawn and then statistically classified as being either normal or abnormal based on the remaining subjects. Log10 transforms approximated Gaussian distribution in the range of 95% to 99% accuracy. Parametric Z score tests at P < .05 cross-validation demonstrated an average misclassification rate of approximately 4.25%, and range over the 2,394 gray matter pixels was 27.66% to 0.11%. At P < .01 parametric Z score cross-validation false positives were 0.26% and ranged from 6.65% to 0% false positives. The non-parametric Key Institute's t-max statistic at P < .05 had an average misclassification error rate of 7.64% and ranged from 43.37% to 0.04% false positives. The nonparametric t-max at P < .01 had an average misclassification rate of 6.67% and ranged from 41.34% to 0% false positives of the 2,394 gray matter pixels for any cross-validated normal subject. In conclusion, adequate approximation to Gaussian distribution and high cross-validation can be achieved by the Key Institute's LORETA programs by using a log10 transform and parametric statistics, and parametric normative comparisons had lower false positive rates than the non-parametric tests.
NASA Astrophysics Data System (ADS)
Garza, Jennifer M.
The purpose of this study is to inform and further the discussion of academic (i.e. teachers and school counselors) and non-academic (i.e. parents, family, friends, etc.) validating agents on Latina students' mathematics and science self-concepts. This study found a relationship between Latina students' interactions with academic and non-academic validating agents and their math and science self-concept at the K-12 level. Through the review of the literature the researcher addresses identifiable factors and strategies that inform the field of education in the areas of validation theory, family characteristics, and access to STEM fields for Latina students. The researcher used an established instrument designed, administered, and validated through the National Center for Education Statistics (NCES). For purposes of this study, a categorical subset of participants who self-identified as being a Latina student was used. As a result, the total subset number in this study was N=1,882. To determine if academic and non-academic validating agents had an observable statistically significant relationship with Latina students' math and science self-concept, a series of one-way ANOVAs were calculated to compare differences in students' math and science self-concept based on academic and non-academic validating agents for the weighted sample of Latinas for the HLS:09 survey. A path analysis was also employed to assess the factors involved in Latina students' math and science self-concepts. The findings are consistent with previous research involving the influence that academic and non-academic validating agents have on the math and science self-concept of Latina students. The results indicated that students who had teachers that believed in the students, regardless of family background, social economic status or home environment influences had higher math and science self concepts than those who did not. Similarly, it was found that students who had counselors that set high standards of learning and believed that all students could do well had higher math and science self concept than those who did not. Students who had parents that encouraged and discussed taking more math and science courses had higher math and science self concepts than those who did not.
38 CFR 1.15 - Standards for program evaluation.
Code of Federal Regulations, 2010 CFR
2010-07-01
... program operates. (3) Validity. The degree of statistical validity should be assessed within the research... decisions. (4) Reliability. Use of the same research design by others should yield the same findings. (g...
Analytical procedure validation and the quality by design paradigm.
Rozet, Eric; Lebrun, Pierre; Michiels, Jean-François; Sondag, Perceval; Scherder, Tara; Boulanger, Bruno
2015-01-01
Since the adoption of the ICH Q8 document concerning the development of pharmaceutical processes following a quality by design (QbD) approach, there have been many discussions on the opportunity for analytical procedure developments to follow a similar approach. While development and optimization of analytical procedure following QbD principles have been largely discussed and described, the place of analytical procedure validation in this framework has not been clarified. This article aims at showing that analytical procedure validation is fully integrated into the QbD paradigm and is an essential step in developing analytical procedures that are effectively fit for purpose. Adequate statistical methodologies have also their role to play: such as design of experiments, statistical modeling, and probabilistic statements. The outcome of analytical procedure validation is also an analytical procedure design space, and from it, control strategy can be set.
Morgan, Patrick; Nissi, Mikko J; Hughes, John; Mortazavi, Shabnam; Ellerman, Jutta
2017-07-01
Objectives The purpose of this study was to validate T2* mapping as an objective, noninvasive method for the prediction of acetabular cartilage damage. Methods This is the second step in the validation of T2*. In a previous study, we established a quantitative predictive model for identifying and grading acetabular cartilage damage. In this study, the model was applied to a second cohort of 27 consecutive hips to validate the model. A clinical 3.0-T imaging protocol with T2* mapping was used. Acetabular regions of interest (ROI) were identified on magnetic resonance and graded using the previously established model. Each ROI was then graded in a blinded fashion by arthroscopy. Accurate surgical location of ROIs was facilitated with a 2-dimensional map projection of the acetabulum. A total of 459 ROIs were studied. Results When T2* mapping and arthroscopic assessment were compared, 82% of ROIs were within 1 Beck group (of a total 6 possible) and 32% of ROIs were classified identically. Disease prediction based on receiver operating characteristic curve analysis demonstrated a sensitivity of 0.713 and a specificity of 0.804. Model stability evaluation required no significant changes to the predictive model produced in the initial study. Conclusions These results validate that T2* mapping provides statistically comparable information regarding acetabular cartilage when compared to arthroscopy. In contrast to arthroscopy, T2* mapping is quantitative, noninvasive, and can be used in follow-up. Unlike research quantitative magnetic resonance protocols, T2* takes little time and does not require a contrast agent. This may facilitate its use in the clinical sphere.
Cancela Carral, José María; Lago Ballesteros, Joaquín; Ayán Pérez, Carlos; Mosquera Morono, María Belén
2016-01-01
To analyse the reliability and validity of the Weekly Activity Checklist (WAC), the One Week Recall (OWR), and the Godin-Shephard Leisure Time Exercise Questionnaire (GLTEQ) in Spanish adolescents. A total of 78 adolescents wore a pedometer for one week, filled out the questionnaires at the end of this period and underwent a test to estimate their maximal oxygen consumption (VO2max). The reliability of the questionnaires was determined by means of a factor analysis. Convergent validity was obtained by comparing the questionnaires' scores against the amount of physical activity quantified by the pedometer and the VO2max reported. The questionnaires showed a weak internal consistency (WAC: α=0.59-0.78; OWR: α=0.53-0.73; GLTEQ: α=0.60). Moderate statistically significant correlations were found between the pedometer and the WAC (r=0.69; p <0.01) and the OWR (r=0.42; p <0.01), while a low statistically significant correlation was found for the GLTEQ (r=0.36; p=0.01). The estimated VO2max showed a low level of association with the WAC results (r=0.30; p <0.05), and the OWR results (r=0.29; p <0.05). When classifying the participants as active or inactive, the level of agreement with the pedometer was moderate for the WAC (k=0.46) and the OWR (r=0.44), and slight for the GLTEQ (r=0.20). Of the three questionnaires analysed, the WAC showed the best psychometric performance as it was the only one with respectable convergent validity, while sharing low reliability with the OWR and the GLTEQ. Copyright © 2016 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.
Reliability and factorial validity of flexibility tests for team sports.
Sporis, Goran; Vucetic, Vlatko; Jovanovic, Mario; Jukic, Igor; Omrcen, Darija
2011-04-01
The main goal of this method paper was to evaluate the reliability and factorial validity of flexibility tests used in soccer, and to do crossvalidation study on 2 other team sports using handball and basketball players. The second aim was to compare the validity of the different tests and evaluate the flexibility of soccer players; the third was to determine the positional differences between attackers, defenders, and midfielders in all flexibility tests. One hundred and fifty (n = 150) elite male junior soccer players, members of the First Croatian Junior League Teams, and 60 (n = 60) handball and 60 (n = 60) basketball players also members of the First Croatian Junior League Teams volunteered to participate in the study, tested for the purpose of crossvalidation. The SAR and V-SAR had the greatest AVR and ICC. The within-subjects variation ranged from between 0.3 and 3.8%. The lowest value of CV was found between the LSPL and LSPR. Low to moderate statistically significant correlation coefficients were found among all the measured flexibility tests. It was observed that the greatest correlations existed between the SAR and V-SAR (r = 0.65) and between the LLSR and LLSL (r = 0.56). Statistically significant correlations were also observed between the BLPL and BLPR (r = 0.62). The principal components factor analysis of 9 flexibility tests resulted in the extraction of 3 significant components. The results of this study have the following implications for the assessment of flexibility in soccer: (a) all flexibility tests used in this study have the acceptable between and within-subjects reliability and they can be used to estimate the flexibility of soccer players; (b) the LSPL and LSPR tests are the most reliable and valid flexibility tests for the estimation of flexibility of professional soccer players.
NASA Astrophysics Data System (ADS)
Rubin, D.; Aldering, G.; Barbary, K.; Boone, K.; Chappell, G.; Currie, M.; Deustua, S.; Fagrelius, P.; Fruchter, A.; Hayden, B.; Lidman, C.; Nordin, J.; Perlmutter, S.; Saunders, C.; Sofiatti, C.; Supernova Cosmology Project, The
2015-11-01
While recent supernova (SN) cosmology research has benefited from improved measurements, current analysis approaches are not statistically optimal and will prove insufficient for future surveys. This paper discusses the limitations of current SN cosmological analyses in treating outliers, selection effects, shape- and color-standardization relations, unexplained dispersion, and heterogeneous observations. We present a new Bayesian framework, called UNITY (Unified Nonlinear Inference for Type-Ia cosmologY), that incorporates significant improvements in our ability to confront these effects. We apply the framework to real SN observations and demonstrate smaller statistical and systematic uncertainties. We verify earlier results that SNe Ia require nonlinear shape and color standardizations, but we now include these nonlinear relations in a statistically well-justified way. This analysis was primarily performed blinded, in that the basic framework was first validated on simulated data before transitioning to real data. We also discuss possible extensions of the method.
ERIC Educational Resources Information Center
Patterson, Brian F.; Mattern, Krista D.
2013-01-01
The continued accumulation of validity evidence for the core uses of educational assessments is critical to ensure that proper inferences will be made for those core purposes. To that end, the College Board has continued to follow previous cohorts of college students and this report provides updated validity evidence for using the SAT to predict…
Improving Maintenance Data Collection Via Point-of- Maintenance (POMX) Implementation
2006-03-01
accurate documentation, (3) identifying and correcting the root causes for poor data integrity, and (4) educating the unit on the critical need for data ...the validity of the results. The data in this study were analyzed using the SAS JMP 6.0 statistical software package. The results for the tests...traditional keyboard data entry methods at a computer terminal. These terminals are typically located in the aircraft maintenance unit (AMU) facility , away
Improving Maintenance Data Collection Via Point-Of-Maintenance (POMX) Implementation
2006-03-01
accurate documentation, (3) identifying and correcting the root causes for poor data integrity, and (4) educating the unit on the critical need for data ...the validity of the results. The data in this study were analyzed using the SAS JMP 6.0 statistical software package. The results for the tests...traditional keyboard data entry methods at a computer terminal. These terminals are typically located in the aircraft maintenance unit (AMU) facility , away
Park, Hyun; Bothe, Denise; Holsinger, Eva; Kirchner, H Lester; Olness, Karen; Mandalakas, Anna
2011-01-01
Internationally adopted children often arrive from institutional settings where they have experienced medical, nutritional and psychosocial deprivation. This study uses a validated research assessment tool to prospectively assess the impact of baseline (immediately post adoption) nutritional status on fifty-eight children as measured by weight-for-age, height-for-age, weight-for-height and head circumference-for-age z scores, as a determinant of cognitive (MDI) and psychomotor development (PDI) scores longitudinally. A statistical model was developed to allow for different ages at time of initial assessment as well as variable intervals between follow up visits. The study results show that both acute and chronic measures of malnutrition significantly affect baseline developmental status as well as the rate of improvement in both MDI and PDI scores. This study contributes to the body of literature with its prospective nature, unique statistical model for longitudinal evaluation, and use of a validated assessment tool to assess outcomes.
Conesa, Claudia; García-Breijo, Eduardo; Loeff, Edwin; Seguí, Lucía; Fito, Pedro; Laguarda-Miró, Nicolás
2015-01-01
Electrochemical Impedance Spectroscopy (EIS) has been used to develop a methodology able to identify and quantify fermentable sugars present in the enzymatic hydrolysis phase of second-generation bioethanol production from pineapple waste. Thus, a low-cost non-destructive system consisting of a stainless double needle electrode associated to an electronic equipment that allows the implementation of EIS was developed. In order to validate the system, different concentrations of glucose, fructose and sucrose were added to the pineapple waste and analyzed both individually and in combination. Next, statistical data treatment enabled the design of specific Artificial Neural Networks-based mathematical models for each one of the studied sugars and their respective combinations. The obtained prediction models are robust and reliable and they are considered statistically valid (CCR% > 93.443%). These results allow us to introduce this EIS-based technique as an easy, fast, non-destructive, and in-situ alternative to the traditional laboratory methods for enzymatic hydrolysis monitoring. PMID:26378537
Sakunpak, Apirak; Suksaeree, Jirapornchai; Monton, Chaowalit; Pathompak, Pathamaporn; Kraisintu, Krisana
2014-02-01
To develop and validate an image analysis method for quantitative analysis of γ-oryzanol in cold pressed rice bran oil. TLC-densitometric and TLC-image analysis methods were developed, validated, and used for quantitative analysis of γ-oryzanol in cold pressed rice bran oil. The results obtained by these two different quantification methods were compared by paired t-test. Both assays provided good linearity, accuracy, reproducibility and selectivity for determination of γ-oryzanol. The TLC-densitometric and TLC-image analysis methods provided a similar reproducibility, accuracy and selectivity for the quantitative determination of γ-oryzanol in cold pressed rice bran oil. A statistical comparison of the quantitative determinations of γ-oryzanol in samples did not show any statistically significant difference between TLC-densitometric and TLC-image analysis methods. As both methods were found to be equal, they therefore can be used for the determination of γ-oryzanol in cold pressed rice bran oil.
NASA Astrophysics Data System (ADS)
da Silva Oliveira, C. I.; Martinez-Martinez, D.; Al-Rjoub, A.; Rebouta, L.; Menezes, R.; Cunha, L.
2018-04-01
In this paper, we present a statistical method that allows evaluating the degree of a transparency of a thin film. To do so, the color coordinates are measured on different substrates, and the standard deviation is evaluated. In case of low values, the color depends on the film and not on the substrate, and intrinsic colors are obtained. In contrast, transparent films lead to high values of standard deviation, since the value of the color coordinates depends on the substrate. Between both extremes, colored films with a certain degree of transparency can be found. This method allows an objective and simple evaluation of the transparency of any film, improving the subjective visual inspection and avoiding the thickness problems related to optical spectroscopy evaluation. Zirconium oxynitride films deposited on three different substrates (Si, steel and glass) are used for testing the validity of this method, whose results have been validated with optical spectroscopy, and agree with the visual impression of the samples.
The Fifth Calibration/Data Product Validation Panel Meeting
NASA Technical Reports Server (NTRS)
1992-01-01
The minutes and associated documents prepared from presentations and meetings at the Fifth Calibration/Data Product Validation Panel meeting in Boulder, Colorado, April 8 - 10, 1992, are presented. Key issues include (1) statistical characterization of data sets: finding statistics that characterize key attributes of the data sets, and defining ways to characterize the comparisons among data sets; (2) selection of specific intercomparison exercises: selecting characteristic spatial and temporal regions for intercomparisons, and impact of validation exercises on the logistics of current and planned field campaigns and model runs; and (3) preparation of data sets for intercomparisons: characterization of assumptions, transportable data formats, labeling data files, content of data sets, and data storage and distribution (EOSDIS interface).
A user-targeted synthesis of the VALUE perfect predictor experiment
NASA Astrophysics Data System (ADS)
Maraun, Douglas; Widmann, Martin; Gutierrez, Jose; Kotlarski, Sven; Hertig, Elke; Wibig, Joanna; Rössler, Ole; Huth, Radan
2016-04-01
VALUE is an open European network to validate and compare downscaling methods for climate change research. A key deliverable of VALUE is the development of a systematic validation framework to enable the assessment and comparison of both dynamical and statistical downscaling methods. VALUE's main approach to validation is user-focused: starting from a specific user problem, a validation tree guides the selection of relevant validation indices and performance measures. We consider different aspects: (1) marginal aspects such as mean, variance and extremes; (2) temporal aspects such as spell length characteristics; (3) spatial aspects such as the de-correlation length of precipitation extremes; and multi-variate aspects such as the interplay of temperature and precipitation or scale-interactions. Several experiments have been designed to isolate specific points in the downscaling procedure where problems may occur. Experiment 1 (perfect predictors): what is the isolated downscaling skill? How do statistical and dynamical methods compare? How do methods perform at different spatial scales? Experiment 2 (Global climate model predictors): how is the overall representation of regional climate, including errors inherited from global climate models? Experiment 3 (pseudo reality): do methods fail in representing regional climate change? Here, we present a user-targeted synthesis of the results of the first VALUE experiment. In this experiment, downscaling methods are driven with ERA-Interim reanalysis data to eliminate global climate model errors, over the period 1979-2008. As reference data we use, depending on the question addressed, (1) observations from 86 meteorological stations distributed across Europe; (2) gridded observations at the corresponding 86 locations or (3) gridded spatially extended observations for selected European regions. With more than 40 contributing methods, this study is the most comprehensive downscaling inter-comparison project so far. The results clearly indicate that for several aspects, the downscaling skill varies considerably between different methods. For specific purposes, some methods can therefore clearly be excluded.
NASA Astrophysics Data System (ADS)
Schultz, E.; Genuer, V.; Marcoux, P.; Gal, O.; Belafdil, C.; Decq, D.; Maurin, Max; Morales, S.
2018-02-01
Elastic Light Scattering (ELS) is an innovative technique to identify bacterial pathogens directly on culture plates. Compelling results have already been reported for agri-food applications. Here, we have developed ELS for clinical diagnosis, starting with Staphylococcus aureus early screening. Our goal is to bring a result (positive/negative) after only 6 h of growth to fight surgical-site infections. The method starts with the acquisition of the scattering pattern arising from the interaction between a laser beam and a single bacterial colony growing on a culture medium. Then, the resulting image, considered as the bacterial species signature, is analyzed using statistical learning techniques. We present a custom optical setup able to target bacterial colonies with various sizes (30-500 microns). This system was used to collect a reference dataset of 38 strains of S. aureus and other Staphyloccocus species (5459 images) on ChromIDSAID/ MRSA bi-plates. A validation set from 20 patients has then been acquired and clinically-validated according to chromogenic enzymatic tests. The best correct-identification rate between S. aureus and S. non-aureus (94.7%) has been obtained using a support vector machine classifier trained on a combination of Fourier-Bessel moments and Local- Binary-Patterns extracted features. This statistical model applied to the validation set provided a sensitivity and a specificity of 90.0% and 56.9%, or alternatively, a positive predictive value of 47% and a negative predictive value of 93%. From a clinical point of view, the results head in the right direction and pave the way toward the WHO's requirements for rapid, low-cost, and automated diagnosis tools.
Students' Initial Knowledge State and Test Design: Towards a Valid and Reliable Test Instrument
ERIC Educational Resources Information Center
CoPo, Antonio Roland I.
2015-01-01
Designing a good test instrument involves specifications, test construction, validation, try-out, analysis and revision. The initial knowledge state of forty (40) tertiary students enrolled in Business Statistics course was determined and the same test instrument undergoes validation. The designed test instrument did not only reveal the baseline…
The Michigan Alcoholism Screening Test (MAST): A Statistical Validation Analysis
ERIC Educational Resources Information Center
Laux, John M.; Newman, Isadore; Brown, Russ
2004-01-01
This study extends the Michigan Alcoholism Screening Test (MAST; M. L. Selzer, 1971) literature base by examining 4 issues related to the validity of the MAST scores. Specifically, the authors examine the validity of the MAST scores in light of the presence of impression management, participant demographic variables, and item endorsement…
Analyzing the Validity of the Adult-Adolescent Parenting Inventory for Low-Income Populations
ERIC Educational Resources Information Center
Lawson, Michael A.; Alameda-Lawson, Tania; Byrnes, Edward
2017-01-01
Objectives: The purpose of this study was to examine the construct and predictive validity of the Adult-Adolescent Parenting Inventory (AAPI-2). Methods: The validity of the AAPI-2 was evaluated using multiple statistical methods, including exploratory factor analysis, confirmatory factor analysis, and latent class analysis. These analyses were…
McAllister, Sue; Lincoln, Michelle; Ferguson, Allison; McAllister, Lindy
2013-01-01
Valid assessment of health science students' ability to perform in the real world of workplace practice is critical for promoting quality learning and ultimately certifying students as fit to enter the world of professional practice. Current practice in performance assessment in the health sciences field has been hampered by multiple issues regarding assessment content and process. Evidence for the validity of scores derived from assessment tools are usually evaluated against traditional validity categories with reliability evidence privileged over validity, resulting in the paradoxical effect of compromising the assessment validity and learning processes the assessments seek to promote. Furthermore, the dominant statistical approaches used to validate scores from these assessments fall under the umbrella of classical test theory approaches. This paper reports on the successful national development and validation of measures derived from an assessment of Australian speech pathology students' performance in the workplace. Validation of these measures considered each of Messick's interrelated validity evidence categories and included using evidence generated through Rasch analyses to support score interpretation and related action. This research demonstrated that it is possible to develop an assessment of real, complex, work based performance of speech pathology students, that generates valid measures without compromising the learning processes the assessment seeks to promote. The process described provides a model for other health professional education programs to trial.
Currens, J.C.
1999-01-01
Analytical data for nitrate and triazines from 566 samples collected over a 3-year period at Pleasant Grove Spring, Logan County, KY, were statistically analyzed to determine the minimum data set needed to calculate meaningful yearly averages for a conduit-flow karst spring. Results indicate that a biweekly sampling schedule augmented with bihourly samples from high-flow events will provide meaningful suspended-constituent and dissolved-constituent statistics. Unless collected over an extensive period of time, daily samples may not be representative and may also be autocorrelated. All high-flow events resulting in a significant deflection of a constituent from base-line concentrations should be sampled. Either the geometric mean or the flow-weighted average of the suspended constituents should be used. If automatic samplers are used, then they may be programmed to collect storm samples as frequently as every few minutes to provide details on the arrival time of constituents of interest. However, only samples collected bihourly should be used to calculate averages. By adopting a biweekly sampling schedule augmented with high-flow samples, the need to continuously monitor discharge, or to search for and analyze existing data to develop a statistically valid monitoring plan, is lessened.Analytical data for nitrate and triazines from 566 samples collected over a 3-year period at Pleasant Grove Spring, Logan County, KY, were statistically analyzed to determine the minimum data set needed to calculate meaningful yearly averages for a conduit-flow karst spring. Results indicate that a biweekly sampling schedule augmented with bihourly samples from high-flow events will provide meaningful suspended-constituent and dissolved-constituent statistics. Unless collected over an extensive period of time, daily samples may not be representative and may also be autocorrelated. All high-flow events resulting in a significant deflection of a constituent from base-line concentrations should be sampled. Either the geometric mean or the flow-weighted average of the suspended constituents should be used. If automatic samplers are used, then they may be programmed to collect storm samples as frequently as every few minutes to provide details on the arrival time of constituents of interest. However, only samples collected bihourly should be used to calculate averages. By adopting a biweekly sampling schedule augmented with high-flow samples, the need to continuously monitor discharge, or to search for and analyze existing data to develop a statistically valid monitoring plan, is lessened.
2012-01-01
Background Oestrogen and progestogen have the potential to influence gastro-intestinal motility; both are key components of hormone replacement therapy (HRT). Results of observational studies in women taking HRT rely on self-reporting of gastro-oesophageal symptoms and the aetiology of gastro-oesophageal reflux disease (GORD) remains unclear. This study investigated the association between HRT and GORD in menopausal women using validated general practice records. Methods 51,182 menopausal women were identified using the UK General Practice Research Database between 1995–2004. Of these, 8,831 were matched with and without hormone use. Odds ratios (ORs) were calculated for GORD and proton-pump inhibitor (PPI) use in hormone and non-hormone users, adjusting for age, co-morbidities, and co-pharmacy. Results In unadjusted analysis, all forms of hormone use (oestrogen-only, tibolone, combined HRT and progestogen) were statistically significantly associated with GORD. In adjusted models, this association remained statistically significant for oestrogen-only treatment (OR 1.49; 1.18–1.89). Unadjusted analysis showed a statistically significant association between PPI use and oestrogen-only and combined HRT treatment. When adjusted for covariates, oestrogen-only treatment was significant (OR 1.34; 95% CI 1.03–1.74). Findings from the adjusted model demonstrated the greater use of PPI by progestogen users (OR 1.50; 1.01–2.22). Conclusions This first large cohort study of the association between GORD and HRT found a statistically significant association between oestrogen-only hormone and GORD and PPI use. This should be further investigated using prospective follow-up to validate the strength of association and describe its clinical significance. PMID:22642788
NASA Astrophysics Data System (ADS)
Jiang, Zhou-Ting; Zhang, Lin-Xi; Sun, Ting-Ting; Wu, Tai-Quan
2009-10-01
The character of forming long-range contacts affects the three-dimensional structure of globular proteins deeply. As the different ability to form long-range contacts between 20 types of amino acids and 4 categories of globular proteins, the statistical properties are thoroughly discussed in this paper. Two parameters NC and ND are defined to confine the valid residues in detail. The relationship between hydrophobicity scales and valid residue percentage of each amino acid is given in the present work and the linear functions are shown in our statistical results. It is concluded that the hydrophobicity scale defined by chemical derivatives of the amino acids and nonpolar phase of large unilamellar vesicle membranes is the most effective technique to characterise the hydrophobic behavior of amino acid residues. Meanwhile, residue percentage Pi and sequential residue length Li of a certain protein i are calculated under different conditions. The statistical results show that the average value of Pi as well as Li of all-α proteins has a minimum among these 4 classes of globular proteins, indicating that all-α proteins are hardly capable of forming long-range contacts one by one along their linear amino acid sequences. All-β proteins have a higher tendency to construct long-range contacts along their primary sequences related to the secondary configurations, i.e. parallel and anti-parallel configurations of β sheets. The investigation of the interior properties of globular proteins give us the connection between the three-dimensional structure and its primary sequence data or secondary configurations, and help us to understand the structure of protein and its folding process well.
Pat, Lucio; Ali, Bassam; Guerrero, Armando; Córdova, Atl V.; Garduza, José P.
2016-01-01
Attenuated total reflectance-Fourier transform infrared spectrometry and chemometrics model was used for determination of physicochemical properties (pH, redox potential, free acidity, electrical conductivity, moisture, total soluble solids (TSS), ash, and HMF) in honey samples. The reference values of 189 honey samples of different botanical origin were determined using Association Official Analytical Chemists, (AOAC), 1990; Codex Alimentarius, 2001, International Honey Commission, 2002, methods. Multivariate calibration models were built using partial least squares (PLS) for the measurands studied. The developed models were validated using cross-validation and external validation; several statistical parameters were obtained to determine the robustness of the calibration models: (PCs) optimum number of components principal, (SECV) standard error of cross-validation, (R 2 cal) coefficient of determination of cross-validation, (SEP) standard error of validation, and (R 2 val) coefficient of determination for external validation and coefficient of variation (CV). The prediction accuracy for pH, redox potential, electrical conductivity, moisture, TSS, and ash was good, while for free acidity and HMF it was poor. The results demonstrate that attenuated total reflectance-Fourier transform infrared spectrometry is a valuable, rapid, and nondestructive tool for the quantification of physicochemical properties of honey. PMID:28070445
López-Ortega, Mariana; Torres-Castro, Sara; Rosas-Carrasco, Oscar
2016-12-09
The Satisfaction with Life Scale (SWLS) has been widely used and has proven to be a valid and reliable instrument for assessing satisfaction with life in diverse population groups, however, research on satisfaction with life and validation of different measuring instruments in Mexican adults is still lacking. The objective was to evaluate the psychometric properties of the Satisfaction with Life Scale (SWLS) in a representative sample of Mexican adults. This is a methodological study to evaluate a satisfaction with life scale in a sample of 13,220 Mexican adults 50 years of age or older from the 2012 Mexican Health and Aging Study. The scale's reliability (internal consistency) was analysed using Cronbach's alpha and inter-item correlations. An exploratory factor analysis was also performed. Known-groups validity was evaluated comparing good-health and bad-health participants. Comorbidity, perceived financial situation, self-reported general health, depression symptoms, and social support were included to evaluate the validity between these measures and the total score of the scale using Spearman's correlations. The analysis of the scale's reliability showed good internal consistency (α = 0.74). The exploratory factor analysis confirmed the existence of a unique factor structure that explained 54% of the variance. SWLS was related to depression, perceived health, financial situation, and social support, and these relations were all statistically significant (P < .01). There was significant difference in life satisfaction between the good- and bad-health groups. Results show good internal consistency and construct validity of the SWLS. These results are comparable with results from previous studies. Meeting the study's objective to validate the scale, the results show that the Spanish version of the SWLS is a reliable and valid measure of satisfaction with life in the Mexican context.
Magalhães, Eunice; Calheiros, María M
2015-01-01
Although the significant scientific advances on place attachment literature, no instruments exist specifically developed or adapted to residential care. 410 adolescents (11 - 18 years old) participated in this study. The place attachment scale evaluates five dimensions: Place identity, Place dependence, Institutional bonding, Caregivers bonding and Friend bonding. Data analysis included descriptive statistics, content validity, construct validity (Confirmatory Factor Analysis), concurrent validity with correlations with satisfaction with life and with institution, and reliability evidences. The relationship with individual characteristics and placement length was also verified. Content validity analysis revealed that more than half of the panellists perceive all the items as relevant to assess the construct in residential care. The structure with five dimensions revealed good fit statistics and concurrent validity evidences were found, with significant correlations with satisfaction with life and with the institution. Acceptable values of internal consistence and specific gender differences were found. The preliminary psychometric properties of this scale suggest it potential to be used with youth in care.
Just add water: Accuracy of analysis of diluted human milk samples using mid-infrared spectroscopy.
Smith, R W; Adamkin, D H; Farris, A; Radmacher, P G
2017-01-01
To determine the maximum dilution of human milk (HM) that yields reliable results for protein, fat and lactose when analyzed by mid-infrared spectroscopy. De-identified samples of frozen HM were obtained. Milk was thawed and warmed (40°C) prior to analysis. Undiluted (native) HM was analyzed by mid-infrared spectroscopy for macronutrient composition: total protein (P), fat (F), carbohydrate (C); Energy (E) was calculated from the macronutrient results. Subsequent analyses were done with 1 : 2, 1 : 3, 1 : 5 and 1 : 10 dilutions of each sample with distilled water. Additional samples were sent to a certified lab for external validation. Quantitatively, F and P showed statistically significant but clinically non-critical differences in 1 : 2 and 1 : 3 dilutions. Differences at higher dilutions were statistically significant and deviated from native values enough to render those dilutions unreliable. External validation studies also showed statistically significant but clinically unimportant differences at 1 : 2 and 1 : 3 dilutions. The Calais Human Milk Analyzer can be used with HM samples diluted 1 : 2 and 1 : 3 and return results within 5% of values from undiluted HM. At a 1 : 5 or 1 : 10 dilution, however, results vary as much as 10%, especially with P and F. At the 1 : 2 and 1 : 3 dilutions these differences appear to be insignificant in the context of nutritional management. However, the accuracy and reliability of the 1 : 5 and 1 : 10 dilutions are questionable.
Adding statistical regularity results in a global slowdown in visual search.
Vaskevich, Anna; Luria, Roy
2018-05-01
Current statistical learning theories predict that embedding implicit regularities within a task should further improve online performance, beyond general practice. We challenged this assumption by contrasting performance in a visual search task containing either a consistent-mapping (regularity) condition, a random-mapping condition, or both conditions, mixed. Surprisingly, performance in a random visual search, without any regularity, was better than performance in a mixed design search that contained a beneficial regularity. This result was replicated using different stimuli and different regularities, suggesting that mixing consistent and random conditions leads to an overall slowing down of performance. Relying on the predictive-processing framework, we suggest that this global detrimental effect depends on the validity of the regularity: when its predictive value is low, as it is in the case of a mixed design, reliance on all prior information is reduced, resulting in a general slowdown. Our results suggest that our cognitive system does not maximize speed, but rather continues to gather and implement statistical information at the expense of a possible slowdown in performance. Copyright © 2018 Elsevier B.V. All rights reserved.
Publication Bias ( The "File-Drawer Problem") in Scientific Inference
NASA Technical Reports Server (NTRS)
Scargle, Jeffrey D.; DeVincenzi, Donald (Technical Monitor)
1999-01-01
Publication bias arises whenever the probability that a study is published depends on the statistical significance of its results. This bias, often called the file-drawer effect since the unpublished results are imagined to be tucked away in researchers' file cabinets, is potentially a severe impediment to combining the statistical results of studies collected from the literature. With almost any reasonable quantitative model for publication bias, only a small number of studies lost in the file-drawer will produce a significant bias. This result contradicts the well known Fail Safe File Drawer (FSFD) method for setting limits on the potential harm of publication bias, widely used in social, medical and psychic research. This method incorrectly treats the file drawer as unbiased, and almost always miss-estimates the seriousness of publication bias. A large body of not only psychic research, but medical and social science studies, has mistakenly relied on this method to validate claimed discoveries. Statistical combination can be trusted only if it is known with certainty that all studies that have been carried out are included. Such certainty is virtually impossible to achieve in literature surveys.
Development and Validation of a Job Exposure Matrix for Physical Risk Factors in Low Back Pain
Solovieva, Svetlana; Pehkonen, Irmeli; Kausto, Johanna; Miranda, Helena; Shiri, Rahman; Kauppinen, Timo; Heliövaara, Markku; Burdorf, Alex; Husgafvel-Pursiainen, Kirsti; Viikari-Juntura, Eira
2012-01-01
Objectives The aim was to construct and validate a gender-specific job exposure matrix (JEM) for physical exposures to be used in epidemiological studies of low back pain (LBP). Materials and Methods We utilized two large Finnish population surveys, one to construct the JEM and another to test matrix validity. The exposure axis of the matrix included exposures relevant to LBP (heavy physical work, heavy lifting, awkward trunk posture and whole body vibration) and exposures that increase the biomechanical load on the low back (arm elevation) or those that in combination with other known risk factors could be related to LBP (kneeling or squatting). Job titles with similar work tasks and exposures were grouped. Exposure information was based on face-to-face interviews. Validity of the matrix was explored by comparing the JEM (group-based) binary measures with individual-based measures. The predictive validity of the matrix against LBP was evaluated by comparing the associations of the group-based (JEM) exposures with those of individual-based exposures. Results The matrix includes 348 job titles, representing 81% of all Finnish job titles in the early 2000s. The specificity of the constructed matrix was good, especially in women. The validity measured with kappa-statistic ranged from good to poor, being fair for most exposures. In men, all group-based (JEM) exposures were statistically significantly associated with one-month prevalence of LBP. In women, four out of six group-based exposures showed an association with LBP. Conclusions The gender-specific JEM for physical exposures showed relatively high specificity without compromising sensitivity. The matrix can therefore be considered as a valid instrument for exposure assessment in large-scale epidemiological studies, when more precise but more labour-intensive methods are not feasible. Although the matrix was based on Finnish data we foresee that it could be applicable, with some modifications, in other countries with a similar level of technology. PMID:23152793
Jin, X F; Wang, J; Li, Y J; Liu, J F; Ni, D F
2016-09-20
Objective: To cross-culturally translate the questionnaire of olfactory disorders(QOD)into a simplified Chinese version, and evaluate its reliability and validity in clinical. Method: A simplified Chinese version of the QOD was evaluated in test-retest reliability, split-half reliability and internal consistency.Then it was evaluated in validity test including content validity, criterion-related validity, responsibility. Criterion-related validity was using the medical outcome study's 36-item short rorm health survey(SF-36) and the World Health Organization quality of life-brief (WHOQOL-BREF) for comparison. Result: A total of 239 patients with olfactory dysfunction were enrolled and tested, in which 195 patients completed all three surveys(QOD, SF-36, WHOQOL-BREF). The test-retest reliabilities of the QOD-parosmia statements(QOD-P), QOD-quality of life(QOD-QoL), and the QOD-visual simulation(QOD-VAS)sections were 0.799( P <0.01),0.781( P <0.01),0.488( P <0.01), respectively, and the Cronbach' s α coefficients reliability were 0.477,0.812,0.889,respectively.The split-half reliability of QOD-QoL was 0.89. There was no correlation between the QOD-P section and the SF-36, but there were statistically significant correlations between the QOD-QoL and QOD-VAS sections with the SF-36. There was no correlation between the QOD-P section and the WHOQOL-BREF, but there were statistically significant correlations between the QOD-QoL and QOD-VAS sections with the SF-36 in most sections. Conclusion: The simplified Chinese version of the QOD was testified to be a reliable and valid questionnaire for evaluating patients with olfactory dysfunction living in mainland of China.The QOD-P section needs further modifications to properly adapt patients with Chinese cultural and knowledge background. Copyright© by the Editorial Department of Journal of Clinical Otorhinolaryngology Head and Neck Surgery.
Hazing DEOCS 4.1 Construct Validity Summary
2017-08-01
Hazing DEOCS 4.1 Construct Validity Summary DEFENSE EQUAL OPPORTUNITY MANAGEMENT INSTITUTE DIRECTORATE OF...the analysis. Tables 4 – 6 provide additional information regarding the descriptive statistics and reliability of the Hazing items. Table 7 provides
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Xiaolin; Ye, Li; Wang, Xiaoxiang
2012-12-15
Several recent reports suggested that hydroxylated polybrominated diphenyl ethers (HO-PBDEs) may disturb thyroid hormone homeostasis. To illuminate the structural features for thyroid hormone activity of HO-PBDEs and the binding mode between HO-PBDEs and thyroid hormone receptor (TR), the hormone activity of a series of HO-PBDEs to thyroid receptors β was studied based on the combination of 3D-QSAR, molecular docking, and molecular dynamics (MD) methods. The ligand- and receptor-based 3D-QSAR models were obtained using Comparative Molecular Similarity Index Analysis (CoMSIA) method. The optimum CoMSIA model with region focusing yielded satisfactory statistical results: leave-one-out cross-validation correlation coefficient (q{sup 2}) was 0.571 andmore » non-cross-validation correlation coefficient (r{sup 2}) was 0.951. Furthermore, the results of internal validation such as bootstrapping, leave-many-out cross-validation, and progressive scrambling as well as external validation indicated the rationality and good predictive ability of the best model. In addition, molecular docking elucidated the conformations of compounds and key amino acid residues at the docking pocket, MD simulation further determined the binding process and validated the rationality of docking results. -- Highlights: ► The thyroid hormone activities of HO-PBDEs were studied by 3D-QSAR. ► The binding modes between HO-PBDEs and TRβ were explored. ► 3D-QSAR, molecular docking, and molecular dynamics (MD) methods were performed.« less
VALFAST: Secure Probabilistic Validation of Hundreds of Kepler Planet Candidates
NASA Astrophysics Data System (ADS)
Morton, Tim; Petigura, E.; Johnson, J. A.; Howard, A.; Marcy, G. W.; Baranec, C.; Law, N. M.; Riddle, R. L.; Ciardi, D. R.; Robo-AO Team
2014-01-01
The scope, scale, and tremendous success of the Kepler mission has necessitated the rapid development of probabilistic validation as a new conceptual framework for analyzing transiting planet candidate signals. While several planet validation methods have been independently developed and presented in the literature, none has yet come close to addressing the entire Kepler survey. I present the results of applying VALFAST---a planet validation code based on the methodology described in Morton (2012)---to every Kepler Object of Interest. VALFAST is unique in its combination of detail, completeness, and speed. Using the transit light curve shape, realistic population simulations, and (optionally) diverse follow-up observations, it calculates the probability that a transit candidate signal is the result of a true transiting planet or any of a number of astrophysical false positive scenarios, all in just a few minutes on a laptop computer. In addition to efficiently validating the planetary nature of hundreds of new KOIs, this broad application of VALFAST also demonstrates its ability to reliably identify likely false positives. This extensive validation effort is also the first to incorporate data from all of the largest Kepler follow-up observing efforts: the CKS survey of ~1000 KOIs with Keck/HIRES, the Robo-AO survey of >1700 KOIs, and high-resolution images obtained through the Kepler Follow-up Observing Program. In addition to enabling the core science that the Kepler mission was designed for, this methodology will be critical to obtain statistical results from future surveys such as TESS and PLATO.
Model identification using stochastic differential equation grey-box models in diabetes.
Duun-Henriksen, Anne Katrine; Schmidt, Signe; Røge, Rikke Meldgaard; Møller, Jonas Bech; Nørgaard, Kirsten; Jørgensen, John Bagterp; Madsen, Henrik
2013-03-01
The acceptance of virtual preclinical testing of control algorithms is growing and thus also the need for robust and reliable models. Models based on ordinary differential equations (ODEs) can rarely be validated with standard statistical tools. Stochastic differential equations (SDEs) offer the possibility of building models that can be validated statistically and that are capable of predicting not only a realistic trajectory, but also the uncertainty of the prediction. In an SDE, the prediction error is split into two noise terms. This separation ensures that the errors are uncorrelated and provides the possibility to pinpoint model deficiencies. An identifiable model of the glucoregulatory system in a type 1 diabetes mellitus (T1DM) patient is used as the basis for development of a stochastic-differential-equation-based grey-box model (SDE-GB). The parameters are estimated on clinical data from four T1DM patients. The optimal SDE-GB is determined from likelihood-ratio tests. Finally, parameter tracking is used to track the variation in the "time to peak of meal response" parameter. We found that the transformation of the ODE model into an SDE-GB resulted in a significant improvement in the prediction and uncorrelated errors. Tracking of the "peak time of meal absorption" parameter showed that the absorption rate varied according to meal type. This study shows the potential of using SDE-GBs in diabetes modeling. Improved model predictions were obtained due to the separation of the prediction error. SDE-GBs offer a solid framework for using statistical tools for model validation and model development. © 2013 Diabetes Technology Society.
Smith, William Pastor
2013-09-01
The primary purpose of this two-phased study was to examine the structural validity and statistical utility of a racism scale specific to Black men who have sex with men (MSM) who resided in the Washington, DC, metropolitan area and Baltimore, Maryland. Phase I involved pretesting a 10-item racism measure with 20 Black MSM. Based on pretest findings, the scale was adapted into a 21-item racism scale for use in collecting data on 166 respondents in Phase II. Exploratory factor analysis of the 21-item racism scale resulted in a 19-item, two-factor solution. The two factors or subscales were the following: General Racism and Relationships and Racism. Confirmatory factor analysis was used in testing construct validity of the factored racism scale. Specifically, the two racism factors were combined with three homophobia factors into a confirmatory factor analysis model. Based on a summary of the fit indices, both comparative and incremental were equal to .90, suggesting an adequate convergence of the racism and homophobia dimensions into a single social oppression construct. Statistical utility of the two racism subscales was demonstrated when regression analysis revealed that the gay-identified men versus bisexual-identified men in the sample were more likely to experience increased racism within the context of intimate relationships and less likely to be exposed to repeated experiences of general racism. Overall, the findings in this study highlight the importance of continuing to explore the psychometric properties of a racism scale that accounts for the unique psychosocial concerns experienced by Black MSM.
NASA Astrophysics Data System (ADS)
Dabanlı, İsmail; Şen, Zekai
2018-04-01
The statistical climate downscaling model by the Turkish Water Foundation (TWF) is further developed and applied to a set of monthly precipitation records. The model is structured by two phases as spatial (regional) and temporal downscaling of global circulation model (GCM) scenarios. The TWF model takes into consideration the regional dependence function (RDF) for spatial structure and Markov whitening process (MWP) for temporal characteristics of the records to set projections. The impact of climate change on monthly precipitations is studied by downscaling Intergovernmental Panel on Climate Change-Special Report on Emission Scenarios (IPCC-SRES) A2 and B2 emission scenarios from Max Plank Institute (EH40PYC) and Hadley Center (HadCM3). The main purposes are to explain the TWF statistical climate downscaling model procedures and to expose the validation tests, which are rewarded in same specifications as "very good" for all stations except one (Suhut) station in the Akarcay basin that is in the west central part of Turkey. Eventhough, the validation score is just a bit lower at the Suhut station, the results are "satisfactory." It is, therefore, possible to say that the TWF model has reasonably acceptable skill for highly accurate estimation regarding standard deviation ratio (SDR), Nash-Sutcliffe efficiency (NSE), and percent bias (PBIAS) criteria. Based on the validated model, precipitation predictions are generated from 2011 to 2100 by using 30-year reference observation period (1981-2010). Precipitation arithmetic average and standard deviation have less than 5% error for EH40PYC and HadCM3 SRES (A2 and B2) scenarios.
Szyda, Joanna; Liu, Zengting; Zatoń-Dobrowolska, Magdalena; Wierzbicki, Heliodor; Rzasa, Anna
2008-01-01
We analysed data from a selective DNA pooling experiment with 130 individuals of the arctic fox (Alopex lagopus), which originated from 2 different types regarding body size. The association between alleles of 6 selected unlinked molecular markers and body size was tested by using univariate and multinomial logistic regression models, applying odds ratio and test statistics from the power divergence family. Due to the small sample size and the resulting sparseness of the data table, in hypothesis testing we could not rely on the asymptotic distributions of the tests. Instead, we tried to account for data sparseness by (i) modifying confidence intervals of odds ratio; (ii) using a normal approximation of the asymptotic distribution of the power divergence tests with different approaches for calculating moments of the statistics; and (iii) assessing P values empirically, based on bootstrap samples. As a result, a significant association was observed for 3 markers. Furthermore, we used simulations to assess the validity of the normal approximation of the asymptotic distribution of the test statistics under the conditions of small and sparse samples.
Bashir, Saba; Qamar, Usman; Khan, Farhan Hassan
2015-06-01
Conventional clinical decision support systems are based on individual classifiers or simple combination of these classifiers which tend to show moderate performance. This research paper presents a novel classifier ensemble framework based on enhanced bagging approach with multi-objective weighted voting scheme for prediction and analysis of heart disease. The proposed model overcomes the limitations of conventional performance by utilizing an ensemble of five heterogeneous classifiers: Naïve Bayes, linear regression, quadratic discriminant analysis, instance based learner and support vector machines. Five different datasets are used for experimentation, evaluation and validation. The datasets are obtained from publicly available data repositories. Effectiveness of the proposed ensemble is investigated by comparison of results with several classifiers. Prediction results of the proposed ensemble model are assessed by ten fold cross validation and ANOVA statistics. The experimental evaluation shows that the proposed framework deals with all type of attributes and achieved high diagnosis accuracy of 84.16 %, 93.29 % sensitivity, 96.70 % specificity, and 82.15 % f-measure. The f-ratio higher than f-critical and p value less than 0.05 for 95 % confidence interval indicate that the results are extremely statistically significant for most of the datasets.
Hales, M.; Biros, E.
2015-01-01
Background: Since 1982, the International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI) has been used to classify sensation of spinal cord injury (SCI) through pinprick and light touch scores. The absence of proprioception, pain, and temperature within this scale creates questions about its validity and accuracy. Objectives: To assess whether the sensory component of the ISNCSCI represents a reliable and valid measure of classification of SCI. Methods: A systematic review of studies examining the reliability and validity of the sensory component of the ISNCSCI published between 1982 and February 2013 was conducted. The electronic databases MEDLINE via Ovid, CINAHL, PEDro, and Scopus were searched for relevant articles. A secondary search of reference lists was also completed. Chosen articles were assessed according to the Oxford Centre for Evidence-Based Medicine hierarchy of evidence and critically appraised using the McMasters Critical Review Form. A statistical analysis was conducted to investigate the variability of the results given by reliability studies. Results: Twelve studies were identified: 9 reviewed reliability and 3 reviewed validity. All studies demonstrated low levels of evidence and moderate critical appraisal scores. The majority of the articles (~67%; 6/9) assessing the reliability suggested that training was positively associated with better posttest results. The results of the 3 studies that assessed the validity of the ISNCSCI scale were confounding. Conclusions: Due to the low to moderate quality of the current literature, the sensory component of the ISNCSCI requires further revision and investigation if it is to be a useful tool in clinical trials. PMID:26363591
Pullin, A N; Pairis-Garcia, M D; Campbell, B J; Campler, M R; Proudfoot, K L
2017-11-01
When considering methodologies for collecting behavioral data, continuous sampling provides the most complete and accurate data set whereas instantaneous sampling can provide similar results and also increase the efficiency of data collection. However, instantaneous time intervals require validation to ensure accurate estimation of the data. Therefore, the objective of this study was to validate scan sampling intervals for lambs housed in a feedlot environment. Feeding, lying, standing, drinking, locomotion, and oral manipulation were measured on 18 crossbred lambs housed in an indoor feedlot facility for 14 h (0600-2000 h). Data from continuous sampling were compared with data from instantaneous scan sampling intervals of 5, 10, 15, and 20 min using a linear regression analysis. Three criteria determined if a time interval accurately estimated behaviors: 1) ≥ 0.90, 2) slope not statistically different from 1 ( > 0.05), and 3) intercept not statistically different from 0 ( > 0.05). Estimations for lying behavior were accurate up to 20-min intervals, whereas feeding and standing behaviors were accurate only at 5-min intervals (i.e., met all 3 regression criteria). Drinking, locomotion, and oral manipulation demonstrated poor associations () for all tested intervals. The results from this study suggest that a 5-min instantaneous sampling interval will accurately estimate lying, feeding, and standing behaviors for lambs housed in a feedlot, whereas continuous sampling is recommended for the remaining behaviors. This methodology will contribute toward the efficiency, accuracy, and transparency of future behavioral data collection in lamb behavior research.
A preliminary validation of a new measure of occupational health and safety.
Cadieux, Jean; Roy, Mario; Desmarais, Lise
2006-01-01
This article describes the validation of an instrument designed to conduct an Occupational Health and Safety (OHS) self-diagnosis using workers observations of tangible facts and actions in the workplace. The instrument puts the emphasis on observable factors that make it possible to act proactively before accidents actually occur. Three companies were recruited from the printing sector in Québec based on their OHS performance (low, medium, and high), their interest in the project, and their availability. The master sample was composed of 269 people. Partial least squares (PLS) and vanishing tetrad analyses were used to study the behavior of the formative scales developed. The results indicate that partial revision of the instrument will be necessary to reach a fully satisfying level of validity. With regards to the statistical approach adopted, the use of a macro called CTA-SAS 2.0 developed by Bollen and Ting [Bollen, K.A., & Ting, K.-F. (1993). Confirmatory tetrad analysis. In P.V. Marsden (Ed.), Social methodology.Vol.23 (pp.147-176). Cambridge, MA: Blackwell, 23] has proven to be effective in providing a suitable statistical treatment of the latent formative variables on which our instrument's development is based. The results suggest that the instrument appears to be relatively robust and appropriate to diagnose OHS leading measures. In the long run, this instrument will make it possible for companies to draw a more refined picture of OHS evolution, even in the absence of occupational injuries.
Collins, Simon N; Dyson, Sue J; Murray, Rachel C; Newton, J Richard; Burden, Faith; Trawford, Andrew F
2012-08-01
To establish and validate an objective method of radiographic diagnosis of anatomic changes in laminitic forefeet of donkeys on the basis of data from a comprehensive series of radiographic measurements. 85 donkeys with and 85 without forelimb laminitis for baseline data determination; a cohort of 44 donkeys with and 18 without forelimb laminitis was used for validation analyses. For each donkey, lateromedial radiographic views of 1 weight-bearing forelimb were obtained; images from 11 laminitic and 2 nonlaminitic donkeys were excluded (motion artifact) from baseline data determination. Data from an a priori selection of 19 measurements of anatomic features of laminitic and nonlaminitic donkey feet were analyzed by use of a novel application of multivariate statistical techniques. The resultant diagnostic models were validated in a blinded manner with data from the separate cohort of laminitic and nonlaminitic donkeys. Data were modeled, and robust statistical rules were established for the diagnosis of anatomic changes within laminitic donkey forefeet. Component 1 scores ≤ -3.5 were indicative of extreme anatomic change, and scores from -2.0 to 0.0 denoted modest change. Nonlaminitic donkeys with a score from 0.5 to 1.0 should be considered as at risk for laminitis. Results indicated that the radiographic procedures evaluated can be used for the identification, assessment, and monitoring of anatomic changes associated with laminitis. Screening assessments by use of this method may enable early detection of mild anatomic change and identification of at-risk donkeys.
Park, Joanne; Roberts, Mary Roduta; Esmail, Shaniff; Rayani, Fahreen; Norris, Colleen M; Gross, Douglas P
2018-06-01
Purpose To examine construct and concurrent validity of the Readiness for Return-To-Work (RRTW) Scale with injured workers participating in an outpatient occupational rehabilitation program. Methods Lost-time claimants (n = 389) with sub-acute or chronic musculoskeletal disorders completed the RRTW Scale on their first day of their occupational rehabilitation program. Statistical analysis included exploratory and confirmatory factor analyses of the readiness items, reliability analyses, and correlation with related scales and questionnaires. Results For claimants in the non-job attached/not working group (n = 165), three factors were found (1) Contemplation (2) Prepared for Action-Self-evaluative and (3) Prepared for Action-Behavioural. The precontemplation stage was not identified within this sample of injured workers. For claimants who were job attached/working group in some capacity (n = 224), two factors were identified (1) Uncertain Maintenance and (2) Proactive Maintenance. Expected relationships and statistically significant differences were found among the identified Return-To-Work (RTW) readiness factors and related constructs of pain, physical and mental health and RTW expectations. Conclusion Construct and concurrent validity of the RRTW Scale were supported in this study. The results of this study indicate the construct of readiness for RTW can vary by disability duration and occupational category. Physical health appears to be a significant barrier to RRTW for the job attached/working group while mental health significantly compromises RRTW with the non-job attached/not working group.
Soil Moisture Active Passive Mission L4_SM Data Product Assessment (Version 2 Validated Release)
NASA Technical Reports Server (NTRS)
Reichle, Rolf Helmut; De Lannoy, Gabrielle J. M.; Liu, Qing; Ardizzone, Joseph V.; Chen, Fan; Colliander, Andreas; Conaty, Austin; Crow, Wade; Jackson, Thomas; Kimball, John;
2016-01-01
During the post-launch SMAP calibration and validation (Cal/Val) phase there are two objectives for each science data product team: 1) calibrate, verify, and improve the performance of the science algorithm, and 2) validate the accuracy of the science data product as specified in the science requirements and according to the Cal/Val schedule. This report provides an assessment of the SMAP Level 4 Surface and Root Zone Soil Moisture Passive (L4_SM) product specifically for the product's public Version 2 validated release scheduled for 29 April 2016. The assessment of the Version 2 L4_SM data product includes comparisons of SMAP L4_SM soil moisture estimates with in situ soil moisture observations from core validation sites and sparse networks. The assessment further includes a global evaluation of the internal diagnostics from the ensemble-based data assimilation system that is used to generate the L4_SM product. This evaluation focuses on the statistics of the observation-minus-forecast (O-F) residuals and the analysis increments. Together, the core validation site comparisons and the statistics of the assimilation diagnostics are considered primary validation methodologies for the L4_SM product. Comparisons against in situ measurements from regional-scale sparse networks are considered a secondary validation methodology because such in situ measurements are subject to up-scaling errors from the point-scale to the grid cell scale of the data product. Based on the limited set of core validation sites, the wide geographic range of the sparse network sites, and the global assessment of the assimilation diagnostics, the assessment presented here meets the criteria established by the Committee on Earth Observing Satellites for Stage 2 validation and supports the validated release of the data. An analysis of the time average surface and root zone soil moisture shows that the global pattern of arid and humid regions are captured by the L4_SM estimates. Results from the core validation site comparisons indicate that "Version 2" of the L4_SM data product meets the self-imposed L4_SM accuracy requirement, which is formulated in terms of the ubRMSE: the RMSE (Root Mean Square Error) after removal of the long-term mean difference. The overall ubRMSE of the 3-hourly L4_SM surface soil moisture at the 9 km scale is 0.035 cubic meters per cubic meter requirement. The corresponding ubRMSE for L4_SM root zone soil moisture is 0.024 cubic meters per cubic meter requirement. Both of these metrics are comfortably below the 0.04 cubic meters per cubic meter requirement. The L4_SM estimates are an improvement over estimates from a model-only SMAP Nature Run version 4 (NRv4), which demonstrates the beneficial impact of the SMAP brightness temperature data. L4_SM surface soil moisture estimates are consistently more skillful than NRv4 estimates, although not by a statistically significant margin. The lack of statistical significance is not surprising given the limited data record available to date. Root zone soil moisture estimates from L4_SM and NRv4 have similar skill. Results from comparisons of the L4_SM product to in situ measurements from nearly 400 sparse network sites corroborate the core validation site results. The instantaneous soil moisture and soil temperature analysis increments are within a reasonable range and result in spatially smooth soil moisture analyses. The O-F residuals exhibit only small biases on the order of 1-3 degrees Kelvin between the (re-scaled) SMAP brightness temperature observations and the L4_SM model forecast, which indicates that the assimilation system is largely unbiased. The spatially averaged time series standard deviation of the O-F residuals is 5.9 degrees Kelvin, which reduces to 4.0 degrees Kelvin for the observation-minus-analysis (O-A) residuals, reflecting the impact of the SMAP observations on the L4_SM system. Averaged globally, the time series standard deviation of the normalized O-F residuals is close to unity, which would suggest that the magnitude of the modeled errors approximately reflects that of the actual errors. The assessment report also notes several limitations of the "Version 2" L4_SM data product and science algorithm calibration that will be addressed in future releases. Regionally, the time series standard deviation of the normalized O-F residuals deviates considerably from unity, which indicates that the L4_SM assimilation algorithm either over- or under-estimates the actual errors that are present in the system. Planned improvements include revised land model parameters, revised error parameters for the land model and the assimilated SMAP observations, and revised surface meteorological forcing data for the operational period and underlying climatological data. Moreover, a refined analysis of the impact of SMAP observations will be facilitated by the construction of additional variants of the model-only reference data. Nevertheless, the “Version 2” validated release of the L4_SM product is sufficiently mature and of adequate quality for distribution to and use by the larger science and application communities.
ERIC Educational Resources Information Center
Welborn, Cliff Alan; Lester, Don; Parnell, John
2015-01-01
The American College Test (ACT) is utilized to determine academic success in business core courses at a midlevel state university. Specifically, subscores are compared to subsequent student grades in selected courses. The results indicate that ACT Mathematics and English subscores are a valid predictor of success or failure of business students in…
Butterfly Floquet Spectrum in Driven SU(2) Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang Jiao; Department of Physics, Institute of Theoretical Physics and Astrophysics, Xiamen University, Xiamen 361005; Gong Jiangbin
2009-06-19
The Floquet spectrum of a class of driven SU(2) systems is shown to display a butterfly pattern with multifractal properties. The level crossing between Floquet states of the same parity or different parities is studied. The results are relevant to studies of fractal statistics, quantum chaos, coherent destruction of tunneling, and the validity of mean-field descriptions of Bose-Einstein condensates.
Ielpo, Pierina; Leardi, Riccardo; Pappagallo, Giuseppe; Uricchio, Vito Felice
2017-06-01
In this paper, the results obtained from multivariate statistical techniques such as PCA (Principal component analysis) and LDA (Linear discriminant analysis) applied to a wide soil data set are presented. The results have been compared with those obtained on a groundwater data set, whose samples were collected together with soil ones, within the project "Improvement of the Regional Agro-meteorological Monitoring Network (2004-2007)". LDA, applied to soil data, has allowed to distinguish the geographical origin of the sample from either one of the two macroaeras: Bari and Foggia provinces vs Brindisi, Lecce e Taranto provinces, with a percentage of correct prediction in cross validation of 87%. In the case of the groundwater data set, the best classification was obtained when the samples were grouped into three macroareas: Foggia province, Bari province and Brindisi, Lecce and Taranto provinces, by reaching a percentage of correct predictions in cross validation of 84%. The obtained information can be very useful in supporting soil and water resource management, such as the reduction of water consumption and the reduction of energy and chemical (nutrients and pesticides) inputs in agriculture.
Utturkar, Sagar M.; Klingeman, Dawn Marie; Land, Miriam L.; ...
2014-06-14
Our motivation with this work was to assess the potential of different types of sequence data combined with de novo and hybrid assembly approaches to improve existing draft genome sequences. Our results show Illumina, 454 and PacBio sequencing technologies were used to generate de novo and hybrid genome assemblies for four different bacteria, which were assessed for quality using summary statistics (e.g. number of contigs, N50) and in silico evaluation tools. Differences in predictions of multiple copies of rDNA operons for each respective bacterium were evaluated by PCR and Sanger sequencing, and then the validated results were applied as anmore » additional criterion to rank assemblies. In general, assemblies using longer PacBio reads were better able to resolve repetitive regions. In this study, the combination of Illumina and PacBio sequence data assembled through the ALLPATHS-LG algorithm gave the best summary statistics and most accurate rDNA operon number predictions. This study will aid others looking to improve existing draft genome assemblies. As to availability and implementation–all assembly tools except CLC Genomics Workbench are freely available under GNU General Public License.« less
Roberson, David W; Kentala, Erna; Forbes, Peter
2005-12-01
The goals of this project were 1) to develop and validate an objective instrument to measure surgical performance at tonsillectomy, 2) to assess its interobserver and interobservation reliability and construct validity, and 3) to select those items with best reliability and most independent information to design a simplified form suitable for routine use in otolaryngology surgical evaluation. Prospective, observational data collection for an educational quality improvement project. The evaluation instrument was based on previous instruments developed in general surgery with input from attending otolaryngologic surgeons and experts in medical education. It was pilot tested and subjected to iterative improvements. After the instrument was finalized, a total of 55 tonsillectomies were observed and scored during academic year 2002 to 2003: 45 cases by residents at different points during their rotation, 5 by fellows, and 5 by faculty. Results were assessed for interobserver reliability, interobservation reliability, and construct validity. Factor analysis was used to identify items with independent information. Interobserver and interobservation reliability was high. On technical items, faculty substantially outperformed fellows, who in turn outperformed residents (P < .0001 for both comparisons). On the "global" scale (overall assessment), residents improved an average of 1 full point (on a 5 point scale) during a 3 month rotation (P = .01). In the subscale of "patient care," results were less clear cut: fellows outperformed residents, who in turn outperformed faculty, but only the fellows to faculty comparison was statistically significant (P = .04), and residents did not clearly improve over time (P = .36). Factor analysis demonstrated that technical items and patient care items factor separately and thus represent separate skill domains in surgery. It is possible to objectively measure surgical skill at tonsillectomy with high reliability and good construct validity. Factor analysis demonstrated that patient care is a distinct domain in surgical skill. Although the interobserver reliability for some patient care items reached statistical significance, it was not high enough for "high stakes testing" purposes. Using reliability and factor analysis results, we propose a simplified instrument for use in evaluating trainees in otolaryngologic surgery.
John, Majnu; Lencz, Todd; Malhotra, Anil K; Correll, Christoph U; Zhang, Jian-Ping
2018-06-01
Meta-analysis of genetic association studies is being increasingly used to assess phenotypic differences between genotype groups. When the underlying genetic model is assumed to be dominant or recessive, assessing the phenotype differences based on summary statistics, reported for individual studies in a meta-analysis, is a valid strategy. However, when the genetic model is additive, a similar strategy based on summary statistics will lead to biased results. This fact about the additive model is one of the things that we establish in this paper, using simulations. The main goal of this paper is to present an alternate strategy for the additive model based on simulating data for the individual studies. We show that the alternate strategy is far superior to the strategy based on summary statistics.
Validation of spatial variability in downscaling results from the VALUE perfect predictor experiment
NASA Astrophysics Data System (ADS)
Widmann, Martin; Bedia, Joaquin; Gutiérrez, Jose Manuel; Maraun, Douglas; Huth, Radan; Fischer, Andreas; Keller, Denise; Hertig, Elke; Vrac, Mathieu; Wibig, Joanna; Pagé, Christian; Cardoso, Rita M.; Soares, Pedro MM; Bosshard, Thomas; Casado, Maria Jesus; Ramos, Petra
2016-04-01
VALUE is an open European network to validate and compare downscaling methods for climate change research. Within VALUE a systematic validation framework to enable the assessment and comparison of both dynamical and statistical downscaling methods has been developed. In the first validation experiment the downscaling methods are validated in a setup with perfect predictors taken from the ERA-interim reanalysis for the period 1997 - 2008. This allows to investigate the isolated skill of downscaling methods without further error contributions from the large-scale predictors. One aspect of the validation is the representation of spatial variability. As part of the VALUE validation we have compared various properties of the spatial variability of downscaled daily temperature and precipitation with the corresponding properties in observations. We have used two test validation datasets, one European-wide set of 86 stations, and one higher-density network of 50 stations in Germany. Here we present results based on three approaches, namely the analysis of i.) correlation matrices, ii.) pairwise joint threshold exceedances, and iii.) regions of similar variability. We summarise the information contained in correlation matrices by calculating the dependence of the correlations on distance and deriving decorrelation lengths, as well as by determining the independent degrees of freedom. Probabilities for joint threshold exceedances and (where appropriate) non-exceedances are calculated for various user-relevant thresholds related for instance to extreme precipitation or frost and heat days. The dependence of these probabilities on distance is again characterised by calculating typical length scales that separate dependent from independent exceedances. Regionalisation is based on rotated Principal Component Analysis. The results indicate which downscaling methods are preferable if the dependency of variability at different locations is relevant for the user.
CheS-Mapper 2.0 for visual validation of (Q)SAR models
2014-01-01
Background Sound statistical validation is important to evaluate and compare the overall performance of (Q)SAR models. However, classical validation does not support the user in better understanding the properties of the model or the underlying data. Even though, a number of visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allow the investigation of model validation results are still lacking. Results We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. The approach applies the 3D viewer CheS-Mapper, an open-source application for the exploration of small molecules in virtual 3D space. The present work describes the new functionalities in CheS-Mapper 2.0, that facilitate the analysis of (Q)SAR information and allows the visual validation of (Q)SAR models. The tool enables the comparison of model predictions to the actual activity in feature space. The approach is generic: It is model-independent and can handle physico-chemical and structural input features as well as quantitative and qualitative endpoints. Conclusions Visual validation with CheS-Mapper enables analyzing (Q)SAR information in the data and indicates how this information is employed by the (Q)SAR model. It reveals, if the endpoint is modeled too specific or too generic and highlights common properties of misclassified compounds. Moreover, the researcher can use CheS-Mapper to inspect how the (Q)SAR model predicts activity cliffs. The CheS-Mapper software is freely available at http://ches-mapper.org. Graphical abstract Comparing actual and predicted activity values with CheS-Mapper.
ERIC Educational Resources Information Center
Williams, Amanda
2014-01-01
The purpose of the current research was to investigate the relationship between preference for numerical information (PNI), math self-concept, and six types of statistics anxiety in an attempt to establish support for the nomological validity of the PNI. Correlations indicate that four types of statistics anxiety were strongly related to PNI, and…
Prediction of Primary Care Depression Outcomes at Six Months: Validation of DOC-6 ©.
Angstman, Kurt B; Garrison, Gregory M; Gonzalez, Cesar A; Cozine, Daniel W; Cozine, Elizabeth W; Katzelnick, David J
2017-01-01
The goal of this study was to develop and validate an assessment tool for adult primary care patients diagnosed with depression to determine predictive probability of clinical outcomes at 6 months. We retrospectively reviewed 3096 adult patients enrolled in collaborative care management (CCM) for depression. Patients enrolled on or before December 31, 2013, served as the training set (n = 2525), whereas those enrolled after that date served as the preliminary validation set (n = 571). Six variables (2 demographic and 4 clinical) were statistically significant in determining clinical outcomes. Using the validation data set, the remission classifier produced the receiver operating characteristics (ROC) curve with a c-statistic or area under the curve (AUC) of 0.62 with predicted probabilities than ranged from 14.5% to 79.1%, with a median of 50.6%. The persistent depressive symptoms (PDS) classifier produced an ROC curve with a c-statistic or AUC of 0.67 and predicted probabilities that ranged from 5.5% to 73.1%, with a median of 23.5%. We were able to identify readily available variables and then validated these in the prediction of depression remission and PDS at 6 months. The DOC-6 tool may be used to predict which patients may be at risk for worse outcomes. © Copyright 2017 by the American Board of Family Medicine.
Greene, Barry R; Redmond, Stephen J; Caulfield, Brian
2017-05-01
Falls are the leading global cause of accidental death and disability in older adults and are the most common cause of injury and hospitalization. Accurate, early identification of patients at risk of falling, could lead to timely intervention and a reduction in the incidence of fall-related injury and associated costs. We report a statistical method for fall risk assessment using standard clinical fall risk factors (N = 748). We also report a means of improving this method by automatically combining it, with a fall risk assessment algorithm based on inertial sensor data and the timed-up-and-go test. Furthermore, we provide validation data on the sensor-based fall risk assessment method using a statistically independent dataset. Results obtained using cross-validation on a sample of 292 community dwelling older adults suggest that a combined clinical and sensor-based approach yields a classification accuracy of 76.0%, compared to either 73.6% for sensor-based assessment alone, or 68.8% for clinical risk factors alone. Increasing the cohort size by adding an additional 130 subjects from a separate recruitment wave (N = 422), and applying the same model building and validation method, resulted in a decrease in classification performance (68.5% for combined classifier, 66.8% for sensor data alone, and 58.5% for clinical data alone). This suggests that heterogeneity between cohorts may be a major challenge when attempting to develop fall risk assessment algorithms which generalize well. Independent validation of the sensor-based fall risk assessment algorithm on an independent cohort of 22 community dwelling older adults yielded a classification accuracy of 72.7%. Results suggest that the present method compares well to previously reported sensor-based fall risk assessment methods in assessing falls risk. Implementation of objective fall risk assessment methods on a large scale has the potential to improve quality of care and lead to a reduction in associated hospital costs, due to fewer admissions and reduced injuries due to falling.
Quantitative Validation of the Integrated Medical Model (IMM) for ISS Missions
NASA Technical Reports Server (NTRS)
Young, Millennia; Arellano, J.; Boley, L.; Garcia, Y.; Saile, L.; Walton, M.; Kerstman, E.; Reyes, D.; Goodenow, D. A.; Myers, J. G.
2016-01-01
Lifetime Surveillance of Astronaut Health (LSAH) provided observed medical event data on 33 ISS and 111 STS person-missions for use in further improving and validating the Integrated Medical Model (IMM). Using only the crew characteristics from these observed missions, the newest development version, IMM v4.0, will simulate these missions to predict medical events and outcomes. Comparing IMM predictions to the actual observed medical event counts will provide external validation and identify areas of possible improvement. In an effort to improve the power of detecting differences in this validation study, the total over each program ISS and STS will serve as the main quantitative comparison objective, specifically the following parameters: total medical events (TME), probability of loss of crew life (LOCL), and probability of evacuation (EVAC). Scatter plots of observed versus median predicted TMEs (with error bars reflecting the simulation intervals) will graphically display comparisons while linear regression will serve as the statistical test of agreement. Two scatter plots will be analyzed 1) where each point reflects a mission and 2) where each point reflects a condition-specific total number of occurrences. The coefficient of determination (R2) resulting from a linear regression with no intercept bias (intercept fixed at zero) will serve as an overall metric of agreement between IMM and the real world system (RWS). In an effort to identify as many possible discrepancies as possible for further inspection, the -level for all statistical tests comparing IMM predictions to observed data will be set to 0.1. This less stringent criterion, along with the multiple testing being conducted, should detect all perceived differences including many false positive signals resulting from random variation. The results of these analyses will reveal areas of the model requiring adjustment to improve overall IMM output, which will thereby provide better decision support for mission critical applications.
Herpers, Matthias; Dintsios, Charalabos-Markos
2018-04-25
The decision matrix applied by the Institute for Quality and Efficiency in Health Care (IQWiG) for the quantification of added benefit within the early benefit assessment of new pharmaceuticals in Germany with its nine fields is quite complex and could be simplified. Furthermore, the method used by IQWiG is subject to manifold criticism: (1) it is implicitly weighting endpoints differently in its assessments favoring overall survival and, thereby, drug interventions in fatal diseases, (2) it is assuming that two pivotal trials are available when assessing the dossiers submitted by the pharmaceutical manufacturers, leading to far-reaching implications with respect to the quantification of added benefit, and, (3) it is basing the evaluation primarily on dichotomous endpoints and consequently leading to an information loss of usable evidence. To investigate if criticism is justified and to propose methodological adaptations. Analysis of the available dossiers up to the end of 2016 using statistical tests and multinomial logistic regression and simulations. It was shown that due to power losses, the method does not ensure that results are statistically valid and outcomes of the early benefit assessment may be compromised, though evidence on favoring overall survival remains unclear. Modifications, however, of the IQWiG method are possible to address the identified problems. By converging with the approach of approval authorities for confirmatory endpoints, the decision matrix could be simplified and the analysis method could be improved, to put the results on a more valid statistical basis.
Correcting evaluation bias of relational classifiers with network cross validation
Neville, Jennifer; Gallagher, Brian; Eliassi-Rad, Tina; ...
2011-01-04
Recently, a number of modeling techniques have been developed for data mining and machine learning in relational and network domains where the instances are not independent and identically distributed (i.i.d.). These methods specifically exploit the statistical dependencies among instances in order to improve classification accuracy. However, there has been little focus on how these same dependencies affect our ability to draw accurate conclusions about the performance of the models. More specifically, the complex link structure and attribute dependencies in relational data violate the assumptions of many conventional statistical tests and make it difficult to use these tests to assess themore » models in an unbiased manner. In this work, we examine the task of within-network classification and the question of whether two algorithms will learn models that will result in significantly different levels of performance. We show that the commonly used form of evaluation (paired t-test on overlapping network samples) can result in an unacceptable level of Type I error. Furthermore, we show that Type I error increases as (1) the correlation among instances increases and (2) the size of the evaluation set increases (i.e., the proportion of labeled nodes in the network decreases). Lastly, we propose a method for network cross-validation that combined with paired t-tests produces more acceptable levels of Type I error while still providing reasonable levels of statistical power (i.e., 1–Type II error).« less
Sepehrband, Farshid; Lynch, Kirsten M; Cabeen, Ryan P; Gonzalez-Zacarias, Clio; Zhao, Lu; D'Arcy, Mike; Kesselman, Carl; Herting, Megan M; Dinov, Ivo D; Toga, Arthur W; Clark, Kristi A
2018-05-15
Exploring neuroanatomical sex differences using a multivariate statistical learning approach can yield insights that cannot be derived with univariate analysis. While gross differences in total brain volume are well-established, uncovering the more subtle, regional sex-related differences in neuroanatomy requires a multivariate approach that can accurately model spatial complexity as well as the interactions between neuroanatomical features. Here, we developed a multivariate statistical learning model using a support vector machine (SVM) classifier to predict sex from MRI-derived regional neuroanatomical features from a single-site study of 967 healthy youth from the Philadelphia Neurodevelopmental Cohort (PNC). Then, we validated the multivariate model on an independent dataset of 682 healthy youth from the multi-site Pediatric Imaging, Neurocognition and Genetics (PING) cohort study. The trained model exhibited an 83% cross-validated prediction accuracy, and correctly predicted the sex of 77% of the subjects from the independent multi-site dataset. Results showed that cortical thickness of the middle occipital lobes and the angular gyri are major predictors of sex. Results also demonstrated the inferential benefits of going beyond classical regression approaches to capture the interactions among brain features in order to better characterize sex differences in male and female youths. We also identified specific cortical morphological measures and parcellation techniques, such as cortical thickness as derived from the Destrieux atlas, that are better able to discriminate between males and females in comparison to other brain atlases (Desikan-Killiany, Brodmann and subcortical atlases). Copyright © 2018 Elsevier Inc. All rights reserved.
Interpolative modeling of GaAs FET S-parameter data bases for use in Monte Carlo simulations
NASA Technical Reports Server (NTRS)
Campbell, L.; Purviance, J.
1992-01-01
A statistical interpolation technique is presented for modeling GaAs FET S-parameter measurements for use in the statistical analysis and design of circuits. This is accomplished by interpolating among the measurements in a GaAs FET S-parameter data base in a statistically valid manner.
ERIC Educational Resources Information Center
Harrell-Williams, Leigh M.; Sorto, M. Alejandra; Pierce, Rebecca L.; Lesser, Lawrence M.; Murphy, Teri J.
2014-01-01
The influential "Common Core State Standards for Mathematics" (CCSSM) expect students to start statistics learning during middle grades. Thus teacher education and professional development programs are advised to help preservice and in-service teachers increase their knowledge and confidence to teach statistics. Although existing…
Biomarker development in the precision medicine era: lung cancer as a case study.
Vargas, Ashley J; Harris, Curtis C
2016-08-01
Precision medicine relies on validated biomarkers with which to better classify patients by their probable disease risk, prognosis and/or response to treatment. Although affordable 'omics'-based technology has enabled faster identification of putative biomarkers, the validation of biomarkers is still stymied by low statistical power and poor reproducibility of results. This Review summarizes the successes and challenges of using different types of molecule as biomarkers, using lung cancer as a key illustrative example. Efforts at the national level of several countries to tie molecular measurement of samples to patient data via electronic medical records are the future of precision medicine research.
Vespa, Anna; Giulietti, Maria Velia; Spatuzzi, Roberta; Fabbietti, Paolo; Meloni, Cristina; Gattafoni, Pisana; Ottaviani, Marica
2017-06-01
This study aimed at assessing the reliability and construct validity of Brief Multidimensional Measure of Religiousness/Spirituality (BMMRS) on Italian sample. 353 Italian participants: 58.9% affected by different diseases and 41.1% healthy subjects. The results of descriptive statistics of internal consistency reliabilities (Chronbach's coefficient) of the BMMRS revealed a remarkable consistency and reliability of different scales DSE, SpC, SC, CSC, VB, SPY-WELL and a good Inter-Class Correlations ≥70 maintaining a good stability of the measures over the time. BMMRS is a useful inventory for the evaluation of the principal spiritual dimensions.
Adaptation of clinical prediction models for application in local settings.
Kappen, Teus H; Vergouwe, Yvonne; van Klei, Wilton A; van Wolfswinkel, Leo; Kalkman, Cor J; Moons, Karel G M
2012-01-01
When planning to use a validated prediction model in new patients, adequate performance is not guaranteed. For example, changes in clinical practice over time or a different case mix than the original validation population may result in inaccurate risk predictions. To demonstrate how clinical information can direct updating a prediction model and development of a strategy for handling missing predictor values in clinical practice. A previously derived and validated prediction model for postoperative nausea and vomiting was updated using a data set of 1847 patients. The update consisted of 1) changing the definition of an existing predictor, 2) reestimating the regression coefficient of a predictor, and 3) adding a new predictor to the model. The updated model was then validated in a new series of 3822 patients. Furthermore, several imputation models were considered to handle real-time missing values, so that possible missing predictor values could be anticipated during actual model use. Differences in clinical practice between our local population and the original derivation population guided the update strategy of the prediction model. The predictive accuracy of the updated model was better (c statistic, 0.68; calibration slope, 1.0) than the original model (c statistic, 0.62; calibration slope, 0.57). Inclusion of logistical variables in the imputation models, besides observed patient characteristics, contributed to a strategy to deal with missing predictor values at the time of risk calculation. Extensive knowledge of local, clinical processes provides crucial information to guide the process of adapting a prediction model to new clinical practices.
Berry, Christopher M; Zhao, Peng
2015-01-01
Predictive bias studies have generally suggested that cognitive ability test scores overpredict job performance of African Americans, meaning these tests are not predictively biased against African Americans. However, at least 2 issues call into question existing over-/underprediction evidence: (a) a bias identified by Aguinis, Culpepper, and Pierce (2010) in the intercept test typically used to assess over-/underprediction and (b) a focus on the level of observed validity instead of operational validity. The present study developed and utilized a method of assessing over-/underprediction that draws on the math of subgroup regression intercept differences, does not rely on the biased intercept test, allows for analysis at the level of operational validity, and can use meta-analytic estimates as input values. Therefore, existing meta-analytic estimates of key parameters, corrected for relevant statistical artifacts, were used to determine whether African American job performance remains overpredicted at the level of operational validity. African American job performance was typically overpredicted by cognitive ability tests across levels of job complexity and across conditions wherein African American and White regression slopes did and did not differ. Because the present study does not rely on the biased intercept test and because appropriate statistical artifact corrections were carried out, the present study's results are not affected by the 2 issues mentioned above. The present study represents strong evidence that cognitive ability tests generally overpredict job performance of African Americans. (c) 2015 APA, all rights reserved.
Approach for Uncertainty Propagation and Robust Design in CFD Using Sensitivity Derivatives
NASA Technical Reports Server (NTRS)
Putko, Michele M.; Newman, Perry A.; Taylor, Arthur C., III; Green, Lawrence L.
2001-01-01
This paper presents an implementation of the approximate statistical moment method for uncertainty propagation and robust optimization for a quasi 1-D Euler CFD (computational fluid dynamics) code. Given uncertainties in statistically independent, random, normally distributed input variables, a first- and second-order statistical moment matching procedure is performed to approximate the uncertainty in the CFD output. Efficient calculation of both first- and second-order sensitivity derivatives is required. In order to assess the validity of the approximations, the moments are compared with statistical moments generated through Monte Carlo simulations. The uncertainties in the CFD input variables are also incorporated into a robust optimization procedure. For this optimization, statistical moments involving first-order sensitivity derivatives appear in the objective function and system constraints. Second-order sensitivity derivatives are used in a gradient-based search to successfully execute a robust optimization. The approximate methods used throughout the analyses are found to be valid when considering robustness about input parameter mean values.
Eaton, John E; Vesterhus, Mette; McCauley, Bryan M; Atkinson, Elizabeth J; Schlicht, Erik M; Juran, Brian D; Gossard, Andrea A; LaRusso, Nicholas F; Gores, Gregory J; Karlsen, Tom H; Lazaridis, Konstantinos N
2018-05-09
Improved methods are needed to risk stratify and predict outcomes in patients with primary sclerosing cholangitis (PSC). Therefore, we sought to derive and validate a new prediction model and compare its performance to existing surrogate markers. The model was derived using 509 subjects from a multicenter North American cohort and validated in an international multicenter cohort (n=278). Gradient boosting, a machine based learning technique, was used to create the model. The endpoint was hepatic decompensation (ascites, variceal hemorrhage or encephalopathy). Subjects with advanced PSC or cholangiocarcinoma at baseline were excluded. The PSC risk estimate tool (PREsTo) consists of 9 variables: bilirubin, albumin, serum alkaline phosphatase (SAP) times the upper limit of normal (ULN), platelets, AST, hemoglobin, sodium, patient age and the number of years since PSC was diagnosed. Validation in an independent cohort confirms PREsTo accurately predicts decompensation (C statistic 0.90, 95% confidence interval (CI) 0.84-0.95) and performed well compared to MELD score (C statistic 0.72, 95% CI 0.57-0.84), Mayo PSC risk score (C statistic 0.85, 95% CI 0.77-0.92) and SAP < 1.5x ULN (C statistic 0.65, 95% CI 0.55-0.73). PREsTo continued to be accurate among individuals with a bilirubin < 2.0 mg/dL (C statistic 0.90, 95% CI 0.82-0.96) and when the score was re-applied at a later course in the disease (C statistic 0.82, 95% CI 0.64-0.95). PREsTo accurately predicts hepatic decompensation in PSC and exceeds the performance among other widely available, noninvasive prognostic scoring systems. This article is protected by copyright. All rights reserved. © 2018 by the American Association for the Study of Liver Diseases.
Xu, Stanley; Clarke, Christina L; Newcomer, Sophia R; Daley, Matthew F; Glanz, Jason M
2018-05-16
Vaccine safety studies are often electronic health record (EHR)-based observational studies. These studies often face significant methodological challenges, including confounding and misclassification of adverse event. Vaccine safety researchers use self-controlled case series (SCCS) study design to handle confounding effect and employ medical chart review to ascertain cases that are identified using EHR data. However, for common adverse events, limited resources often make it impossible to adjudicate all adverse events observed in electronic data. In this paper, we considered four approaches for analyzing SCCS data with confirmation rates estimated from an internal validation sample: (1) observed cases, (2) confirmed cases only, (3) known confirmation rate, and (4) multiple imputation (MI). We conducted a simulation study to evaluate these four approaches using type I error rates, percent bias, and empirical power. Our simulation results suggest that when misclassification of adverse events is present, approaches such as observed cases, confirmed case only, and known confirmation rate may inflate the type I error, yield biased point estimates, and affect statistical power. The multiple imputation approach considers the uncertainty of estimated confirmation rates from an internal validation sample, yields a proper type I error rate, largely unbiased point estimate, proper variance estimate, and statistical power. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Naeem, Naghma; Muijtjens, Arno
2015-04-01
The psychological construct of emotional intelligence (EI), its theoretical models, measurement instruments and applications have been the subject of several research studies in health professions education. The objective of the current study was to investigate the factorial validity and reliability of a bilingual version of the Schutte Self Report Emotional Intelligence Scale (SSREIS) in an undergraduate Arab medical student population. The study was conducted during April-May 2012. A cross-sectional survey design was employed. A sample (n = 467) was obtained from undergraduate medical students belonging to the male and female medical college of King Saud University, Riyadh, Saudi Arabia. Exploratory and confirmatory factor analysis was performed using SPSS 16.0 and AMOS 4.0 statistical software to determine the factor structure. Reliability was determined using Cronbach's alpha statistics. The results obtained using an undergraduate Arab medical student sample supported a multidimensional; three factor structure of the SSREIS. The three factors are Optimism, Awareness-of-Emotions and Use-of-Emotions. The reliability (Cronbach's alpha) for the three subscales was 0.76, 0.72 and 0.55, respectively. Emotional intelligence is a multifactorial construct (three factors). The bilingual version of the SSREIS is a valid and reliable measure of trait emotional intelligence in an undergraduate Arab medical student population.
Validating the food behavior questions from the elementary school SPAN questionnaire.
Thiagarajah, Krisha; Fly, Alyce D; Hoelscher, Deanna M; Bai, Yeon; Lo, Kaman; Leone, Angela; Shertzer, Julie A
2008-01-01
The School Physical Activity and Nutrition (SPAN) questionnaire was developed as a surveillance instrument to measure physical activity, nutrition attitudes, and dietary and physical activity behaviors in children and adolescents. The SPAN questionnaire has 2 versions. This study was conducted to evaluate the validity of food consumption items from the elementary school version of the SPAN questionnaire. Validity was assessed by comparing food items selected on the questionnaire with food items reported from a single 24-hour recall covering the same reference period. 5 elementary schools in Indiana. Fourth-grade student volunteers (N = 121) from 5 elementary schools. Agreement between responses to SPAN questionnaire items and reference values obtained through 24-hour dietary recall. The agreement between the questionnaire and the 24-hour recall was measured using Spearman correlation, percentage agreement, and kappa statistic. Correlation between SPAN item responses and recall data ranged from .25 (bread and related products) to .67 (gravy). The percentage agreement ranged from 26% (bread and related products) to 90% (gravy). The kappa statistic varied from .06 (chocolate candy) to .60 (beans). Results from this study indicate that the SPAN questionnaire can be administered in the classroom quickly and easily to measure many previous day dietary behaviors of fourth graders. However, questions addressing consumption of "vegetables," "candy," and "snacks" need further investigation.
Limberg, Brian J; Johnstone, Kevin; Filloon, Thomas; Catrenich, Carl
2016-09-01
Using United States Pharmacopeia-National Formulary (USP-NF) general method <1223> guidance, the Soleris(®) automated system and reagents (Nonfermenting Total Viable Count for bacteria and Direct Yeast and Mold for yeast and mold) were validated, using a performance equivalence approach, as an alternative to plate counting for total microbial content analysis using five representative microbes: Staphylococcus aureus, Bacillus subtilis, Pseudomonas aeruginosa, Candida albicans, and Aspergillus brasiliensis. Detection times (DTs) in the alternative automated system were linearly correlated to CFU/sample (R(2) = 0.94-0.97) with ≥70% accuracy per USP General Chapter <1223> guidance. The LOD and LOQ of the automated system were statistically similar to the traditional plate count method. This system was significantly more precise than plate counting (RSD 1.2-2.9% for DT, 7.8-40.6% for plate counts), was statistically comparable to plate counting with respect to variations in analyst, vial lots, and instruments, and was robust when variations in the operating detection thresholds (dTs; ±2 units) were used. The automated system produced accurate results, was more precise and less labor-intensive, and met or exceeded criteria for a valid alternative quantitative method, consistent with USP-NF general method <1223> guidance.
Lambert, Carole; Gagnon, Robert; Nguyen, David; Charlin, Bernard
2009-01-01
Background The Script Concordance test (SCT) is a reliable and valid tool to evaluate clinical reasoning in complex situations where experts' opinions may be divided. Scores reflect the degree of concordance between the performance of examinees and that of a reference panel of experienced physicians. The purpose of this study is to demonstrate SCT's usefulness in radiation oncology. Methods A 90 items radiation oncology SCT was administered to 155 participants. Three levels of experience were tested: medical students (n = 70), radiation oncology residents (n = 38) and radiation oncologists (n = 47). Statistical tests were performed to assess reliability and to document validity. Results After item optimization, the test comprised 30 cases and 70 questions. Cronbach alpha was 0.90. Mean scores were 51.62 (± 8.19) for students, 71.20 (± 9.45) for residents and 76.67 (± 6.14) for radiation oncologists. The difference between the three groups was statistically significant when compared by the Kruskall-Wallis test (p < 0.001). Conclusion The SCT is reliable and useful to discriminate among participants according to their level of experience in radiation oncology. It appears as a useful tool to document the progression of reasoning during residency training. PMID:19203358
Fractional viscoelasticity of soft elastomers and auxetic foams
NASA Astrophysics Data System (ADS)
Solheim, Hannah; Stanisauskis, Eugenia; Miles, Paul; Oates, William
2018-03-01
Dielectric elastomers are commonly implemented in adaptive structures due to their unique capabilities for real time control of a structure's shape, stiffness, and damping. These active polymers are often used in applications where actuator control or dynamic tunability are important, making an accurate understanding of the viscoelastic behavior critical. This challenge is complicated as these elastomers often operate over a broad range of deformation rates. Whereas research has demonstrated success in applying a nonlinear viscoelastic constitutive model to characterize the behavior of Very High Bond (VHB) 4910, robust predictions of the viscoelastic response over the entire range of time scales is still a significant challenge. An alternative formulation for viscoelastic modeling using fractional order calculus has shown significant improvement in predictive capabilities. While fractional calculus has been explored theoretically in the field of linear viscoelasticity, limited experimental validation and statistical evaluation of the underlying phenomena have been considered. In the present study, predictions across several orders of magnitude in deformation rates are validated against data using a single set of model parameters. Moreover, we illustrate the fractional order is material dependent by running complementary experiments and parameter estimation on the elastomer VHB 4949 as well as an auxetic foam. All results are statistically validated using Bayesian uncertainty methods to obtain posterior densities for the fractional order as well as the hyperelastic parameters.
Schröder, Christian; Steinbrück, Arnd; Müller, Tatjana; Woiczinski, Matthias; Chevalier, Yan; Müller, Peter E.; Jansson, Volkmar
2015-01-01
Retropatellar complications after total knee arthroplasty (TKA) such as anterior knee pain and subluxations might be related to altered patellofemoral biomechanics, in particular to trochlear design and femorotibial joint positioning. A method was developed to test femorotibial and patellofemoral joint modifications separately with 3D-rapid prototyped components for in vitro tests, but material differences may further influence results. This pilot study aims at validating the use of prostheses made of photopolymerized rapid prototype material (RPM) by measuring the sliding friction with a ring-on-disc setup as well as knee kinematics and retropatellar pressure on a knee rig. Cobalt-chromium alloy (standard prosthesis material, SPM) prostheses served as validation standard. Friction coefficients between these materials and polytetrafluoroethylene (PTFE) were additionally tested as this latter material is commonly used to protect pressure sensors in experiments. No statistical differences were found between friction coefficients of both materials to PTFE. UHMWPE shows higher friction coefficient at low axial loads for RPM, a difference that disappears at higher load. No measurable statistical differences were found in knee kinematics and retropatellar pressure distribution. This suggests that using polymer prototypes may be a valid alternative to original components for in vitro TKA studies and future investigations on knee biomechanics. PMID:25879019
Cost model validation: a technical and cultural approach
NASA Technical Reports Server (NTRS)
Hihn, J.; Rosenberg, L.; Roust, K.; Warfield, K.
2001-01-01
This paper summarizes how JPL's parametric mission cost model (PMCM) has been validated using both formal statistical methods and a variety of peer and management reviews in order to establish organizational acceptance of the cost model estimates.
Imputation of missing data in time series for air pollutants
NASA Astrophysics Data System (ADS)
Junger, W. L.; Ponce de Leon, A.
2015-02-01
Missing data are major concerns in epidemiological studies of the health effects of environmental air pollutants. This article presents an imputation-based method that is suitable for multivariate time series data, which uses the EM algorithm under the assumption of normal distribution. Different approaches are considered for filtering the temporal component. A simulation study was performed to assess validity and performance of proposed method in comparison with some frequently used methods. Simulations showed that when the amount of missing data was as low as 5%, the complete data analysis yielded satisfactory results regardless of the generating mechanism of the missing data, whereas the validity began to degenerate when the proportion of missing values exceeded 10%. The proposed imputation method exhibited good accuracy and precision in different settings with respect to the patterns of missing observations. Most of the imputations obtained valid results, even under missing not at random. The methods proposed in this study are implemented as a package called mtsdi for the statistical software system R.
Scaling field data to calibrate and validate moderate spatial resolution remote sensing models
Baccini, A.; Friedl, M.A.; Woodcock, C.E.; Zhu, Z.
2007-01-01
Validation and calibration are essential components of nearly all remote sensing-based studies. In both cases, ground measurements are collected and then related to the remote sensing observations or model results. In many situations, and particularly in studies that use moderate resolution remote sensing, a mismatch exists between the sensor's field of view and the scale at which in situ measurements are collected. The use of in situ measurements for model calibration and validation, therefore, requires a robust and defensible method to spatially aggregate ground measurements to the scale at which the remotely sensed data are acquired. This paper examines this challenge and specifically considers two different approaches for aggregating field measurements to match the spatial resolution of moderate spatial resolution remote sensing data: (a) landscape stratification; and (b) averaging of fine spatial resolution maps. The results show that an empirically estimated stratification based on a regression tree method provides a statistically defensible and operational basis for performing this type of procedure.
Sotardi, Valerie A
2018-05-01
Educational measures of anxiety focus heavily on students' experiences with tests yet overlook other assessment contexts. In this research, two brief multiscale questionnaires were developed and validated to measure trait evaluation anxiety (MTEA-12) and state evaluation anxiety (MSEA-12) for use in various assessment contexts in non-clinical, educational settings. The research included a cross-sectional analysis of self-report data using authentic assessment settings in which evaluation anxiety was measured. Instruments were tested using a validation sample of 241 first-year university students in New Zealand. Scale development included component structures for state and trait scales based on existing theoretical frameworks. Analyses using confirmatory factor analysis and descriptive statistics indicate that the scales are reliable and structurally valid. Multivariate general linear modeling using subscales from the MTEA-12, MSEA-12, and student grades suggest adequate criterion-related validity. Initial predictive validity in which one relevant MTEA-12 factor explained between 21% and 54% of the variance in three MSEA-12 factors. Results document MTEA-12 and MSEA-12 as reliable measures of trait and state dimensions of evaluation anxiety for test and writing contexts. Initial estimates suggest the scales as having promising validity, and recommendations for further validation are outlined.
Kim, Jeong-Eon; Park, Eun-Jun
2015-04-01
The purpose of this study was to validate the Korean version of the Ethical Leadership at Work questionnaire (K-ELW) that measures RNs' perceived ethical leadership of their nurse managers. The strong validation process suggested by Benson (1998), including translation and cultural adaptation stage, structural stage, and external stage, was used. Participants were 241 RNs who reported their perceived ethical leadership using both the pre-version of K-ELW and a previously known Ethical Leadership Scale, and interactional justice of their managers, as well as their own demographics, organizational commitment and organizational citizenship behavior. Data analyses included descriptive statistics, Pearson correlation coefficients, reliability coefficients, exploratory factor analysis, and confirmatory factor analysis. SPSS 19.0 and Amos 18.0 versions were used. A modified K-ELW was developed from construct validity evidence and included 31 items in 7 domains: People orientation, task responsibility fairness, relationship fairness, power sharing, concern for sustainability, ethical guidance, and integrity. Convergent validity, discriminant validity, and concurrent validity were supported according to the correlation coefficients of the 7 domains with other measures. The results of this study provide preliminary evidence that the modified K-ELW can be adopted in Korean nursing organizations, and reliable and valid ethical leadership scores can be expected.
Greaves, Paul; Clear, Andrew; Coutinho, Rita; Wilson, Andrew; Matthews, Janet; Owen, Andrew; Shanyinde, Milensu; Lister, T. Andrew; Calaminici, Maria; Gribben, John G.
2013-01-01
Purpose The immune microenvironment is key to the pathophysiology of classical Hodgkin lymphoma (CHL). Twenty percent of patients experience failure of their initial treatment, and others receive excessively toxic treatment. Prognostic scores and biomarkers have yet to influence outcomes significantly. Previous biomarker studies have been limited by the extent of tissue analyzed, statistical inconsistencies, and failure to validate findings. We aimed to overcome these limitations by validating recently identified microenvironment biomarkers (CD68, FOXP3, and CD20) in a new patient cohort with a greater extent of tissue and by using rigorous statistical methodology. Patients and Methods Diagnostic tissue from 122 patients with CHL was microarrayed and stained, and positive cells were counted across 10 to 20 high-powered fields per patient by using an automated system. Two statistical analyses were performed: a categorical analysis with test/validation set-defined cut points and Kaplan-Meier estimated outcome measures of 5-year overall survival (OS), disease-specific survival (DSS), and freedom from first-line treatment failure (FFTF) and an independent multivariate analysis of absolute uncategorized counts. Results Increased CD20 expression confers superior OS. Increased FOXP3 expression confers superior OS, and increased CD68 confers inferior FFTF and OS. FOXP3 varies independently of CD68 expression and retains significance when analyzed as a continuous variable in multivariate analysis. A simple score combining FOXP3 and CD68 discriminates three groups: FFTF 93%, 62%, and 47% (P < .001), DSS 93%, 82%, and 63% (P = .03), and OS 93%, 82%, and 59% (P = .002). Conclusion We have independently validated CD68, FOXP3, and CD20 as prognostic biomarkers in CHL, and we demonstrate, to the best of our knowledge for the first time, that combining FOXP3 and CD68 may further improve prognostic stratification. PMID:23045593
Marzulli, F; Maguire, H C
1982-02-01
Several guinea-pig predictive test methods were evaluated by comparison of results with those obtained with human predictive tests, using ten compounds that have been used in cosmetics. The method involves the statistical analysis of the frequency with which guinea-pig tests agree with the findings of tests in humans. In addition, the frequencies of false positive and false negative predictive findings are considered and statistically analysed. The results clearly demonstrate the superiority of adjuvant tests (complete Freund's adjuvant) in determining skin sensitizers and the overall superiority of the guinea-pig maximization test in providing results similar to those obtained by human testing. A procedure is suggested for utilizing adjuvant and non-adjuvant test methods for characterizing compounds as of weak, moderate or strong sensitizing potential.
Nateghi, Roshanak; Guikema, Seth D; Quiring, Steven M
2011-12-01
This article compares statistical methods for modeling power outage durations during hurricanes and examines the predictive accuracy of these methods. Being able to make accurate predictions of power outage durations is valuable because the information can be used by utility companies to plan their restoration efforts more efficiently. This information can also help inform customers and public agencies of the expected outage times, enabling better collective response planning, and coordination of restoration efforts for other critical infrastructures that depend on electricity. In the long run, outage duration estimates for future storm scenarios may help utilities and public agencies better allocate risk management resources to balance the disruption from hurricanes with the cost of hardening power systems. We compare the out-of-sample predictive accuracy of five distinct statistical models for estimating power outage duration times caused by Hurricane Ivan in 2004. The methods compared include both regression models (accelerated failure time (AFT) and Cox proportional hazard models (Cox PH)) and data mining techniques (regression trees, Bayesian additive regression trees (BART), and multivariate additive regression splines). We then validate our models against two other hurricanes. Our results indicate that BART yields the best prediction accuracy and that it is possible to predict outage durations with reasonable accuracy. © 2011 Society for Risk Analysis.
Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M.; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert
2016-01-01
The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008–2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0. PMID:27892471
Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert
2016-11-28
The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.
NASA Astrophysics Data System (ADS)
Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M.; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert
2016-11-01
The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.
NASA Astrophysics Data System (ADS)
Lutz, Norbert W.; Bernard, Monique
2018-02-01
We recently suggested a new paradigm for statistical analysis of thermal heterogeneity in (semi-)aqueous materials by 1H NMR spectroscopy, using water as a temperature probe. Here, we present a comprehensive in silico and in vitro validation that demonstrates the ability of this new technique to provide accurate quantitative parameters characterizing the statistical distribution of temperature values in a volume of (semi-)aqueous matter. First, line shape parameters of numerically simulated water 1H NMR spectra are systematically varied to study a range of mathematically well-defined temperature distributions. Then, corresponding models based on measured 1H NMR spectra of agarose gel are analyzed. In addition, dedicated samples based on hydrogels or biological tissue are designed to produce temperature gradients changing over time, and dynamic NMR spectroscopy is employed to analyze the resulting temperature profiles at sub-second temporal resolution. Accuracy and consistency of the previously introduced statistical descriptors of temperature heterogeneity are determined: weighted median and mean temperature, standard deviation, temperature range, temperature mode(s), kurtosis, skewness, entropy, and relative areas under temperature curves. Potential and limitations of this method for quantitative analysis of thermal heterogeneity in (semi-)aqueous materials are discussed in view of prospective applications in materials science as well as biology and medicine.
Guiné, R P F; Duarte, J; Ferreira, M; Correia, P; Leal, M; Rumbak, I; Barić, I C; Komes, D; Satalić, Z; Sarić, M M; Tarcea, M; Fazakas, Z; Jovanoska, D; Vanevski, D; Vittadini, E; Pellegrini, N; Szűcs, V; Harangozó, J; El-Kenawy, A; El-Shenawy, O; Yalçın, E; Kösemeci, C; Klava, D; Straumite, E
2016-12-01
Dietary fibre (DF) is one of the components of diet that strongly contributes to health improvements, particularly on the gastrointestinal system. Hence, this work intended to evaluate the relations between some sociodemographic variables such as age, gender, level of education, living environment or country on the levels of knowledge about dietary fibre (KADF), its sources and its effects on human health, using a validated scale. The present study was a cross-sectional study. A methodological study was conducted with 6010 participants, residing in 10 countries from different continents (Europe, America, Africa). The instrument was a questionnaire of self-response, aimed at collecting information on knowledge about food fibres. The instrument was used to validate a scale (KADF) which model was used in the present work to identify the best predictors of knowledge. The statistical tools used were as follows: basic descriptive statistics, decision trees, inferential analysis (t-test for independent samples with Levene test and one-way ANOVA with multiple comparisons post hoc tests). The results showed that the best predictor for the three types of knowledge evaluated (about DF, about its sources and about its effects on human health) was always the country, meaning that the social, cultural and/or political conditions greatly determine the level of knowledge. On the other hand, the tests also showed that statistically significant differences were encountered regarding the three types of knowledge for all sociodemographic variables evaluated: age, gender, level of education, living environment and country. The results showed that to improve the level of knowledge the actions planned should not be delineated in general as to reach all sectors of the populations, and that in addressing different people, different methodologies must be designed so as to provide an effective health education. Copyright © 2016 The Royal Society for Public Health. Published by Elsevier Ltd. All rights reserved.
Cronin, Katherine A; Jacobson, Sarah L; Bonnie, Kristin E; Hopper, Lydia M
2017-01-01
Studying animal cognition in a social setting is associated with practical and statistical challenges. However, conducting cognitive research without disturbing species-typical social groups can increase ecological validity, minimize distress, and improve animal welfare. Here, we review the existing literature on cognitive research run with primates in a social setting in order to determine how widespread such testing is and highlight approaches that may guide future research planning. Using Google Scholar to search the terms "primate" "cognition" "experiment" and "social group," we conducted a systematic literature search covering 16 years (2000-2015 inclusive). We then conducted two supplemental searches within each journal that contained a publication meeting our criteria in the original search, using the terms "primate" and "playback" in one search and the terms "primate" "cognition" and "social group" in the second. The results were used to assess how frequently nonhuman primate cognition has been studied in a social setting (>3 individuals), to gain perspective on the species and topics that have been studied, and to extract successful approaches for social testing. Our search revealed 248 unique publications in 43 journals encompassing 71 species. The absolute number of publications has increased over years, suggesting viable strategies for studying cognition in social settings. While a wide range of species were studied they were not equally represented, with 19% of the publications reporting data for chimpanzees. Field sites were the most common environment for experiments run in social groups of primates, accounting for more than half of the results. Approaches to mitigating the practical and statistical challenges were identified. This analysis has revealed that the study of primate cognition in a social setting is increasing and taking place across a range of environments. This literature review calls attention to examples that may provide valuable models for researchers wishing to overcome potential practical and statistical challenges to studying cognition in a social setting, ultimately increasing validity and improving the welfare of the primates we study.
Ren, Anna N; Neher, Robert E; Bell, Tyler; Grimm, James
2018-06-01
Preoperative planning is important to achieve successful implantation in primary total knee arthroplasty (TKA). However, traditional TKA templating techniques are not accurate enough to predict the component size to a very close range. With the goal of developing a general predictive statistical model using patient demographic information, ordinal logistic regression was applied to build a proportional odds model to predict the tibia component size. The study retrospectively collected the data of 1992 primary Persona Knee System TKA procedures. Of them, 199 procedures were randomly selected as testing data and the rest of the data were randomly partitioned between model training data and model evaluation data with a ratio of 7:3. Different models were trained and evaluated on the training and validation data sets after data exploration. The final model had patient gender, age, weight, and height as independent variables and predicted the tibia size within 1 size difference 96% of the time on the validation data, 94% of the time on the testing data, and 92% on a prospective cadaver data set. The study results indicated the statistical model built by ordinal logistic regression can increase the accuracy of tibia sizing information for Persona Knee preoperative templating. This research shows statistical modeling may be used with radiographs to dramatically enhance the templating accuracy, efficiency, and quality. In general, this methodology can be applied to other TKA products when the data are applicable. Copyright © 2018 Elsevier Inc. All rights reserved.
Using statistical text classification to identify health information technology incidents
Chai, Kevin E K; Anthony, Stephen; Coiera, Enrico; Magrabi, Farah
2013-01-01
Objective To examine the feasibility of using statistical text classification to automatically identify health information technology (HIT) incidents in the USA Food and Drug Administration (FDA) Manufacturer and User Facility Device Experience (MAUDE) database. Design We used a subset of 570 272 incidents including 1534 HIT incidents reported to MAUDE between 1 January 2008 and 1 July 2010. Text classifiers using regularized logistic regression were evaluated with both ‘balanced’ (50% HIT) and ‘stratified’ (0.297% HIT) datasets for training, validation, and testing. Dataset preparation, feature extraction, feature selection, cross-validation, classification, performance evaluation, and error analysis were performed iteratively to further improve the classifiers. Feature-selection techniques such as removing short words and stop words, stemming, lemmatization, and principal component analysis were examined. Measurements κ statistic, F1 score, precision and recall. Results Classification performance was similar on both the stratified (0.954 F1 score) and balanced (0.995 F1 score) datasets. Stemming was the most effective technique, reducing the feature set size to 79% while maintaining comparable performance. Training with balanced datasets improved recall (0.989) but reduced precision (0.165). Conclusions Statistical text classification appears to be a feasible method for identifying HIT reports within large databases of incidents. Automated identification should enable more HIT problems to be detected, analyzed, and addressed in a timely manner. Semi-supervised learning may be necessary when applying machine learning to big data analysis of patient safety incidents and requires further investigation. PMID:23666777
De Spiegelaere, Ward; Malatinkova, Eva; Lynch, Lindsay; Van Nieuwerburgh, Filip; Messiaen, Peter; O'Doherty, Una; Vandekerckhove, Linos
2014-06-01
Quantification of integrated proviral HIV DNA by repetitive-sampling Alu-HIV PCR is a candidate virological tool to monitor the HIV reservoir in patients. However, the experimental procedures and data analysis of the assay are complex and hinder its widespread use. Here, we provide an improved and simplified data analysis method by adopting binomial and Poisson statistics. A modified analysis method on the basis of Poisson statistics was used to analyze the binomial data of positive and negative reactions from a 42-replicate Alu-HIV PCR by use of dilutions of an integration standard and on samples of 57 HIV-infected patients. Results were compared with the quantitative output of the previously described Alu-HIV PCR method. Poisson-based quantification of the Alu-HIV PCR was linearly correlated with the standard dilution series, indicating that absolute quantification with the Poisson method is a valid alternative for data analysis of repetitive-sampling Alu-HIV PCR data. Quantitative outputs of patient samples assessed by the Poisson method correlated with the previously described Alu-HIV PCR analysis, indicating that this method is a valid alternative for quantifying integrated HIV DNA. Poisson-based analysis of the Alu-HIV PCR data enables absolute quantification without the need of a standard dilution curve. Implementation of the CI estimation permits improved qualitative analysis of the data and provides a statistical basis for the required minimal number of technical replicates. © 2014 The American Association for Clinical Chemistry.
Validating Smoking Data From the Veteran’s Affairs Health Factors Dataset, an Electronic Data Source
Brandt, Cynthia A.; Skanderson, Melissa; Justice, Amy C.; Shahrir, Shahida; Butt, Adeel A.; Brown, Sheldon T.; Freiberg, Matthew S.; Gibert, Cynthia L.; Goetz, Matthew Bidwell; Kim, Joon Woo; Pisani, Margaret A.; Rimland, David; Rodriguez-Barradas, Maria C.; Sico, Jason J.; Tindle, Hilary A.; Crothers, Kristina
2011-01-01
Introduction: We assessed smoking data from the Veterans Health Administration (VHA) electronic medical record (EMR) Health Factors dataset. Methods: To assess the validity of the EMR Health Factors smoking data, we first created an algorithm to convert text entries into a 3-category smoking variable (never, former, and current). We compared this EMR smoking variable to 2 different sources of patient self-reported smoking survey data: (a) 6,816 HIV-infected and -uninfected participants in the 8-site Veterans Aging Cohort Study (VACS-8) and (b) a subset of 13,689 participants from the national VACS Virtual Cohort (VACS-VC), who also completed the 1999 Large Health Study (LHS) survey. Sensitivity, specificity, and kappa statistics were used to evaluate agreement of EMR Health Factors smoking data with self-report smoking data. Results: For the EMR Health Factors and VACS-8 comparison of current, former, and never smoking categories, the kappa statistic was .66. For EMR Health Factors and VACS-VC/LHS comparison of smoking, the kappa statistic was .61. Conclusions: Based on kappa statistics, agreement between the EMR Health Factors and survey sources is substantial. Identification of current smokers nationally within the VHA can be used in future studies to track smoking status over time, to evaluate smoking interventions, and to adjust for smoking status in research. Our methodology may provide insights for other organizations seeking to use EMR data for accurate determination of smoking status. PMID:21911825
Kurtosis Approach Nonlinear Blind Source Separation
NASA Technical Reports Server (NTRS)
Duong, Vu A.; Stubbemd, Allen R.
2005-01-01
In this paper, we introduce a new algorithm for blind source signal separation for post-nonlinear mixtures. The mixtures are assumed to be linearly mixed from unknown sources first and then distorted by memoryless nonlinear functions. The nonlinear functions are assumed to be smooth and can be approximated by polynomials. Both the coefficients of the unknown mixing matrix and the coefficients of the approximated polynomials are estimated by the gradient descent method conditional on the higher order statistical requirements. The results of simulation experiments presented in this paper demonstrate the validity and usefulness of our approach for nonlinear blind source signal separation Keywords: Independent Component Analysis, Kurtosis, Higher order statistics.
NASA Astrophysics Data System (ADS)
Little, David L., II
Ongoing changes in values, pedagogy, and curriculum concerning sustainability education necessitate that strong curricular elements are identified in sustainability education. However, quantitative research in sustainability education is largely undeveloped or relies on outdated instruments. In part, this is because no widespread quantitative instrument for measuring related educational outcomes has been developed for the field, though their development is pivotal for future efforts in sustainability education related to STEM majors. This research study details the creation, evaluation, and validation of an instrument -- the STEM Sustainability Engagement Instrument (STEMSEI) -- designed to measure sustainability engagement in post-secondary STEM majors. The study was conducted in three phases, using qualitative methods in phase 1, a concurrent mixed methods design in phase 2, and a sequential mixed methods design in phase 3. The STEMSEI was able to successfully predict statistically significant differences in the sample (n= 1017) that were predicted by prior research in environmental education. The STEMSEI also revealed statistically significant differences between STEM majors' sustainability engagement with a large effect size (.203 ≤ eta2 ≤ .211). As hypothesized, statistically significant differences were found on the environmental scales across gender and present religion. With respect to gender, self-perceived measures of emotional engagement with environmental sustainability was higher with females while males had higher measures in cognitive engagement with respect to knowing information related to environmental sustainability. With respect to present religion, self-perceived measures of general engagement and emotional engagement in environmental sustainability were higher for non-Christians as compared to Christians. On the economic scales, statistically significant differences were found across gender. Specifically, measures of males' self-perceived cognitive engagement in knowing information related to economic sustainability were greater than those of females. Future research should establish the generalizability of these results and further test the validity of the STEMSEI.
An Easy Tool to Predict Survival in Patients Receiving Radiation Therapy for Painful Bone Metastases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Westhoff, Paulien G., E-mail: p.g.westhoff@umcutrecht.nl; Graeff, Alexander de; Monninkhof, Evelyn M.
2014-11-15
Purpose: Patients with bone metastases have a widely varying survival. A reliable estimation of survival is needed for appropriate treatment strategies. Our goal was to assess the value of simple prognostic factors, namely, patient and tumor characteristics, Karnofsky performance status (KPS), and patient-reported scores of pain and quality of life, to predict survival in patients with painful bone metastases. Methods and Materials: In the Dutch Bone Metastasis Study, 1157 patients were treated with radiation therapy for painful bone metastases. At randomization, physicians determined the KPS; patients rated general health on a visual analogue scale (VAS-gh), valuation of life on amore » verbal rating scale (VRS-vl) and pain intensity. To assess the predictive value of the variables, we used multivariate Cox proportional hazard analyses and C-statistics for discriminative value. Of the final model, calibration was assessed. External validation was performed on a dataset of 934 patients who were treated with radiation therapy for vertebral metastases. Results: Patients had mainly breast (39%), prostate (23%), or lung cancer (25%). After a maximum of 142 weeks' follow-up, 74% of patients had died. The best predictive model included sex, primary tumor, visceral metastases, KPS, VAS-gh, and VRS-vl (C-statistic = 0.72, 95% CI = 0.70-0.74). A reduced model, with only KPS and primary tumor, showed comparable discriminative capacity (C-statistic = 0.71, 95% CI = 0.69-0.72). External validation showed a C-statistic of 0.72 (95% CI = 0.70-0.73). Calibration of the derivation and the validation dataset showed underestimation of survival. Conclusion: In predicting survival in patients with painful bone metastases, KPS combined with primary tumor was comparable to a more complex model. Considering the amount of variables in complex models and the additional burden on patients, the simple model is preferred for daily use. In addition, a risk table for survival is provided.« less
Wolf, Pedro S A; Figueredo, Aurelio J; Jacobs, W Jake
2013-01-01
The purpose of this paper is to examine the convergent and nomological validity of a GPS-based measure of daily activity, operationalized as Number of Places Visited (NPV). Relations among the GPS-based measure and two self-report measures of NPV, as well as relations among NPV and two factors made up of self-reported individual differences were examined. The first factor was composed of variables related to an Active Lifestyle (AL) (e.g., positive affect, extraversion…) and the second factor was composed of variables related to a Sedentary Lifestyle (SL) (e.g., depression, neuroticism…). NPV was measured over 4 days. This timeframe was made up of two week and two weekend days. A bi-variate analysis established one level of convergent validity and a Split-Plot GLM examined convergent validity, nomological validity, and alternative hypotheses related to constraints on activity throughout the week simultaneously. The first analysis revealed significant correlations among NPV measures- weekday, weekend, and the entire 4-day time period, supporting the convergent validity of the Diary-, Google Maps-, and GPS-NPV measures. Results from the second analysis, indicating non-significant mean differences in NPV regardless of method, also support this conclusion. We also found that AL is a statistically significant predictor of NPV no matter how NPV was measured. We did not find a statically significant relation among NPV and SL. These results permit us to infer that the GPS-based NPV measure has convergent and nomological validity.
Wolf, Pedro S. A.; Figueredo, Aurelio J.; Jacobs, W. Jake
2013-01-01
The purpose of this paper is to examine the convergent and nomological validity of a GPS-based measure of daily activity, operationalized as Number of Places Visited (NPV). Relations among the GPS-based measure and two self-report measures of NPV, as well as relations among NPV and two factors made up of self-reported individual differences were examined. The first factor was composed of variables related to an Active Lifestyle (AL) (e.g., positive affect, extraversion…) and the second factor was composed of variables related to a Sedentary Lifestyle (SL) (e.g., depression, neuroticism…). NPV was measured over 4 days. This timeframe was made up of two week and two weekend days. A bi-variate analysis established one level of convergent validity and a Split-Plot GLM examined convergent validity, nomological validity, and alternative hypotheses related to constraints on activity throughout the week simultaneously. The first analysis revealed significant correlations among NPV measures- weekday, weekend, and the entire 4-day time period, supporting the convergent validity of the Diary-, Google Maps-, and GPS-NPV measures. Results from the second analysis, indicating non-significant mean differences in NPV regardless of method, also support this conclusion. We also found that AL is a statistically significant predictor of NPV no matter how NPV was measured. We did not find a statically significant relation among NPV and SL. These results permit us to infer that the GPS-based NPV measure has convergent and nomological validity. PMID:23761772
Statistical Engineering in Air Traffic Management Research
NASA Technical Reports Server (NTRS)
Wilson, Sara R.
2015-01-01
NASA is working to develop an integrated set of advanced technologies to enable efficient arrival operations in high-density terminal airspace for the Next Generation Air Transportation System. This integrated arrival solution is being validated and verified in laboratories and transitioned to a field prototype for an operational demonstration at a major U.S. airport. Within NASA, this is a collaborative effort between Ames and Langley Research Centers involving a multi-year iterative experimentation process. Designing and analyzing a series of sequential batch computer simulations and human-in-the-loop experiments across multiple facilities and simulation environments involves a number of statistical challenges. Experiments conducted in separate laboratories typically have different limitations and constraints, and can take different approaches with respect to the fundamental principles of statistical design of experiments. This often makes it difficult to compare results from multiple experiments and incorporate findings into the next experiment in the series. A statistical engineering approach is being employed within this project to support risk-informed decision making and maximize the knowledge gained within the available resources. This presentation describes a statistical engineering case study from NASA, highlights statistical challenges, and discusses areas where existing statistical methodology is adapted and extended.
Statistical-Dynamical Seasonal Forecasts of Central-Southwest Asian Winter Precipitation.
NASA Astrophysics Data System (ADS)
Tippett, Michael K.; Goddard, Lisa; Barnston, Anthony G.
2005-06-01
Interannual precipitation variability in central-southwest (CSW) Asia has been associated with East Asian jet stream variability and western Pacific tropical convection. However, atmospheric general circulation models (AGCMs) forced by observed sea surface temperature (SST) poorly simulate the region's interannual precipitation variability. The statistical-dynamical approach uses statistical methods to correct systematic deficiencies in the response of AGCMs to SST forcing. Statistical correction methods linking model-simulated Indo-west Pacific precipitation and observed CSW Asia precipitation result in modest, but statistically significant, cross-validated simulation skill in the northeast part of the domain for the period from 1951 to 1998. The statistical-dynamical method is also applied to recent (winter 1998/99 to 2002/03) multimodel, two-tier December-March precipitation forecasts initiated in October. This period includes 4 yr (winter of 1998/99 to 2001/02) of severe drought. Tercile probability forecasts are produced using ensemble-mean forecasts and forecast error estimates. The statistical-dynamical forecasts show enhanced probability of below-normal precipitation for the four drought years and capture the return to normal conditions in part of the region during the winter of 2002/03.May Kabul be without gold, but not without snow.—Traditional Afghan proverb
Modelling 1-minute directional observations of the global irradiance.
NASA Astrophysics Data System (ADS)
Thejll, Peter; Pagh Nielsen, Kristian; Andersen, Elsa; Furbo, Simon
2016-04-01
Direct and diffuse irradiances from the sky has been collected at 1-minute intervals for about a year from the experimental station at the Technical University of Denmark for the IEA project "Solar Resource Assessment and Forecasting". These data were gathered by pyrheliometers tracking the Sun, as well as with apertured pyranometers gathering 1/8th and 1/16th of the light from the sky in 45 degree azimuthal ranges pointed around the compass. The data are gathered in order to develop detailed models of the potentially available solar energy and its variations at high temporal resolution in order to gain a more detailed understanding of the solar resource. This is important for a better understanding of the sub-grid scale cloud variation that cannot be resolved with climate and weather models. It is also important for optimizing the operation of active solar energy systems such as photovoltaic plants and thermal solar collector arrays, and for passive solar energy and lighting to buildings. We present regression-based modelling of the observed data, and focus, here, on the statistical properties of the model fits. Using models based on the one hand on what is found in the literature and on physical expectations, and on the other hand on purely statistical models, we find solutions that can explain up to 90% of the variance in global radiation. The models leaning on physical insights include terms for the direct solar radiation, a term for the circum-solar radiation, a diffuse term and a term for the horizon brightening/darkening. The purely statistical model is found using data- and formula-validation approaches picking model expressions from a general catalogue of possible formulae. The method allows nesting of expressions, and the results found are dependent on and heavily constrained by the cross-validation carried out on statistically independent testing and training data-sets. Slightly better fits -- in terms of variance explained -- is found using the purely statistical fitting/searching approach. We describe the methods applied, results found, and discuss the different potentials of the physics- and statistics-only based model-searches.
Often Asked but Rarely Answered: Can Asians Meet DSM-5/ICD-10 Autism Spectrum Disorder Criteria?
Kim, So Hyun; Koh, Yun-Joo; Lim, Eun-Chung; Kim, Soo-Jeong; Leventhal, Bennett L.
2016-01-01
Abstract Objectives: To evaluate whether Asian (Korean children) populations can be validly diagnosed with autism spectrum disorder (ASD) using Western-based diagnostic instruments and criteria based on Diagnostic and Statistical Manual on Mental Disorders, 5th edition (DSM-5). Methods: Participants included an epidemiologically ascertained 7–14-year-old (N = 292) South Korean cohort from a larger prevalence study (N = 55,266). Main outcomes were based on Western-based diagnostic methods for Korean children using gold standard instruments, Autism Diagnostic Interview-Revised, and Autism Diagnostic Observation Schedule. Factor analysis and ANOVAs were performed to examine factor structure of autism symptoms and identify phenotypic differences between Korean children with ASD and non-ASD diagnoses. Results: Using Western-based diagnostic methods, Korean children with ASD were successfully identified with moderate-to-high diagnostic validity (sensitivities/specificities ranging 64%–93%), strong internal consistency, and convergent/concurrent validity. The patterns of autism phenotypes in a Korean population were similar to those observed in a Western population with two symptom domains (social communication and restricted and repetitive behavior factors). Statistically significant differences in the use of socially acceptable communicative behaviors (e.g., direct gaze, range of facial expressions) emerged between ASD versus non-ASD cases (mostly p < 0.001), ensuring that these can be a similarly valid part of the ASD phenotype in both Asian and Western populations. Conclusions: Despite myths, biases, and stereotypes about Asian social behavior, Asians (at least Korean children) typically use elements of reciprocal social interactions similar to those in the West. Therefore, standardized diagnostic methods widely used for ASD in Western culture can be validly used as part of the assessment process and research with Koreans and, possibly, other Asians. PMID:27315155
Engoren, Milo; Habib, Robert H; Dooner, John J; Schwann, Thomas A
2013-08-01
As many as 14 % of patients undergoing coronary artery bypass surgery are readmitted within 30 days. Readmission is usually the result of morbidity and may lead to death. The purpose of this study is to develop and compare statistical and genetic programming models to predict readmission. Patients were divided into separate Construction and Validation populations. Using 88 variables, logistic regression, genetic programs, and artificial neural nets were used to develop predictive models. Models were first constructed and tested on the Construction populations, then validated on the Validation population. Areas under the receiver operator characteristic curves (AU ROC) were used to compare the models. Two hundred and two patients (7.6 %) in the 2,644 patient Construction group and 216 (8.0 %) of the 2,711 patient Validation group were re-admitted within 30 days of CABG surgery. Logistic regression predicted readmission with AU ROC = .675 ± .021 in the Construction group. Genetic programs significantly improved the accuracy, AU ROC = .767 ± .001, p < .001). Artificial neural nets were less accurate with AU ROC = 0.597 ± .001 in the Construction group. Predictive accuracy of all three techniques fell in the Validation group. However, the accuracy of genetic programming (AU ROC = .654 ± .001) was still trivially but statistically non-significantly better than that of the logistic regression (AU ROC = .644 ± .020, p = .61). Genetic programming and logistic regression provide alternative methods to predict readmission that are similarly accurate.
Measurement issues in research on social support and health.
Dean, K; Holst, E; Kreiner, S; Schoenborn, C; Wilson, R
1994-01-01
STUDY OBJECTIVE--The aims were: (1) to identify methodological problems that may explain the inconsistencies and contradictions in the research evidence on social support and health, and (2) to validate a frequently used measure of social support in order to determine whether or not it could be used in multivariate analyses of population data in research on social support and health. DESIGN AND METHODS--Secondary analysis of data collected in a cross sectional survey of a multistage cluster sample of the population of the United States, designed to study relationships in behavioural, social support and health variables. Statistical models based on item response theory and graph theory were used to validate the measure of social support to be used in subsequent analyses. PARTICIPANTS--Data on 1755 men and women aged 20 to 64 years were available for the scale validation. RESULTS--Massive evidence of item bias was found for all items of a group membership subscale. The most serious problems were found in relationship to an item measuring membership in work related groups. Using that item in the social network scale in multivariate analyses would distort findings on the statistical effects of education, employment status, and household income. Evidence of item bias was also found for a sociability subscale. When marital status was included to create what is called an intimate contacts subscale, the confounding grew worse. CONCLUSIONS--The composite measure of social network is not valid and would seriously distort the findings of analyses attempting to study relationships between the index and other variables. The findings show that valid measurement is a methodological issue that must be addressed in scientific research on population health. PMID:8189179