Gaskin, Cadeyrn J; Happell, Brenda
2014-05-01
To (a) assess the statistical power of nursing research to detect small, medium, and large effect sizes; (b) estimate the experiment-wise Type I error rate in these studies; and (c) assess the extent to which (i) a priori power analyses, (ii) effect sizes (and interpretations thereof), and (iii) confidence intervals were reported. Statistical review. Papers published in the 2011 volumes of the 10 highest ranked nursing journals, based on their 5-year impact factors. Papers were assessed for statistical power, control of experiment-wise Type I error, reporting of a priori power analyses, reporting and interpretation of effect sizes, and reporting of confidence intervals. The analyses were based on 333 papers, from which 10,337 inferential statistics were identified. The median power to detect small, medium, and large effect sizes was .40 (interquartile range [IQR]=.24-.71), .98 (IQR=.85-1.00), and 1.00 (IQR=1.00-1.00), respectively. The median experiment-wise Type I error rate was .54 (IQR=.26-.80). A priori power analyses were reported in 28% of papers. Effect sizes were routinely reported for Spearman's rank correlations (100% of papers in which this test was used), Poisson regressions (100%), odds ratios (100%), Kendall's tau correlations (100%), Pearson's correlations (99%), logistic regressions (98%), structural equation modelling/confirmatory factor analyses/path analyses (97%), and linear regressions (83%), but were reported less often for two-proportion z tests (50%), analyses of variance/analyses of covariance/multivariate analyses of variance (18%), t tests (8%), Wilcoxon's tests (8%), Chi-squared tests (8%), and Fisher's exact tests (7%), and not reported for sign tests, Friedman's tests, McNemar's tests, multi-level models, and Kruskal-Wallis tests. Effect sizes were infrequently interpreted. Confidence intervals were reported in 28% of papers. The use, reporting, and interpretation of inferential statistics in nursing research need substantial improvement. Most importantly, researchers should abandon the misleading practice of interpreting the results from inferential tests based solely on whether they are statistically significant (or not) and, instead, focus on reporting and interpreting effect sizes, confidence intervals, and significance levels. Nursing researchers also need to conduct and report a priori power analyses, and to address the issue of Type I experiment-wise error inflation in their studies. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.
Chung, Sang M; Lee, David J; Hand, Austin; Young, Philip; Vaidyanathan, Jayabharathi; Sahajwalla, Chandrahas
2015-12-01
The study evaluated whether the renal function decline rate per year with age in adults varies based on two primary statistical analyses: cross-section (CS), using one observation per subject, and longitudinal (LT), using multiple observations per subject over time. A total of 16628 records (3946 subjects; age range 30-92 years) of creatinine clearance and relevant demographic data were used. On average, four samples per subject were collected for up to 2364 days (mean: 793 days). A simple linear regression and random coefficient models were selected for CS and LT analyses, respectively. The renal function decline rates per year were 1.33 and 0.95 ml/min/year for CS and LT analyses, respectively, and were slower when the repeated individual measurements were considered. The study confirms that rates are different based on statistical analyses, and that a statistically robust longitudinal model with a proper sampling design provides reliable individual as well as population estimates of the renal function decline rates per year with age in adults. In conclusion, our findings indicated that one should be cautious in interpreting the renal function decline rate with aging information because its estimation was highly dependent on the statistical analyses. From our analyses, a population longitudinal analysis (e.g. random coefficient model) is recommended if individualization is critical, such as a dose adjustment based on renal function during a chronic therapy. Copyright © 2015 John Wiley & Sons, Ltd.
Comparability of a Paper-Based Language Test and a Computer-Based Language Test.
ERIC Educational Resources Information Center
Choi, Inn-Chull; Kim, Kyoung Sung; Boo, Jaeyool
2003-01-01
Utilizing the Test of English Proficiency, developed by Seoul National University (TEPS), examined comparability between the paper-based language test and the computer-based language test based on content and construct validation employing content analyses based on corpus linguistic techniques in addition to such statistical analyses as…
Coordinate based random effect size meta-analysis of neuroimaging studies.
Tench, C R; Tanasescu, Radu; Constantinescu, C S; Auer, D P; Cottam, W J
2017-06-01
Low power in neuroimaging studies can make them difficult to interpret, and Coordinate based meta-analysis (CBMA) may go some way to mitigating this issue. CBMA has been used in many analyses to detect where published functional MRI or voxel-based morphometry studies testing similar hypotheses report significant summary results (coordinates) consistently. Only the reported coordinates and possibly t statistics are analysed, and statistical significance of clusters is determined by coordinate density. Here a method of performing coordinate based random effect size meta-analysis and meta-regression is introduced. The algorithm (ClusterZ) analyses both coordinates and reported t statistic or Z score, standardised by the number of subjects. Statistical significance is determined not by coordinate density, but by a random effects meta-analyses of reported effects performed cluster-wise using standard statistical methods and taking account of censoring inherent in the published summary results. Type 1 error control is achieved using the false cluster discovery rate (FCDR), which is based on the false discovery rate. This controls both the family wise error rate under the null hypothesis that coordinates are randomly drawn from a standard stereotaxic space, and the proportion of significant clusters that are expected under the null. Such control is necessary to avoid propagating and even amplifying the very issues motivating the meta-analysis in the first place. ClusterZ is demonstrated on both numerically simulated data and on real data from reports of grey matter loss in multiple sclerosis (MS) and syndromes suggestive of MS, and of painful stimulus in healthy controls. The software implementation is available to download and use freely. Copyright © 2017 Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Morrissey, L. A.; Weinstock, K. J.; Mouat, D. A.; Card, D. H.
1984-01-01
An evaluation of Thematic Mapper Simulator (TMS) data for the geobotanical discrimination of rock types based on vegetative cover characteristics is addressed in this research. A methodology for accomplishing this evaluation utilizing univariate and multivariate techniques is presented. TMS data acquired with a Daedalus DEI-1260 multispectral scanner were integrated with vegetation and geologic information for subsequent statistical analyses, which included a chi-square test, an analysis of variance, stepwise discriminant analysis, and Duncan's multiple range test. Results indicate that ultramafic rock types are spectrally separable from nonultramafics based on vegetative cover through the use of statistical analyses.
USDA-ARS?s Scientific Manuscript database
Agronomic and Environmental research experiments result in data that are analyzed using statistical methods. These data are unavoidably accompanied by uncertainty. Decisions about hypotheses, based on statistical analyses of these data are therefore subject to error. This error is of three types,...
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
The Problem of Auto-Correlation in Parasitology
Pollitt, Laura C.; Reece, Sarah E.; Mideo, Nicole; Nussey, Daniel H.; Colegrave, Nick
2012-01-01
Explaining the contribution of host and pathogen factors in driving infection dynamics is a major ambition in parasitology. There is increasing recognition that analyses based on single summary measures of an infection (e.g., peak parasitaemia) do not adequately capture infection dynamics and so, the appropriate use of statistical techniques to analyse dynamics is necessary to understand infections and, ultimately, control parasites. However, the complexities of within-host environments mean that tracking and analysing pathogen dynamics within infections and among hosts poses considerable statistical challenges. Simple statistical models make assumptions that will rarely be satisfied in data collected on host and parasite parameters. In particular, model residuals (unexplained variance in the data) should not be correlated in time or space. Here we demonstrate how failure to account for such correlations can result in incorrect biological inference from statistical analysis. We then show how mixed effects models can be used as a powerful tool to analyse such repeated measures data in the hope that this will encourage better statistical practices in parasitology. PMID:22511865
Wu, Robert; Glen, Peter; Ramsay, Tim; Martel, Guillaume
2014-06-28
Observational studies dominate the surgical literature. Statistical adjustment is an important strategy to account for confounders in observational studies. Research has shown that published articles are often poor in statistical quality, which may jeopardize their conclusions. The Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines have been published to help establish standards for statistical reporting.This study will seek to determine whether the quality of statistical adjustment and the reporting of these methods are adequate in surgical observational studies. We hypothesize that incomplete reporting will be found in all surgical observational studies, and that the quality and reporting of these methods will be of lower quality in surgical journals when compared with medical journals. Finally, this work will seek to identify predictors of high-quality reporting. This work will examine the top five general surgical and medical journals, based on a 5-year impact factor (2007-2012). All observational studies investigating an intervention related to an essential component area of general surgery (defined by the American Board of Surgery), with an exposure, outcome, and comparator, will be included in this systematic review. Essential elements related to statistical reporting and quality were extracted from the SAMPL guidelines and include domains such as intent of analysis, primary analysis, multiple comparisons, numbers and descriptive statistics, association and correlation analyses, linear regression, logistic regression, Cox proportional hazard analysis, analysis of variance, survival analysis, propensity analysis, and independent and correlated analyses. Each article will be scored as a proportion based on fulfilling criteria in relevant analyses used in the study. A logistic regression model will be built to identify variables associated with high-quality reporting. A comparison will be made between the scores of surgical observational studies published in medical versus surgical journals. Secondary outcomes will pertain to individual domains of analysis. Sensitivity analyses will be conducted. This study will explore the reporting and quality of statistical analyses in surgical observational studies published in the most referenced surgical and medical journals in 2013 and examine whether variables (including the type of journal) can predict high-quality reporting.
ISSUES IN THE STATISTICAL ANALYSIS OF SMALL-AREA HEALTH DATA. (R825173)
The availability of geographically indexed health and population data, with advances in computing, geographical information systems and statistical methodology, have opened the way for serious exploration of small area health statistics based on routine data. Such analyses may be...
Reframing Serial Murder Within Empirical Research.
Gurian, Elizabeth A
2017-04-01
Empirical research on serial murder is limited due to the lack of consensus on a definition, the continued use of primarily descriptive statistics, and linkage to popular culture depictions. These limitations also inhibit our understanding of these offenders and affect credibility in the field of research. Therefore, this comprehensive overview of a sample of 508 cases (738 total offenders, including partnered groups of two or more offenders) provides analyses of solo male, solo female, and partnered serial killers to elucidate statistical differences and similarities in offending and adjudication patterns among the three groups. This analysis of serial homicide offenders not only supports previous research on offending patterns present in the serial homicide literature but also reveals that empirically based analyses can enhance our understanding beyond traditional case studies and descriptive statistics. Further research based on these empirical analyses can aid in the development of more accurate classifications and definitions of serial murderers.
Weir, Christopher J; Butcher, Isabella; Assi, Valentina; Lewis, Stephanie C; Murray, Gordon D; Langhorne, Peter; Brady, Marian C
2018-03-07
Rigorous, informative meta-analyses rely on availability of appropriate summary statistics or individual participant data. For continuous outcomes, especially those with naturally skewed distributions, summary information on the mean or variability often goes unreported. While full reporting of original trial data is the ideal, we sought to identify methods for handling unreported mean or variability summary statistics in meta-analysis. We undertook two systematic literature reviews to identify methodological approaches used to deal with missing mean or variability summary statistics. Five electronic databases were searched, in addition to the Cochrane Colloquium abstract books and the Cochrane Statistics Methods Group mailing list archive. We also conducted cited reference searching and emailed topic experts to identify recent methodological developments. Details recorded included the description of the method, the information required to implement the method, any underlying assumptions and whether the method could be readily applied in standard statistical software. We provided a summary description of the methods identified, illustrating selected methods in example meta-analysis scenarios. For missing standard deviations (SDs), following screening of 503 articles, fifteen methods were identified in addition to those reported in a previous review. These included Bayesian hierarchical modelling at the meta-analysis level; summary statistic level imputation based on observed SD values from other trials in the meta-analysis; a practical approximation based on the range; and algebraic estimation of the SD based on other summary statistics. Following screening of 1124 articles for methods estimating the mean, one approximate Bayesian computation approach and three papers based on alternative summary statistics were identified. Illustrative meta-analyses showed that when replacing a missing SD the approximation using the range minimised loss of precision and generally performed better than omitting trials. When estimating missing means, a formula using the median, lower quartile and upper quartile performed best in preserving the precision of the meta-analysis findings, although in some scenarios, omitting trials gave superior results. Methods based on summary statistics (minimum, maximum, lower quartile, upper quartile, median) reported in the literature facilitate more comprehensive inclusion of randomised controlled trials with missing mean or variability summary statistics within meta-analyses.
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.
Chu, Annie; Cui, Jenny; Dinov, Ivo D
2009-03-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.
ERIC Educational Resources Information Center
Merrill, Ray M.; Chatterley, Amanda; Shields, Eric C.
2005-01-01
This study explored the effectiveness of selected statistical measures at motivating or maintaining regular exercise among college students. The study also considered whether ease in understanding these statistical measures was associated with perceived effectiveness at motivating or maintaining regular exercise. Analyses were based on a…
Schulz, Marcus; Neumann, Daniel; Fleet, David M; Matthies, Michael
2013-12-01
During the last decades, marine pollution with anthropogenic litter has become a worldwide major environmental concern. Standardized monitoring of litter since 2001 on 78 beaches selected within the framework of the Convention for the Protection of the Marine Environment of the North-East Atlantic (OSPAR) has been used to identify temporal trends of marine litter. Based on statistical analyses of this dataset a two-part multi-criteria evaluation system for beach litter pollution of the North-East Atlantic and the North Sea is proposed. Canonical correlation analyses, linear regression analyses, and non-parametric analyses of variance were used to identify different temporal trends. A classification of beaches was derived from cluster analyses and served to define different states of beach quality according to abundances of 17 input variables. The evaluation system is easily applicable and relies on the above-mentioned classification and on significant temporal trends implied by significant rank correlations. Copyright © 2013 Elsevier Ltd. All rights reserved.
2011-01-01
Background Clinical researchers have often preferred to use a fixed effects model for the primary interpretation of a meta-analysis. Heterogeneity is usually assessed via the well known Q and I2 statistics, along with the random effects estimate they imply. In recent years, alternative methods for quantifying heterogeneity have been proposed, that are based on a 'generalised' Q statistic. Methods We review 18 IPD meta-analyses of RCTs into treatments for cancer, in order to quantify the amount of heterogeneity present and also to discuss practical methods for explaining heterogeneity. Results Differing results were obtained when the standard Q and I2 statistics were used to test for the presence of heterogeneity. The two meta-analyses with the largest amount of heterogeneity were investigated further, and on inspection the straightforward application of a random effects model was not deemed appropriate. Compared to the standard Q statistic, the generalised Q statistic provided a more accurate platform for estimating the amount of heterogeneity in the 18 meta-analyses. Conclusions Explaining heterogeneity via the pre-specification of trial subgroups, graphical diagnostic tools and sensitivity analyses produced a more desirable outcome than an automatic application of the random effects model. Generalised Q statistic methods for quantifying and adjusting for heterogeneity should be incorporated as standard into statistical software. Software is provided to help achieve this aim. PMID:21473747
P-MartCancer-Interactive Online Software to Enable Analysis of Shotgun Cancer Proteomic Datasets.
Webb-Robertson, Bobbie-Jo M; Bramer, Lisa M; Jensen, Jeffrey L; Kobold, Markus A; Stratton, Kelly G; White, Amanda M; Rodland, Karin D
2017-11-01
P-MartCancer is an interactive web-based software environment that enables statistical analyses of peptide or protein data, quantitated from mass spectrometry-based global proteomics experiments, without requiring in-depth knowledge of statistical programming. P-MartCancer offers a series of statistical modules associated with quality assessment, peptide and protein statistics, protein quantification, and exploratory data analyses driven by the user via customized workflows and interactive visualization. Currently, P-MartCancer offers access and the capability to analyze multiple cancer proteomic datasets generated through the Clinical Proteomics Tumor Analysis Consortium at the peptide, gene, and protein levels. P-MartCancer is deployed as a web service (https://pmart.labworks.org/cptac.html), alternatively available via Docker Hub (https://hub.docker.com/r/pnnl/pmart-web/). Cancer Res; 77(21); e47-50. ©2017 AACR . ©2017 American Association for Cancer Research.
Thompson, Ronald E.; Hoffman, Scott A.
2006-01-01
A suite of 28 streamflow statistics, ranging from extreme low to high flows, was computed for 17 continuous-record streamflow-gaging stations and predicted for 20 partial-record stations in Monroe County and contiguous counties in north-eastern Pennsylvania. The predicted statistics for the partial-record stations were based on regression analyses relating inter-mittent flow measurements made at the partial-record stations indexed to concurrent daily mean flows at continuous-record stations during base-flow conditions. The same statistics also were predicted for 134 ungaged stream locations in Monroe County on the basis of regression analyses relating the statistics to GIS-determined basin characteristics for the continuous-record station drainage areas. The prediction methodology for developing the regression equations used to estimate statistics was developed for estimating low-flow frequencies. This study and a companion study found that the methodology also has application potential for predicting intermediate- and high-flow statistics. The statistics included mean monthly flows, mean annual flow, 7-day low flows for three recurrence intervals, nine flow durations, mean annual base flow, and annual mean base flows for two recurrence intervals. Low standard errors of prediction and high coefficients of determination (R2) indicated good results in using the regression equations to predict the statistics. Regression equations for the larger flow statistics tended to have lower standard errors of prediction and higher coefficients of determination (R2) than equations for the smaller flow statistics. The report discusses the methodologies used in determining the statistics and the limitations of the statistics and the equations used to predict the statistics. Caution is indicated in using the predicted statistics for small drainage area situations. Study results constitute input needed by water-resource managers in Monroe County for planning purposes and evaluation of water-resources availability.
NASA Technical Reports Server (NTRS)
Foster, Winfred A., Jr.; Crowder, Winston; Steadman, Todd E.
2014-01-01
This paper presents the results of statistical analyses performed to predict the thrust imbalance between two solid rocket motor boosters to be used on the Space Launch System (SLS) vehicle. Two legacy internal ballistics codes developed for the Space Shuttle program were coupled with a Monte Carlo analysis code to determine a thrust imbalance envelope for the SLS vehicle based on the performance of 1000 motor pairs. Thirty three variables which could impact the performance of the motors during the ignition transient and thirty eight variables which could impact the performance of the motors during steady state operation of the motor were identified and treated as statistical variables for the analyses. The effects of motor to motor variation as well as variations between motors of a single pair were included in the analyses. The statistical variations of the variables were defined based on data provided by NASA's Marshall Space Flight Center for the upgraded five segment booster and from the Space Shuttle booster when appropriate. The results obtained for the statistical envelope are compared with the design specification thrust imbalance limits for the SLS launch vehicle
NASA Technical Reports Server (NTRS)
Foster, Winfred A., Jr.; Crowder, Winston; Steadman, Todd E.
2014-01-01
This paper presents the results of statistical analyses performed to predict the thrust imbalance between two solid rocket motor boosters to be used on the Space Launch System (SLS) vehicle. Two legacy internal ballistics codes developed for the Space Shuttle program were coupled with a Monte Carlo analysis code to determine a thrust imbalance envelope for the SLS vehicle based on the performance of 1000 motor pairs. Thirty three variables which could impact the performance of the motors during the ignition transient and thirty eight variables which could impact the performance of the motors during steady state operation of the motor were identified and treated as statistical variables for the analyses. The effects of motor to motor variation as well as variations between motors of a single pair were included in the analyses. The statistical variations of the variables were defined based on data provided by NASA's Marshall Space Flight Center for the upgraded five segment booster and from the Space Shuttle booster when appropriate. The results obtained for the statistical envelope are compared with the design specification thrust imbalance limits for the SLS launch vehicle.
SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit
Chu, Annie; Cui, Jenny; Dinov, Ivo D.
2011-01-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994
Coronado-Montoya, Stephanie; Levis, Alexander W; Kwakkenbos, Linda; Steele, Russell J; Turner, Erick H; Thombs, Brett D
2016-01-01
A large proportion of mindfulness-based therapy trials report statistically significant results, even in the context of very low statistical power. The objective of the present study was to characterize the reporting of "positive" results in randomized controlled trials of mindfulness-based therapy. We also assessed mindfulness-based therapy trial registrations for indications of possible reporting bias and reviewed recent systematic reviews and meta-analyses to determine whether reporting biases were identified. CINAHL, Cochrane CENTRAL, EMBASE, ISI, MEDLINE, PsycInfo, and SCOPUS databases were searched for randomized controlled trials of mindfulness-based therapy. The number of positive trials was described and compared to the number that might be expected if mindfulness-based therapy were similarly effective compared to individual therapy for depression. Trial registries were searched for mindfulness-based therapy registrations. CINAHL, Cochrane CENTRAL, EMBASE, ISI, MEDLINE, PsycInfo, and SCOPUS were also searched for mindfulness-based therapy systematic reviews and meta-analyses. 108 (87%) of 124 published trials reported ≥1 positive outcome in the abstract, and 109 (88%) concluded that mindfulness-based therapy was effective, 1.6 times greater than the expected number of positive trials based on effect size d = 0.55 (expected number positive trials = 65.7). Of 21 trial registrations, 13 (62%) remained unpublished 30 months post-trial completion. No trial registrations adequately specified a single primary outcome measure with time of assessment. None of 36 systematic reviews and meta-analyses concluded that effect estimates were overestimated due to reporting biases. The proportion of mindfulness-based therapy trials with statistically significant results may overstate what would occur in practice.
Statistics for Learning Genetics
ERIC Educational Resources Information Center
Charles, Abigail Sheena
2012-01-01
This study investigated the knowledge and skills that biology students may need to help them understand statistics/mathematics as it applies to genetics. The data are based on analyses of current representative genetics texts, practicing genetics professors' perspectives, and more directly, students' perceptions of, and performance in, doing…
The analysis of morphometric data on rocky mountain wolves and artic wolves using statistical method
NASA Astrophysics Data System (ADS)
Ammar Shafi, Muhammad; Saifullah Rusiman, Mohd; Hamzah, Nor Shamsidah Amir; Nor, Maria Elena; Ahmad, Noor’ani; Azia Hazida Mohamad Azmi, Nur; Latip, Muhammad Faez Ab; Hilmi Azman, Ahmad
2018-04-01
Morphometrics is a quantitative analysis depending on the shape and size of several specimens. Morphometric quantitative analyses are commonly used to analyse fossil record, shape and size of specimens and others. The aim of the study is to find the differences between rocky mountain wolves and arctic wolves based on gender. The sample utilised secondary data which included seven variables as independent variables and two dependent variables. Statistical modelling was used in the analysis such was the analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA). The results showed there exist differentiating results between arctic wolves and rocky mountain wolves based on independent factors and gender.
NASA Astrophysics Data System (ADS)
Emoto, K.; Saito, T.; Shiomi, K.
2017-12-01
Short-period (<1 s) seismograms are strongly affected by small-scale (<10 km) heterogeneities in the lithosphere. In general, short-period seismograms are analysed based on the statistical method by considering the interaction between seismic waves and randomly distributed small-scale heterogeneities. Statistical properties of the random heterogeneities have been estimated by analysing short-period seismograms. However, generally, the small-scale random heterogeneity is not taken into account for the modelling of long-period (>2 s) seismograms. We found that the energy of the coda of long-period seismograms shows a spatially flat distribution. This phenomenon is well known in short-period seismograms and results from the scattering by small-scale heterogeneities. We estimate the statistical parameters that characterize the small-scale random heterogeneity by modelling the spatiotemporal energy distribution of long-period seismograms. We analyse three moderate-size earthquakes that occurred in southwest Japan. We calculate the spatial distribution of the energy density recorded by a dense seismograph network in Japan at the period bands of 8-16 s, 4-8 s and 2-4 s and model them by using 3-D finite difference (FD) simulations. Compared to conventional methods based on statistical theories, we can calculate more realistic synthetics by using the FD simulation. It is not necessary to assume a uniform background velocity, body or surface waves and scattering properties considered in general scattering theories. By taking the ratio of the energy of the coda area to that of the entire area, we can separately estimate the scattering and the intrinsic absorption effects. Our result reveals the spectrum of the random inhomogeneity in a wide wavenumber range including the intensity around the corner wavenumber as P(m) = 8πε2a3/(1 + a2m2)2, where ε = 0.05 and a = 3.1 km, even though past studies analysing higher-frequency records could not detect the corner. Finally, we estimate the intrinsic attenuation by modelling the decay rate of the energy. The method proposed in this study is suitable for quantifying the statistical properties of long-wavelength subsurface random inhomogeneity, which leads the way to characterizing a wider wavenumber range of spectra, including the corner wavenumber.
Spurious correlations and inference in landscape genetics
Samuel A. Cushman; Erin L. Landguth
2010-01-01
Reliable interpretation of landscape genetic analyses depends on statistical methods that have high power to identify the correct process driving gene flow while rejecting incorrect alternative hypotheses. Little is known about statistical power and inference in individual-based landscape genetics. Our objective was to evaluate the power of causalmodelling with partial...
The Poor in 1970: A Chartbook.
ERIC Educational Resources Information Center
Ryscavage, Paul M.
The analyses in this presentation book, prepared by the Policy Research Division of the Office of Planning, Research, and Evaluation, Office of Economic Opportunity, reflect poverty statistics based on data from the Current Population Survey of the Bureau of the Census. These statistics reflect incomes of families and individuals which fall below…
Unconscious analyses of visual scenes based on feature conjunctions.
Tachibana, Ryosuke; Noguchi, Yasuki
2015-06-01
To efficiently process a cluttered scene, the visual system analyzes statistical properties or regularities of visual elements embedded in the scene. It is controversial, however, whether those scene analyses could also work for stimuli unconsciously perceived. Here we show that our brain performs the unconscious scene analyses not only using a single featural cue (e.g., orientation) but also based on conjunctions of multiple visual features (e.g., combinations of color and orientation information). Subjects foveally viewed a stimulus array (duration: 50 ms) where 4 types of bars (red-horizontal, red-vertical, green-horizontal, and green-vertical) were intermixed. Although a conscious perception of those bars was inhibited by a subsequent mask stimulus, the brain correctly analyzed the information about color, orientation, and color-orientation conjunctions of those invisible bars. The information of those features was then used for the unconscious configuration analysis (statistical processing) of the central bars, which induced a perceptual bias and illusory feature binding in visible stimuli at peripheral locations. While statistical analyses and feature binding are normally 2 key functions of the visual system to construct coherent percepts of visual scenes, our results show that a high-level analysis combining those 2 functions is correctly performed by unconscious computations in the brain. (c) 2015 APA, all rights reserved).
Muko, Soyoka; Shimatani, Ichiro K; Nozawa, Yoko
2014-07-01
Spatial distributions of individuals are conventionally analysed by representing objects as dimensionless points, in which spatial statistics are based on centre-to-centre distances. However, if organisms expand without overlapping and show size variations, such as is the case for encrusting corals, interobject spacing is crucial for spatial associations where interactions occur. We introduced new pairwise statistics using minimum distances between objects and demonstrated their utility when examining encrusting coral community data. We also calculated the conventional point process statistics and the grid-based statistics to clarify the advantages and limitations of each spatial statistical method. For simplicity, coral colonies were approximated by disks in these demonstrations. Focusing on short-distance effects, the use of minimum distances revealed that almost all coral genera were aggregated at a scale of 1-25 cm. However, when fragmented colonies (ramets) were treated as a genet, a genet-level analysis indicated weak or no aggregation, suggesting that most corals were randomly distributed and that fragmentation was the primary cause of colony aggregations. In contrast, point process statistics showed larger aggregation scales, presumably because centre-to-centre distances included both intercolony spacing and colony sizes (radius). The grid-based statistics were able to quantify the patch (aggregation) scale of colonies, but the scale was strongly affected by the colony size. Our approach quantitatively showed repulsive effects between an aggressive genus and a competitively weak genus, while the grid-based statistics (covariance function) also showed repulsion although the spatial scale indicated from the statistics was not directly interpretable in terms of ecological meaning. The use of minimum distances together with previously proposed spatial statistics helped us to extend our understanding of the spatial patterns of nonoverlapping objects that vary in size and the associated specific scales. © 2013 The Authors. Journal of Animal Ecology © 2013 British Ecological Society.
NASA Astrophysics Data System (ADS)
ten Veldhuis, Marie-Claire; Schleiss, Marc
2017-04-01
In this study, we introduced an alternative approach for analysis of hydrological flow time series, using an adaptive sampling framework based on inter-amount times (IATs). The main difference with conventional flow time series is the rate at which low and high flows are sampled: the unit of analysis for IATs is a fixed flow amount, instead of a fixed time window. We analysed statistical distributions of flows and IATs across a wide range of sampling scales to investigate sensitivity of statistical properties such as quantiles, variance, skewness, scaling parameters and flashiness indicators to the sampling scale. We did this based on streamflow time series for 17 (semi)urbanised basins in North Carolina, US, ranging from 13 km2 to 238 km2 in size. Results showed that adaptive sampling of flow time series based on inter-amounts leads to a more balanced representation of low flow and peak flow values in the statistical distribution. While conventional sampling gives a lot of weight to low flows, as these are most ubiquitous in flow time series, IAT sampling gives relatively more weight to high flow values, when given flow amounts are accumulated in shorter time. As a consequence, IAT sampling gives more information about the tail of the distribution associated with high flows, while conventional sampling gives relatively more information about low flow periods. We will present results of statistical analyses across a range of subdaily to seasonal scales and will highlight some interesting insights that can be derived from IAT statistics with respect to basin flashiness and impact urbanisation on hydrological response.
2016-11-15
participants who were followed for the development of back pain for an average of 3.9 years. Methods. Descriptive statistics and longitudinal...health, military personnel, occupational health, outcome assessment, statistics, survey methodology . Level of Evidence: 3 Spine 2016;41:1754–1763ack...based on the National Health and Nutrition Examination Survey.21 Statistical Analysis Descriptive and univariate analyses compared character- istics
Shek, Daniel T L; Ma, Cecilia M S
2011-01-05
Although different methods are available for the analyses of longitudinal data, analyses based on generalized linear models (GLM) are criticized as violating the assumption of independence of observations. Alternatively, linear mixed models (LMM) are commonly used to understand changes in human behavior over time. In this paper, the basic concepts surrounding LMM (or hierarchical linear models) are outlined. Although SPSS is a statistical analyses package commonly used by researchers, documentation on LMM procedures in SPSS is not thorough or user friendly. With reference to this limitation, the related procedures for performing analyses based on LMM in SPSS are described. To demonstrate the application of LMM analyses in SPSS, findings based on six waves of data collected in the Project P.A.T.H.S. (Positive Adolescent Training through Holistic Social Programmes) in Hong Kong are presented.
Longitudinal Data Analyses Using Linear Mixed Models in SPSS: Concepts, Procedures and Illustrations
Shek, Daniel T. L.; Ma, Cecilia M. S.
2011-01-01
Although different methods are available for the analyses of longitudinal data, analyses based on generalized linear models (GLM) are criticized as violating the assumption of independence of observations. Alternatively, linear mixed models (LMM) are commonly used to understand changes in human behavior over time. In this paper, the basic concepts surrounding LMM (or hierarchical linear models) are outlined. Although SPSS is a statistical analyses package commonly used by researchers, documentation on LMM procedures in SPSS is not thorough or user friendly. With reference to this limitation, the related procedures for performing analyses based on LMM in SPSS are described. To demonstrate the application of LMM analyses in SPSS, findings based on six waves of data collected in the Project P.A.T.H.S. (Positive Adolescent Training through Holistic Social Programmes) in Hong Kong are presented. PMID:21218263
Measuring the Impacts of ICT Using Official Statistics. OECD Digital Economy Papers, No. 136
ERIC Educational Resources Information Center
Roberts, Sheridan
2008-01-01
This paper describes the findings of an OECD project examining ICT impact measurement and analyses based on official statistics. Both economic and social impacts are covered and some results are presented. It attempts to place ICT impacts measurement into an Information Society conceptual framework, provides some suggestions for standardising…
Interpretation of correlations in clinical research.
Hung, Man; Bounsanga, Jerry; Voss, Maren Wright
2017-11-01
Critically analyzing research is a key skill in evidence-based practice and requires knowledge of research methods, results interpretation, and applications, all of which rely on a foundation based in statistics. Evidence-based practice makes high demands on trained medical professionals to interpret an ever-expanding array of research evidence. As clinical training emphasizes medical care rather than statistics, it is useful to review the basics of statistical methods and what they mean for interpreting clinical studies. We reviewed the basic concepts of correlational associations, violations of normality, unobserved variable bias, sample size, and alpha inflation. The foundations of causal inference were discussed and sound statistical analyses were examined. We discuss four ways in which correlational analysis is misused, including causal inference overreach, over-reliance on significance, alpha inflation, and sample size bias. Recent published studies in the medical field provide evidence of causal assertion overreach drawn from correlational findings. The findings present a primer on the assumptions and nature of correlational methods of analysis and urge clinicians to exercise appropriate caution as they critically analyze the evidence before them and evaluate evidence that supports practice. Critically analyzing new evidence requires statistical knowledge in addition to clinical knowledge. Studies can overstate relationships, expressing causal assertions when only correlational evidence is available. Failure to account for the effect of sample size in the analyses tends to overstate the importance of predictive variables. It is important not to overemphasize the statistical significance without consideration of effect size and whether differences could be considered clinically meaningful.
Valid randomization-based p-values for partially post hoc subgroup analyses.
Lee, Joseph J; Rubin, Donald B
2015-10-30
By 'partially post-hoc' subgroup analyses, we mean analyses that compare existing data from a randomized experiment-from which a subgroup specification is derived-to new, subgroup-only experimental data. We describe a motivating example in which partially post hoc subgroup analyses instigated statistical debate about a medical device's efficacy. We clarify the source of such analyses' invalidity and then propose a randomization-based approach for generating valid posterior predictive p-values for such partially post hoc subgroups. Lastly, we investigate the approach's operating characteristics in a simple illustrative setting through a series of simulations, showing that it can have desirable properties under both null and alternative hypotheses. Copyright © 2015 John Wiley & Sons, Ltd.
Statistical analysis of iron geochemical data suggests limited late Proterozoic oxygenation
NASA Astrophysics Data System (ADS)
Sperling, Erik A.; Wolock, Charles J.; Morgan, Alex S.; Gill, Benjamin C.; Kunzmann, Marcus; Halverson, Galen P.; MacDonald, Francis A.; Knoll, Andrew H.; Johnston, David T.
2015-07-01
Sedimentary rocks deposited across the Proterozoic-Phanerozoic transition record extreme climate fluctuations, a potential rise in atmospheric oxygen or re-organization of the seafloor redox landscape, and the initial diversification of animals. It is widely assumed that the inferred redox change facilitated the observed trends in biodiversity. Establishing this palaeoenvironmental context, however, requires that changes in marine redox structure be tracked by means of geochemical proxies and translated into estimates of atmospheric oxygen. Iron-based proxies are among the most effective tools for tracking the redox chemistry of ancient oceans. These proxies are inherently local, but have global implications when analysed collectively and statistically. Here we analyse about 4,700 iron-speciation measurements from shales 2,300 to 360 million years old. Our statistical analyses suggest that subsurface water masses in mid-Proterozoic oceans were predominantly anoxic and ferruginous (depleted in dissolved oxygen and iron-bearing), but with a tendency towards euxinia (sulfide-bearing) that is not observed in the Neoproterozoic era. Analyses further indicate that early animals did not experience appreciable benthic sulfide stress. Finally, unlike proxies based on redox-sensitive trace-metal abundances, iron geochemical data do not show a statistically significant change in oxygen content through the Ediacaran and Cambrian periods, sharply constraining the magnitude of the end-Proterozoic oxygen increase. Indeed, this re-analysis of trace-metal data is consistent with oxygenation continuing well into the Palaeozoic era. Therefore, if changing redox conditions facilitated animal diversification, it did so through a limited rise in oxygen past critical functional and ecological thresholds, as is seen in modern oxygen minimum zone benthic animal communities.
Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi
2014-09-18
Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.
Lewis, Gregory F.; Furman, Senta A.; McCool, Martha F.; Porges, Stephen W.
2011-01-01
Three frequently used RSA metrics are investigated to document violations of assumptions for parametric analyses, moderation by respiration, influences of nonstationarity, and sensitivity to vagal blockade. Although all metrics are highly correlated, new findings illustrate that the metrics are noticeably different on the above dimensions. Only one method conforms to the assumptions for parametric analyses, is not moderated by respiration, is not influenced by nonstationarity, and reliably generates stronger effect sizes. Moreover, this method is also the most sensitive to vagal blockade. Specific features of this method may provide insights into improving the statistical characteristics of other commonly used RSA metrics. These data provide the evidence to question, based on statistical grounds, published reports using particular metrics of RSA. PMID:22138367
Analyses of the 1981-82 Illinois Public Library Statistics.
ERIC Educational Resources Information Center
Wallace, Danny P.
Using data provided by the annual reports of Illinois public libraries and by the Illinois state library, this publication is a companion to the November 1982 issue of "Illinois Libraries," which enumerated the 16 data elements upon which the analyses are based. Three additional types of information are provided for each of six…
The disagreeable behaviour of the kappa statistic.
Flight, Laura; Julious, Steven A
2015-01-01
It is often of interest to measure the agreement between a number of raters when an outcome is nominal or ordinal. The kappa statistic is used as a measure of agreement. The statistic is highly sensitive to the distribution of the marginal totals and can produce unreliable results. Other statistics such as the proportion of concordance, maximum attainable kappa and prevalence and bias adjusted kappa should be considered to indicate how well the kappa statistic represents agreement in the data. Each kappa should be considered and interpreted based on the context of the data being analysed. Copyright © 2014 John Wiley & Sons, Ltd.
A Framework for Assessing High School Students' Statistical Reasoning.
Chan, Shiau Wei; Ismail, Zaleha; Sumintono, Bambang
2016-01-01
Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students' statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students' statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework's cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments.
A Framework for Assessing High School Students' Statistical Reasoning
2016-01-01
Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students’ statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students’ statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework’s cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments. PMID:27812091
An Exploratory Data Analysis System for Support in Medical Decision-Making
Copeland, J. A.; Hamel, B.; Bourne, J. R.
1979-01-01
An experimental system was developed to allow retrieval and analysis of data collected during a study of neurobehavioral correlates of renal disease. After retrieving data organized in a relational data base, simple bivariate statistics of parametric and nonparametric nature could be conducted. An “exploratory” mode in which the system provided guidance in selection of appropriate statistical analyses was also available to the user. The system traversed a decision tree using the inherent qualities of the data (e.g., the identity and number of patients, tests, and time epochs) to search for the appropriate analyses to employ.
Khan, Asaduzzaman; Chien, Chi-Wen; Bagraith, Karl S
2015-04-01
To investigate whether using a parametric statistic in comparing groups leads to different conclusions when using summative scores from rating scales compared with using their corresponding Rasch-based measures. A Monte Carlo simulation study was designed to examine between-group differences in the change scores derived from summative scores from rating scales, and those derived from their corresponding Rasch-based measures, using 1-way analysis of variance. The degree of inconsistency between the 2 scoring approaches (i.e. summative and Rasch-based) was examined, using varying sample sizes, scale difficulties and person ability conditions. This simulation study revealed scaling artefacts that could arise from using summative scores rather than Rasch-based measures for determining the changes between groups. The group differences in the change scores were statistically significant for summative scores under all test conditions and sample size scenarios. However, none of the group differences in the change scores were significant when using the corresponding Rasch-based measures. This study raises questions about the validity of the inference on group differences of summative score changes in parametric analyses. Moreover, it provides a rationale for the use of Rasch-based measures, which can allow valid parametric analyses of rating scale data.
Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses
Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy
2015-01-01
Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning. PMID:26086579
Barbie, Dana L.; Wehmeyer, Loren L.
2012-01-01
Trends in selected streamflow statistics during 1922-2009 were evaluated at 19 long-term streamflow-gaging stations considered indicative of outflows from Texas to Arkansas, Louisiana, Galveston Bay, and the Gulf of Mexico. The U.S. Geological Survey, in cooperation with the Texas Water Development Board, evaluated streamflow data from streamflow-gaging stations with more than 50 years of record that were active as of 2009. The outflows into Arkansas and Louisiana were represented by 3 streamflow-gaging stations, and outflows into the Gulf of Mexico, including Galveston Bay, were represented by 16 streamflow-gaging stations. Monotonic trend analyses were done using the following three streamflow statistics generated from daily mean values of streamflow: (1) annual mean daily discharge, (2) annual maximum daily discharge, and (3) annual minimum daily discharge. The trend analyses were based on the nonparametric Kendall's Tau test, which is useful for the detection of monotonic upward or downward trends with time. A total of 69 trend analyses by Kendall's Tau were computed - 19 periods of streamflow multiplied by the 3 streamflow statistics plus 12 additional trend analyses because the periods of record for 2 streamflow-gaging stations were divided into periods representing pre- and post-reservoir impoundment. Unless otherwise described, each trend analysis used the entire period of record for each streamflow-gaging station. The monotonic trend analysis detected 11 statistically significant downward trends, 37 instances of no trend, and 21 statistically significant upward trends. One general region studied, which seemingly has relatively more upward trends for many of the streamflow statistics analyzed, includes the rivers and associated creeks and bayous to Galveston Bay in the Houston metropolitan area. Lastly, the most western river basins considered (the Nueces and Rio Grande) had statistically significant downward trends for many of the streamflow statistics analyzed.
Post Hoc Analyses of ApoE Genotype-Defined Subgroups in Clinical Trials.
Kennedy, Richard E; Cutter, Gary R; Wang, Guoqiao; Schneider, Lon S
2016-01-01
Many post hoc analyses of clinical trials in Alzheimer's disease (AD) and mild cognitive impairment (MCI) are in small Phase 2 trials. Subject heterogeneity may lead to statistically significant post hoc results that cannot be replicated in larger follow-up studies. We investigated the extent of this problem using simulation studies mimicking current trial methods with post hoc analyses based on ApoE4 carrier status. We used a meta-database of 24 studies, including 3,574 subjects with mild AD and 1,171 subjects with MCI/prodromal AD, to simulate clinical trial scenarios. Post hoc analyses examined if rates of progression on the Alzheimer's Disease Assessment Scale-cognitive (ADAS-cog) differed between ApoE4 carriers and non-carriers. Across studies, ApoE4 carriers were younger and had lower baseline scores, greater rates of progression, and greater variability on the ADAS-cog. Up to 18% of post hoc analyses for 18-month trials in AD showed greater rates of progression for ApoE4 non-carriers that were statistically significant but unlikely to be confirmed in follow-up studies. The frequency of erroneous conclusions dropped below 3% with trials of 100 subjects per arm. In MCI, rates of statistically significant differences with greater progression in ApoE4 non-carriers remained below 3% unless sample sizes were below 25 subjects per arm. Statistically significant differences for ApoE4 in post hoc analyses often reflect heterogeneity among small samples rather than true differential effect among ApoE4 subtypes. Such analyses must be viewed cautiously. ApoE genotype should be incorporated into the design stage to minimize erroneous conclusions.
Rao, Goutham; Lopez-Jimenez, Francisco; Boyd, Jack; D'Amico, Frank; Durant, Nefertiti H; Hlatky, Mark A; Howard, George; Kirley, Katherine; Masi, Christopher; Powell-Wiley, Tiffany M; Solomonides, Anthony E; West, Colin P; Wessel, Jennifer
2017-09-05
Meta-analyses are becoming increasingly popular, especially in the fields of cardiovascular disease prevention and treatment. They are often considered to be a reliable source of evidence for making healthcare decisions. Unfortunately, problems among meta-analyses such as the misapplication and misinterpretation of statistical methods and tests are long-standing and widespread. The purposes of this statement are to review key steps in the development of a meta-analysis and to provide recommendations that will be useful for carrying out meta-analyses and for readers and journal editors, who must interpret the findings and gauge methodological quality. To make the statement practical and accessible, detailed descriptions of statistical methods have been omitted. Based on a survey of cardiovascular meta-analyses, published literature on methodology, expert consultation, and consensus among the writing group, key recommendations are provided. Recommendations reinforce several current practices, including protocol registration; comprehensive search strategies; methods for data extraction and abstraction; methods for identifying, measuring, and dealing with heterogeneity; and statistical methods for pooling results. Other practices should be discontinued, including the use of levels of evidence and evidence hierarchies to gauge the value and impact of different study designs (including meta-analyses) and the use of structured tools to assess the quality of studies to be included in a meta-analysis. We also recommend choosing a pooling model for conventional meta-analyses (fixed effect or random effects) on the basis of clinical and methodological similarities among studies to be included, rather than the results of a test for statistical heterogeneity. © 2017 American Heart Association, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Metoyer, Candace N.; Walsh, Stephen J.; Tardiff, Mark F.
2008-10-30
The detection and identification of weak gaseous plumes using thermal imaging data is complicated by many factors. These include variability due to atmosphere, ground and plume temperature, and background clutter. This paper presents an analysis of one formulation of the physics-based model that describes the at-sensor observed radiance. The motivating question for the analyses performed in this paper is as follows. Given a set of backgrounds, is there a way to predict the background over which the probability of detecting a given chemical will be the highest? Two statistics were developed to address this question. These statistics incorporate data frommore » the long-wave infrared band to predict the background over which chemical detectability will be the highest. These statistics can be computed prior to data collection. As a preliminary exploration into the predictive ability of these statistics, analyses were performed on synthetic hyperspectral images. Each image contained one chemical (either carbon tetrachloride or ammonia) spread across six distinct background types. The statistics were used to generate predictions for the background ranks. Then, the predicted ranks were compared to the empirical ranks obtained from the analyses of the synthetic images. For the simplified images under consideration, the predicted and empirical ranks showed a promising amount of agreement. One statistic accurately predicted the best and worst background for detection in all of the images. Future work may include explorations of more complicated plume ingredients, background types, and noise structures.« less
Quasi-Static Probabilistic Structural Analyses Process and Criteria
NASA Technical Reports Server (NTRS)
Goldberg, B.; Verderaime, V.
1999-01-01
Current deterministic structural methods are easily applied to substructures and components, and analysts have built great design insights and confidence in them over the years. However, deterministic methods cannot support systems risk analyses, and it was recently reported that deterministic treatment of statistical data is inconsistent with error propagation laws that can result in unevenly conservative structural predictions. Assuming non-nal distributions and using statistical data formats throughout prevailing stress deterministic processes lead to a safety factor in statistical format, which integrated into the safety index, provides a safety factor and first order reliability relationship. The embedded safety factor in the safety index expression allows a historically based risk to be determined and verified over a variety of quasi-static metallic substructures consistent with the traditional safety factor methods and NASA Std. 5001 criteria.
Mass univariate analysis of event-related brain potentials/fields I: a critical tutorial review.
Groppe, David M; Urbach, Thomas P; Kutas, Marta
2011-12-01
Event-related potentials (ERPs) and magnetic fields (ERFs) are typically analyzed via ANOVAs on mean activity in a priori windows. Advances in computing power and statistics have produced an alternative, mass univariate analyses consisting of thousands of statistical tests and powerful corrections for multiple comparisons. Such analyses are most useful when one has little a priori knowledge of effect locations or latencies, and for delineating effect boundaries. Mass univariate analyses complement and, at times, obviate traditional analyses. Here we review this approach as applied to ERP/ERF data and four methods for multiple comparison correction: strong control of the familywise error rate (FWER) via permutation tests, weak control of FWER via cluster-based permutation tests, false discovery rate control, and control of the generalized FWER. We end with recommendations for their use and introduce free MATLAB software for their implementation. Copyright © 2011 Society for Psychophysiological Research.
Impact of ontology evolution on functional analyses.
Groß, Anika; Hartung, Michael; Prüfer, Kay; Kelso, Janet; Rahm, Erhard
2012-10-15
Ontologies are used in the annotation and analysis of biological data. As knowledge accumulates, ontologies and annotation undergo constant modifications to reflect this new knowledge. These modifications may influence the results of statistical applications such as functional enrichment analyses that describe experimental data in terms of ontological groupings. Here, we investigate to what degree modifications of the Gene Ontology (GO) impact these statistical analyses for both experimental and simulated data. The analysis is based on new measures for the stability of result sets and considers different ontology and annotation changes. Our results show that past changes in the GO are non-uniformly distributed over different branches of the ontology. Considering the semantic relatedness of significant categories in analysis results allows a more realistic stability assessment for functional enrichment studies. We observe that the results of term-enrichment analyses tend to be surprisingly stable despite changes in ontology and annotation.
Mass spectrometry-based protein identification with accurate statistical significance assignment.
Alves, Gelio; Yu, Yi-Kuo
2015-03-01
Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sorić formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
ERIC Educational Resources Information Center
Chen, Xianglei; Lauff, Erich; Arbeit, Caren A.; Henke, Robin; Skomsvold, Paul; Hufford, Justine
2017-01-01
This Statistical Analysis Report tracks a cohort of 2002 high school sophomores over 10 years, examining the extent to which cohort members had reached such life course milestones as finishing school, starting a job, leaving home, getting married, and having children. The analyses in this report are based on data from the Education Longitudinal…
An evaluation of GTAW-P versus GTA welding of alloy 718
NASA Technical Reports Server (NTRS)
Gamwell, W. R.; Kurgan, C.; Malone, T. W.
1991-01-01
Mechanical properties were evaluated to determine statistically whether the pulsed current gas tungsten arc welding (GTAW-P) process produces welds in alloy 718 with room temperature structural performance equivalent to current Space Shuttle Main Engine (SSME) welds manufactured by the constant current GTAW-P process. Evaluations were conducted on two base metal lots, two filler metal lots, two heat input levels, and two welding processes. The material form was 0.125-inch (3.175-mm) alloy 718 sheet. Prior to welding, sheets were treated to either the ST or STA-1 condition. After welding, panels were left as welded or heat treated to the STA-1 condition, and weld beads were left intact or machined flush. Statistical analyses were performed on yield strength, ultimate tensile strength (UTS), and high cycle fatigue (HCF) properties for all the post welded material conditions. Analyses of variance were performed on the data to determine if there were any significant effects on UTS or HCF life due to variations in base metal, filler metal, heat input level, or welding process. Statistical analyses showed that the GTAW-P process does produce welds with room temperature structural performance equivalent to current SSME welds manufactured by the GTAW process, regardless of prior material condition or post welding condition.
Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kleijnen, J.P.C.; Helton, J.C.
1999-04-01
The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are consideredmore » for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.« less
Peng, Fei; Li, Jiao-ting; Long, Min
2015-03-01
To discriminate the acquisition pipelines of digital images, a novel scheme for the identification of natural images and computer-generated graphics is proposed based on statistical and textural features. First, the differences between them are investigated from the view of statistics and texture, and 31 dimensions of feature are acquired for identification. Then, LIBSVM is used for the classification. Finally, the experimental results are presented. The results show that it can achieve an identification accuracy of 97.89% for computer-generated graphics, and an identification accuracy of 97.75% for natural images. The analyses also demonstrate the proposed method has excellent performance, compared with some existing methods based only on statistical features or other features. The method has a great potential to be implemented for the identification of natural images and computer-generated graphics. © 2014 American Academy of Forensic Sciences.
Electric Field Magnitude and Radar Reflectivity as a Function of Distance from Cloud Edge
NASA Technical Reports Server (NTRS)
Ward, Jennifer G.; Merceret, Francis J.
2004-01-01
The results of analyses of data collected during a field investigation of thunderstorm anvil and debris clouds are reported. Statistics of the magnitude of the electric field are determined as a function of distance from cloud edge. Statistics of radar reflectivity near cloud edge are also determined. Both analyses use in-situ airborne field mill and cloud physics data coupled with ground-based radar measurements obtained in east-central Florida during the summer convective season. Electric fields outside of anvil and debris clouds averaged less than 3 kV/m. The average radar reflectivity at the cloud edge ranged between 0 and 5 dBZ.
The November 1, 2017 issue of Cancer Research is dedicated to a collection of computational resource papers in genomics, proteomics, animal models, imaging, and clinical subjects for non-bioinformaticists looking to incorporate computing tools into their work. Scientists at Pacific Northwest National Laboratory have developed P-MartCancer, an open, web-based interactive software tool that enables statistical analyses of peptide or protein data generated from mass-spectrometry (MS)-based global proteomics experiments.
Progressive statistics for studies in sports medicine and exercise science.
Hopkins, William G; Marshall, Stephen W; Batterham, Alan M; Hanin, Juri
2009-01-01
Statistical guidelines and expert statements are now available to assist in the analysis and reporting of studies in some biomedical disciplines. We present here a more progressive resource for sample-based studies, meta-analyses, and case studies in sports medicine and exercise science. We offer forthright advice on the following controversial or novel issues: using precision of estimation for inferences about population effects in preference to null-hypothesis testing, which is inadequate for assessing clinical or practical importance; justifying sample size via acceptable precision or confidence for clinical decisions rather than via adequate power for statistical significance; showing SD rather than SEM, to better communicate the magnitude of differences in means and nonuniformity of error; avoiding purely nonparametric analyses, which cannot provide inferences about magnitude and are unnecessary; using regression statistics in validity studies, in preference to the impractical and biased limits of agreement; making greater use of qualitative methods to enrich sample-based quantitative projects; and seeking ethics approval for public access to the depersonalized raw data of a study, to address the need for more scrutiny of research and better meta-analyses. Advice on less contentious issues includes the following: using covariates in linear models to adjust for confounders, to account for individual differences, and to identify potential mechanisms of an effect; using log transformation to deal with nonuniformity of effects and error; identifying and deleting outliers; presenting descriptive, effect, and inferential statistics in appropriate formats; and contending with bias arising from problems with sampling, assignment, blinding, measurement error, and researchers' prejudices. This article should advance the field by stimulating debate, promoting innovative approaches, and serving as a useful checklist for authors, reviewers, and editors.
The concept and use of elasticity in population viability models [Exercise 13
Carolyn Hull Sieg; Rudy M. King; Fred Van Dyke
2003-01-01
As you have seen in exercise 12, plants, such as the western prairie fringed orchid, typically have distinct life stages and complex life cycles that require the matrix analyses associated with a stage-based population model. Some statistics that can be generated from such matrix analyses can be very informative in determining which variables in the model have the...
Statistics for NAEG: past efforts, new results, and future plans
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gilbert, R.O.; Simpson, J.C.; Kinnison, R.R.
A brief review of Nevada Applied Ecology Group (NAEG) objectives is followed by a summary of past statistical analyses conducted by Pacific Northwest Laboratory for the NAEG. Estimates of spatial pattern of radionuclides and other statistical analyses at NS's 201, 219 and 221 are reviewed as background for new analyses presented in this paper. Suggested NAEG activities and statistical analyses needed for the projected termination date of NAEG studies in March 1986 are given.
Review of Statistical Methods for Analysing Healthcare Resources and Costs
Mihaylova, Borislava; Briggs, Andrew; O'Hagan, Anthony; Thompson, Simon G
2011-01-01
We review statistical methods for analysing healthcare resource use and costs, their ability to address skewness, excess zeros, multimodality and heavy right tails, and their ease for general use. We aim to provide guidance on analysing resource use and costs focusing on randomised trials, although methods often have wider applicability. Twelve broad categories of methods were identified: (I) methods based on the normal distribution, (II) methods following transformation of data, (III) single-distribution generalized linear models (GLMs), (IV) parametric models based on skewed distributions outside the GLM family, (V) models based on mixtures of parametric distributions, (VI) two (or multi)-part and Tobit models, (VII) survival methods, (VIII) non-parametric methods, (IX) methods based on truncation or trimming of data, (X) data components models, (XI) methods based on averaging across models, and (XII) Markov chain methods. Based on this review, our recommendations are that, first, simple methods are preferred in large samples where the near-normality of sample means is assured. Second, in somewhat smaller samples, relatively simple methods, able to deal with one or two of above data characteristics, may be preferable but checking sensitivity to assumptions is necessary. Finally, some more complex methods hold promise, but are relatively untried; their implementation requires substantial expertise and they are not currently recommended for wider applied work. Copyright © 2010 John Wiley & Sons, Ltd. PMID:20799344
The effect of noise-induced variance on parameter recovery from reaction times.
Vadillo, Miguel A; Garaizar, Pablo
2016-03-31
Technical noise can compromise the precision and accuracy of the reaction times collected in psychological experiments, especially in the case of Internet-based studies. Although this noise seems to have only a small impact on traditional statistical analyses, its effects on model fit to reaction-time distributions remains unexplored. Across four simulations we study the impact of technical noise on parameter recovery from data generated from an ex-Gaussian distribution and from a Ratcliff Diffusion Model. Our results suggest that the impact of noise-induced variance tends to be limited to specific parameters and conditions. Although we encourage researchers to adopt all measures to reduce the impact of noise on reaction-time experiments, we conclude that the typical amount of noise-induced variance found in these experiments does not pose substantial problems for statistical analyses based on model fitting.
Impact of animal health programmes on poverty reduction and sustainable livestock development.
Pradere, J P
2017-04-01
Based on data from publications and field observations, this study analyses the interactions between animal health, rural poverty and the performance and environmental impact of livestock farming in low-income countries and middle-income countries. There are strong statistical correlations between the quality of Veterinary Services, livestock productivity and poverty rates. In countries with effective Veterinary Services, livestock growth stems mainly from productivity gains and poverty rates are the lowest. Conversely, these analyses identify no statistical link between the quality of Veterinary Services and increased livestock production volumes. However, where animal diseases are poorly controlled, productivity is low and livestock growth is extensive, based mainly on a steady increase in animal numbers. Extensive growth is less effective than intensive growth in reducing poverty and aggravates the pressure of livestock production on natural resources and the climate.
Walking through the statistical black boxes of plant breeding.
Xavier, Alencar; Muir, William M; Craig, Bruce; Rainey, Katy Martin
2016-10-01
The main statistical procedures in plant breeding are based on Gaussian process and can be computed through mixed linear models. Intelligent decision making relies on our ability to extract useful information from data to help us achieve our goals more efficiently. Many plant breeders and geneticists perform statistical analyses without understanding the underlying assumptions of the methods or their strengths and pitfalls. In other words, they treat these statistical methods (software and programs) like black boxes. Black boxes represent complex pieces of machinery with contents that are not fully understood by the user. The user sees the inputs and outputs without knowing how the outputs are generated. By providing a general background on statistical methodologies, this review aims (1) to introduce basic concepts of machine learning and its applications to plant breeding; (2) to link classical selection theory to current statistical approaches; (3) to show how to solve mixed models and extend their application to pedigree-based and genomic-based prediction; and (4) to clarify how the algorithms of genome-wide association studies work, including their assumptions and limitations.
Dark matter constraints from a joint analysis of dwarf Spheroidal galaxy observations with VERITAS
Archambault, S.; Archer, A.; Benbow, W.; ...
2017-04-05
We present constraints on the annihilation cross section of weakly interacting massive particles dark matter based on the joint statistical analysis of four dwarf galaxies with VERITAS. These results are derived from an optimized photon weighting statistical technique that improves on standard imaging atmospheric Cherenkov telescope (IACT) analyses by utilizing the spectral and spatial properties of individual photon events.
Parkhurst, R.S.; Christenson, S.C.
1987-01-01
Hydrochemical data were compiled into a data base as part of the Central Midwest Regional Aquifer System Analysis project. The data consist of chemical analyses of water samples collected from wells that are completed in formations of Mesozoic and Paleozoic age. The data base includes data from the National Water Data Storage and Retrieval System, the Petroleum Data System, the National Uranium Resource Evaluation, and selected publications. Chemical analyses were selected for inclusion within the hydrochemical data base if the total concentration of the cations differed from the total 10 percent or less of the total concentration of all ions. Those analyses which lacked the necessary data for an ionic balance were included if the ratios of dissolved-solids concentration to specific conductance were between 0.55 and 0.75. The tabulated chemical analyses, grouped by county, and a statistical summary of the analyses, listed by geologic unit, are presented.
Hu, Ting; Pan, Qinxin; Andrew, Angeline S; Langer, Jillian M; Cole, Michael D; Tomlinson, Craig R; Karagas, Margaret R; Moore, Jason H
2014-04-11
Several different genetic and environmental factors have been identified as independent risk factors for bladder cancer in population-based studies. Recent studies have turned to understanding the role of gene-gene and gene-environment interactions in determining risk. We previously developed the bioinformatics framework of statistical epistasis networks (SEN) to characterize the global structure of interacting genetic factors associated with a particular disease or clinical outcome. By applying SEN to a population-based study of bladder cancer among Caucasians in New Hampshire, we were able to identify a set of connected genetic factors with strong and significant interaction effects on bladder cancer susceptibility. To support our statistical findings using networks, in the present study, we performed pathway enrichment analyses on the set of genes identified using SEN, and found that they are associated with the carcinogen benzo[a]pyrene, a component of tobacco smoke. We further carried out an mRNA expression microarray experiment to validate statistical genetic interactions, and to determine if the set of genes identified in the SEN were differentially expressed in a normal bladder cell line and a bladder cancer cell line in the presence or absence of benzo[a]pyrene. Significant nonrandom sets of genes from the SEN were found to be differentially expressed in response to benzo[a]pyrene in both the normal bladder cells and the bladder cancer cells. In addition, the patterns of gene expression were significantly different between these two cell types. The enrichment analyses and the gene expression microarray results support the idea that SEN analysis of bladder in population-based studies is able to identify biologically meaningful statistical patterns. These results bring us a step closer to a systems genetic approach to understanding cancer susceptibility that integrates population and laboratory-based studies.
The sumLINK statistic for genetic linkage analysis in the presence of heterogeneity.
Christensen, G B; Knight, S; Camp, N J
2009-11-01
We present the "sumLINK" statistic--the sum of multipoint LOD scores for the subset of pedigrees with nominally significant linkage evidence at a given locus--as an alternative to common methods to identify susceptibility loci in the presence of heterogeneity. We also suggest the "sumLOD" statistic (the sum of positive multipoint LOD scores) as a companion to the sumLINK. sumLINK analysis identifies genetic regions of extreme consistency across pedigrees without regard to negative evidence from unlinked or uninformative pedigrees. Significance is determined by an innovative permutation procedure based on genome shuffling that randomizes linkage information across pedigrees. This procedure for generating the empirical null distribution may be useful for other linkage-based statistics as well. Using 500 genome-wide analyses of simulated null data, we show that the genome shuffling procedure results in the correct type 1 error rates for both the sumLINK and sumLOD. The power of the statistics was tested using 100 sets of simulated genome-wide data from the alternative hypothesis from GAW13. Finally, we illustrate the statistics in an analysis of 190 aggressive prostate cancer pedigrees from the International Consortium for Prostate Cancer Genetics, where we identified a new susceptibility locus. We propose that the sumLINK and sumLOD are ideal for collaborative projects and meta-analyses, as they do not require any sharing of identifiable data between contributing institutions. Further, loci identified with the sumLINK have good potential for gene localization via statistical recombinant mapping, as, by definition, several linked pedigrees contribute to each peak.
ERIC Educational Resources Information Center
Jensen, Matilde Bisballe; Utriainen, Tuuli Maria; Steinert, Martin
2018-01-01
This paper presents the experienced difficulties of students participating in the multidisciplinary, remote collaborating engineering design course challenge-based innovation at CERN. This is with the aim to identify learning barriers and improve future learning experiences. We statistically analyse the rated differences between distinct design…
Some Psychometric and Design Implications of Game-Based Learning Analytics
ERIC Educational Resources Information Center
Gibson, David; Clarke-Midura, Jody
2013-01-01
The rise of digital game and simulation-based learning applications has led to new approaches in educational measurement that take account of patterns in time, high resolution paths of action, and clusters of virtual performance artifacts. The new approaches, which depart from traditional statistical analyses, include data mining, machine…
P-MartCancer–Interactive Online Software to Enable Analysis of Shotgun Cancer Proteomic Datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Webb-Robertson, Bobbie-Jo M.; Bramer, Lisa M.; Jensen, Jeffrey L.
P-MartCancer is a new interactive web-based software environment that enables biomedical and biological scientists to perform in-depth analyses of global proteomics data without requiring direct interaction with the data or with statistical software. P-MartCancer offers a series of statistical modules associated with quality assessment, peptide and protein statistics, protein quantification and exploratory data analyses driven by the user via customized workflows and interactive visualization. Currently, P-MartCancer offers access to multiple cancer proteomic datasets generated through the Clinical Proteomics Tumor Analysis Consortium (CPTAC) at the peptide, gene and protein levels. P-MartCancer is deployed using Azure technologies (http://pmart.labworks.org/cptac.html), the web-service is alternativelymore » available via Docker Hub (https://hub.docker.com/r/pnnl/pmart-web/) and many statistical functions can be utilized directly from an R package available on GitHub (https://github.com/pmartR).« less
Quinn, Michael C J; Wilson, Daniel J; Young, Fiona; Dempsey, Adam A; Arcand, Suzanna L; Birch, Ashley H; Wojnarowicz, Paulina M; Provencher, Diane; Mes-Masson, Anne-Marie; Englert, David; Tonin, Patricia N
2009-07-06
As gene expression signatures may serve as biomarkers, there is a need to develop technologies based on mRNA expression patterns that are adaptable for translational research. Xceed Molecular has recently developed a Ziplex technology, that can assay for gene expression of a discrete number of genes as a focused array. The present study has evaluated the reproducibility of the Ziplex system as applied to ovarian cancer research of genes shown to exhibit distinct expression profiles initially assessed by Affymetrix GeneChip analyses. The new chemiluminescence-based Ziplex gene expression array technology was evaluated for the expression of 93 genes selected based on their Affymetrix GeneChip profiles as applied to ovarian cancer research. Probe design was based on the Affymetrix target sequence that favors the 3' UTR of transcripts in order to maximize reproducibility across platforms. Gene expression analysis was performed using the Ziplex Automated Workstation. Statistical analyses were performed to evaluate reproducibility of both the magnitude of expression and differences between normal and tumor samples by correlation analyses, fold change differences and statistical significance testing. Expressions of 82 of 93 (88.2%) genes were highly correlated (p < 0.01) in a comparison of the two platforms. Overall, 75 of 93 (80.6%) genes exhibited consistent results in normal versus tumor tissue comparisons for both platforms (p < 0.001). The fold change differences were concordant for 87 of 93 (94%) genes, where there was agreement between the platforms regarding statistical significance for 71 (76%) of 87 genes. There was a strong agreement between the two platforms as shown by comparisons of log2 fold differences of gene expression between tumor versus normal samples (R = 0.93) and by Bland-Altman analysis, where greater than 90% of expression values fell within the 95% limits of agreement. Overall concordance of gene expression patterns based on correlations, statistical significance between tumor and normal ovary data, and fold changes was consistent between the Ziplex and Affymetrix platforms. The reproducibility and ease-of-use of the technology suggests that the Ziplex array is a suitable platform for translational research.
Huh, Iksoo; Wu, Xin; Park, Taesung; Yi, Soojin V
2017-07-21
DNA methylation is one of the most extensively studied epigenetic modifications of genomic DNA. In recent years, sequencing of bisulfite-converted DNA, particularly via next-generation sequencing technologies, has become a widely popular method to study DNA methylation. This method can be readily applied to a variety of species, dramatically expanding the scope of DNA methylation studies beyond the traditionally studied human and mouse systems. In parallel to the increasing wealth of genomic methylation profiles, many statistical tools have been developed to detect differentially methylated loci (DMLs) or differentially methylated regions (DMRs) between biological conditions. We discuss and summarize several key properties of currently available tools to detect DMLs and DMRs from sequencing of bisulfite-converted DNA. However, the majority of the statistical tools developed for DML/DMR analyses have been validated using only mammalian data sets, and less priority has been placed on the analyses of invertebrate or plant DNA methylation data. We demonstrate that genomic methylation profiles of non-mammalian species are often highly distinct from those of mammalian species using examples of honey bees and humans. We then discuss how such differences in data properties may affect statistical analyses. Based on these differences, we provide three specific recommendations to improve the power and accuracy of DML and DMR analyses of invertebrate data when using currently available statistical tools. These considerations should facilitate systematic and robust analyses of DNA methylation from diverse species, thus advancing our understanding of DNA methylation. © The Author 2017. Published by Oxford University Press.
Formalizing the definition of meta-analysis in Molecular Ecology.
ArchMiller, Althea A; Bauer, Eric F; Koch, Rebecca E; Wijayawardena, Bhagya K; Anil, Ammu; Kottwitz, Jack J; Munsterman, Amelia S; Wilson, Alan E
2015-08-01
Meta-analysis, the statistical synthesis of pertinent literature to develop evidence-based conclusions, is relatively new to the field of molecular ecology, with the first meta-analysis published in the journal Molecular Ecology in 2003 (Slate & Phua 2003). The goal of this article is to formalize the definition of meta-analysis for the authors, editors, reviewers and readers of Molecular Ecology by completing a review of the meta-analyses previously published in this journal. We also provide a brief overview of the many components required for meta-analysis with a more specific discussion of the issues related to the field of molecular ecology, including the use and statistical considerations of Wright's FST and its related analogues as effect sizes in meta-analysis. We performed a literature review to identify articles published as 'meta-analyses' in Molecular Ecology, which were then evaluated by at least two reviewers. We specifically targeted Molecular Ecology publications because as a flagship journal in this field, meta-analyses published in Molecular Ecology have the potential to set the standard for meta-analyses in other journals. We found that while many of these reviewed articles were strong meta-analyses, others failed to follow standard meta-analytical techniques. One of these unsatisfactory meta-analyses was in fact a secondary analysis. Other studies attempted meta-analyses but lacked the fundamental statistics that are considered necessary for an effective and powerful meta-analysis. By drawing attention to the inconsistency of studies labelled as meta-analyses, we emphasize the importance of understanding the components of traditional meta-analyses to fully embrace the strengths of quantitative data synthesis in the field of molecular ecology. © 2015 John Wiley & Sons Ltd.
ERIC Educational Resources Information Center
Brandon, Paul R.; Harrison, George M.; Lawton, Brian E.
2013-01-01
When evaluators plan site-randomized experiments, they must conduct the appropriate statistical power analyses. These analyses are most likely to be valid when they are based on data from the jurisdictions in which the studies are to be conducted. In this method note, we provide software code, in the form of a SAS macro, for producing statistical…
Guyot, Patricia; Ades, A E; Ouwens, Mario J N M; Welton, Nicky J
2012-02-01
The results of Randomized Controlled Trials (RCTs) on time-to-event outcomes that are usually reported are median time to events and Cox Hazard Ratio. These do not constitute the sufficient statistics required for meta-analysis or cost-effectiveness analysis, and their use in secondary analyses requires strong assumptions that may not have been adequately tested. In order to enhance the quality of secondary data analyses, we propose a method which derives from the published Kaplan Meier survival curves a close approximation to the original individual patient time-to-event data from which they were generated. We develop an algorithm that maps from digitised curves back to KM data by finding numerical solutions to the inverted KM equations, using where available information on number of events and numbers at risk. The reproducibility and accuracy of survival probabilities, median survival times and hazard ratios based on reconstructed KM data was assessed by comparing published statistics (survival probabilities, medians and hazard ratios) with statistics based on repeated reconstructions by multiple observers. The validation exercise established there was no material systematic error and that there was a high degree of reproducibility for all statistics. Accuracy was excellent for survival probabilities and medians, for hazard ratios reasonable accuracy can only be obtained if at least numbers at risk or total number of events are reported. The algorithm is a reliable tool for meta-analysis and cost-effectiveness analyses of RCTs reporting time-to-event data. It is recommended that all RCTs should report information on numbers at risk and total number of events alongside KM curves.
SPARC Intercomparison of Middle Atmosphere Climatologies
NASA Technical Reports Server (NTRS)
Randel, William; Fleming, Eric; Geller, Marvin; Hamilton, Kevin; Karoly, David; Ortland, Dave; Pawson, Steve; Swinbank, Richard; Udelhofen, Petra
2002-01-01
This atlas presents detailed incomparisons of several climatological wind and temperature data sets which cover the middle atmosphere (over altitudes approx. 10-80 km). A number of middle atmosphere climatologies have been developed in the research community based on a variety of meteorological analyses and satellite data sets. Here we present comparisons between these climatological data sets for a number of basic circulation statistics, such as zonal mean temperature, winds and eddy flux statistics. Special attention is focused on tropical winds and temperatures, where large differences exist among separate analyses. We also include comparisons between the global climatologies and historical rocketsonde wind and temperature measurements, and also with more recent lidar temperature data. These comparisons highlight differences and uncertainties in contemporary middle atmosphere data sets, and allow biases in particular analyses to be isolated. In addition, a brief atlas of zonal mean temperature and wind statistics is provided to highlight data availability and as a quick-look reference. This technical report is intended as a companion to the climatological data sets held in archive at the SPARC Data Center (http://www.sparc.sunysb.edu).
permGPU: Using graphics processing units in RNA microarray association studies.
Shterev, Ivo D; Jung, Sin-Ho; George, Stephen L; Owzar, Kouros
2010-06-16
Many analyses of microarray association studies involve permutation, bootstrap resampling and cross-validation, that are ideally formulated as embarrassingly parallel computing problems. Given that these analyses are computationally intensive, scalable approaches that can take advantage of multi-core processor systems need to be developed. We have developed a CUDA based implementation, permGPU, that employs graphics processing units in microarray association studies. We illustrate the performance and applicability of permGPU within the context of permutation resampling for a number of test statistics. An extensive simulation study demonstrates a dramatic increase in performance when using permGPU on an NVIDIA GTX 280 card compared to an optimized C/C++ solution running on a conventional Linux server. permGPU is available as an open-source stand-alone application and as an extension package for the R statistical environment. It provides a dramatic increase in performance for permutation resampling analysis in the context of microarray association studies. The current version offers six test statistics for carrying out permutation resampling analyses for binary, quantitative and censored time-to-event traits.
Estimating population diversity with CatchAll
Bunge, John; Woodard, Linda; Böhning, Dankmar; Foster, James A.; Connolly, Sean; Allen, Heather K.
2012-01-01
Motivation: The massive data produced by next-generation sequencing require advanced statistical tools. We address estimating the total diversity or species richness in a population. To date, only relatively simple methods have been implemented in available software. There is a need for software employing modern, computationally intensive statistical analyses including error, goodness-of-fit and robustness assessments. Results: We present CatchAll, a fast, easy-to-use, platform-independent program that computes maximum likelihood estimates for finite-mixture models, weighted linear regression-based analyses and coverage-based non-parametric methods, along with outlier diagnostics. Given sample ‘frequency count’ data, CatchAll computes 12 different diversity estimates and applies a model-selection algorithm. CatchAll also derives discounted diversity estimates to adjust for possibly uncertain low-frequency counts. It is accompanied by an Excel-based graphics program. Availability: Free executable downloads for Linux, Windows and Mac OS, with manual and source code, at www.northeastern.edu/catchall. Contact: jab18@cornell.edu PMID:22333246
Point-by-point compositional analysis for atom probe tomography.
Stephenson, Leigh T; Ceguerra, Anna V; Li, Tong; Rojhirunsakool, Tanaporn; Nag, Soumya; Banerjee, Rajarshi; Cairney, Julie M; Ringer, Simon P
2014-01-01
This new alternate approach to data processing for analyses that traditionally employed grid-based counting methods is necessary because it removes a user-imposed coordinate system that not only limits an analysis but also may introduce errors. We have modified the widely used "binomial" analysis for APT data by replacing grid-based counting with coordinate-independent nearest neighbour identification, improving the measurements and the statistics obtained, allowing quantitative analysis of smaller datasets, and datasets from non-dilute solid solutions. It also allows better visualisation of compositional fluctuations in the data. Our modifications include:.•using spherical k-atom blocks identified by each detected atom's first k nearest neighbours.•3D data visualisation of block composition and nearest neighbour anisotropy.•using z-statistics to directly compare experimental and expected composition curves. Similar modifications may be made to other grid-based counting analyses (contingency table, Langer-Bar-on-Miller, sinusoidal model) and could be instrumental in developing novel data visualisation options.
Psychophysiological investigations of the biomedical problems of manned spaceflight
NASA Technical Reports Server (NTRS)
1993-01-01
The Final Report on psychophysiological investigations of the biomedical problems of manned spaceflight is presented. In the first project, statistical analyses of human autonomic data were performed. The objectives were the following: to establish a relational data base containing human psychophysiological data obtained from Shuttle flight experiments and ground-based research over a 20 year period; to enable multi-user access and retrieval of these data for subsequent analyses and for possible inclusion in the proposed Life Sciences Data Archive; and to enable/conduct statistical analyses across several experiments on large subject populations which can thereby provide definitive answers to questions on human autonomic and behavioral responses and adaptation to environmental stressors on Earth and in space. The second project studied motion sickness. The objectives were: to test/develop hardware and procedures to be incorporated into preflight training of crewmembers of Spacelab-J; and to examine spin-off applications of AFT. The third project studied orthostatic intolerance. The objective was to test the feasibility of applying autogenic-feedback training as a potential treatment for postflight orthostatic intolerance.
A weighted U-statistic for genetic association analyses of sequencing data.
Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J; Lu, Qing
2014-12-01
With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. © 2014 WILEY PERIODICALS, INC.
Metamodels for Computer-Based Engineering Design: Survey and Recommendations
NASA Technical Reports Server (NTRS)
Simpson, Timothy W.; Peplinski, Jesse; Koch, Patrick N.; Allen, Janet K.
1997-01-01
The use of statistical techniques to build approximations of expensive computer analysis codes pervades much of todays engineering design. These statistical approximations, or metamodels, are used to replace the actual expensive computer analyses, facilitating multidisciplinary, multiobjective optimization and concept exploration. In this paper we review several of these techniques including design of experiments, response surface methodology, Taguchi methods, neural networks, inductive learning, and kriging. We survey their existing application in engineering design and then address the dangers of applying traditional statistical techniques to approximate deterministic computer analysis codes. We conclude with recommendations for the appropriate use of statistical approximation techniques in given situations and how common pitfalls can be avoided.
Parent and Friend Social Support and Adolescent Hope.
Mahon, Noreen E; Yarcheski, Adela
2017-04-01
The purpose of this study was to conduct two meta-analyses. The first examined social support from parents in relation to adolescent hope, and the second examined social support from friends in relation to adolescent hope. Using Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines for the literature reviewed, nine published studies or doctoral dissertations completed between 1990 and 2014 met the inclusion criteria. Using meta-analytic techniques and the mean weighted r statistic, the results indicated that social support from friends had a stronger mean effect size (ES = .31) than social support from parents (ES = .21); there was a statistically significant difference between the two ESs. Two of the four moderators for the parent social support-adolescent hope relationship were statistically significant. They were quality score and health status. Implications for school nurses and nurses in all settings are addressed, and conclusions are drawn based on the findings.
Performance of Between-Study Heterogeneity Measures in the Cochrane Library.
Ma, Xiaoyue; Lin, Lifeng; Qu, Zhiyong; Zhu, Motao; Chu, Haitao
2018-05-29
The growth in comparative effectiveness research and evidence-based medicine has increased attention to systematic reviews and meta-analyses. Meta-analysis synthesizes and contrasts evidence from multiple independent studies to improve statistical efficiency and reduce bias. Assessing heterogeneity is critical for performing a meta-analysis and interpreting results. As a widely used heterogeneity measure, the I statistic quantifies the proportion of total variation across studies that is due to real differences in effect size. The presence of outlying studies can seriously exaggerate the I statistic. Two alternative heterogeneity measures, the Ir and Im, have been recently proposed to reduce the impact of outlying studies. To evaluate these measures' performance empirically, we applied them to 20,599 meta-analyses in the Cochrane Library. We found that the Ir and Im have strong agreement with the I, while they are more robust than the I when outlying studies appear.
Meta-analysis in applied ecology.
Stewart, Gavin
2010-02-23
This overview examines research synthesis in applied ecology and conservation. Vote counting and pooling unweighted averages are widespread despite the superiority of syntheses based on weighted combination of effects. Such analyses allow exploration of methodological uncertainty in addition to consistency of effects across species, space and time, but exploring heterogeneity remains controversial. Meta-analyses are required to generalize in ecology, and to inform evidence-based decision-making, but the more sophisticated statistical techniques and registers of research used in other disciplines must be employed in ecology to fully realize their benefits.
Trends in study design and the statistical methods employed in a leading general medicine journal.
Gosho, M; Sato, Y; Nagashima, K; Takahashi, S
2018-02-01
Study design and statistical methods have become core components of medical research, and the methodology has become more multifaceted and complicated over time. The study of the comprehensive details and current trends of study design and statistical methods is required to support the future implementation of well-planned clinical studies providing information about evidence-based medicine. Our purpose was to illustrate study design and statistical methods employed in recent medical literature. This was an extension study of Sato et al. (N Engl J Med 2017; 376: 1086-1087), which reviewed 238 articles published in 2015 in the New England Journal of Medicine (NEJM) and briefly summarized the statistical methods employed in NEJM. Using the same database, we performed a new investigation of the detailed trends in study design and individual statistical methods that were not reported in the Sato study. Due to the CONSORT statement, prespecification and justification of sample size are obligatory in planning intervention studies. Although standard survival methods (eg Kaplan-Meier estimator and Cox regression model) were most frequently applied, the Gray test and Fine-Gray proportional hazard model for considering competing risks were sometimes used for a more valid statistical inference. With respect to handling missing data, model-based methods, which are valid for missing-at-random data, were more frequently used than single imputation methods. These methods are not recommended as a primary analysis, but they have been applied in many clinical trials. Group sequential design with interim analyses was one of the standard designs, and novel design, such as adaptive dose selection and sample size re-estimation, was sometimes employed in NEJM. Model-based approaches for handling missing data should replace single imputation methods for primary analysis in the light of the information found in some publications. Use of adaptive design with interim analyses is increasing after the presentation of the FDA guidance for adaptive design. © 2017 John Wiley & Sons Ltd.
Grey literature in meta-analyses.
Conn, Vicki S; Valentine, Jeffrey C; Cooper, Harris M; Rantz, Marilyn J
2003-01-01
In meta-analysis, researchers combine the results of individual studies to arrive at cumulative conclusions. Meta-analysts sometimes include "grey literature" in their evidential base, which includes unpublished studies and studies published outside widely available journals. Because grey literature is a source of data that might not employ peer review, critics have questioned the validity of its data and the results of meta-analyses that include it. To examine evidence regarding whether grey literature should be included in meta-analyses and strategies to manage grey literature in quantitative synthesis. This article reviews evidence on whether the results of studies published in peer-reviewed journals are representative of results from broader samplings of research on a topic as a rationale for inclusion of grey literature. Strategies to enhance access to grey literature are addressed. The most consistent and robust difference between published and grey literature is that published research is more likely to contain results that are statistically significant. Effect size estimates of published research are about one-third larger than those of unpublished studies. Unfunded and small sample studies are less likely to be published. Yet, importantly, methodological rigor does not differ between published and grey literature. Meta-analyses that exclude grey literature likely (a) over-represent studies with statistically significant findings, (b) inflate effect size estimates, and (c) provide less precise effect size estimates than meta-analyses including grey literature. Meta-analyses should include grey literature to fully reflect the existing evidential base and should assess the impact of methodological variations through moderator analysis.
Fast and accurate imputation of summary statistics enhances evidence of functional enrichment
Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P.; Patterson, Nick; Price, Alkes L.
2014-01-01
Motivation: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. Results: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1–5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case–control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of χ2 association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Availability and implementation: Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. Contact: bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:24990607
Olimpo, Jeffrey T.; Pevey, Ryan S.; McCabe, Thomas M.
2018-01-01
Course-based undergraduate research experiences (CUREs) provide an avenue for student participation in authentic scientific opportunities. Within the context of such coursework, students are often expected to collect, analyze, and evaluate data obtained from their own investigations. Yet, limited research has been conducted that examines mechanisms for supporting students in these endeavors. In this article, we discuss the development and evaluation of an interactive statistics workshop that was expressly designed to provide students with an open platform for graduate teaching assistant (GTA)-mentored data processing, statistical testing, and synthesis of their own research findings. Mixed methods analyses of pre/post-intervention survey data indicated a statistically significant increase in students’ reasoning and quantitative literacy abilities in the domain, as well as enhancement of student self-reported confidence in and knowledge of the application of various statistical metrics to real-world contexts. Collectively, these data reify an important role for scaffolded instruction in statistics in preparing emergent scientists to be data-savvy researchers in a globally expansive STEM workforce. PMID:29904549
Olimpo, Jeffrey T; Pevey, Ryan S; McCabe, Thomas M
2018-01-01
Course-based undergraduate research experiences (CUREs) provide an avenue for student participation in authentic scientific opportunities. Within the context of such coursework, students are often expected to collect, analyze, and evaluate data obtained from their own investigations. Yet, limited research has been conducted that examines mechanisms for supporting students in these endeavors. In this article, we discuss the development and evaluation of an interactive statistics workshop that was expressly designed to provide students with an open platform for graduate teaching assistant (GTA)-mentored data processing, statistical testing, and synthesis of their own research findings. Mixed methods analyses of pre/post-intervention survey data indicated a statistically significant increase in students' reasoning and quantitative literacy abilities in the domain, as well as enhancement of student self-reported confidence in and knowledge of the application of various statistical metrics to real-world contexts. Collectively, these data reify an important role for scaffolded instruction in statistics in preparing emergent scientists to be data-savvy researchers in a globally expansive STEM workforce.
Empirical performance of interpolation techniques in risk-neutral density (RND) estimation
NASA Astrophysics Data System (ADS)
Bahaludin, H.; Abdullah, M. H.
2017-03-01
The objective of this study is to evaluate the empirical performance of interpolation techniques in risk-neutral density (RND) estimation. Firstly, the empirical performance is evaluated by using statistical analysis based on the implied mean and the implied variance of RND. Secondly, the interpolation performance is measured based on pricing error. We propose using the leave-one-out cross-validation (LOOCV) pricing error for interpolation selection purposes. The statistical analyses indicate that there are statistical differences between the interpolation techniques:second-order polynomial, fourth-order polynomial and smoothing spline. The results of LOOCV pricing error shows that interpolation by using fourth-order polynomial provides the best fitting to option prices in which it has the lowest value error.
ParallABEL: an R library for generalized parallelization of genome-wide association studies.
Sangket, Unitsa; Mahasirimongkol, Surakameth; Chantratita, Wasun; Tandayya, Pichaya; Aulchenko, Yurii S
2010-04-29
Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files. Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was linearly reduced from approximately eight hours to one hour when ParallABEL employed eight processors. Executing genome-wide association analysis using the ParallABEL library on a computer cluster is an effective way to boost performance, and simplify the parallelization of GWA studies. ParallABEL is a user-friendly parallelization of GenABEL.
ERIC Educational Resources Information Center
Scarr, Sandra
1995-01-01
Argues that Gottlieb rejects population sampling and statistical analyses of distributions as he proposes that his experimental brand of mechanistic science is the only legitimate approach to developmental research. Maintains that Gottlieb exaggerates developmental uncertainty, based on his own research with extreme environmental manipulations.…
The Development of Statistics Textbook Supported with ICT and Portfolio-Based Assessment
NASA Astrophysics Data System (ADS)
Hendikawati, Putriaji; Yuni Arini, Florentina
2016-02-01
This research was development research that aimed to develop and produce a Statistics textbook model that supported with information and communication technology (ICT) and Portfolio-Based Assessment. This book was designed for students of mathematics at the college to improve students’ ability in mathematical connection and communication. There were three stages in this research i.e. define, design, and develop. The textbooks consisted of 10 chapters which each chapter contains introduction, core materials and include examples and exercises. The textbook developed phase begins with the early stages of designed the book (draft 1) which then validated by experts. Revision of draft 1 produced draft 2 which then limited test for readability test book. Furthermore, revision of draft 2 produced textbook draft 3 which simulated on a small sample to produce a valid model textbook. The data were analysed with descriptive statistics. The analysis showed that the Statistics textbook model that supported with ICT and Portfolio-Based Assessment valid and fill up the criteria of practicality.
Brusniak, Mi-Youn; Bodenmiller, Bernd; Campbell, David; Cooke, Kelly; Eddes, James; Garbutt, Andrew; Lau, Hollis; Letarte, Simon; Mueller, Lukas N; Sharma, Vagisha; Vitek, Olga; Zhang, Ning; Aebersold, Ruedi; Watts, Julian D
2008-01-01
Background Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics. Results We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling. Conclusion The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field. PMID:19087345
Assessing the significance of pedobarographic signals using random field theory.
Pataky, Todd C
2008-08-07
Traditional pedobarographic statistical analyses are conducted over discrete regions. Recent studies have demonstrated that regionalization can corrupt pedobarographic field data through conflation when arbitrary dividing lines inappropriately delineate smooth field processes. An alternative is to register images such that homologous structures optimally overlap and then conduct statistical tests at each pixel to generate statistical parametric maps (SPMs). The significance of SPM processes may be assessed within the framework of random field theory (RFT). RFT is ideally suited to pedobarographic image analysis because its fundamental data unit is a lattice sampling of a smooth and continuous spatial field. To correct for the vast number of multiple comparisons inherent in such data, recent pedobarographic studies have employed a Bonferroni correction to retain a constant family-wise error rate. This approach unfortunately neglects the spatial correlation of neighbouring pixels, so provides an overly conservative (albeit valid) statistical threshold. RFT generally relaxes the threshold depending on field smoothness and on the geometry of the search area, but it also provides a framework for assigning p values to suprathreshold clusters based on their spatial extent. The current paper provides an overview of basic RFT concepts and uses simulated and experimental data to validate both RFT-relevant field smoothness estimations and RFT predictions regarding the topological characteristics of random pedobarographic fields. Finally, previously published experimental data are re-analysed using RFT inference procedures to demonstrate how RFT yields easily understandable statistical results that may be incorporated into routine clinical and laboratory analyses.
2015-01-01
Changes in glycosylation have been shown to have a profound correlation with development/malignancy in many cancer types. Currently, two major enrichment techniques have been widely applied in glycoproteomics, namely, lectin affinity chromatography (LAC)-based and hydrazide chemistry (HC)-based enrichments. Here we report the LC–MS/MS quantitative analyses of human blood serum glycoproteins and glycopeptides associated with esophageal diseases by LAC- and HC-based enrichment. The separate and complementary qualitative and quantitative data analyses of protein glycosylation were performed using both enrichment techniques. Chemometric and statistical evaluations, PCA plots, or ANOVA test, respectively, were employed to determine and confirm candidate cancer-associated glycoprotein/glycopeptide biomarkers. Out of 139, 59 common glycoproteins (42% overlap) were observed in both enrichment techniques. This overlap is very similar to previously published studies. The quantitation and evaluation of significantly changed glycoproteins/glycopeptides are complementary between LAC and HC enrichments. LC–ESI–MS/MS analyses indicated that 7 glycoproteins enriched by LAC and 11 glycoproteins enriched by HC showed significantly different abundances between disease-free and disease cohorts. Multiple reaction monitoring quantitation resulted in 13 glycopeptides by LAC enrichment and 10 glycosylation sites by HC enrichment to be statistically different among disease cohorts. PMID:25134008
Do regional methods really help reduce uncertainties in flood frequency analyses?
NASA Astrophysics Data System (ADS)
Cong Nguyen, Chi; Payrastre, Olivier; Gaume, Eric
2013-04-01
Flood frequency analyses are often based on continuous measured series at gauge sites. However, the length of the available data sets is usually too short to provide reliable estimates of extreme design floods. To reduce the estimation uncertainties, the analyzed data sets have to be extended either in time, making use of historical and paleoflood data, or in space, merging data sets considered as statistically homogeneous to build large regional data samples. Nevertheless, the advantage of the regional analyses, the important increase of the size of the studied data sets, may be counterbalanced by the possible heterogeneities of the merged sets. The application and comparison of four different flood frequency analysis methods to two regions affected by flash floods in the south of France (Ardèche and Var) illustrates how this balance between the number of records and possible heterogeneities plays in real-world applications. The four tested methods are: (1) a local statistical analysis based on the existing series of measured discharges, (2) a local analysis valuating the existing information on historical floods, (3) a standard regional flood frequency analysis based on existing measured series at gauged sites and (4) a modified regional analysis including estimated extreme peak discharges at ungauged sites. Monte Carlo simulations are conducted to simulate a large number of discharge series with characteristics similar to the observed ones (type of statistical distributions, number of sites and records) to evaluate to which extent the results obtained on these case studies can be generalized. These two case studies indicate that even small statistical heterogeneities, which are not detected by the standard homogeneity tests implemented in regional flood frequency studies, may drastically limit the usefulness of such approaches. On the other hand, these result show that the valuation of information on extreme events, either historical flood events at gauged sites or estimated extremes at ungauged sites in the considered region, is an efficient way to reduce uncertainties in flood frequency studies.
McLawhorn, Alexander S; Steinhaus, Michael E; Southren, Daniel L; Lee, Yuo-Yu; Dodwell, Emily R; Figgie, Mark P
2017-01-01
The purpose of this study was to compare the health-related quality of life (HRQoL) of patients across World Health Organization (WHO) body mass index (BMI) classes before and after total hip arthroplasty (THA). Patients with end-stage hip osteoarthritis who received elective primary unilateral THA were identified through an institutional registry and categorized based on the World Health Organization BMI classification. Age, sex, laterality, year of surgery, and Charlson-Deyo comorbidity index were recorded. The primary outcome was the EQ-5D-3L index and visual analog scale (EQ-VAS) scores at 2 years postoperatively. Inferential statistics and regression analyses were performed to determine associations between BMI classes and HRQoL. EQ-5D-3L scores at baseline and at 2 years were statistically different across BMI classes, with higher EQ-VAS and index scores in patients with lower BMI. There was no difference observed for the 2-year change in EQ-VAS scores, but there was a statistically greater increase in index scores for more obese patients. In the regression analyses, there were statistically significant negative effect estimates for EQ-VAS and index scores associated with increasing BMI class. BMI class is independently associated with lower HRQoL scores 2 years after primary THA. While absolute scores in obese patients were lower than in nonobese patients, obese patients enjoyed more positive changes in EQ-5D index scores after THA. These results may provide the most detailed information on how BMI influences HRQoL before and after THA, and they are relevant to future economic decision analyses on the topic. Copyright © 2016 Elsevier Inc. All rights reserved.
Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J
2008-01-01
ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.
Statistical methods for convergence detection of multi-objective evolutionary algorithms.
Trautmann, H; Wagner, T; Naujoks, B; Preuss, M; Mehnen, J
2009-01-01
In this paper, two approaches for estimating the generation in which a multi-objective evolutionary algorithm (MOEA) shows statistically significant signs of convergence are introduced. A set-based perspective is taken where convergence is measured by performance indicators. The proposed techniques fulfill the requirements of proper statistical assessment on the one hand and efficient optimisation for real-world problems on the other hand. The first approach accounts for the stochastic nature of the MOEA by repeating the optimisation runs for increasing generation numbers and analysing the performance indicators using statistical tools. This technique results in a very robust offline procedure. Moreover, an online convergence detection method is introduced as well. This method automatically stops the MOEA when either the variance of the performance indicators falls below a specified threshold or a stagnation of their overall trend is detected. Both methods are analysed and compared for two MOEA and on different classes of benchmark functions. It is shown that the methods successfully operate on all stated problems needing less function evaluations while preserving good approximation quality at the same time.
Mali, Matilda; Dell'Anna, Maria Michela; Mastrorilli, Piero; Damiani, Leonardo; Ungaro, Nicola; Belviso, Claudia; Fiore, Saverio
2015-11-01
Sediment contamination by metals poses significant risks to coastal ecosystems and is considered to be problematic for dredging operations. The determination of the background values of metal and metalloid distribution based on site-specific variability is fundamental in assessing pollution levels in harbour sediments. The novelty of the present work consists of addressing the scope and limitation of analysing port sediments through the use of conventional statistical techniques (such as: linear regression analysis, construction of cumulative frequency curves and the iterative 2σ technique), that are commonly employed for assessing Regional Geochemical Background (RGB) values in coastal sediments. This study ascertained that although the tout court use of such techniques in determining the RGB values in harbour sediments seems appropriate (the chemical-physical parameters of port sediments fit well with statistical equations), it should nevertheless be avoided because it may be misleading and can mask key aspects of the study area that can only be revealed by further investigations, such as mineralogical and multivariate statistical analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
Wess, Bernard P.; Jacobson, Gary
1987-01-01
In the process of forming a new medical malpractice reinsurance company, the authors analyzed thousands of medical malpractice cases, settlements, and verdicts. The evidence of those analyses indicated that the medical malpractice crisis is (1)emerging nation- and world-wide, (2)exacerbated by but not primarily a result of “predatory” legal action, (3)statistically determined by a small percentage of physicians and procedures, (4)overburdened with data but poor on information, (5)subject to classic forms of quality control and automation. The management information system developed to address this problem features a tiered data base architecture to accommodate medical, administrative, procedural, statistical, and actuarial analyses necessary to predict claims from untoward events, not merely to report them.
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"
ERIC Educational Resources Information Center
Ozturk, Elif
2012-01-01
The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
Lyksborg, Mark; Siebner, Hartwig R.; Sørensen, Per S.; Blinkenberg, Morten; Parker, Geoff J. M.; Dogonowski, Anne-Marie; Garde, Ellen; Larsen, Rasmus; Dyrby, Tim B.
2014-01-01
Multiple sclerosis (MS) damages central white matter pathways which has considerable impact on disease-related disability. To identify disease-related alterations in anatomical connectivity, 34 patients (19 with relapsing remitting MS (RR-MS), 15 with secondary progressive MS (SP-MS) and 20 healthy subjects underwent diffusion magnetic resonance imaging (dMRI) of the brain. Based on the dMRI, anatomical connectivity mapping (ACM) yielded a voxel-based metric reflecting the connectivity shared between each individual voxel and all other brain voxels. To avoid biases caused by inter-individual brain-shape differences, they were estimated in a spatially normalized space. Voxel-based statistical analyses using ACM were compared with analyses based on the localized microstructural indices of fractional anisotropy (FA). In both RR-MS and SP-MS patients, considerable portions of the motor-related white matter revealed decreases in ACM and FA when compared with healthy subjects. Patients with SP-MS exhibited reduced ACM values relative to RR-MS in the motor-related tracts, whereas there were no consistent decreases in FA between SP-MS and RR-MS patients. Regional ACM statistics exhibited moderate correlation with clinical disability as reflected by the expanded disability status scale (EDSS). The correlation between these statistics and EDSS was either similar to or stronger than the correlation between FA statistics and the EDSS. Together, the results reveal an improved relationship between ACM, the clinical phenotype, and impairment. This highlights the potential of the ACM connectivity indices to be used as a marker which can identify disease related-alterations due to MS which may not be seen using localized microstructural indices. PMID:24748023
Kanda, Junya
2016-01-01
The Transplant Registry Unified Management Program (TRUMP) made it possible for members of the Japan Society for Hematopoietic Cell Transplantation (JSHCT) to analyze large sets of national registry data on autologous and allogeneic hematopoietic stem cell transplantation. However, as the processes used to collect transplantation information are complex and differed over time, the background of these processes should be understood when using TRUMP data. Previously, information on the HLA locus of patients and donors had been collected using a questionnaire-based free-description method, resulting in some input errors. To correct minor but significant errors and provide accurate HLA matching data, the use of a Stata or EZR/R script offered by the JSHCT is strongly recommended when analyzing HLA data in the TRUMP dataset. The HLA mismatch direction, mismatch counting method, and different impacts of HLA mismatches by stem cell source are other important factors in the analysis of HLA data. Additionally, researchers should understand the statistical analyses specific for hematopoietic stem cell transplantation, such as competing risk, landmark analysis, and time-dependent analysis, to correctly analyze transplant data. The data center of the JSHCT can be contacted if statistical assistance is required.
Shitara, Kohei; Matsuo, Keitaro; Oze, Isao; Mizota, Ayako; Kondo, Chihiro; Nomura, Motoo; Yokota, Tomoya; Takahari, Daisuke; Ura, Takashi; Muro, Kei
2011-08-01
We performed a systematic review and meta-analysis to determine the impact of neutropenia or leukopenia experienced during chemotherapy on survival. Eligible studies included prospective or retrospective analyses that evaluated neutropenia or leukopenia as a prognostic factor for overall survival or disease-free survival. Statistical analyses were conducted to calculate a summary hazard ratio and 95% confidence interval (CI) using random-effects or fixed-effects models based on the heterogeneity of the included studies. Thirteen trials were selected for the meta-analysis, with a total of 9,528 patients. The hazard ratio of death was 0.69 (95% CI, 0.64-0.75) for patients with higher-grade neutropenia or leukopenia compared to patients with lower-grade or lack of cytopenia. Our analysis was also stratified by statistical method (any statistical method to decrease lead-time bias; time-varying analysis or landmark analysis), but no differences were observed. Our results indicate that neutropenia or leukopenia experienced during chemotherapy is associated with improved survival in patients with advanced cancer or hematological malignancies undergoing chemotherapy. Future prospective analyses designed to investigate the potential impact of chemotherapy dose adjustment coupled with monitoring of neutropenia or leukopenia on survival are warranted.
NASA Astrophysics Data System (ADS)
Pignalosa, Antonio; Di Crescenzo, Giuseppe; Marino, Ermanno; Terracciano, Rosario; Santo, Antonio
2015-04-01
The work here presented concerns a case study in which a complete multidisciplinary workflow has been applied for an extensive assessment of the rockslide susceptibility and hazard in a common scenario such as a vertical and fractured rocky cliffs. The studied area is located in a high-relief zone in Southern Italy (Sacco, Salerno, Campania), characterized by wide vertical rocky cliffs formed by tectonized thick successions of shallow-water limestones. The study concerned the following phases: a) topographic surveying integrating of 3d laser scanning, photogrammetry and GNSS; b) gelogical surveying, characterization of single instabilities and geomecanichal surveying, conducted by geologists rock climbers; c) processing of 3d data and reconstruction of high resolution geometrical models; d) structural and geomechanical analyses; e) data filing in a GIS-based spatial database; f) geo-statistical and spatial analyses and mapping of the whole set of data; g) 3D rockfall analysis; The main goals of the study have been a) to set-up an investigation method to achieve a complete and thorough characterization of the slope stability conditions and b) to provide a detailed base for an accurate definition of the reinforcement and mitigation systems. For this purposes the most up-to-date methods of field surveying, remote sensing, 3d modelling and geospatial data analysis have been integrated in a systematic workflow, accounting of the economic sustainability of the whole project. A novel integrated approach have been applied both fusing deterministic and statistical surveying methods. This approach enabled to deal with the wide extension of the studied area (near to 200.000 m2), without compromising an high accuracy of the results. The deterministic phase, based on a field characterization of single instabilities and their further analyses on 3d models, has been applied for delineating the peculiarity of each single feature. The statistical approach, based on geostructural field mapping and on punctual geomechanical data from scan-line surveying, allowed the rock mass partitioning in homogeneous geomechanical sectors and data interpolation through bounded geostatistical analyses on 3d models. All data, resulting from both approaches, have been referenced and filed in a single spatial database and considered in global geo-statistical analyses for deriving a fully modelled and comprehensive evaluation of the rockslide susceptibility. The described workflow yielded the following innovative results: a) a detailed census of single potential instabilities, through a spatial database recording the geometrical, geological and mechanical features, along with the expected failure modes; b) an high resolution characterization of the whole slope rockslide susceptibility, based on the partitioning of the area according to the stability and mechanical conditions which can be directly related to specific hazard mitigation systems; c) the exact extension of the area exposed to the rockslide hazard, along with the dynamic parameters of expected phenomena; d) an intervention design for hazard mitigation.
Does RAIM with Correct Exclusion Produce Unbiased Positions?
Teunissen, Peter J. G.; Imparato, Davide; Tiberius, Christian C. J. M.
2017-01-01
As the navigation solution of exclusion-based RAIM follows from a combination of least-squares estimation and a statistically based exclusion-process, the computation of the integrity of the navigation solution has to take the propagated uncertainty of the combined estimation-testing procedure into account. In this contribution, we analyse, theoretically as well as empirically, the effect that this combination has on the first statistical moment, i.e., the mean, of the computed navigation solution. It will be shown, although statistical testing is intended to remove biases from the data, that biases will always remain under the alternative hypothesis, even when the correct alternative hypothesis is properly identified. The a posteriori exclusion of a biased satellite range from the position solution will therefore never remove the bias in the position solution completely. PMID:28672862
Maternal Involvement and Academic Achievement.
ERIC Educational Resources Information Center
Lopez, Linda C.; Holmes, William M.
The potential impact of several maternal involvement behaviors on teachers' ratings of children's academic skills was examined through statistical analyses. Data, based on mothers' responses to selected questions concerning maternal involvement and on teachers' ratings on the Classroom Behavior Inventory, were obtained for 115 kindergarten…
Nieuwenhuys, Angela; Papageorgiou, Eirini; Desloovere, Kaat; Molenaers, Guy; De Laet, Tinne
2017-01-01
Experts recently identified 49 joint motion patterns in children with cerebral palsy during a Delphi consensus study. Pattern definitions were therefore the result of subjective expert opinion. The present study aims to provide objective, quantitative data supporting the identification of these consensus-based patterns. To do so, statistical parametric mapping was used to compare the mean kinematic waveforms of 154 trials of typically developing children (n = 56) to the mean kinematic waveforms of 1719 trials of children with cerebral palsy (n = 356), which were classified following the classification rules of the Delphi study. Three hypotheses stated that: (a) joint motion patterns with 'no or minor gait deviations' (n = 11 patterns) do not differ significantly from the gait pattern of typically developing children; (b) all other pathological joint motion patterns (n = 38 patterns) differ from typically developing gait and the locations of difference within the gait cycle, highlighted by statistical parametric mapping, concur with the consensus-based classification rules. (c) all joint motion patterns at the level of each joint (n = 49 patterns) differ from each other during at least one phase of the gait cycle. Results showed that: (a) ten patterns with 'no or minor gait deviations' differed somewhat unexpectedly from typically developing gait, but these differences were generally small (≤3°); (b) all other joint motion patterns (n = 38) differed from typically developing gait and the significant locations within the gait cycle that were indicated by the statistical analyses, coincided well with the classification rules; (c) joint motion patterns at the level of each joint significantly differed from each other, apart from two sagittal plane pelvic patterns. In addition to these results, for several joints, statistical analyses indicated other significant areas during the gait cycle that were not included in the pattern definitions of the consensus study. Based on these findings, suggestions to improve pattern definitions were made.
Papageorgiou, Eirini; Desloovere, Kaat; Molenaers, Guy; De Laet, Tinne
2017-01-01
Experts recently identified 49 joint motion patterns in children with cerebral palsy during a Delphi consensus study. Pattern definitions were therefore the result of subjective expert opinion. The present study aims to provide objective, quantitative data supporting the identification of these consensus-based patterns. To do so, statistical parametric mapping was used to compare the mean kinematic waveforms of 154 trials of typically developing children (n = 56) to the mean kinematic waveforms of 1719 trials of children with cerebral palsy (n = 356), which were classified following the classification rules of the Delphi study. Three hypotheses stated that: (a) joint motion patterns with ‘no or minor gait deviations’ (n = 11 patterns) do not differ significantly from the gait pattern of typically developing children; (b) all other pathological joint motion patterns (n = 38 patterns) differ from typically developing gait and the locations of difference within the gait cycle, highlighted by statistical parametric mapping, concur with the consensus-based classification rules. (c) all joint motion patterns at the level of each joint (n = 49 patterns) differ from each other during at least one phase of the gait cycle. Results showed that: (a) ten patterns with ‘no or minor gait deviations’ differed somewhat unexpectedly from typically developing gait, but these differences were generally small (≤3°); (b) all other joint motion patterns (n = 38) differed from typically developing gait and the significant locations within the gait cycle that were indicated by the statistical analyses, coincided well with the classification rules; (c) joint motion patterns at the level of each joint significantly differed from each other, apart from two sagittal plane pelvic patterns. In addition to these results, for several joints, statistical analyses indicated other significant areas during the gait cycle that were not included in the pattern definitions of the consensus study. Based on these findings, suggestions to improve pattern definitions were made. PMID:28081229
Samanta, Brajogopal; Bhadury, Punyasloke
2016-01-01
Marine chromophytes are taxonomically diverse group of algae and contribute approximately half of the total oceanic primary production. To understand the global patterns of functional diversity of chromophytic phytoplankton, robust bioinformatics and statistical analyses including deep phylogeny based on 2476 form ID rbcL gene sequences representing seven ecologically significant oceanographic ecoregions were undertaken. In addition, 12 form ID rbcL clone libraries were generated and analyzed (148 sequences) from Sundarbans Biosphere Reserve representing the world’s largest mangrove ecosystem as part of this study. Global phylogenetic analyses recovered 11 major clades of chromophytic phytoplankton in varying proportions with several novel rbcL sequences in each of the seven targeted ecoregions. Majority of OTUs was found to be exclusive to each ecoregion, whereas some were shared by two or more ecoregions based on beta-diversity analysis. Present phylogenetic and bioinformatics analyses provide a strong statistical support for the hypothesis that different oceanographic regimes harbor distinct and coherent groups of chromophytic phytoplankton. It has been also shown as part of this study that varying natural selection pressure on form ID rbcL gene under different environmental conditions could lead to functional differences and overall fitness of chromophytic phytoplankton populations. PMID:26861415
FARVATX: FAmily-based Rare Variant Association Test for X-linked genes
Choi, Sungkyoung; Lee, Sungyoung; Qiao, Dandi; Hardin, Megan; Cho, Michael H.; Silverman, Edwin K; Park, Taesung; Won, Sungho
2016-01-01
Although the X chromosome has many genes that are functionally related to human diseases, the complicated biological properties of the X chromosome have prevented efficient genetic association analyses, and only a few significantly associated X-linked variants have been reported for complex traits. For instance, dosage compensation of X-linked genes is often achieved via the inactivation of one allele in each X-linked variant in females; however, some X-linked variants can escape this X chromosome inactivation. Efficient genetic analyses cannot be conducted without prior knowledge about the gene expression process of X-linked variants, and misspecified information can lead to power loss. In this report, we propose new statistical methods for rare X-linked variant genetic association analysis of dichotomous phenotypes with family-based samples. The proposed methods are computationally efficient and can complete X-linked analyses within a few hours. Simulation studies demonstrate the statistical efficiency of the proposed methods, which were then applied to rare-variant association analysis of the X chromosome in chronic obstructive pulmonary disease (COPD). Some promising significant X-linked genes were identified, illustrating the practical importance of the proposed methods. PMID:27325607
FARVATX: Family-Based Rare Variant Association Test for X-Linked Genes.
Choi, Sungkyoung; Lee, Sungyoung; Qiao, Dandi; Hardin, Megan; Cho, Michael H; Silverman, Edwin K; Park, Taesung; Won, Sungho
2016-09-01
Although the X chromosome has many genes that are functionally related to human diseases, the complicated biological properties of the X chromosome have prevented efficient genetic association analyses, and only a few significantly associated X-linked variants have been reported for complex traits. For instance, dosage compensation of X-linked genes is often achieved via the inactivation of one allele in each X-linked variant in females; however, some X-linked variants can escape this X chromosome inactivation. Efficient genetic analyses cannot be conducted without prior knowledge about the gene expression process of X-linked variants, and misspecified information can lead to power loss. In this report, we propose new statistical methods for rare X-linked variant genetic association analysis of dichotomous phenotypes with family-based samples. The proposed methods are computationally efficient and can complete X-linked analyses within a few hours. Simulation studies demonstrate the statistical efficiency of the proposed methods, which were then applied to rare-variant association analysis of the X chromosome in chronic obstructive pulmonary disease. Some promising significant X-linked genes were identified, illustrating the practical importance of the proposed methods. © 2016 WILEY PERIODICALS, INC.
Blueprint XAS: a Matlab-based toolbox for the fitting and analysis of XAS spectra.
Delgado-Jaime, Mario Ulises; Mewis, Craig Philip; Kennepohl, Pierre
2010-01-01
Blueprint XAS is a new Matlab-based program developed to fit and analyse X-ray absorption spectroscopy (XAS) data, most specifically in the near-edge region of the spectrum. The program is based on a methodology that introduces a novel background model into the complete fit model and that is capable of generating any number of independent fits with minimal introduction of user bias [Delgado-Jaime & Kennepohl (2010), J. Synchrotron Rad. 17, 119-128]. The functions and settings on the five panels of its graphical user interface are designed to suit the needs of near-edge XAS data analyzers. A batch function allows for the setting of multiple jobs to be run with Matlab in the background. A unique statistics panel allows the user to analyse a family of independent fits, to evaluate fit models and to draw statistically supported conclusions. The version introduced here (v0.2) is currently a toolbox for Matlab. Future stand-alone versions of the program will also incorporate several other new features to create a full package of tools for XAS data processing.
Statistical Analyses of Femur Parameters for Designing Anatomical Plates.
Wang, Lin; He, Kunjin; Chen, Zhengming
2016-01-01
Femur parameters are key prerequisites for scientifically designing anatomical plates. Meanwhile, individual differences in femurs present a challenge to design well-fitting anatomical plates. Therefore, to design anatomical plates more scientifically, analyses of femur parameters with statistical methods were performed in this study. The specific steps were as follows. First, taking eight anatomical femur parameters as variables, 100 femur samples were classified into three classes with factor analysis and Q-type cluster analysis. Second, based on the mean parameter values of the three classes of femurs, three sizes of average anatomical plates corresponding to the three classes of femurs were designed. Finally, based on Bayes discriminant analysis, a new femur could be assigned to the proper class. Thereafter, the average anatomical plate suitable for that new femur was selected from the three available sizes of plates. Experimental results showed that the classification of femurs was quite reasonable based on the anatomical aspects of the femurs. For instance, three sizes of condylar buttress plates were designed. Meanwhile, 20 new femurs are judged to which classes the femurs belong. Thereafter, suitable condylar buttress plates were determined and selected.
Plant selection for ethnobotanical uses on the Amalfi Coast (Southern Italy).
Savo, V; Joy, R; Caneva, G; McClatchey, W C
2015-07-15
Many ethnobotanical studies have investigated selection criteria for medicinal and non-medicinal plants. In this paper we test several statistical methods using different ethnobotanical datasets in order to 1) define to which extent the nature of the datasets can affect the interpretation of results; 2) determine if the selection for different plant uses is based on phylogeny, or other selection criteria. We considered three different ethnobotanical datasets: two datasets of medicinal plants and a dataset of non-medicinal plants (handicraft production, domestic and agro-pastoral practices) and two floras of the Amalfi Coast. We performed residual analysis from linear regression, the binomial test and the Bayesian approach for calculating under-used and over-used plant families within ethnobotanical datasets. Percentages of agreement were calculated to compare the results of the analyses. We also analyzed the relationship between plant selection and phylogeny, chorology, life form and habitat using the chi-square test. Pearson's residuals for each of the significant chi-square analyses were examined for investigating alternative hypotheses of plant selection criteria. The three statistical analysis methods differed within the same dataset, and between different datasets and floras, but with some similarities. In the two medicinal datasets, only Lamiaceae was identified in both floras as an over-used family by all three statistical methods. All statistical methods in one flora agreed that Malvaceae was over-used and Poaceae under-used, but this was not found to be consistent with results of the second flora in which one statistical result was non-significant. All other families had some discrepancy in significance across methods, or floras. Significant over- or under-use was observed in only a minority of cases. The chi-square analyses were significant for phylogeny, life form and habitat. Pearson's residuals indicated a non-random selection of woody species for non-medicinal uses and an under-use of plants of temperate forests for medicinal uses. Our study showed that selection criteria for plant uses (including medicinal) are not always based on phylogeny. The comparison of different statistical methods (regression, binomial and Bayesian) under different conditions led to the conclusion that the most conservative results are obtained using regression analysis.
MGAS: a powerful tool for multivariate gene-based genome-wide association analysis.
Van der Sluis, Sophie; Dolan, Conor V; Li, Jiang; Song, Youqiang; Sham, Pak; Posthuma, Danielle; Li, Miao-Xin
2015-04-01
Standard genome-wide association studies, testing the association between one phenotype and a large number of single nucleotide polymorphisms (SNPs), are limited in two ways: (i) traits are often multivariate, and analysis of composite scores entails loss in statistical power and (ii) gene-based analyses may be preferred, e.g. to decrease the multiple testing problem. Here we present a new method, multivariate gene-based association test by extended Simes procedure (MGAS), that allows gene-based testing of multivariate phenotypes in unrelated individuals. Through extensive simulation, we show that under most trait-generating genotype-phenotype models MGAS has superior statistical power to detect associated genes compared with gene-based analyses of univariate phenotypic composite scores (i.e. GATES, multiple regression), and multivariate analysis of variance (MANOVA). Re-analysis of metabolic data revealed 32 False Discovery Rate controlled genome-wide significant genes, and 12 regions harboring multiple genes; of these 44 regions, 30 were not reported in the original analysis. MGAS allows researchers to conduct their multivariate gene-based analyses efficiently, and without the loss of power that is often associated with an incorrectly specified genotype-phenotype models. MGAS is freely available in KGG v3.0 (http://statgenpro.psychiatry.hku.hk/limx/kgg/download.php). Access to the metabolic dataset can be requested at dbGaP (https://dbgap.ncbi.nlm.nih.gov/). The R-simulation code is available from http://ctglab.nl/people/sophie_van_der_sluis. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Research Design and Statistical Methods in Indian Medical Journals: A Retrospective Survey
Hassan, Shabbeer; Yellur, Rajashree; Subramani, Pooventhan; Adiga, Poornima; Gokhale, Manoj; Iyer, Manasa S.; Mayya, Shreemathi S.
2015-01-01
Good quality medical research generally requires not only an expertise in the chosen medical field of interest but also a sound knowledge of statistical methodology. The number of medical research articles which have been published in Indian medical journals has increased quite substantially in the past decade. The aim of this study was to collate all evidence on study design quality and statistical analyses used in selected leading Indian medical journals. Ten (10) leading Indian medical journals were selected based on impact factors and all original research articles published in 2003 (N = 588) and 2013 (N = 774) were categorized and reviewed. A validated checklist on study design, statistical analyses, results presentation, and interpretation was used for review and evaluation of the articles. Main outcomes considered in the present study were – study design types and their frequencies, error/defects proportion in study design, statistical analyses, and implementation of CONSORT checklist in RCT (randomized clinical trials). From 2003 to 2013: The proportion of erroneous statistical analyses did not decrease (χ2=0.592, Φ=0.027, p=0.4418), 25% (80/320) in 2003 compared to 22.6% (111/490) in 2013. Compared with 2003, significant improvement was seen in 2013; the proportion of papers using statistical tests increased significantly (χ2=26.96, Φ=0.16, p<0.0001) from 42.5% (250/588) to 56.7 % (439/774). The overall proportion of errors in study design decreased significantly (χ2=16.783, Φ=0.12 p<0.0001), 41.3% (243/588) compared to 30.6% (237/774). In 2013, randomized clinical trials designs has remained very low (7.3%, 43/588) with majority showing some errors (41 papers, 95.3%). Majority of the published studies were retrospective in nature both in 2003 [79.1% (465/588)] and in 2013 [78.2% (605/774)]. Major decreases in error proportions were observed in both results presentation (χ2=24.477, Φ=0.17, p<0.0001), 82.2% (263/320) compared to 66.3% (325/490) and interpretation (χ2=25.616, Φ=0.173, p<0.0001), 32.5% (104/320) compared to 17.1% (84/490), though some serious ones were still present. Indian medical research seems to have made no major progress regarding using correct statistical analyses, but error/defects in study designs have decreased significantly. Randomized clinical trials are quite rarely published and have high proportion of methodological problems. PMID:25856194
Research design and statistical methods in Indian medical journals: a retrospective survey.
Hassan, Shabbeer; Yellur, Rajashree; Subramani, Pooventhan; Adiga, Poornima; Gokhale, Manoj; Iyer, Manasa S; Mayya, Shreemathi S
2015-01-01
Good quality medical research generally requires not only an expertise in the chosen medical field of interest but also a sound knowledge of statistical methodology. The number of medical research articles which have been published in Indian medical journals has increased quite substantially in the past decade. The aim of this study was to collate all evidence on study design quality and statistical analyses used in selected leading Indian medical journals. Ten (10) leading Indian medical journals were selected based on impact factors and all original research articles published in 2003 (N = 588) and 2013 (N = 774) were categorized and reviewed. A validated checklist on study design, statistical analyses, results presentation, and interpretation was used for review and evaluation of the articles. Main outcomes considered in the present study were - study design types and their frequencies, error/defects proportion in study design, statistical analyses, and implementation of CONSORT checklist in RCT (randomized clinical trials). From 2003 to 2013: The proportion of erroneous statistical analyses did not decrease (χ2=0.592, Φ=0.027, p=0.4418), 25% (80/320) in 2003 compared to 22.6% (111/490) in 2013. Compared with 2003, significant improvement was seen in 2013; the proportion of papers using statistical tests increased significantly (χ2=26.96, Φ=0.16, p<0.0001) from 42.5% (250/588) to 56.7 % (439/774). The overall proportion of errors in study design decreased significantly (χ2=16.783, Φ=0.12 p<0.0001), 41.3% (243/588) compared to 30.6% (237/774). In 2013, randomized clinical trials designs has remained very low (7.3%, 43/588) with majority showing some errors (41 papers, 95.3%). Majority of the published studies were retrospective in nature both in 2003 [79.1% (465/588)] and in 2013 [78.2% (605/774)]. Major decreases in error proportions were observed in both results presentation (χ2=24.477, Φ=0.17, p<0.0001), 82.2% (263/320) compared to 66.3% (325/490) and interpretation (χ2=25.616, Φ=0.173, p<0.0001), 32.5% (104/320) compared to 17.1% (84/490), though some serious ones were still present. Indian medical research seems to have made no major progress regarding using correct statistical analyses, but error/defects in study designs have decreased significantly. Randomized clinical trials are quite rarely published and have high proportion of methodological problems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Josse, Florent; Lefebvre, Yannick; Todeschini, Patrick
2006-07-01
Assessing the structural integrity of a nuclear Reactor Pressure Vessel (RPV) subjected to pressurized-thermal-shock (PTS) transients is extremely important to safety. In addition to conventional deterministic calculations to confirm RPV integrity, Electricite de France (EDF) carries out probabilistic analyses. Probabilistic analyses are interesting because some key variables, albeit conventionally taken at conservative values, can be modeled more accurately through statistical variability. One variable which significantly affects RPV structural integrity assessment is cleavage fracture initiation toughness. The reference fracture toughness method currently in use at EDF is the RCCM and ASME Code lower-bound K{sub IC} based on the indexing parameter RT{submore » NDT}. However, in order to quantify the toughness scatter for probabilistic analyses, the master curve method is being analyzed at present. Furthermore, the master curve method is a direct means of evaluating fracture toughness based on K{sub JC} data. In the framework of the master curve investigation undertaken by EDF, this article deals with the following two statistical items: building a master curve from an extract of a fracture toughness dataset (from the European project 'Unified Reference Fracture Toughness Design curves for RPV Steels') and controlling statistical uncertainty for both mono-temperature and multi-temperature tests. Concerning the first point, master curve temperature dependence is empirical in nature. To determine the 'original' master curve, Wallin postulated that a unified description of fracture toughness temperature dependence for ferritic steels is possible, and used a large number of data corresponding to nuclear-grade pressure vessel steels and welds. Our working hypothesis is that some ferritic steels may behave in slightly different ways. Therefore we focused exclusively on the basic french reactor vessel metal of types A508 Class 3 and A 533 grade B Class 1, taking the sampling level and direction into account as well as the test specimen type. As for the second point, the emphasis is placed on the uncertainties in applying the master curve approach. For a toughness dataset based on different specimens of a single product, application of the master curve methodology requires the statistical estimation of one parameter: the reference temperature T{sub 0}. Because of the limited number of specimens, estimation of this temperature is uncertain. The ASTM standard provides a rough evaluation of this statistical uncertainty through an approximate confidence interval. In this paper, a thorough study is carried out to build more meaningful confidence intervals (for both mono-temperature and multi-temperature tests). These results ensure better control over uncertainty, and allow rigorous analysis of the impact of its influencing factors: the number of specimens and the temperatures at which they have been tested. (authors)« less
BRepertoire: a user-friendly web server for analysing antibody repertoire data.
Margreitter, Christian; Lu, Hui-Chun; Townsend, Catherine; Stewart, Alexander; Dunn-Walters, Deborah K; Fraternali, Franca
2018-04-14
Antibody repertoire analysis by high throughput sequencing is now widely used, but a persisting challenge is enabling immunologists to explore their data to discover discriminating repertoire features for their own particular investigations. Computational methods are necessary for large-scale evaluation of antibody properties. We have developed BRepertoire, a suite of user-friendly web-based software tools for large-scale statistical analyses of repertoire data. The software is able to use data preprocessed by IMGT, and performs statistical and comparative analyses with versatile plotting options. BRepertoire has been designed to operate in various modes, for example analysing sequence-specific V(D)J gene usage, discerning physico-chemical properties of the CDR regions and clustering of clonotypes. Those analyses are performed on the fly by a number of R packages and are deployed by a shiny web platform. The user can download the analysed data in different table formats and save the generated plots as image files ready for publication. We believe BRepertoire to be a versatile analytical tool that complements experimental studies of immune repertoires. To illustrate the server's functionality, we show use cases including differential gene usage in a vaccination dataset and analysis of CDR3H properties in old and young individuals. The server is accessible under http://mabra.biomed.kcl.ac.uk/BRepertoire.
Niclasen, Janni; Keilow, Maria; Obel, Carsten
2018-05-01
Well-being is considered a prerequisite for learning. The Danish Ministry of Education initiated the development of a new 40-item student well-being questionnaire in 2014 to monitor well-being among all Danish public school students on a yearly basis. The aim of this study was to investigate the basic psychometric properties of this questionnaire. We used the data from the 2015 Danish student well-being survey for 268,357 students in grades 4-9 (about 85% of the study population). Descriptive statistics, exploratory factor analyses, confirmatory factor analyses and Cronbach's α reliability measures were used in the analyses. The factor analyses did not unambiguously support one particular factor structure. However, based on the basic descriptive statistics, exploratory factor analyses, confirmatory factor analyses, the semantics of the individual items and Cronbach's α, we propose a four-factor structure including 27 of the 40 items originally proposed. The four scales measure school connectedness, learning self-efficacy, learning environment and classroom management. Two bullying items and two psychosomatic items should be considered separately, leaving 31 items in the questionnaire. The proposed four-factor structure addresses central aspects of well-being, which, if used constructively, may support public schools' work to increase levels of student well-being.
ERIC Educational Resources Information Center
Klatt, Malgorzata; Clarke, Kira; Dulfer, Nicky
2017-01-01
This paper highlights troubling patterns within the Australian School-based Apprenticeships and Traineeships (SBATs) by analysing statistical data of 21,000 of 15-19 year old apprenticeship/traineeship learners engaged in Vocational Education and Training in School (VETiS). It confirms the alignment of social groups to certain qualification fields…
[Clinical research XXIII. From clinical judgment to meta-analyses].
Rivas-Ruiz, Rodolfo; Castelán-Martínez, Osvaldo D; Pérez-Rodríguez, Marcela; Palacios-Cruz, Lino; Noyola-Castillo, Maura E; Talavera, Juan O
2014-01-01
Systematic reviews (SR) are studies made in order to ask clinical questions based on original articles. Meta-analysis (MTA) is the mathematical analysis of SR. These analyses are divided in two groups, those which evaluate the measured results of quantitative variables (for example, the body mass index -BMI-) and those which evaluate qualitative variables (for example, if a patient is alive or dead, or if he is healing or not). Quantitative variables generally use the mean difference analysis and qualitative variables can be performed using several calculations: odds ratio (OR), relative risk (RR), absolute risk reduction (ARR) and hazard ratio (HR). These analyses are represented through forest plots which allow the evaluation of each individual study, as well as the heterogeneity between studies and the overall effect of the intervention. These analyses are mainly based on Student's t test and chi-squared. To take appropriate decisions based on the MTA, it is important to understand the characteristics of statistical methods in order to avoid misinterpretations.
Fatality Reduction by Air Bags: Analyses of Accident Data through Early 1996
DOT National Transportation Integrated Search
1996-08-01
The fatality risk of front-seat occupants of passenger cars and light trucks equipped with air bags is compared to the corresponding risk in similar vehicles without air bags, based on statistical analysis of Fatal Accident Reporting System (FARS)dat...
USDA-ARS?s Scientific Manuscript database
Phylogenetic analyses of species of Streptomyces based on 16S rRNA gene sequences resulted in a statistically well-supported clade (100% bootstrap value) containing 8 species having very similar gross morphology. These species, including Streptomyces bambergiensis, Streptomyces chlorus, Streptomyces...
Jia, Erik; Chen, Tianlu
2018-01-01
Left-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been developed and applied to the situation of MNAR in the field of metabolomics. Thus, a practical left-censored missing value imputation method is urgently needed. We developed an iterative Gibbs sampler based left-censored missing value imputation approach (GSimp). We compared GSimp with other three imputation methods on two real-world targeted metabolomics datasets and one simulation dataset using our imputation evaluation pipeline. The results show that GSimp outperforms other imputation methods in terms of imputation accuracy, observation distribution, univariate and multivariate analyses, and statistical sensitivity. Additionally, a parallel version of GSimp was developed for dealing with large scale metabolomics datasets. The R code for GSimp, evaluation pipeline, tutorial, real-world and simulated targeted metabolomics datasets are available at: https://github.com/WandeRum/GSimp. PMID:29385130
Bianchi, Anna Rita; Ferreri, Carla; Ruggiero, Simona; Deplano, Simone; Sunda, Valentina; Galloro, Giuseppe; Formisano, Cesare; Mennella, Maria Rosaria Faraone
2016-01-01
Establishing by statistical analyses whether the analyses of auto-modified poly(ADP-ribose)polymerase and erythrocyte membrane fatty acid composition (Fat Profile(®)), separately or in tandem, help monitoring the physio-pathology of the cell, and correlate with diseases, if present. Ninety five subjects were interviewed and analyzed blindly. Blood lymphocytes and erythrocytes were prepared to assay poly(ADP-ribose)polymerase automodification and fatty acid based membrane lipidome, respectively. Poly(ADP-ribose)polymerase automodification levels confirmed their correlation with DNA damage extent, and allowed monitoring disease activity, upon surgical/therapeutic treatment. Membrane lipidome profiles showed lipid unbalance mainly linked to inflammatory states. Statistically both tests were separately significant, and correlated each other within some pathologies. In the laboratory routine, both tests, separately or in tandem, might be a preliminary and helpful step to investigate the occurrence of a given disease. Their combination represents a promising integrated panel for sensible, noninvasive and routine health monitoring.
Statistical analysis of the determinations of the Sun's Galactocentric distance
NASA Astrophysics Data System (ADS)
Malkin, Zinovy
2013-02-01
Based on several tens of R0 measurements made during the past two decades, several studies have been performed to derive the best estimate of R0. Some used just simple averaging to derive a result, whereas others provided comprehensive analyses of possible errors in published results. In either case, detailed statistical analyses of data used were not performed. However, a computation of the best estimates of the Galactic rotation constants is not only an astronomical but also a metrological task. Here we perform an analysis of 53 R0 measurements (published in the past 20 years) to assess the consistency of the data. Our analysis shows that they are internally consistent. It is also shown that any trend in the R0 estimates from the last 20 years is statistically negligible, which renders the presence of a bandwagon effect doubtful. On the other hand, the formal errors in the published R0 estimates improve significantly with time.
Statistical thermodynamics of strain hardening in polycrystalline solids
Langer, James S.
2015-09-18
This paper starts with a systematic rederivation of the statistical thermodynamic equations of motion for dislocation-mediated plasticity proposed in 2010 by Langer, Bouchbinder, and Lookman. The paper then uses that theory to explain the anomalous rate-hardening behavior reported in 1988 by Follansbee and Kocks and to explore the relation between hardening rate and grain size reported in 1995 by Meyers et al. A central theme is the need for physics-based, nonequilibrium analyses in developing predictive theories of the strength of polycrystalline materials.
Statistical Performances of Resistive Active Power Splitter
NASA Astrophysics Data System (ADS)
Lalléchère, Sébastien; Ravelo, Blaise; Thakur, Atul
2016-03-01
In this paper, the synthesis and sensitivity analysis of an active power splitter (PWS) is proposed. It is based on the active cell composed of a Field Effect Transistor in cascade with shunted resistor at the input and the output (resistive amplifier topology). The PWS uncertainty versus resistance tolerances is suggested by using stochastic method. Furthermore, with the proposed topology, we can control easily the device gain while varying a resistance. This provides useful tool to analyse the statistical sensitivity of the system in uncertain environment.
Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.
Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P; Patterson, Nick; Price, Alkes L
2014-10-15
Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary materials are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
How Big of a Problem is Analytic Error in Secondary Analyses of Survey Data?
West, Brady T; Sakshaug, Joseph W; Aurelien, Guy Alain S
2016-01-01
Secondary analyses of survey data collected from large probability samples of persons or establishments further scientific progress in many fields. The complex design features of these samples improve data collection efficiency, but also require analysts to account for these features when conducting analysis. Unfortunately, many secondary analysts from fields outside of statistics, biostatistics, and survey methodology do not have adequate training in this area, and as a result may apply incorrect statistical methods when analyzing these survey data sets. This in turn could lead to the publication of incorrect inferences based on the survey data that effectively negate the resources dedicated to these surveys. In this article, we build on the results of a preliminary meta-analysis of 100 peer-reviewed journal articles presenting analyses of data from a variety of national health surveys, which suggested that analytic errors may be extremely prevalent in these types of investigations. We first perform a meta-analysis of a stratified random sample of 145 additional research products analyzing survey data from the Scientists and Engineers Statistical Data System (SESTAT), which describes features of the U.S. Science and Engineering workforce, and examine trends in the prevalence of analytic error across the decades used to stratify the sample. We once again find that analytic errors appear to be quite prevalent in these studies. Next, we present several example analyses of real SESTAT data, and demonstrate that a failure to perform these analyses correctly can result in substantially biased estimates with standard errors that do not adequately reflect complex sample design features. Collectively, the results of this investigation suggest that reviewers of this type of research need to pay much closer attention to the analytic methods employed by researchers attempting to publish or present secondary analyses of survey data.
How Big of a Problem is Analytic Error in Secondary Analyses of Survey Data?
West, Brady T.; Sakshaug, Joseph W.; Aurelien, Guy Alain S.
2016-01-01
Secondary analyses of survey data collected from large probability samples of persons or establishments further scientific progress in many fields. The complex design features of these samples improve data collection efficiency, but also require analysts to account for these features when conducting analysis. Unfortunately, many secondary analysts from fields outside of statistics, biostatistics, and survey methodology do not have adequate training in this area, and as a result may apply incorrect statistical methods when analyzing these survey data sets. This in turn could lead to the publication of incorrect inferences based on the survey data that effectively negate the resources dedicated to these surveys. In this article, we build on the results of a preliminary meta-analysis of 100 peer-reviewed journal articles presenting analyses of data from a variety of national health surveys, which suggested that analytic errors may be extremely prevalent in these types of investigations. We first perform a meta-analysis of a stratified random sample of 145 additional research products analyzing survey data from the Scientists and Engineers Statistical Data System (SESTAT), which describes features of the U.S. Science and Engineering workforce, and examine trends in the prevalence of analytic error across the decades used to stratify the sample. We once again find that analytic errors appear to be quite prevalent in these studies. Next, we present several example analyses of real SESTAT data, and demonstrate that a failure to perform these analyses correctly can result in substantially biased estimates with standard errors that do not adequately reflect complex sample design features. Collectively, the results of this investigation suggest that reviewers of this type of research need to pay much closer attention to the analytic methods employed by researchers attempting to publish or present secondary analyses of survey data. PMID:27355817
Hydrometeorological and statistical analyses of heavy rainfall in Midwestern USA
NASA Astrophysics Data System (ADS)
Thorndahl, S.; Smith, J. A.; Krajewski, W. F.
2012-04-01
During the last two decades the mid-western states of the United States of America has been largely afflicted by heavy flood producing rainfall. Several of these storms seem to have similar hydrometeorological properties in terms of pattern, track, evolution, life cycle, clustering, etc. which raise the question if it is possible to derive general characteristics of the space-time structures of these heavy storms. This is important in order to understand hydrometeorological features, e.g. how storms evolve and with what frequency we can expect extreme storms to occur. In the literature, most studies of extreme rainfall are based on point measurements (rain gauges). However, with high resolution and quality radar observation periods exceeding more than two decades, it is possible to do long-term spatio-temporal statistical analyses of extremes. This makes it possible to link return periods to distributed rainfall estimates and to study precipitation structures which cause floods. However, doing these statistical frequency analyses of rainfall based on radar observations introduces some different challenges, converting radar reflectivity observations to "true" rainfall, which are not problematic doing traditional analyses on rain gauge data. It is for example difficult to distinguish reflectivity from high intensity rain from reflectivity from other hydrometeors such as hail, especially using single polarization radars which are used in this study. Furthermore, reflectivity from bright band (melting layer) should be discarded and anomalous propagation should be corrected in order to produce valid statistics of extreme radar rainfall. Other challenges include combining observations from several radars to one mosaic, bias correction against rain gauges, range correction, ZR-relationships, etc. The present study analyzes radar rainfall observations from 1996 to 2011 based the American NEXRAD network of radars over an area covering parts of Iowa, Wisconsin, Illinois, and Lake Michigan. The radar observations are processed using Hydro-NEXRAD algorithms in order to produce rainfall estimates with a spatial resolution of 1 km and a temporal resolution of 15 min. The rainfall estimates are bias-corrected on a daily basis using a network of rain gauges. Besides a thorough evaluation of the different challenges in investigating heavy rain as described above the study includes suggestions for frequency analysis methods as well as studies of hydrometeorological features of single events.
Vieira, Rute; McDonald, Suzanne; Araújo-Soares, Vera; Sniehotta, Falko F; Henderson, Robin
2017-09-01
N-of-1 studies are based on repeated observations within an individual or unit over time and are acknowledged as an important research method for generating scientific evidence about the health or behaviour of an individual. Statistical analyses of n-of-1 data require accurate modelling of the outcome while accounting for its distribution, time-related trend and error structures (e.g., autocorrelation) as well as reporting readily usable contextualised effect sizes for decision-making. A number of statistical approaches have been documented but no consensus exists on which method is most appropriate for which type of n-of-1 design. We discuss the statistical considerations for analysing n-of-1 studies and briefly review some currently used methodologies. We describe dynamic regression modelling as a flexible and powerful approach, adaptable to different types of outcomes and capable of dealing with the different challenges inherent to n-of-1 statistical modelling. Dynamic modelling borrows ideas from longitudinal and event history methodologies which explicitly incorporate the role of time and the influence of past on future. We also present an illustrative example of the use of dynamic regression on monitoring physical activity during the retirement transition. Dynamic modelling has the potential to expand researchers' access to robust and user-friendly statistical methods for individualised studies.
ParallABEL: an R library for generalized parallelization of genome-wide association studies
2010-01-01
Background Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files. Results Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was linearly reduced from approximately eight hours to one hour when ParallABEL employed eight processors. Conclusions Executing genome-wide association analysis using the ParallABEL library on a computer cluster is an effective way to boost performance, and simplify the parallelization of GWA studies. ParallABEL is a user-friendly parallelization of GenABEL. PMID:20429914
Statistical Data Analyses of Trace Chemical, Biochemical, and Physical Analytical Signatures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Udey, Ruth Norma
Analytical and bioanalytical chemistry measurement results are most meaningful when interpreted using rigorous statistical treatments of the data. The same data set may provide many dimensions of information depending on the questions asked through the applied statistical methods. Three principal projects illustrated the wealth of information gained through the application of statistical data analyses to diverse problems.
Borrowing of strength and study weights in multivariate and network meta-analysis.
Jackson, Dan; White, Ian R; Price, Malcolm; Copas, John; Riley, Richard D
2017-12-01
Multivariate and network meta-analysis have the potential for the estimated mean of one effect to borrow strength from the data on other effects of interest. The extent of this borrowing of strength is usually assessed informally. We present new mathematical definitions of 'borrowing of strength'. Our main proposal is based on a decomposition of the score statistic, which we show can be interpreted as comparing the precision of estimates from the multivariate and univariate models. Our definition of borrowing of strength therefore emulates the usual informal assessment. We also derive a method for calculating study weights, which we embed into the same framework as our borrowing of strength statistics, so that percentage study weights can accompany the results from multivariate and network meta-analyses as they do in conventional univariate meta-analyses. Our proposals are illustrated using three meta-analyses involving correlated effects for multiple outcomes, multiple risk factor associations and multiple treatments (network meta-analysis).
Borrowing of strength and study weights in multivariate and network meta-analysis
Jackson, Dan; White, Ian R; Price, Malcolm; Copas, John; Riley, Richard D
2016-01-01
Multivariate and network meta-analysis have the potential for the estimated mean of one effect to borrow strength from the data on other effects of interest. The extent of this borrowing of strength is usually assessed informally. We present new mathematical definitions of ‘borrowing of strength’. Our main proposal is based on a decomposition of the score statistic, which we show can be interpreted as comparing the precision of estimates from the multivariate and univariate models. Our definition of borrowing of strength therefore emulates the usual informal assessment. We also derive a method for calculating study weights, which we embed into the same framework as our borrowing of strength statistics, so that percentage study weights can accompany the results from multivariate and network meta-analyses as they do in conventional univariate meta-analyses. Our proposals are illustrated using three meta-analyses involving correlated effects for multiple outcomes, multiple risk factor associations and multiple treatments (network meta-analysis). PMID:26546254
Redmond, Tony; O'Leary, Neil; Hutchison, Donna M; Nicolela, Marcelo T; Artes, Paul H; Chauhan, Balwantray C
2013-12-01
A new analysis method called permutation of pointwise linear regression measures the significance of deterioration over time at each visual field location, combines the significance values into an overall statistic, and then determines the likelihood of change in the visual field. Because the outcome is a single P value, individualized to that specific visual field and independent of the scale of the original measurement, the method is well suited for comparing techniques with different stimuli and scales. To test the hypothesis that frequency-doubling matrix perimetry (FDT2) is more sensitive than standard automated perimetry (SAP) in identifying visual field progression in glaucoma. Patients with open-angle glaucoma and healthy controls were examined by FDT2 and SAP, both with the 24-2 test pattern, on the same day at 6-month intervals in a longitudinal prospective study conducted in a hospital-based setting. Only participants with at least 5 examinations were included. Data were analyzed with permutation of pointwise linear regression. Permutation of pointwise linear regression is individualized to each participant, in contrast to current analyses in which the statistical significance is inferred from population-based approaches. Analyses were performed with both total deviation and pattern deviation. Sixty-four patients and 36 controls were included in the study. The median age, SAP mean deviation, and follow-up period were 65 years, -2.6 dB, and 5.4 years, respectively, in patients and 62 years, +0.4 dB, and 5.2 years, respectively, in controls. Using total deviation analyses, statistically significant deterioration was identified in 17% of patients with FDT2, in 34% of patients with SAP, and in 14% of patients with both techniques; in controls these percentages were 8% with FDT2, 31% with SAP, and 8% with both. Using pattern deviation analyses, statistically significant deterioration was identified in 16% of patients with FDT2, in 17% of patients with SAP, and in 3% of patients with both techniques; in controls these values were 3% with FDT2 and none with SAP. No evidence was found that FDT2 is more sensitive than SAP in identifying visual field deterioration. In about one-third of healthy controls, age-related deterioration with SAP reached statistical significance.
Metz, Anneke M
2008-01-01
There is an increasing need for students in the biological sciences to build a strong foundation in quantitative approaches to data analyses. Although most science, engineering, and math field majors are required to take at least one statistics course, statistical analysis is poorly integrated into undergraduate biology course work, particularly at the lower-division level. Elements of statistics were incorporated into an introductory biology course, including a review of statistics concepts and opportunity for students to perform statistical analysis in a biological context. Learning gains were measured with an 11-item statistics learning survey instrument developed for the course. Students showed a statistically significant 25% (p < 0.005) increase in statistics knowledge after completing introductory biology. Students improved their scores on the survey after completing introductory biology, even if they had previously completed an introductory statistics course (9%, improvement p < 0.005). Students retested 1 yr after completing introductory biology showed no loss of their statistics knowledge as measured by this instrument, suggesting that the use of statistics in biology course work may aid long-term retention of statistics knowledge. No statistically significant differences in learning were detected between male and female students in the study.
Assessment and evaluations of I-80 truck loads and their load effects : final report.
DOT National Transportation Integrated Search
2016-12-01
The research objective is to examine the safety of Wyoming bridges on the I-80 corridor considering the actual truck traffic on the : interstate based upon weigh in motion (WIM) data. This was accomplished by performing statistical analyses to determ...
Ekins, Sean; Olechno, Joe; Williams, Antony J.
2013-01-01
Dispensing and dilution processes may profoundly influence estimates of biological activity of compounds. Published data show Ephrin type-B receptor 4 IC50 values obtained via tip-based serial dilution and dispensing versus acoustic dispensing with direct dilution differ by orders of magnitude with no correlation or ranking of datasets. We generated computational 3D pharmacophores based on data derived by both acoustic and tip-based transfer. The computed pharmacophores differ significantly depending upon dispensing and dilution methods. The acoustic dispensing-derived pharmacophore correctly identified active compounds in a subsequent test set where the tip-based method failed. Data from acoustic dispensing generates a pharmacophore containing two hydrophobic features, one hydrogen bond donor and one hydrogen bond acceptor. This is consistent with X-ray crystallography studies of ligand-protein interactions and automatically generated pharmacophores derived from this structural data. In contrast, the tip-based data suggest a pharmacophore with two hydrogen bond acceptors, one hydrogen bond donor and no hydrophobic features. This pharmacophore is inconsistent with the X-ray crystallographic studies and automatically generated pharmacophores. In short, traditional dispensing processes are another important source of error in high-throughput screening that impacts computational and statistical analyses. These findings have far-reaching implications in biological research. PMID:23658723
Multi-country health surveys: are the analyses misleading?
Masood, Mohd; Reidpath, Daniel D
2014-05-01
The aim of this paper was to review the types of approaches currently utilized in the analysis of multi-country survey data, specifically focusing on design and modeling issues with a focus on analyses of significant multi-country surveys published in 2010. A systematic search strategy was used to identify the 10 multi-country surveys and the articles published from them in 2010. The surveys were selected to reflect diverse topics and foci; and provide an insight into analytic approaches across research themes. The search identified 159 articles appropriate for full text review and data extraction. The analyses adopted in the multi-country surveys can be broadly classified as: univariate/bivariate analyses, and multivariate/multivariable analyses. Multivariate/multivariable analyses may be further divided into design- and model-based analyses. Of the 159 articles reviewed, 129 articles used model-based analysis, 30 articles used design-based analyses. Similar patterns could be seen in all the individual surveys. While there is general agreement among survey statisticians that complex surveys are most appropriately analyzed using design-based analyses, most researchers continued to use the more common model-based approaches. Recent developments in design-based multi-level analysis may be one approach to include all the survey design characteristics. This is a relatively new area, however, and there remains statistical, as well as applied analytic research required. An important limitation of this study relates to the selection of the surveys used and the choice of year for the analysis, i.e., year 2010 only. There is, however, no strong reason to believe that analytic strategies have changed radically in the past few years, and 2010 provides a credible snapshot of current practice.
Kent, David M; Dahabreh, Issa J; Ruthazer, Robin; Furlan, Anthony J; Weimar, Christian; Serena, Joaquín; Meier, Bernhard; Mattle, Heinrich P; Di Angelantonio, Emanuele; Paciaroni, Maurizio; Schuchlenz, Herwig; Homma, Shunichi; Lutz, Jennifer S; Thaler, David E
2015-09-14
The preferred antithrombotic strategy for secondary prevention in patients with cryptogenic stroke (CS) and patent foramen ovale (PFO) is unknown. We pooled multiple observational studies and used propensity score-based methods to estimate the comparative effectiveness of oral anticoagulation (OAC) compared with antiplatelet therapy (APT). Individual participant data from 12 databases of medically treated patients with CS and PFO were analysed with Cox regression models, to estimate database-specific hazard ratios (HRs) comparing OAC with APT, for both the primary composite outcome [recurrent stroke, transient ischaemic attack (TIA), or death] and stroke alone. Propensity scores were applied via inverse probability of treatment weighting to control for confounding. We synthesized database-specific HRs using random-effects meta-analysis models. This analysis included 2385 (OAC = 804 and APT = 1581) patients with 227 composite endpoints (stroke/TIA/death). The difference between OAC and APT was not statistically significant for the primary composite outcome [adjusted HR = 0.76, 95% confidence interval (CI) 0.52-1.12] or for the secondary outcome of stroke alone (adjusted HR = 0.75, 95% CI 0.44-1.27). Results were consistent in analyses applying alternative weighting schemes, with the exception that OAC had a statistically significant beneficial effect on the composite outcome in analyses standardized to the patient population who actually received APT (adjusted HR = 0.64, 95% CI 0.42-0.99). Subgroup analyses did not detect statistically significant heterogeneity of treatment effects across clinically important patient groups. We did not find a statistically significant difference comparing OAC with APT; our results justify randomized trials comparing different antithrombotic approaches in these patients. Published on behalf of the European Society of Cardiology. All rights reserved. © The Author 2015. For permissions please email: journals.permissions@oup.com.
Biological Parametric Mapping: A Statistical Toolbox for Multi-Modality Brain Image Analysis
Casanova, Ramon; Ryali, Srikanth; Baer, Aaron; Laurienti, Paul J.; Burdette, Jonathan H.; Hayasaka, Satoru; Flowers, Lynn; Wood, Frank; Maldjian, Joseph A.
2006-01-01
In recent years multiple brain MR imaging modalities have emerged; however, analysis methodologies have mainly remained modality specific. In addition, when comparing across imaging modalities, most researchers have been forced to rely on simple region-of-interest type analyses, which do not allow the voxel-by-voxel comparisons necessary to answer more sophisticated neuroscience questions. To overcome these limitations, we developed a toolbox for multimodal image analysis called biological parametric mapping (BPM), based on a voxel-wise use of the general linear model. The BPM toolbox incorporates information obtained from other modalities as regressors in a voxel-wise analysis, thereby permitting investigation of more sophisticated hypotheses. The BPM toolbox has been developed in MATLAB with a user friendly interface for performing analyses, including voxel-wise multimodal correlation, ANCOVA, and multiple regression. It has a high degree of integration with the SPM (statistical parametric mapping) software relying on it for visualization and statistical inference. Furthermore, statistical inference for a correlation field, rather than a widely-used T-field, has been implemented in the correlation analysis for more accurate results. An example with in-vivo data is presented demonstrating the potential of the BPM methodology as a tool for multimodal image analysis. PMID:17070709
Lee, Geunho; Lee, Hyun Beom; Jung, Byung Hwa; Nam, Hojung
2017-07-01
Mass spectrometry (MS) data are used to analyze biological phenomena based on chemical species. However, these data often contain unexpected duplicate records and missing values due to technical or biological factors. These 'dirty data' problems increase the difficulty of performing MS analyses because they lead to performance degradation when statistical or machine-learning tests are applied to the data. Thus, we have developed missing values preprocessor (mvp), an open-source software for preprocessing data that might include duplicate records and missing values. mvp uses the property of MS data in which identical chemical species present the same or similar values for key identifiers, such as the mass-to-charge ratio and intensity signal, and forms cliques via graph theory to process dirty data. We evaluated the validity of the mvp process via quantitative and qualitative analyses and compared the results from a statistical test that analyzed the original and mvp-applied data. This analysis showed that using mvp reduces problems associated with duplicate records and missing values. We also examined the effects of using unprocessed data in statistical tests and examined the improved statistical test results obtained with data preprocessed using mvp.
An Assessment of the Impact of the Department of Defense Very-High-Speed Integrated Circuit Program.
1982-01-01
analysis, statistical inference, device physics and other such products of basic research. Examples of such information would be: analyses of properties of...TB , for a n-p-n silicon transitor with 1018 cm- 3 base-doping, TB = Wb 2/2Dw becomes 0.4 ps in this limit so that the base contributes little to delay
CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets
Nowicka, Malgorzata; Krieg, Carsten; Weber, Lukas M.; Hartmann, Felix J.; Guglietta, Silvia; Becher, Burkhard; Levesque, Mitchell P.; Robinson, Mark D.
2017-01-01
High dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high throughput interrogation and characterization of cell populations.Here, we present an R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signaling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g. multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g. plots of aggregated signals). PMID:28663787
Regional analyses of labor markets and demography: a model based Norwegian example.
Stambol, L S; Stolen, N M; Avitsland, T
1998-01-01
The authors discuss the regional REGARD model, developed by Statistics Norway to analyze the regional implications of macroeconomic development of employment, labor force, and unemployment. "In building the model, empirical analyses of regional producer behavior in manufacturing industries have been performed, and the relation between labor market development and regional migration has been investigated. Apart from providing a short description of the REGARD model, this article demonstrates the functioning of the model, and presents some results of an application." excerpt
Anatomy of the Higgs fits: A first guide to statistical treatments of the theoretical uncertainties
NASA Astrophysics Data System (ADS)
Fichet, Sylvain; Moreau, Grégory
2016-04-01
The studies of the Higgs boson couplings based on the recent and upcoming LHC data open up a new window on physics beyond the Standard Model. In this paper, we propose a statistical guide to the consistent treatment of the theoretical uncertainties entering the Higgs rate fits. Both the Bayesian and frequentist approaches are systematically analysed in a unified formalism. We present analytical expressions for the marginal likelihoods, useful to implement simultaneously the experimental and theoretical uncertainties. We review the various origins of the theoretical errors (QCD, EFT, PDF, production mode contamination…). All these individual uncertainties are thoroughly combined with the help of moment-based considerations. The theoretical correlations among Higgs detection channels appear to affect the location and size of the best-fit regions in the space of Higgs couplings. We discuss the recurrent question of the shape of the prior distributions for the individual theoretical errors and find that a nearly Gaussian prior arises from the error combinations. We also develop the bias approach, which is an alternative to marginalisation providing more conservative results. The statistical framework to apply the bias principle is introduced and two realisations of the bias are proposed. Finally, depending on the statistical treatment, the Standard Model prediction for the Higgs signal strengths is found to lie within either the 68% or 95% confidence level region obtained from the latest analyses of the 7 and 8 TeV LHC datasets.
Turgeon, Ricky D; Wilby, Kyle J; Ensom, Mary H H
2015-06-01
We conducted a systematic review with meta-analysis to evaluate the efficacy of antiviral agents on complete recovery of Bell's palsy. We searched CENTRAL, Embase, MEDLINE, International Pharmaceutical Abstracts, and sources of unpublished literature to November 1, 2014. Primary and secondary outcomes were complete and satisfactory recovery, respectively. To evaluate statistical heterogeneity, we performed subgroup analysis of baseline severity of Bell's palsy and between-study sensitivity analyses based on risk of allocation and detection bias. The 10 included randomized controlled trials (2419 patients; 807 with severe Bell's palsy at onset) had variable risk of bias, with 9 trials having a high risk of bias in at least 1 domain. Complete recovery was not statistically significantly greater with antiviral use versus no antiviral use in the random-effects meta-analysis of 6 trials (relative risk, 1.06; 95% confidence interval, 0.97-1.16; I(2) = 65%). Conversely, random-effects meta-analysis of 9 trials showed a statistically significant difference in satisfactory recovery (relative risk, 1.10; 95% confidence interval, 1.02-1.18; I(2) = 63%). Response to antiviral agents did not differ visually or statistically between patients with severe symptoms at baseline and those with milder disease (test for interaction, P = .11). Sensitivity analyses did not show a clear effect of bias on outcomes. Antiviral agents are not efficacious in increasing the proportion of patients with Bell's palsy who achieved complete recovery, regardless of baseline symptom severity. Copyright © 2015 Elsevier Inc. All rights reserved.
Early Warning Signs of Suicide in Service Members Who Engage in Unauthorized Acts of Violence
2016-06-01
observable to military law enforcement personnel. Statistical analyses tested for differences in warning signs between cases of suicide, violence, or...indicators, (2) Behavioral Change indicators, (3) Social indicators, and (4) Occupational indicators. Statistical analyses were conducted to test for...6 Coding _________________________________________________________________ 7 Statistical
[Statistical analysis using freely-available "EZR (Easy R)" software].
Kanda, Yoshinobu
2015-10-01
Clinicians must often perform statistical analyses for purposes such evaluating preexisting evidence and designing or executing clinical studies. R is a free software environment for statistical computing. R supports many statistical analysis functions, but does not incorporate a statistical graphical user interface (GUI). The R commander provides an easy-to-use basic-statistics GUI for R. However, the statistical function of the R commander is limited, especially in the field of biostatistics. Therefore, the author added several important statistical functions to the R commander and named it "EZR (Easy R)", which is now being distributed on the following website: http://www.jichi.ac.jp/saitama-sct/. EZR allows the application of statistical functions that are frequently used in clinical studies, such as survival analyses, including competing risk analyses and the use of time-dependent covariates and so on, by point-and-click access. In addition, by saving the script automatically created by EZR, users can learn R script writing, maintain the traceability of the analysis, and assure that the statistical process is overseen by a supervisor.
Predation and fragmentation portrayed in the statistical structure of prey time series
Hendrichsen, Ditte K; Topping, Chris J; Forchhammer, Mads C
2009-01-01
Background Statistical autoregressive analyses of direct and delayed density dependence are widespread in ecological research. The models suggest that changes in ecological factors affecting density dependence, like predation and landscape heterogeneity are directly portrayed in the first and second order autoregressive parameters, and the models are therefore used to decipher complex biological patterns. However, independent tests of model predictions are complicated by the inherent variability of natural populations, where differences in landscape structure, climate or species composition prevent controlled repeated analyses. To circumvent this problem, we applied second-order autoregressive time series analyses to data generated by a realistic agent-based computer model. The model simulated life history decisions of individual field voles under controlled variations in predator pressure and landscape fragmentation. Analyses were made on three levels: comparisons between predated and non-predated populations, between populations exposed to different types of predators and between populations experiencing different degrees of habitat fragmentation. Results The results are unambiguous: Changes in landscape fragmentation and the numerical response of predators are clearly portrayed in the statistical time series structure as predicted by the autoregressive model. Populations without predators displayed significantly stronger negative direct density dependence than did those exposed to predators, where direct density dependence was only moderately negative. The effects of predation versus no predation had an even stronger effect on the delayed density dependence of the simulated prey populations. In non-predated prey populations, the coefficients of delayed density dependence were distinctly positive, whereas they were negative in predated populations. Similarly, increasing the degree of fragmentation of optimal habitat available to the prey was accompanied with a shift in the delayed density dependence, from strongly negative to gradually becoming less negative. Conclusion We conclude that statistical second-order autoregressive time series analyses are capable of deciphering interactions within and across trophic levels and their effect on direct and delayed density dependence. PMID:19419539
Schaid, Daniel J
2010-01-01
Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1]. Copyright © 2010 S. Karger AG, Basel.
How Historical Information Can Improve Extreme Value Analysis of Coastal Water Levels
NASA Astrophysics Data System (ADS)
Le Cozannet, G.; Bulteau, T.; Idier, D.; Lambert, J.; Garcin, M.
2016-12-01
The knowledge of extreme coastal water levels is useful for coastal flooding studies or the design of coastal defences. While deriving such extremes with standard analyses using tide gauge measurements, one often needs to deal with limited effective duration of observation which can result in large statistical uncertainties. This is even truer when one faces outliers, those particularly extreme values distant from the others. In a recent work (Bulteau et al., 2015), we investigated how historical information of past events reported in archives can reduce statistical uncertainties and relativize such outlying observations. We adapted a Bayesian Markov Chain Monte Carlo method, initially developed in the hydrology field (Reis and Stedinger, 2005), to the specific case of coastal water levels. We applied this method to the site of La Rochelle (France), where the storm Xynthia in 2010 generated a water level considered so far as an outlier. Based on 30 years of tide gauge measurements and 8 historical events since 1890, the results showed a significant decrease in statistical uncertainties on return levels when historical information is used. Also, Xynthia's water level no longer appeared as an outlier and we could have reasonably predicted the annual exceedance probability of that level beforehand (predictive probability for 2010 based on data until the end of 2009 of the same order of magnitude as the standard estimative probability using data until the end of 2010). Such results illustrate the usefulness of historical information in extreme value analyses of coastal water levels, as well as the relevance of the proposed method to integrate heterogeneous data in such analyses.
Barry, Samantha J; Pham, Tran N; Borman, Phil J; Edwards, Andrew J; Watson, Simon A
2012-01-27
The DMAIC (Define, Measure, Analyse, Improve and Control) framework and associated statistical tools have been applied to both identify and reduce variability observed in a quantitative (19)F solid-state NMR (SSNMR) analytical method. The method had been developed to quantify levels of an additional polymorph (Form 3) in batches of an active pharmaceutical ingredient (API), where Form 1 is the predominant polymorph. In order to validate analyses of the polymorphic form, a single batch of API was used as a standard each time the method was used. The level of Form 3 in this standard was observed to gradually increase over time, the effect not being immediately apparent due to method variability. In order to determine the cause of this unexpected increase and to reduce method variability, a risk-based statistical investigation was performed to identify potential factors which could be responsible for these effects. Factors identified by the risk assessment were investigated using a series of designed experiments to gain a greater understanding of the method. The increase of the level of Form 3 in the standard was primarily found to correlate with the number of repeat analyses, an effect not previously reported in SSNMR literature. Differences in data processing (phasing and linewidth) were found to be responsible for the variability in the method. After implementing corrective actions the variability was reduced such that the level of Form 3 was within an acceptable range of ±1% ww(-1) in fresh samples of API. Copyright © 2011. Published by Elsevier B.V.
Homeopathy: meta-analyses of pooled clinical data.
Hahn, Robert G
2013-01-01
In the first decade of the evidence-based era, which began in the mid-1990s, meta-analyses were used to scrutinize homeopathy for evidence of beneficial effects in medical conditions. In this review, meta-analyses including pooled data from placebo-controlled clinical trials of homeopathy and the aftermath in the form of debate articles were analyzed. In 1997 Klaus Linde and co-workers identified 89 clinical trials that showed an overall odds ratio of 2.45 in favor of homeopathy over placebo. There was a trend toward smaller benefit from studies of the highest quality, but the 10 trials with the highest Jadad score still showed homeopathy had a statistically significant effect. These results challenged academics to perform alternative analyses that, to demonstrate the lack of effect, relied on extensive exclusion of studies, often to the degree that conclusions were based on only 5-10% of the material, or on virtual data. The ultimate argument against homeopathy is the 'funnel plot' published by Aijing Shang's research group in 2005. However, the funnel plot is flawed when applied to a mixture of diseases, because studies with expected strong treatments effects are, for ethical reasons, powered lower than studies with expected weak or unclear treatment effects. To conclude that homeopathy lacks clinical effect, more than 90% of the available clinical trials had to be disregarded. Alternatively, flawed statistical methods had to be applied. Future meta-analyses should focus on the use of homeopathy in specific diseases or groups of diseases instead of pooling data from all clinical trials. © 2013 S. Karger GmbH, Freiburg.
DOT National Transportation Integrated Search
2011-03-01
"NHTSA selected the vehicle footprint (the measure of a vehicles wheelbase multiplied by its average track width) as the attribute upon which to base the CAFE standards for model year 2012-2016 passenger cars and light trucks. These standards are ...
Ramón, M; Martínez-Pastor, F
2018-04-23
Computer-aided sperm analysis (CASA) produces a wealth of data that is frequently ignored. The use of multiparametric statistical methods can help explore these datasets, unveiling the subpopulation structure of sperm samples. In this review we analyse the significance of the internal heterogeneity of sperm samples and its relevance. We also provide a brief description of the statistical tools used for extracting sperm subpopulations from the datasets, namely unsupervised clustering (with non-hierarchical, hierarchical and two-step methods) and the most advanced supervised methods, based on machine learning. The former method has allowed exploration of subpopulation patterns in many species, whereas the latter offering further possibilities, especially considering functional studies and the practical use of subpopulation analysis. We also consider novel approaches, such as the use of geometric morphometrics or imaging flow cytometry. Finally, although the data provided by CASA systems provides valuable information on sperm samples by applying clustering analyses, there are several caveats. Protocols for capturing and analysing motility or morphometry should be standardised and adapted to each experiment, and the algorithms should be open in order to allow comparison of results between laboratories. Moreover, we must be aware of new technology that could change the paradigm for studying sperm motility and morphology.
Jackson, Dan; Bowden, Jack
2016-09-07
Confidence intervals for the between study variance are useful in random-effects meta-analyses because they quantify the uncertainty in the corresponding point estimates. Methods for calculating these confidence intervals have been developed that are based on inverting hypothesis tests using generalised heterogeneity statistics. Whilst, under the random effects model, these new methods furnish confidence intervals with the correct coverage, the resulting intervals are usually very wide, making them uninformative. We discuss a simple strategy for obtaining 95 % confidence intervals for the between-study variance with a markedly reduced width, whilst retaining the nominal coverage probability. Specifically, we consider the possibility of using methods based on generalised heterogeneity statistics with unequal tail probabilities, where the tail probability used to compute the upper bound is greater than 2.5 %. This idea is assessed using four real examples and a variety of simulation studies. Supporting analytical results are also obtained. Our results provide evidence that using unequal tail probabilities can result in shorter 95 % confidence intervals for the between-study variance. We also show some further results for a real example that illustrates how shorter confidence intervals for the between-study variance can be useful when performing sensitivity analyses for the average effect, which is usually the parameter of primary interest. We conclude that using unequal tail probabilities when computing 95 % confidence intervals for the between-study variance, when using methods based on generalised heterogeneity statistics, can result in shorter confidence intervals. We suggest that those who find the case for using unequal tail probabilities convincing should use the '1-4 % split', where greater tail probability is allocated to the upper confidence bound. The 'width-optimal' interval that we present deserves further investigation.
Crimes against the elderly in Italy, 2007-2014.
Terranova, Claudio; Bevilacqua, Greta; Zen, Margherita; Montisci, Massimo
2017-08-01
Crimes against the elderly have physical, psychological, and economic consequences. Approaches for mitigating them must be based on comprehensive knowledge of the phenomenon. This study analyses crimes against the elderly in Italy during the period 2007-2014 from an epidemiological viewpoint. Data on violent and non-violent crimes derived from the Italian Institute of Statistics were analysed in relation to trends, gender and age by linear regression, T-test, and calculation of the odds ratio with a 95% confidence interval. Results show that the elderly are at higher risk of being victimized in two types of crime, violent (residential robbery) and non-violent (pick-pocketing and purse-snatching) compared with other age groups during the period considered. A statistically significant increase in residential robbery and pick-pocketing was also observed. The rate of homicide against the elderly was stable during the study period, in contrast with reduced rates in other age groups. These results may be explained by risk factors increasing the profiles of elderly individuals as potential victims, such as frailty, cognitive impairment, and social isolation. Further studies analysing the characteristics of victims are required. Based on the results presented here, appropriate preventive strategies should be planned to reduce crimes against the elderly. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Tang, An; Chen, Joshua; Le, Thuy-Anh; Changchien, Christopher; Hamilton, Gavin; Middleton, Michael S.; Loomba, Rohit; Sirlin, Claude B.
2014-01-01
Purpose To explore the cross-sectional and longitudinal relationships between fractional liver fat content, liver volume, and total liver fat burden. Methods In 43 adults with non-alcoholic steatohepatitis participating in a clinical trial, liver volume was estimated by segmentation of magnitude-based low-flip-angle multiecho GRE images. The liver mean proton density fat fraction (PDFF) was calculated. The total liver fat index (TLFI) was estimated as the product of liver mean PDFF and liver volume. Linear regression analyses were performed. Results Cross-sectional analyses revealed statistically significant relationships between TLFI and liver mean PDFF (R2 = 0.740 baseline/0.791 follow-up, P < 0.001 baseline/P < 0.001 follow-up), and between TLFI and liver volume (R2 = 0.352/0.452, P < 0.001/< 0.001). Longitudinal analyses revealed statistically significant relationships between liver volume change and liver mean PDFF change (R2 = 0.556, P < 0.001), between TLFI change and liver mean PDFF change (R2 = 0.920, P < 0.001), and between TLFI change and liver volume change (R2 = 0.735, P < 0.001). Conclusion Liver segmentation in combination with MRI-based PDFF estimation may be used to monitor liver volume, liver mean PDFF, and TLFI in a clinical trial. PMID:25015398
Su, Junhu; Ji, Weihong; Wei, Yanming; Zhang, Yanping; Gleeson, Dianne M; Lou, Zhongyu; Ren, Jing
2014-08-01
The endangered schizothoracine fish Gymnodiptychus pachycheilus is endemic to the Qinghai-Tibetan Plateau (QTP), but very little genetic information is available for this species. Here, we accessed the current genetic divergence of G. pachycheilus population to evaluate their distributions modulated by contemporary and historical processes. Population structure and demographic history were assessed by analyzing 1811-base pairs of mitochondrial DNA from 61 individuals across a large proportion of its geographic range. Our results revealed low nucleotide diversity, suggesting severe historical bottleneck events. Analyses of molecular variance and the conventional population statistic FST (0.0435, P = 0.0215) confirmed weak genetic structure. The monophyly of G. pachycheilus was statistically well-supported, while two divergent evolutionary clusters were identified by phylogenetic analyses, suggesting a microgeographic population structure. The consistent scenario of recent population expansion of two clusters was identified based on several complementary analyses of demographic history (0.096 Ma and 0.15 Ma). This genetic divergence and evolutionary process are likely to have resulted from a series of drainage arrangements triggered by the historical tectonic events of the region. The results obtained here provide the first insights into the evolutionary history and genetic status of this little-known fish.
Interim analyses in 2 x 2 crossover trials.
Cook, R J
1995-09-01
A method is presented for performing interim analyses in long term 2 x 2 crossover trials with serial patient entry. The analyses are based on a linear statistic that combines data from individuals observed for one treatment period with data from individuals observed for both periods. The coefficients in this linear combination can be chosen quite arbitrarily, but we focus on variance-based weights to maximize power for tests regarding direct treatment effects. The type I error rate of this procedure is controlled by utilizing the joint distribution of the linear statistics over analysis stages. Methods for performing power and sample size calculations are indicated. A two-stage sequential design involving simultaneous patient entry and a single between-period interim analysis is considered in detail. The power and average number of measurements required for this design are compared to those of the usual crossover trial. The results indicate that, while there is minimal loss in power relative to the usual crossover design in the absence of differential carry-over effects, the proposed design can have substantially greater power when differential carry-over effects are present. The two-stage crossover design can also lead to more economical studies in terms of the expected number of measurements required, due to the potential for early stopping. Attention is directed toward normally distributed responses.
NASA Astrophysics Data System (ADS)
Poulos, M. J.; Pierce, J. L.; McNamara, J. P.; Flores, A. N.; Benner, S. G.
2015-12-01
Terrain aspect alters the spatial distribution of insolation across topography, driving eco-pedo-hydro-geomorphic feedbacks that can alter landform evolution and result in valley asymmetries for a suite of land surface characteristics (e.g. slope length and steepness, vegetation, soil properties, and drainage development). Asymmetric valleys serve as natural laboratories for studying how landscapes respond to climate perturbation. In the semi-arid montane granodioritic terrain of the Idaho batholith, Northern Rocky Mountains, USA, prior works indicate that reduced insolation on northern (pole-facing) aspects prolongs snow pack persistence, and is associated with thicker, finer-grained soils, that retain more water, prolong the growing season, support coniferous forest rather than sagebrush steppe ecosystems, stabilize slopes at steeper angles, and produce sparser drainage networks. We hypothesize that the primary drivers of valley asymmetry development are changes in the pedon-scale water-balance that coalesce to alter catchment-scale runoff and drainage development, and ultimately cause the divide between north and south-facing land surfaces to migrate northward. We explore this conceptual framework by coupling land surface analyses with statistical modeling to assess relationships and the relative importance of land surface characteristics. Throughout the Idaho batholith, we systematically mapped and tabulated various statistical measures of landforms, land cover, and hydroclimate within discrete valley segments (n=~10,000). We developed a random forest based statistical model to predict valley slope asymmetry based upon numerous measures (n>300) of landscape asymmetries. Preliminary results suggest that drainages are tightly coupled with hillslopes throughout the region, with drainage-network slope being one of the strongest predictors of land-surface-averaged slope asymmetry. When slope-related statistics are excluded, due to possible autocorrelation, valley slope asymmetry is most strongly predicted by asymmetries of insolation and drainage density, which generally supports a water-balance based conceptual model of valley asymmetry development. Surprisingly, vegetation asymmetries had relatively low predictive importance.
NASA Astrophysics Data System (ADS)
Bierstedt, Svenja E.; Hünicke, Birgit; Zorita, Eduardo; Ludwig, Juliane
2017-07-01
We statistically analyse the relationship between the structure of migrating dunes in the southern Baltic and the driving wind conditions over the past 26 years, with the long-term aim of using migrating dunes as a proxy for past wind conditions at an interannual resolution. The present analysis is based on the dune record derived from geo-radar measurements by Ludwig et al. (2017). The dune system is located at the Baltic Sea coast of Poland and is migrating from west to east along the coast. The dunes present layers with different thicknesses that can be assigned to absolute dates at interannual timescales and put in relation to seasonal wind conditions. To statistically analyse this record and calibrate it as a wind proxy, we used a gridded regional meteorological reanalysis data set (coastDat2) covering recent decades. The identified link between the dune annual layers and wind conditions was additionally supported by the co-variability between dune layers and observed sea level variations in the southern Baltic Sea. We include precipitation and temperature into our analysis, in addition to wind, to learn more about the dependency between these three atmospheric factors and their common influence on the dune system. We set up a statistical linear model based on the correlation between the frequency of days with specific wind conditions in a given season and dune migration velocities derived for that season. To some extent, the dune records can be seen as analogous to tree-ring width records, and hence we use a proxy validation method usually applied in dendrochronology, cross-validation with the leave-one-out method, when the observational record is short. The revealed correlations between the wind record from the reanalysis and the wind record derived from the dune structure is in the range between 0.28 and 0.63, yielding similar statistical validation skill as dendroclimatological records.
Zhang, Harrison G; Ying, Gui-Shuang
2018-02-09
The aim of this study is to evaluate the current practice of statistical analysis of eye data in clinical science papers published in British Journal of Ophthalmology ( BJO ) and to determine whether the practice of statistical analysis has improved in the past two decades. All clinical science papers (n=125) published in BJO in January-June 2017 were reviewed for their statistical analysis approaches for analysing primary ocular measure. We compared our findings to the results from a previous paper that reviewed BJO papers in 1995. Of 112 papers eligible for analysis, half of the studies analysed the data at an individual level because of the nature of observation, 16 (14%) studies analysed data from one eye only, 36 (32%) studies analysed data from both eyes at ocular level, one study (1%) analysed the overall summary of ocular finding per individual and three (3%) studies used the paired comparison. Among studies with data available from both eyes, 50 (89%) of 56 papers in 2017 did not analyse data from both eyes or ignored the intereye correlation, as compared with in 60 (90%) of 67 papers in 1995 (P=0.96). Among studies that analysed data from both eyes at an ocular level, 33 (92%) of 36 studies completely ignored the intereye correlation in 2017, as compared with in 16 (89%) of 18 studies in 1995 (P=0.40). A majority of studies did not analyse the data properly when data from both eyes were available. The practice of statistical analysis did not improve in the past two decades. Collaborative efforts should be made in the vision research community to improve the practice of statistical analysis for ocular data. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses
Park, Danny S.; Brown, Brielin; Eng, Celeste; Huntsman, Scott; Hu, Donglei; Torgerson, Dara G.; Burchard, Esteban G.; Zaitlen, Noah
2015-01-01
Motivation: Approaches to identifying new risk loci, training risk prediction models, imputing untyped variants and fine-mapping causal variants from summary statistics of genome-wide association studies are playing an increasingly important role in the human genetics community. Current summary statistics-based methods rely on global ‘best guess’ reference panels to model the genetic correlation structure of the dataset being studied. This approach, especially in admixed populations, has the potential to produce misleading results, ignores variation in local structure and is not feasible when appropriate reference panels are missing or small. Here, we develop a method, Adapt-Mix, that combines information across all available reference panels to produce estimates of local genetic correlation structure for summary statistics-based methods in arbitrary populations. Results: We applied Adapt-Mix to estimate the genetic correlation structure of both admixed and non-admixed individuals using simulated and real data. We evaluated our method by measuring the performance of two summary statistics-based methods: imputation and joint-testing. When using our method as opposed to the current standard of ‘best guess’ reference panels, we observed a 28% decrease in mean-squared error for imputation and a 73.7% decrease in mean-squared error for joint-testing. Availability and implementation: Our method is publicly available in a software package called ADAPT-Mix available at https://github.com/dpark27/adapt_mix. Contact: noah.zaitlen@ucsf.edu PMID:26072481
Using R-Project for Free Statistical Analysis in Extension Research
ERIC Educational Resources Information Center
Mangiafico, Salvatore S.
2013-01-01
One option for Extension professionals wishing to use free statistical software is to use online calculators, which are useful for common, simple analyses. A second option is to use a free computing environment capable of performing statistical analyses, like R-project. R-project is free, cross-platform, powerful, and respected, but may be…
NASA Astrophysics Data System (ADS)
Azila Che Musa, Nor; Mahmud, Zamalia; Baharun, Norhayati
2017-09-01
One of the important skills that is required from any student who are learning statistics is knowing how to solve statistical problems correctly using appropriate statistical methods. This will enable them to arrive at a conclusion and make a significant contribution and decision for the society. In this study, a group of 22 students majoring in statistics at UiTM Shah Alam were given problems relating to topics on testing of hypothesis which require them to solve the problems using confidence interval, traditional and p-value approach. Hypothesis testing is one of the techniques used in solving real problems and it is listed as one of the difficult concepts for students to grasp. The objectives of this study is to explore students’ perceived and actual ability in solving statistical problems and to determine which item in statistical problem solving that students find difficult to grasp. Students’ perceived and actual ability were measured based on the instruments developed from the respective topics. Rasch measurement tools such as Wright map and item measures for fit statistics were used to accomplish the objectives. Data were collected and analysed using Winsteps 3.90 software which is developed based on the Rasch measurement model. The results showed that students’ perceived themselves as moderately competent in solving the statistical problems using confidence interval and p-value approach even though their actual performance showed otherwise. Item measures for fit statistics also showed that the maximum estimated measures were found on two problems. These measures indicate that none of the students have attempted these problems correctly due to reasons which include their lack of understanding in confidence interval and probability values.
Liang, Li-Jung; Weiss, Robert E; Redelings, Benjamin; Suchard, Marc A
2009-10-01
Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model parameters by default. We build objective and informative priors using hierarchical random effect models that combine additional datasets whose parameters are not of direct interest but are similar to the analysis of interest. We propose principled statistical methods that permit more precise parameter estimates in phylogenetic analyses by creating informative priors for parameters of interest. Using additional sequence datasets from our lab or public databases, we construct a fully Bayesian semiparametric hierarchical model to combine datasets. A dynamic iteratively reweighted Markov chain Monte Carlo algorithm conveniently recycles posterior samples from the individual analyses. We demonstrate the value of our approach by examining the insertion-deletion (indel) process in the enolase gene across the Tree of Life using the phylogenetic software BALI-PHY; we incorporate prior information about indels from 82 curated alignments downloaded from the BAliBASE database.
Four modes of optical parametric operation for squeezed state generation
NASA Astrophysics Data System (ADS)
Andersen, U. L.; Buchler, B. C.; Lam, P. K.; Wu, J. W.; Gao, J. R.; Bachor, H.-A.
2003-11-01
We report a versatile instrument, based on a monolithic optical parametric amplifier, which reliably generates four different types of squeezed light. We obtained vacuum squeezing, low power amplitude squeezing, phase squeezing and bright amplitude squeezing. We show a complete analysis of this light, including a full quantum state tomography. In addition we demonstrate the direct detection of the squeezed state statistics without the aid of a spectrum analyser. This technique makes the nonclassical properties directly visible and allows complete measurement of the statistical moments of the squeezed quadrature.
Berlage, Silvia; Wenzlaff, Paul; Damm, Gabriele; Sens, Brigitte
2010-01-01
The concept of the "ZQ In-house Seminars" provided by external trainers/experts pursues the specific aim to enable all healthcare staff members of hospital departments to analyse statistical data--especially from external quality measurements--and to initiate in-hospital measures of quality improvement based on structured team work. The results of an evaluation in Lower Saxony for the period between 2004 and 2008 demonstrate a sustainable increase in outcome quality of care and a strengthening of team and process orientation in clinical care.
Statistical Emulation of Climate Model Projections Based on Precomputed GCM Runs*
Castruccio, Stefano; McInerney, David J.; Stein, Michael L.; ...
2014-02-24
The authors describe a new approach for emulating the output of a fully coupled climate model under arbitrary forcing scenarios that is based on a small set of precomputed runs from the model. Temperature and precipitation are expressed as simple functions of the past trajectory of atmospheric CO 2 concentrations, and a statistical model is fit using a limited set of training runs. The approach is demonstrated to be a useful and computationally efficient alternative to pattern scaling and captures the nonlinear evolution of spatial patterns of climate anomalies inherent in transient climates. The approach does as well as patternmore » scaling in all circumstances and substantially better in many; it is not computationally demanding; and, once the statistical model is fit, it produces emulated climate output effectively instantaneously. In conclusion, it may therefore find wide application in climate impacts assessments and other policy analyses requiring rapid climate projections.« less
When mechanism matters: Bayesian forecasting using models of ecological diffusion
Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.
2017-01-01
Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.
2011-01-01
Background The International Multi-centre ADHD Genetics (IMAGE) project with 11 participating centres from 7 European countries and Israel has collected a large behavioural and genetic database for present and future research. Behavioural data were collected from 1068 probands with the combined type of attention deficit/hyperactivity disorder (ADHD-CT) and 1446 'unselected' siblings. The aim was to analyse the IMAGE sample with respect to demographic features (gender, age, family status, and recruiting centres) and psychopathological characteristics (diagnostic subtype, symptom frequencies, age at symptom detection, and comorbidities). A particular focus was on the effects of the study design and the diagnostic procedure on the homogeneity of the sample in terms of symptom-based behavioural data, and potential consequences for further analyses based on these data. Methods Diagnosis was based on the Parental Account of Childhood Symptoms (PACS) interview and the DSM-IV items of the Conners' teacher questionnaire. Demographics of the full sample and the homogeneity of a subsample (all probands) were analysed by using robust statistical procedures which were adjusted for unequal sample sizes and skewed distributions. These procedures included multi-way analyses based on trimmed means and winsorised variances as well as bootstrapping. Results Age and proband/sibling ratios differed between participating centres. There was no significant difference in the distribution of gender between centres. There was a significant interaction between age and centre for number of inattentive, but not number of hyperactive symptoms. Higher ADHD symptom frequencies were reported by parents than teachers. The diagnostic symptoms differed from each other in their frequencies. The face-to-face interview was more sensitive than the questionnaire. The differentiation between ADHD-CT probands and unaffected siblings was mainly due to differences in hyperactive/impulsive symptoms. Conclusions Despite a symptom-based standardized inclusion procedure according to DSM-IV criteria with defined symptom thresholds, centres may differ markedly in probands' ADHD symptom frequencies. Both the diagnostic procedure and the multi-centre design influence the behavioural characteristics of a sample and, thus, may bias statistical analyses, particularly in genetic or neurobehavioral studies. PMID:21473745
Computer-Based Model Calibration and Uncertainty Analysis: Terms and Concepts
2015-07-01
uncertainty analyses throughout the lifecycle of planning, designing, and operating of Civil Works flood risk management projects as described in...value 95% of the time. In the frequentist approach to PE, model parameters area regarded as having true values, and their estimate is based on the...in catchment models. 1. Evaluating parameter uncertainty. Water Resources Research 19(5):1151–1172. Lee, P. M. 2012. Bayesian statistics: An
Agent-Based Simulation and Analysis of a Defensive UAV Swarm Against an Enemy UAV Swarm
2011-06-01
de Investigacion, Programas y Desarrollo de la Armada Armada de Chile CHILE 10. CAPT Jeffrey Kline, USN(ret.) Naval Postgraduate School Monterey, California 91 ...this de - fensive swarm system, an agent-based simulation model is developed, and appropriate designs of experiments and statistical analyses are... de - velopment and implementation of counter UAV technology from readily-available commercial products. The organization leverages the “largest
Leontjevas, Ruslan; Gerritsen, Debby L; Koopmans, Raymond T C M; Smalbrugge, Martin; Vernooij-Dassen, Myrra J F J
2012-06-01
A multidisciplinary, evidence-based care program to improve the management of depression in nursing home residents was implemented and tested using a stepped-wedge design in 23 nursing homes (NHs): "Act in case of Depression" (AiD). Before effect analyses, to evaluate AiD process data on sampling quality (recruitment and randomization, reach) and intervention quality (relevance and feasibility, extent to which AiD was performed), which can be used for understanding internal and external validity. In this article, a model is presented that divides process evaluation data into first- and second-order process data. Qualitative and quantitative data based on personal files of residents, interviews of nursing home professionals, and a research database were analyzed according to the following process evaluation components: sampling quality and intervention quality. Nursing home. The pattern of residents' informed consent rates differed for dementia special care units and somatic units during the study. The nursing home staff was satisfied with the AiD program and reported that the program was feasible and relevant. With the exception of the first screening step (nursing staff members using a short observer-based depression scale), AiD components were not performed fully by NH staff as prescribed in the AiD protocol. Although NH staff found the program relevant and feasible and was satisfied with the program content, individual AiD components may have different feasibility. The results on sampling quality implied that statistical analyses of AiD effectiveness should account for the type of unit, whereas the findings on intervention quality implied that, next to the type of unit, analyses should account for the extent to which individual AiD program components were performed. In general, our first-order process data evaluation confirmed internal and external validity of the AiD trial, and this evaluation enabled further statistical fine tuning. The importance of evaluating the first-order process data before executing statistical effect analyses is thus underlined. Copyright © 2012 American Medical Directors Association, Inc. Published by Elsevier Inc. All rights reserved.
What do results from coordinate-based meta-analyses tell us?
Albajes-Eizagirre, Anton; Radua, Joaquim
2018-08-01
Coordinate-based meta-analyses (CBMA) methods, such as Activation Likelihood Estimation (ALE) and Seed-based d Mapping (SDM), have become an invaluable tool for summarizing the findings of voxel-based neuroimaging studies. However, the progressive sophistication of these methods may have concealed two particularities of their statistical tests. Common univariate voxelwise tests (such as the t/z-tests used in SPM and FSL) detect voxels that activate, or voxels that show differences between groups. Conversely, the tests conducted in CBMA test for "spatial convergence" of findings, i.e., they detect regions where studies report "more peaks than in most regions", regions that activate "more than most regions do", or regions that show "larger differences between groups than most regions do". The first particularity is that these tests rely on two spatial assumptions (voxels are independent and have the same probability to have a "false" peak), whose violation may make their results either conservative or liberal, though fortunately current versions of ALE, SDM and some other methods consider these assumptions. The second particularity is that the use of these tests involves an important paradox: the statistical power to detect a given effect is higher if there are no other effects in the brain, whereas lower in presence of multiple effects. Copyright © 2018 Elsevier Inc. All rights reserved.
Approach for Input Uncertainty Propagation and Robust Design in CFD Using Sensitivity Derivatives
NASA Technical Reports Server (NTRS)
Putko, Michele M.; Taylor, Arthur C., III; Newman, Perry A.; Green, Lawrence L.
2002-01-01
An implementation of the approximate statistical moment method for uncertainty propagation and robust optimization for quasi 3-D Euler CFD code is presented. Given uncertainties in statistically independent, random, normally distributed input variables, first- and second-order statistical moment procedures are performed to approximate the uncertainty in the CFD output. Efficient calculation of both first- and second-order sensitivity derivatives is required. In order to assess the validity of the approximations, these moments are compared with statistical moments generated through Monte Carlo simulations. The uncertainties in the CFD input variables are also incorporated into a robust optimization procedure. For this optimization, statistical moments involving first-order sensitivity derivatives appear in the objective function and system constraints. Second-order sensitivity derivatives are used in a gradient-based search to successfully execute a robust optimization. The approximate methods used throughout the analyses are found to be valid when considering robustness about input parameter mean values.
NASA Astrophysics Data System (ADS)
Tsutsumi, Morito; Seya, Hajime
2009-12-01
This study discusses the theoretical foundation of the application of spatial hedonic approaches—the hedonic approach employing spatial econometrics or/and spatial statistics—to benefits evaluation. The study highlights the limitations of the spatial econometrics approach since it uses a spatial weight matrix that is not employed by the spatial statistics approach. Further, the study presents empirical analyses by applying the Spatial Autoregressive Error Model (SAEM), which is based on the spatial econometrics approach, and the Spatial Process Model (SPM), which is based on the spatial statistics approach. SPMs are conducted based on both isotropy and anisotropy and applied to different mesh sizes. The empirical analysis reveals that the estimated benefits are quite different, especially between isotropic and anisotropic SPM and between isotropic SPM and SAEM; the estimated benefits are similar for SAEM and anisotropic SPM. The study demonstrates that the mesh size does not affect the estimated amount of benefits. Finally, the study provides a confidence interval for the estimated benefits and raises an issue with regard to benefit evaluation.
Across-cohort QC analyses of GWAS summary statistics from complex traits.
Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M
2016-01-01
Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics F st statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.
Analysis of the dependence of extreme rainfalls
NASA Astrophysics Data System (ADS)
Padoan, Simone; Ancey, Christophe; Parlange, Marc
2010-05-01
The aim of spatial analysis is to quantitatively describe the behavior of environmental phenomena such as precipitation levels, wind speed or daily temperatures. A number of generic approaches to spatial modeling have been developed[1], but these are not necessarily ideal for handling extremal aspects given their focus on mean process levels. The areal modelling of the extremes of a natural process observed at points in space is important in environmental statistics; for example, understanding extremal spatial rainfall is crucial in flood protection. In light of recent concerns over climate change, the use of robust mathematical and statistical methods for such analyses has grown in importance. Multivariate extreme value models and the class of maxstable processes [2] have a similar asymptotic motivation to the univariate Generalized Extreme Value (GEV) distribution , but providing a general approach to modeling extreme processes incorporating temporal or spatial dependence. Statistical methods for max-stable processes and data analyses of practical problems are discussed by [3] and [4]. This work illustrates methods to the statistical modelling of spatial extremes and gives examples of their use by means of a real extremal data analysis of Switzerland precipitation levels. [1] Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. [2] de Haan, L and Ferreria A. (2006). Extreme Value Theory An Introduction. Springer, USA. [3] Padoan, S. A., Ribatet, M and Sisson, S. A. (2009). Likelihood-Based Inference for Max-Stable Processes. Journal of the American Statistical Association, Theory & Methods. In press. [4] Davison, A. C. and Gholamrezaee, M. (2009), Geostatistics of extremes. Journal of the Royal Statistical Society, Series B. To appear.
Across-cohort QC analyses of GWAS summary statistics from complex traits
Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M
2017-01-01
Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics Fst statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy. PMID:27552965
2013-01-01
Background The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case–control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. Results We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson’s disease (PD) case–control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk. Conclusions We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS. PMID:23394771
2008-01-01
There is an increasing need for students in the biological sciences to build a strong foundation in quantitative approaches to data analyses. Although most science, engineering, and math field majors are required to take at least one statistics course, statistical analysis is poorly integrated into undergraduate biology course work, particularly at the lower-division level. Elements of statistics were incorporated into an introductory biology course, including a review of statistics concepts and opportunity for students to perform statistical analysis in a biological context. Learning gains were measured with an 11-item statistics learning survey instrument developed for the course. Students showed a statistically significant 25% (p < 0.005) increase in statistics knowledge after completing introductory biology. Students improved their scores on the survey after completing introductory biology, even if they had previously completed an introductory statistics course (9%, improvement p < 0.005). Students retested 1 yr after completing introductory biology showed no loss of their statistics knowledge as measured by this instrument, suggesting that the use of statistics in biology course work may aid long-term retention of statistics knowledge. No statistically significant differences in learning were detected between male and female students in the study. PMID:18765754
Statistical analysis of fNIRS data: a comprehensive review.
Tak, Sungho; Ye, Jong Chul
2014-01-15
Functional near-infrared spectroscopy (fNIRS) is a non-invasive method to measure brain activities using the changes of optical absorption in the brain through the intact skull. fNIRS has many advantages over other neuroimaging modalities such as positron emission tomography (PET), functional magnetic resonance imaging (fMRI), or magnetoencephalography (MEG), since it can directly measure blood oxygenation level changes related to neural activation with high temporal resolution. However, fNIRS signals are highly corrupted by measurement noises and physiology-based systemic interference. Careful statistical analyses are therefore required to extract neuronal activity-related signals from fNIRS data. In this paper, we provide an extensive review of historical developments of statistical analyses of fNIRS signal, which include motion artifact correction, short source-detector separation correction, principal component analysis (PCA)/independent component analysis (ICA), false discovery rate (FDR), serially-correlated errors, as well as inference techniques such as the standard t-test, F-test, analysis of variance (ANOVA), and statistical parameter mapping (SPM) framework. In addition, to provide a unified view of various existing inference techniques, we explain a linear mixed effect model with restricted maximum likelihood (ReML) variance estimation, and show that most of the existing inference methods for fNIRS analysis can be derived as special cases. Some of the open issues in statistical analysis are also described. Copyright © 2013 Elsevier Inc. All rights reserved.
Statistical power analysis in wildlife research
Steidl, R.J.; Hayes, J.P.
1997-01-01
Statistical power analysis can be used to increase the efficiency of research efforts and to clarify research results. Power analysis is most valuable in the design or planning phases of research efforts. Such prospective (a priori) power analyses can be used to guide research design and to estimate the number of samples necessary to achieve a high probability of detecting biologically significant effects. Retrospective (a posteriori) power analysis has been advocated as a method to increase information about hypothesis tests that were not rejected. However, estimating power for tests of null hypotheses that were not rejected with the effect size observed in the study is incorrect; these power estimates will always be a??0.50 when bias adjusted and have no relation to true power. Therefore, retrospective power estimates based on the observed effect size for hypothesis tests that were not rejected are misleading; retrospective power estimates are only meaningful when based on effect sizes other than the observed effect size, such as those effect sizes hypothesized to be biologically significant. Retrospective power analysis can be used effectively to estimate the number of samples or effect size that would have been necessary for a completed study to have rejected a specific null hypothesis. Simply presenting confidence intervals can provide additional information about null hypotheses that were not rejected, including information about the size of the true effect and whether or not there is adequate evidence to 'accept' a null hypothesis as true. We suggest that (1) statistical power analyses be routinely incorporated into research planning efforts to increase their efficiency, (2) confidence intervals be used in lieu of retrospective power analyses for null hypotheses that were not rejected to assess the likely size of the true effect, (3) minimum biologically significant effect sizes be used for all power analyses, and (4) if retrospective power estimates are to be reported, then the I?-level, effect sizes, and sample sizes used in calculations must also be reported.
Application of multivariate statistical techniques in microbial ecology.
Paliy, O; Shankar, V
2016-03-01
Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure. © 2016 John Wiley & Sons Ltd.
Buttigieg, Pier Luigi; Ramette, Alban
2014-12-01
The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynamic, web-based resource providing accessible descriptions of numerous multivariate techniques relevant to microbial ecologists. A combination of interactive elements allows users to discover and navigate between methods relevant to their needs and examine how they have been used by others in the field. We have designed GUSTA ME to become a community-led and -curated service, which we hope will provide a common reference and forum to discuss and disseminate analytical techniques relevant to the microbial ecology community. © 2014 The Authors. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies.
Maxwell, M; Howie, J G; Pryde, C J
1998-01-01
BACKGROUND: Prescribing matters (particularly budget setting and research into prescribing variation between doctors) have been handicapped by the absence of credible measures of the volume of drugs prescribed. AIM: To use the defined daily dose (DDD) method to study variation in the volume and cost of drugs prescribed across the seven main British National Formulary (BNF) chapters with a view to comparing different methods of setting prescribing budgets. METHOD: Study of one year of prescribing statistics from all 129 general practices in Lothian, covering 808,059 patients: analyses of prescribing statistics for 1995 to define volume and cost/volume of prescribing for one year for 10 groups of practices defined by the age and deprivation status of their patients, for seven BNF chapters; creation of prescribing budgets for 1996 for each individual practice based on the use of target volume and cost statistics; comparison of 1996 DDD-based budgets with those set using the conventional historical approach; and comparison of DDD-based budgets with budgets set using a capitation-based formula derived from local cost/patient information. RESULTS: The volume of drugs prescribed was affected by the age structure of the practices in BNF Chapters 1 (gastrointestinal), 2 (cardiovascular), and 6 (endocrine), and by deprivation structure for BNF Chapters 3 (respiratory) and 4 (central nervous system). Costs per DDD in the major BNF chapters were largely independent of age, deprivation structure, or fundholding status. Capitation and DDD-based budgets were similar to each other, but both differed substantially from historic budgets. One practice in seven gained or lost more than 100,000 Pounds per annum using DDD or capitation budgets compared with historic budgets. The DDD-based budget, but not the capitation-based budget, can be used to set volume-specific prescribing targets. CONCLUSIONS: DDD-based and capitation-based prescribing budgets can be set using a simple explanatory model and generalizable methods. In this study, both differed substantially from historic budgets. DDD budgets could be created to accommodate new prescribing strategies and raised or lowered to reflect local intentions to alter overall prescribing volume or cost targets. We recommend that future work on setting budgets and researching prescribing variations should be based on DDD statistics. PMID:10024703
A Comparison of Readability in Science-Based Texts: Implications for Elementary Teachers
ERIC Educational Resources Information Center
Gallagher, Tiffany; Fazio, Xavier; Ciampa, Katia
2017-01-01
Science curriculum standards were mapped onto various texts (literacy readers, trade books, online articles). Statistical analyses highlighted the inconsistencies among readability formulae for Grades 2-6 levels of the standards. There was a lack of correlation among the readability measures, and also when comparing different text sources. Online…
The Marketing of Canadian University Rankings: A Misadventure Now 24 Years Old
ERIC Educational Resources Information Center
Cramer, Kenneth M.; Page, Stewart; Burrows, Vanessa; Lamoureux, Chastine; Mackay, Sarah; Pedri, Victoria; Pschibul, Rebecca
2016-01-01
Based on analyses of Maclean's ranking data pertaining to Canadian universities published over the last 24 years, we present a summary of statistical findings of annual ranking exercises, as well as discussion about their current status and the effects upon student welfare. Some illustrative tables are also presented. Using correlational and…
An evaluation of the Regional Mercury Cycling Model (R-MCM, a steady-state fate and transport model used to simulate mercury concentrations in lakes) is presented based on its application to a series of 91 lakes in Vermont and New Hampshire. Visual and statistical analyses are pr...
Children's Health, Access to Services and Quality of Care. Revised Executive Summary.
ERIC Educational Resources Information Center
Dutton, Diana B.
This research investigated factors affecting children's health, based on empirical analyses of data from Washington, D.C. and national data. By most measures, poor children experience disproportionate morbidity and mortality. Yet certain ear and vision problems exhibit a U-shaped relation to family income in both national statistics and the…
A Comparison of Imputation Methods for Bayesian Factor Analysis Models
ERIC Educational Resources Information Center
Merkle, Edgar C.
2011-01-01
Imputation methods are popular for the handling of missing data in psychology. The methods generally consist of predicting missing data based on observed data, yielding a complete data set that is amiable to standard statistical analyses. In the context of Bayesian factor analysis, this article compares imputation under an unrestricted…
Technologies for Teaching and Learning about Box Plots and Statistical Analysis
ERIC Educational Resources Information Center
Forster, Patricia A.
2007-01-01
This paper analyses technology-based instruction on data-analysis with box plots. Examples of instruction taken from the research literature inform a study of two classes of 17 year-old students (upper secondary) in which the mathematical relationships that their teachers targeted are distinguished as being, or not being, relevant to statistical…
Based on preliminary statistical analyses of 29 reanalyzed (quantitative TEM) diverse elongated particle (EP) test samples from the well known and often cited study of Stanton et al. 1981, total surface area (TSA) of biodurable EPs was reported at the 2008 Johnson Conference to b...
Multilevel Motivation and Engagement: Assessing Construct Validity across Students and Schools
ERIC Educational Resources Information Center
Martin, Andrew J.; Malmberg, Lars-Erik; Liem, Gregory Arief D.
2010-01-01
Statistical biases associated with single-level analyses underscore the importance of partitioning variance/covariance matrices into individual and group levels. From a multilevel perspective based on data from 21,579 students in 58 high schools, the present study assesses the multilevel factor structure of motivation and engagement with a…
An earlier paper (Hattis et al., 2003) developed a quantitative likelihood-based statistical analysis of the differences in apparent sensitivity of rodents to mutagenic carcinogens across three life stages (fetal, birth-weaning, and weaning-60 days) relative to exposures in adult...
Perkins, Thomas John; Stokes, Mark Andrew; McGillivray, Jane Anne; Mussap, Alexander Julien; Cox, Ivanna Anne; Maller, Jerome Joseph; Bittar, Richard Garth
2014-11-30
There is evidence emerging from Diffusion Tensor Imaging (DTI) research that autism spectrum disorders (ASD) are associated with greater impairment in the left hemisphere. Although this has been quantified with volumetric region of interest analyses, it has yet to be tested with white matter integrity analysis. In the present study, tract based spatial statistics was used to contrast white matter integrity of 12 participants with high-functioning autism or Aspergers syndrome (HFA/AS) with 12 typically developing individuals. Fractional Anisotropy (FA) was examined, in addition to axial, radial and mean diffusivity (AD, RD and MD). In the left hemisphere, participants with HFA/AS demonstrated significantly reduced FA in predominantly thalamic and fronto-parietal pathways and increased RD. Symmetry analyses confirmed that in the HFA/AS group, WM disturbance was significantly greater in the left compared to right hemisphere. These findings contribute to a growing body of literature suggestive of reduced FA in ASD, and provide preliminary evidence for RD impairments in the left hemisphere. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Practice-based evidence study design for comparative effectiveness research.
Horn, Susan D; Gassaway, Julie
2007-10-01
To describe a new, rigorous, comprehensive practice-based evidence for clinical practice improvement (PBE-CPI) study methodology, and compare its features, advantages, and disadvantages to those of randomized controlled trials and sophisticated statistical methods for comparative effectiveness research. PBE-CPI incorporates natural variation within data from routine clinical practice to determine what works, for whom, when, and at what cost. It uses the knowledge of front-line caregivers, who develop study questions and define variables as part of a transdisciplinary team. Its comprehensive measurement framework provides a basis for analyses of significant bivariate and multivariate associations between treatments and outcomes, controlling for patient differences, such as severity of illness. PBE-CPI studies can uncover better practices more quickly than randomized controlled trials or sophisticated statistical methods, while achieving many of the same advantages. We present examples of actionable findings from PBE-CPI studies in postacute care settings related to comparative effectiveness of medications, nutritional support approaches, incontinence products, physical therapy activities, and other services. Outcomes improved when practices associated with better outcomes in PBE-CPI analyses were adopted in practice.
Meta-analysis of magnitudes, differences and variation in evolutionary parameters.
Morrissey, M B
2016-10-01
Meta-analysis is increasingly used to synthesize major patterns in the large literatures within ecology and evolution. Meta-analytic methods that do not account for the process of observing data, which we may refer to as 'informal meta-analyses', may have undesirable properties. In some cases, informal meta-analyses may produce results that are unbiased, but do not necessarily make the best possible use of available data. In other cases, unbiased statistical noise in individual reports in the literature can potentially be converted into severe systematic biases in informal meta-analyses. I first present a general description of how failure to account for noise in individual inferences should be expected to lead to biases in some kinds of meta-analysis. In particular, informal meta-analyses of quantities that reflect the dispersion of parameters in nature, for example, the mean absolute value of a quantity, are likely to be generally highly misleading. I then re-analyse three previously published informal meta-analyses, where key inferences were of aspects of the dispersion of values in nature, for example, the mean absolute value of selection gradients. Major biological conclusions in each original informal meta-analysis closely match those that could arise as artefacts due to statistical noise. I present alternative mixed-model-based analyses that are specifically tailored to each situation, but where all analyses may be implemented with widely available open-source software. In each example meta-re-analysis, major conclusions change substantially. © 2016 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2016 European Society For Evolutionary Biology.
1992-10-01
N=8) and Results of 44 Statistical Analyses for Impact Test Performed on Forefoot of Unworn Footwear A-2. Summary Statistics (N=8) and Results of...on Forefoot of Worn Footwear Vlll Tables (continued) Table Page B-2. Summary Statistics (N=4) and Results of 76 Statistical Analyses for Impact...used tests to assess heel and forefoot shock absorption, upper and sole durability, and flexibility (Cavanagh, 1978). Later, the number of tests was
Page, Matthew J; McKenzie, Joanne E; Kirkham, Jamie; Dwan, Kerry; Kramer, Sharon; Green, Sally; Forbes, Andrew
2014-10-01
Systematic reviews may be compromised by selective inclusion and reporting of outcomes and analyses. Selective inclusion occurs when there are multiple effect estimates in a trial report that could be included in a particular meta-analysis (e.g. from multiple measurement scales and time points) and the choice of effect estimate to include in the meta-analysis is based on the results (e.g. statistical significance, magnitude or direction of effect). Selective reporting occurs when the reporting of a subset of outcomes and analyses in the systematic review is based on the results (e.g. a protocol-defined outcome is omitted from the published systematic review). To summarise the characteristics and synthesise the results of empirical studies that have investigated the prevalence of selective inclusion or reporting in systematic reviews of randomised controlled trials (RCTs), investigated the factors (e.g. statistical significance or direction of effect) associated with the prevalence and quantified the bias. We searched the Cochrane Methodology Register (to July 2012), Ovid MEDLINE, Ovid EMBASE, Ovid PsycINFO and ISI Web of Science (each up to May 2013), and the US Agency for Healthcare Research and Quality (AHRQ) Effective Healthcare Program's Scientific Resource Center (SRC) Methods Library (to June 2013). We also searched the abstract books of the 2011 and 2012 Cochrane Colloquia and the article alerts for methodological work in research synthesis published from 2009 to 2011 and compiled in Research Synthesis Methods. We included both published and unpublished empirical studies that investigated the prevalence and factors associated with selective inclusion or reporting, or both, in systematic reviews of RCTs of healthcare interventions. We included empirical studies assessing any type of selective inclusion or reporting, such as investigations of how frequently RCT outcome data is selectively included in systematic reviews based on the results, outcomes and analyses are discrepant between protocol and published review or non-significant outcomes are partially reported in the full text or summary within systematic reviews. Two review authors independently selected empirical studies for inclusion, extracted the data and performed a risk of bias assessment. A third review author resolved any disagreements about inclusion or exclusion of empirical studies, data extraction and risk of bias. We contacted authors of included studies for additional unpublished data. Primary outcomes included overall prevalence of selective inclusion or reporting, association between selective inclusion or reporting and the statistical significance of the effect estimate, and association between selective inclusion or reporting and the direction of the effect estimate. We combined prevalence estimates and risk ratios (RRs) using a random-effects meta-analysis model. Seven studies met the inclusion criteria. No studies had investigated selective inclusion of results in systematic reviews, or discrepancies in outcomes and analyses between systematic review registry entries and published systematic reviews. Based on a meta-analysis of four studies (including 485 Cochrane Reviews), 38% (95% confidence interval (CI) 23% to 54%) of systematic reviews added, omitted, upgraded or downgraded at least one outcome between the protocol and published systematic review. The association between statistical significance and discrepant outcome reporting between protocol and published systematic review was uncertain. The meta-analytic estimate suggested an increased risk of adding or upgrading (i.e. changing a secondary outcome to primary) when the outcome was statistically significant, although the 95% CI included no association and a decreased risk as plausible estimates (RR 1.43, 95% CI 0.71 to 2.85; two studies, n = 552 meta-analyses). Also, the meta-analytic estimate suggested an increased risk of downgrading (i.e. changing a primary outcome to secondary) when the outcome was statistically significant, although the 95% CI included no association and a decreased risk as plausible estimates (RR 1.26, 95% CI 0.60 to 2.62; two studies, n = 484 meta-analyses). None of the included studies had investigated whether the association between statistical significance and adding, upgrading or downgrading of outcomes was modified by the type of comparison, direction of effect or type of outcome; or whether there is an association between direction of the effect estimate and discrepant outcome reporting.Several secondary outcomes were reported in the included studies. Two studies found that reasons for discrepant outcome reporting were infrequently reported in published systematic reviews (6% in one study and 22% in the other). One study (including 62 Cochrane Reviews) found that 32% (95% CI 21% to 45%) of systematic reviews did not report all primary outcomes in the abstract. Another study (including 64 Cochrane and 118 non-Cochrane reviews) found that statistically significant primary outcomes were more likely to be completely reported in the systematic review abstract than non-significant primary outcomes (RR 2.66, 95% CI 1.81 to 3.90). None of the studies included systematic reviews published after 2009 when reporting standards for systematic reviews (Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) Statement, and Methodological Expectations of Cochrane Intervention Reviews (MECIR)) were disseminated, so the results might not be generalisable to more recent systematic reviews. Discrepant outcome reporting between the protocol and published systematic review is fairly common, although the association between statistical significance and discrepant outcome reporting is uncertain. Complete reporting of outcomes in systematic review abstracts is associated with statistical significance of the results for those outcomes. Systematic review outcomes and analysis plans should be specified prior to seeing the results of included studies to minimise post-hoc decisions that may be based on the observed results. Modifications that occur once the review has commenced, along with their justification, should be clearly reported. Effect estimates and CIs should be reported for all systematic review outcomes regardless of the results. The lack of research on selective inclusion of results in systematic reviews needs to be addressed and studies that avoid the methodological weaknesses of existing research are also needed.
Total mercury in infant food, occurrence and exposure assessment in Portugal.
Martins, Carla; Vasco, Elsa; Paixão, Eleonora; Alvito, Paula
2013-01-01
Commercial infant food labelled as from organic and conventional origin (n = 87) was analysed for total mercury content using a direct mercury analyser (DMA). Median contents of mercury were 0.50, 0.50 and 0.40 μg kg⁻¹ for processed cereal-based food, infant formulae and baby foods, respectively, with a maximum value of 19.56 μg kg⁻¹ in a baby food containing fish. Processed cereal-based food samples showed statistically significant differences for mercury content between organic and conventional analysed products. Consumption of commercial infant food analysed did not pose a risk to infants when the provisionally tolerable weekly intake (PTWI) for food other than fish and shellfish was considered. By the contrary, a risk to some infants could not be excluded when using the PTWI for fish and shellfish. This is the first study reporting contents of total mercury in commercial infant food from both farming systems and the first on exposure assessment of children to mercury in Portugal.
First Monte Carlo analysis of fragmentation functions from single-inclusive e + e - annihilation
Sato, Nobuo; Ethier, J. J.; Melnitchouk, W.; ...
2016-12-02
Here, we perform the first iterative Monte Carlo (IMC) analysis of fragmentation functions constrained by all available data from single-inclusive $e^+ e^-$ annihilation into pions and kaons. The IMC method eliminates potential bias in traditional analyses based on single fits introduced by fixing parameters not well contrained by the data, and provides a statistically rigorous determination of uncertainties. Our analysis reveals specific features of fragmentation functions using the new IMC methodology and those obtained from previous analyses, especially for light quarks and for strange quark fragmentation to kaons.
Periodicity in marine extinction events
NASA Technical Reports Server (NTRS)
Sepkoski, J. John, Jr.; Raup, David M.
1986-01-01
The periodicity of extinction events is examined in detail. In particular, the temporal distribution of specific, identifiable extinction events is analyzed. The nature and limitations of the data base on the global fossil record is discussed in order to establish limits of resolution in statistical analyses. Peaks in extinction intensity which appear to differ significantly from background levels are considered, and new analyses of the temporal distribution of these peaks are presented. Finally, some possible causes of periodicity and of interdependence among extinction events over the last quarter billion years of earth history are examined.
Predicting clinical trial results based on announcements of interim analyses
2014-01-01
Background Announcements of interim analyses of a clinical trial convey information about the results beyond the trial’s Data Safety Monitoring Board (DSMB). The amount of information conveyed may be minimal, but the fact that none of the trial’s stopping boundaries has been crossed implies that the experimental therapy is neither extremely effective nor hopeless. Predicting success of the ongoing trial is of interest to the trial’s sponsor, the medical community, pharmaceutical companies, and investors. We determine the probability of trial success by quantifying only the publicly available information from interim analyses of an ongoing trial. We illustrate our method in the context of the National Surgical Adjuvant Breast and Bowel (NSABP) trial, C-08. Methods We simulated trials based on the specifics of the NSABP C-08 protocol that were publicly available. We quantified the uncertainty around the treatment effect using prior weights for the various possibilities in light of other colon cancer studies and other studies of the investigational agent, bevacizumab. We considered alternative prior distributions. Results Subsequent to the trial’s third interim analysis, our predictive probabilities were: that the trial would eventually be successful, 48.0%; would stop for futility, 7.4%; and would continue to completion without statistical significance, 44.5%. The actual trial continued to completion without statistical significance. Conclusions Announcements of interim analyses provide information outside the DSMB’s sphere of confidentiality. This information is potentially helpful to clinical trial prognosticators. ‘Information leakage’ from standard interim analyses such as in NSABP C-08 is conventionally viewed as acceptable even though it may be quite revealing. Whether leakage from more aggressive types of adaptations is acceptable should be assessed at the design stage. PMID:24607270
NASA Astrophysics Data System (ADS)
Lørup, Jens Kristian; Refsgaard, Jens Christian; Mazvimavi, Dominic
1998-03-01
The purpose of this study was to identify and assess long-term impacts of land use change on catchment runoff in semi-arid Zimbabwe, based on analyses of long hydrological time series (25-50 years) from six medium-sized (200-1000 km 2) non-experimental rural catchments. A methodology combining common statistical methods with hydrological modelling was adopted in order to distinguish between the effects of climate variability and the effects of land use change. The hydrological model (NAM) was in general able to simulate the observed hydrographs very well during the reference period, thus providing a means to account for the effects of climate variability and hence strengthening the power of the subsequent statistical tests. In the test period the validated model was used to provide the runoff record which would have occurred in the absence of land use change. The analyses indicated a decrease in the annual runoff for most of the six catchments, with the largest changes occurring for catchments located within communal land, where large increases in population and agricultural intensity have taken place. However, the decrease was only statistically significant at the 5% level for one of the catchments.
Evidence-based orthodontics. Current statistical trends in published articles in one journal.
Law, Scott V; Chudasama, Dipak N; Rinchuse, Donald J
2010-09-01
To ascertain the number, type, and overall usage of statistics in American Journal of Orthodontics and Dentofacial (AJODO) articles for 2008. These data were then compared to data from three previous years: 1975, 1985, and 2003. The frequency and distribution of statistics used in the AJODO original articles for 2008 were dichotomized into those using statistics and those not using statistics. Statistical procedures were then broadly divided into descriptive statistics (mean, standard deviation, range, percentage) and inferential statistics (t-test, analysis of variance). Descriptive statistics were used to make comparisons. In 1975, 1985, 2003, and 2008, AJODO published 72, 87, 134, and 141 original articles, respectively. The percentage of original articles using statistics was 43.1% in 1975, 75.9% in 1985, 94.0% in 2003, and 92.9% in 2008; original articles using statistics stayed relatively the same from 2003 to 2008, with only a small 1.1% decrease. The percentage of articles using inferential statistical analyses was 23.7% in 1975, 74.2% in 1985, 92.9% in 2003, and 84.4% in 2008. Comparing AJODO publications in 2003 and 2008, there was an 8.5% increase in the use of descriptive articles (from 7.1% to 15.6%), and there was an 8.5% decrease in articles using inferential statistics (from 92.9% to 84.4%).
NASA Astrophysics Data System (ADS)
El Naqa, I.; Suneja, G.; Lindsay, P. E.; Hope, A. J.; Alaly, J. R.; Vicic, M.; Bradley, J. D.; Apte, A.; Deasy, J. O.
2006-11-01
Radiotherapy treatment outcome models are a complicated function of treatment, clinical and biological factors. Our objective is to provide clinicians and scientists with an accurate, flexible and user-friendly software tool to explore radiotherapy outcomes data and build statistical tumour control or normal tissue complications models. The software tool, called the dose response explorer system (DREES), is based on Matlab, and uses a named-field structure array data type. DREES/Matlab in combination with another open-source tool (CERR) provides an environment for analysing treatment outcomes. DREES provides many radiotherapy outcome modelling features, including (1) fitting of analytical normal tissue complication probability (NTCP) and tumour control probability (TCP) models, (2) combined modelling of multiple dose-volume variables (e.g., mean dose, max dose, etc) and clinical factors (age, gender, stage, etc) using multi-term regression modelling, (3) manual or automated selection of logistic or actuarial model variables using bootstrap statistical resampling, (4) estimation of uncertainty in model parameters, (5) performance assessment of univariate and multivariate analyses using Spearman's rank correlation and chi-square statistics, boxplots, nomograms, Kaplan-Meier survival plots, and receiver operating characteristics curves, and (6) graphical capabilities to visualize NTCP or TCP prediction versus selected variable models using various plots. DREES provides clinical researchers with a tool customized for radiotherapy outcome modelling. DREES is freely distributed. We expect to continue developing DREES based on user feedback.
Krystkowiak, Izabella; Manguy, Jean; Davey, Norman E
2018-06-05
There is a pressing need for in silico tools that can aid in the identification of the complete repertoire of protein binding (SLiMs, MoRFs, miniMotifs) and modification (moiety attachment/removal, isomerization, cleavage) motifs. We have created PSSMSearch, an interactive web-based tool for rapid statistical modeling, visualization, discovery and annotation of protein motif specificity determinants to discover novel motifs in a proteome-wide manner. PSSMSearch analyses proteomes for regions with significant similarity to a motif specificity determinant model built from a set of aligned motif-containing peptides. Multiple scoring methods are available to build a position-specific scoring matrix (PSSM) describing the motif specificity determinant model. This model can then be modified by a user to add prior knowledge of specificity determinants through an interactive PSSM heatmap. PSSMSearch includes a statistical framework to calculate the significance of specificity determinant model matches against a proteome of interest. PSSMSearch also includes the SLiMSearch framework's annotation, motif functional analysis and filtering tools to highlight relevant discriminatory information. Additional tools to annotate statistically significant shared keywords and GO terms, or experimental evidence of interaction with a motif-recognizing protein have been added. Finally, PSSM-based conservation metrics have been created for taxonomic range analyses. The PSSMSearch web server is available at http://slim.ucd.ie/pssmsearch/.
Slob, Wout
2006-07-01
Probabilistic dietary exposure assessments that are fully based on Monte Carlo sampling from the raw intake data may not be appropriate. This paper shows that the data should first be analysed by using a statistical model that is able to take the various dimensions of food consumption patterns into account. A (parametric) model is discussed that takes into account the interindividual variation in (daily) consumption frequencies, as well as in amounts consumed. Further, the model can be used to include covariates, such as age, sex, or other individual attributes. Some illustrative examples show how this model may be used to estimate the probability of exceeding an (acute or chronic) exposure limit. These results are compared with the results based on directly counting the fraction of observed intakes exceeding the limit value. This comparison shows that the latter method is not adequate, in particular for the acute exposure situation. A two-step approach for probabilistic (acute) exposure assessment is proposed: first analyse the consumption data by a (parametric) statistical model as discussed in this paper, and then use Monte Carlo techniques for combining the variation in concentrations with the variation in consumption (by sampling from the statistical model). This approach results in an estimate of the fraction of the population as a function of the fraction of days at which the exposure limit is exceeded by the individual.
NASA Astrophysics Data System (ADS)
ten Veldhuis, Marie-Claire; Schleiss, Marc
2017-04-01
Urban catchments are typically characterised by a more flashy nature of the hydrological response compared to natural catchments. Predicting flow changes associated with urbanisation is not straightforward, as they are influenced by interactions between impervious cover, basin size, drainage connectivity and stormwater management infrastructure. In this study, we present an alternative approach to statistical analysis of hydrological response variability and basin flashiness, based on the distribution of inter-amount times. We analyse inter-amount time distributions of high-resolution streamflow time series for 17 (semi-)urbanised basins in North Carolina, USA, ranging from 13 to 238 km2 in size. We show that in the inter-amount-time framework, sampling frequency is tuned to the local variability of the flow pattern, resulting in a different representation and weighting of high and low flow periods in the statistical distribution. This leads to important differences in the way the distribution quantiles, mean, coefficient of variation and skewness vary across scales and results in lower mean intermittency and improved scaling. Moreover, we show that inter-amount-time distributions can be used to detect regulation effects on flow patterns, identify critical sampling scales and characterise flashiness of hydrological response. The possibility to use both the classical approach and the inter-amount-time framework to identify minimum observable scales and analyse flow data opens up interesting areas for future research.
Ge, Long; Tian, Jin-hui; Li, Xiu-xia; Song, Fujian; Li, Lun; Zhang, Jun; Li, Ge; Pei, Gai-qin; Qiu, Xia; Yang, Ke-hu
2016-01-01
Because of the methodological complexity of network meta-analyses (NMAs), NMAs may be more vulnerable to methodological risks than conventional pair-wise meta-analysis. Our study aims to investigate epidemiology characteristics, conduction of literature search, methodological quality and reporting of statistical analysis process in the field of cancer based on PRISMA extension statement and modified AMSTAR checklist. We identified and included 102 NMAs in the field of cancer. 61 NMAs were conducted using a Bayesian framework. Of them, more than half of NMAs did not report assessment of convergence (60.66%). Inconsistency was assessed in 27.87% of NMAs. Assessment of heterogeneity in traditional meta-analyses was more common (42.62%) than in NMAs (6.56%). Most of NMAs did not report assessment of similarity (86.89%) and did not used GRADE tool to assess quality of evidence (95.08%). 43 NMAs were adjusted indirect comparisons, the methods used were described in 53.49% NMAs. Only 4.65% NMAs described the details of handling of multi group trials and 6.98% described the methods of similarity assessment. The median total AMSTAR-score was 8.00 (IQR: 6.00–8.25). Methodological quality and reporting of statistical analysis did not substantially differ by selected general characteristics. Overall, the quality of NMAs in the field of cancer was generally acceptable. PMID:27848997
Szyda, Joanna; Liu, Zengting; Zatoń-Dobrowolska, Magdalena; Wierzbicki, Heliodor; Rzasa, Anna
2008-01-01
We analysed data from a selective DNA pooling experiment with 130 individuals of the arctic fox (Alopex lagopus), which originated from 2 different types regarding body size. The association between alleles of 6 selected unlinked molecular markers and body size was tested by using univariate and multinomial logistic regression models, applying odds ratio and test statistics from the power divergence family. Due to the small sample size and the resulting sparseness of the data table, in hypothesis testing we could not rely on the asymptotic distributions of the tests. Instead, we tried to account for data sparseness by (i) modifying confidence intervals of odds ratio; (ii) using a normal approximation of the asymptotic distribution of the power divergence tests with different approaches for calculating moments of the statistics; and (iii) assessing P values empirically, based on bootstrap samples. As a result, a significant association was observed for 3 markers. Furthermore, we used simulations to assess the validity of the normal approximation of the asymptotic distribution of the test statistics under the conditions of small and sparse samples.
40 CFR 91.512 - Request for public hearing.
Code of Federal Regulations, 2010 CFR
2010-07-01
... plans and statistical analyses have been properly applied (specifically, whether sampling procedures and statistical analyses specified in this subpart were followed and whether there exists a basis for... will be made available to the public during Agency business hours. ...
Jin, Zhichao; Yu, Danghui; Zhang, Luoman; Meng, Hong; Lu, Jian; Gao, Qingbin; Cao, Yang; Ma, Xiuqiang; Wu, Cheng; He, Qian; Wang, Rui; He, Jia
2010-05-25
High quality clinical research not only requires advanced professional knowledge, but also needs sound study design and correct statistical analyses. The number of clinical research articles published in Chinese medical journals has increased immensely in the past decade, but study design quality and statistical analyses have remained suboptimal. The aim of this investigation was to gather evidence on the quality of study design and statistical analyses in clinical researches conducted in China for the first decade of the new millennium. Ten (10) leading Chinese medical journals were selected and all original articles published in 1998 (N = 1,335) and 2008 (N = 1,578) were thoroughly categorized and reviewed. A well-defined and validated checklist on study design, statistical analyses, results presentation, and interpretation was used for review and evaluation. Main outcomes were the frequencies of different types of study design, error/defect proportion in design and statistical analyses, and implementation of CONSORT in randomized clinical trials. From 1998 to 2008: The error/defect proportion in statistical analyses decreased significantly ( = 12.03, p<0.001), 59.8% (545/1,335) in 1998 compared to 52.2% (664/1,578) in 2008. The overall error/defect proportion of study design also decreased ( = 21.22, p<0.001), 50.9% (680/1,335) compared to 42.40% (669/1,578). In 2008, design with randomized clinical trials remained low in single digit (3.8%, 60/1,578) with two-third showed poor results reporting (defects in 44 papers, 73.3%). Nearly half of the published studies were retrospective in nature, 49.3% (658/1,335) in 1998 compared to 48.2% (761/1,578) in 2008. Decreases in defect proportions were observed in both results presentation ( = 93.26, p<0.001), 92.7% (945/1,019) compared to 78.2% (1023/1,309) and interpretation ( = 27.26, p<0.001), 9.7% (99/1,019) compared to 4.3% (56/1,309), some serious ones persisted. Chinese medical research seems to have made significant progress regarding statistical analyses, but there remains ample room for improvement regarding study designs. Retrospective clinical studies are the most often used design, whereas randomized clinical trials are rare and often show methodological weaknesses. Urgent implementation of the CONSORT statement is imperative.
NASA Astrophysics Data System (ADS)
Mills, Jada Jamerson
There is a need for STEM (science, technology, engineering, and mathematics) education to be taught effectively in elementary schools. In order to achieve this, teacher preparation programs should graduate confident, content strong teachers to convey knowledge to elementary students. This study used interdisciplinary collaboration between the School of Education and the College of Liberal Arts through a Learning-by-Teaching method (LdL): Lernen durch Lernen in German. Pre-service teacher (PST) achievement levels of understanding science concepts based on pretest and posttest data, quality of lesson plans developed, and enjoyment of the class based on the collaboration with science students. The PSTs enrolled in two treatment sections of EDEL 404: Science in the Elementary Classroom collaborated with science students enrolled in BISC 327: Introductory Neuroscience to enhance their science skills and create case-based lesson plans on neurothology topics: echolocation, electrosensory reception, steroid hormones, and vocal learning. The PSTs enrolled in the single control section of EDEL 404 collaborated with fellow elementary education majors to develop lesson plans also based on the same selected topics. Qualitative interviews of education faculty, science faculty, and PSTs provided depth to the quantitative findings. Upon lesson plan completion, in-service teachers also graded the two best and two worst plans for the treatment and control sections and a science reviewer graded the plans for scientific accuracy. Statistical analyses were conducted for hypotheses, and one significant hypothesis found that PSTs who collaborated with science students had more positive science lesson plan writing attitudes than those who did not. Despite overall insignificant statistical analyses, all PSTs responded as more confident after collaboration. Additionally, interviews provided meaning and understanding to the insignificant statistical results as well as scientific accuracy of the lesson plans.
ERIC Educational Resources Information Center
Cafri, Guy; Kromrey, Jeffrey D.; Brannick, Michael T.
2010-01-01
This article uses meta-analyses published in "Psychological Bulletin" from 1995 to 2005 to describe meta-analyses in psychology, including examination of statistical power, Type I errors resulting from multiple comparisons, and model choice. Retrospective power estimates indicated that univariate categorical and continuous moderators, individual…
Akiki, Teddy J; Averill, Christopher L; Wrocklage, Kristen M; Scott, J Cobb; Averill, Lynnette A; Schweinsburg, Brian; Alexander-Bloch, Aaron; Martini, Brenda; Southwick, Steven M; Krystal, John H; Abdallah, Chadi G
2018-08-01
Disruption in the default mode network (DMN) has been implicated in numerous neuropsychiatric disorders, including posttraumatic stress disorder (PTSD). However, studies have largely been limited to seed-based methods and involved inconsistent definitions of the DMN. Recent advances in neuroimaging and graph theory now permit the systematic exploration of intrinsic brain networks. In this study, we used resting-state functional magnetic resonance imaging (fMRI), diffusion MRI, and graph theoretical analyses to systematically examine the DMN connectivity and its relationship with PTSD symptom severity in a cohort of 65 combat-exposed US Veterans. We employed metrics that index overall connectivity strength, network integration (global efficiency), and network segregation (clustering coefficient). Then, we conducted a modularity and network-based statistical analysis to identify DMN regions of particular importance in PTSD. Finally, structural connectivity analyses were used to probe whether white matter abnormalities are associated with the identified functional DMN changes. We found decreased DMN functional connectivity strength to be associated with increased PTSD symptom severity. Further topological characterization suggests decreased functional integration and increased segregation in subjects with severe PTSD. Modularity analyses suggest a spared connectivity in the posterior DMN community (posterior cingulate, precuneus, angular gyrus) despite overall DMN weakened connections with increasing PTSD severity. Edge-wise network-based statistical analyses revealed a prefrontal dysconnectivity. Analysis of the diffusion networks revealed no alterations in overall strength or prefrontal structural connectivity. DMN abnormalities in patients with severe PTSD symptoms are characterized by decreased overall interconnections. On a finer scale, we found a pattern of prefrontal dysconnectivity, but increased cohesiveness in the posterior DMN community and relative sparing of connectivity in this region. The DMN measures established in this study may serve as a biomarker of disease severity and could have potential utility in developing circuit-based therapeutics. Published by Elsevier Inc.
A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic
Qi, Jin-Peng; Qi, Jie; Zhang, Qing
2016-01-01
Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals. PMID:27413364
A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic.
Qi, Jin-Peng; Qi, Jie; Zhang, Qing
2016-01-01
Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals.
de Haan, Bianca; Karnath, Hans-Otto
2017-12-01
Nowadays, different anatomical atlases exist for the anatomical interpretation of the results from neuroimaging and lesion analysis studies that investigate the contribution of white matter fiber tract integrity to cognitive (dys)function. A major problem with the use of different atlases in different studies, however, is that the anatomical interpretation of neuroimaging and lesion analysis results might vary as a function of the atlas used. This issue might be particularly prominent in studies that investigate the contribution of white matter fiber tract integrity to cognitive (dys)function. We used a single large-sample dataset of right brain damaged stroke patients with and without cognitive deficit (here: spatial neglect) to systematically compare the influence of three different, widely-used white matter fiber tract atlases (1 histology-based atlas and 2 DTI tractography-based atlases) on conclusions concerning the involvement of white matter fiber tracts in the pathogenesis of cognitive dysfunction. We both calculated the overlap between the statistical lesion analysis results and each long association fiber tract (topological analyses) and performed logistic regressions on the extent of fiber tract damage in each individual for each long association white matter fiber tract (hodological analyses). For the topological analyses, our results suggest that studies that use tractography-based atlases are more likely to conclude that white matter integrity is critical for a cognitive (dys)function than studies that use a histology-based atlas. The DTI tractography-based atlases classified approximately 10 times as many voxels of the statistical map as being located in a long association white matter fiber tract than the histology-based atlas. For hodological analyses on the other hand, we observed that the conclusions concerning the overall importance of long association fiber tract integrity to cognitive function do not necessarily depend on the white matter atlas used, but conclusions may vary as a function of atlas used at the level of individual fiber tracts. Moreover, these analyses revealed that hodological studies that express the individual extent of injury to each fiber tract as a binomial variable are more likely to conclude that white matter integrity is critical for a cognitive function than studies that express the individual extent of injury to each fiber tract as a continuous variable. Copyright © 2017 Elsevier Inc. All rights reserved.
Algorithm for Identifying Erroneous Rain-Gauge Readings
NASA Technical Reports Server (NTRS)
Rickman, Doug
2005-01-01
An algorithm analyzes rain-gauge data to identify statistical outliers that could be deemed to be erroneous readings. Heretofore, analyses of this type have been performed in burdensome manual procedures that have involved subjective judgements. Sometimes, the analyses have included computational assistance for detecting values falling outside of arbitrary limits. The analyses have been performed without statistically valid knowledge of the spatial and temporal variations of precipitation within rain events. In contrast, the present algorithm makes it possible to automate such an analysis, makes the analysis objective, takes account of the spatial distribution of rain gauges in conjunction with the statistical nature of spatial variations in rainfall readings, and minimizes the use of arbitrary criteria. The algorithm implements an iterative process that involves nonparametric statistics.
Citation of previous meta-analyses on the same topic: a clue to perpetuation of incorrect methods?
Li, Tianjing; Dickersin, Kay
2013-06-01
Systematic reviews and meta-analyses serve as a basis for decision-making and clinical practice guidelines and should be carried out using appropriate methodology to avoid incorrect inferences. We describe the characteristics, statistical methods used for meta-analyses, and citation patterns of all 21 glaucoma systematic reviews we identified pertaining to the effectiveness of prostaglandin analog eye drops in treating primary open-angle glaucoma, published between December 2000 and February 2012. We abstracted data, assessed whether appropriate statistical methods were applied in meta-analyses, and examined citation patterns of included reviews. We identified two forms of problematic statistical analyses in 9 of the 21 systematic reviews examined. Except in 1 case, none of the 9 reviews that used incorrect statistical methods cited a previously published review that used appropriate methods. Reviews that used incorrect methods were cited 2.6 times more often than reviews that used appropriate statistical methods. We speculate that by emulating the statistical methodology of previous systematic reviews, systematic review authors may have perpetuated incorrect approaches to meta-analysis. The use of incorrect statistical methods, perhaps through emulating methods described in previous research, calls conclusions of systematic reviews into question and may lead to inappropriate patient care. We urge systematic review authors and journal editors to seek the advice of experienced statisticians before undertaking or accepting for publication a systematic review and meta-analysis. The author(s) have no proprietary or commercial interest in any materials discussed in this article. Copyright © 2013 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression.
Chen, Yanguang
2016-01-01
In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson's statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran's index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China's regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test.
Fluid Intake and Decreased Risk for Hospitalization for Dengue Fever, Nicaragua
Pérez, Leonel; Phares, Christina R.; Pérez, Maria de los Angeles; Idiaquez, Wendy; Rocha, Julio; Cuadra, Ricardo; Hernandez, Emelina; Campos, Luisa Amanda; Gonzalez, Alcides; Amador, Juan Jose; Balmaseda, Angel
2003-01-01
In a hospital and health center-based study in Nicaragua, fluid intake during the 24 hours before being seen by a clinician was statistically associated with decreased risk for hospitalization of dengue fever patients. Similar results were obtained for children <15 years of age and older adolescents and adults in independent analyses. PMID:12967502
ERIC Educational Resources Information Center
Young, Tabitha L.; Gutierrez, Daniel; Hagedorn, W. Bryce
2013-01-01
This study investigated the relationships between motivational interviewing (MI) and client symptoms, attendance, and satisfaction. Seventy-nine clients attending a university-based counseling center were purposefully assigned to treatment or control conditions. Statistical analyses revealed client symptoms in both groups improved. However,…
ERIC Educational Resources Information Center
Polly, Drew; Wang, Chuang; Martin, Christie; Lambert, Richard; Pugalee, David; Middleton, Catherina
2018-01-01
This study examined the influence of a professional development project about an internet-based mathematics formative assessment tool and related pedagogies on primary teachers' instruction and student achievement. Teachers participated in 72 h of professional development during the year. Descriptive statistics and multivariate analyses of…
The Development of Rhythm at the Age of 6-11 Years: Non-Pitch Rhythmic Improvisation
ERIC Educational Resources Information Center
Paananen, Pirkko
2006-01-01
In the statistical and transcriptional analyses reported in this exploratory study, original rhythms of 6-11-year-old children (N=36) were examined. The hypotheses were based on a new model of musical development, and tested empirically using non-pitch rhythmic improvisation in a MIDI-environment. Several representational types were found in…
Three Studies on the Leadership Behaviors of Academic Deans in Higher Education
ERIC Educational Resources Information Center
Brower, Rebecca
2013-01-01
This three article mixed methods dissertation is titled "Three Studies on the Leadership Behaviors of Academic Deans in Higher Education." Each article is based on a sample of 51 academic deans from a three state region in the Southeastern United States. In the first study, the results of the statistical analyses reinforce the gender…
Denis Valle; Benjamin Baiser; Christopher W. Woodall; Robin Chazdon; Jerome Chave
2014-01-01
We propose a novel multivariate method to analyse biodiversity data based on the Latent Dirichlet Allocation (LDA) model. LDA, a probabilistic model, reduces assemblages to sets of distinct component communities. It produces easily interpretable results, can represent abrupt and gradual changes in composition, accommodates missing data and allows for coherent estimates...
ERIC Educational Resources Information Center
Shukla, Dinesh K.; Keehn, Brandon; Muller, Ralph-Axel
2011-01-01
Background: Previous diffusion tensor imaging (DTI) studies have shown white matter compromise in children and adults with autism spectrum disorder (ASD), which may relate to reduced connectivity and impaired function of distributed networks. However, tract-specific evidence remains limited in ASD. We applied tract-based spatial statistics (TBSS)…
Robert E. Keane; Matthew G. Rollins; Cecilia H. McNicoll; Russell A. Parsons
2002-01-01
Presented is a prototype of the Landscape Ecosystem Inventory System (LEIS), a system for creating maps of important landscape characteristics for natural resource planning. This system uses gradient-based field inventories coupled with gradient modeling remote sensing, ecosystem simulation, and statistical analyses to derive spatial data layers required for ecosystem...
Squared Euclidean distance: a statistical test to evaluate plant community change
Raymond D. Ratliff; Sylvia R. Mori
1993-01-01
The concepts and a procedure for evaluating plant community change using the squared Euclidean distance (SED) resemblance function are described. Analyses are based on the concept that Euclidean distances constitute a sample from a population of distances between sampling units (SUs) for a specific number of times and SUs. With different times, the distances will be...
USDA-ARS?s Scientific Manuscript database
Salmonella enterica is the leading etiologic agent of bacterial foodborne outbreaks worldwide. Methods. Laboratory-based statistical surveillance, molecular and genomics analyses were applied to characterize Salmonella outbreaks pattern in Israel. 65,087 Salmonella isolates reported to the National ...
Testing Structural Models of DSM-IV Symptoms of Common Forms of Child and Adolescent Psychopathology
ERIC Educational Resources Information Center
Lahey, Benjamin B.; Rathouz, Paul J.; Van Hulle, Carol; Urbano, Richard C.; Krueger, Robert F.; Applegate, Brooks; Garriock, Holly A.; Chapman, Derek A.; Waldman, Irwin D.
2008-01-01
Confirmatory factor analyses were conducted of "Diagnostic and Statistical Manual of Mental Disorders", Fourth Edition (DSM-IV) symptoms of common mental disorders derived from structured interviews of a representative sample of 4,049 twin children and adolescents and their adult caretakers. A dimensional model based on the assignment of symptoms…
USDA-ARS?s Scientific Manuscript database
Previous phylogenetic analyses of species of Streptomyces based on 16S rRNA gene sequences resulted in a statistically well-supported clade (100% bootstrap value) containing 8 species that exhibited very similar gross morphology in producing open looped (Retinaculum-Apertum) to spiral (Spira) chains...
Statistical analyses of commercial vehicle accident factors. Volume 1 Part 1
DOT National Transportation Integrated Search
1978-02-01
Procedures for conducting statistical analyses of commercial vehicle accidents have been established and initially applied. A file of some 3,000 California Highway Patrol accident reports from two areas of California during a period of about one year...
40 CFR 90.712 - Request for public hearing.
Code of Federal Regulations, 2010 CFR
2010-07-01
... sampling plans and statistical analyses have been properly applied (specifically, whether sampling procedures and statistical analyses specified in this subpart were followed and whether there exists a basis... Clerk and will be made available to the public during Agency business hours. ...
NASA Astrophysics Data System (ADS)
Kahveci, Ajda; Kahveci, Murat; Mansour, Nasser; Alarfaj, Maher Mohammed
2017-06-01
Teachers play a key role in moving reform-based science education practices into the classroom. Based on research that emphasizes the importance of teachers' affective states, this study aimed to explore the constructs pedagogical discontentment, science teaching self-efficacy, intentions to reform, and their correlations. Also, it aimed to provide empirical evidence in light of a previously proposed theoretical model while focusing on an entirely new context in Middle East. Data were collected in Saudi Arabia with a total of randomly selected 994 science teachers, 656 of whom were females and 338 were males. To collect the data, the Arabic versions of the Science Teachers' Pedagogical Discontentment scale, the Science Teaching Efficacy Beliefs Instrument and the Intentions to Reform Science Teaching scale were developed. For assuring the validity of the instruments in a non-Western context, rigorous cross-cultural validations procedures were followed. Factor analyses were conducted for construct validation and descriptive statistical analyses were performed including frequency distributions and normality checks. Univariate analyses of variance were run to explore statistically significant differences between groups of teachers. Cross-tabulation and correlation analyses were conducted to explore relationships. The findings suggest effect of teacher characteristics such as age and professional development program attendance on the affective states. The results demonstrate that teachers who attended a relatively higher number of programs had lower level of intentions to reform raising issues regarding the conduct and outcomes of professional development. Some of the findings concerning interrelationships among the three constructs challenge and serve to expand the previously proposed theoretical model.
Implementation errors in the GingerALE Software: Description and recommendations.
Eickhoff, Simon B; Laird, Angela R; Fox, P Mickle; Lancaster, Jack L; Fox, Peter T
2017-01-01
Neuroscience imaging is a burgeoning, highly sophisticated field the growth of which has been fostered by grant-funded, freely distributed software libraries that perform voxel-wise analyses in anatomically standardized three-dimensional space on multi-subject, whole-brain, primary datasets. Despite the ongoing advances made using these non-commercial computational tools, the replicability of individual studies is an acknowledged limitation. Coordinate-based meta-analysis offers a practical solution to this limitation and, consequently, plays an important role in filtering and consolidating the enormous corpus of functional and structural neuroimaging results reported in the peer-reviewed literature. In both primary data and meta-analytic neuroimaging analyses, correction for multiple comparisons is a complex but critical step for ensuring statistical rigor. Reports of errors in multiple-comparison corrections in primary-data analyses have recently appeared. Here, we report two such errors in GingerALE, a widely used, US National Institutes of Health (NIH)-funded, freely distributed software package for coordinate-based meta-analysis. These errors have given rise to published reports with more liberal statistical inferences than were specified by the authors. The intent of this technical report is threefold. First, we inform authors who used GingerALE of these errors so that they can take appropriate actions including re-analyses and corrective publications. Second, we seek to exemplify and promote an open approach to error management. Third, we discuss the implications of these and similar errors in a scientific environment dependent on third-party software. Hum Brain Mapp 38:7-11, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
van der Krieke, Lian; Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith Gm; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter
2015-08-07
Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher's tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use.
Jahn, I; Foraita, R
2008-01-01
In Germany gender-sensitive approaches are part of guidelines for good epidemiological practice as well as health reporting. They are increasingly claimed to realize the gender mainstreaming strategy in research funding by the federation and federal states. This paper focuses on methodological aspects of data analysis, as an empirical data example of which serves the health report of Bremen, a population-based cross-sectional study. Health reporting requires analysis and reporting methods that are able to discover sex/gender issues of questions, on the one hand, and consider how results can adequately be communicated, on the other hand. The core question is: Which consequences do a different inclusion of the category sex in different statistical analyses for identification of potential target groups have on the results? As evaluation methods logistic regressions as well as a two-stage procedure were exploratively conducted. This procedure combines graphical models with CHAID decision trees and allows for visualising complex results. Both methods are analysed by stratification as well as adjusted by sex/gender and compared with each other. As a result, only stratified analyses are able to detect differences between the sexes and within the sex/gender groups as long as one cannot resort to previous knowledge. Adjusted analyses can detect sex/gender differences only if interaction terms have been included in the model. Results are discussed from a statistical-epidemiological perspective as well as in the context of health reporting. As a conclusion, the question, if a statistical method is gender-sensitive, can only be answered by having concrete research questions and known conditions. Often, an appropriate statistic procedure can be chosen after conducting a separate analysis for women and men. Future gender studies deserve innovative study designs as well as conceptual distinctiveness with regard to the biological and the sociocultural elements of the category sex/gender.
Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith GM; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter
2015-01-01
Background Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. Objective This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. Methods We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher’s tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). Results An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Conclusions Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use. PMID:26254160
Valdez-Flores, Ciriaco; Sielken, Robert L; Teta, M Jane
2010-04-01
The most recent epidemiological data on individual workers in the NIOSH and updated UCC occupational studies have been used to characterize the potential excess cancer risks of environmental exposure to ethylene oxide (EO). In addition to refined analyses of the separate cohorts, power has been increased by analyzing the combined cohorts. In previous SMR analyses of the separate studies and the present analyses of the updated and pooled studies of over 19,000 workers, none of the SMRs for any combination of the 12 cancer endpoints and six sub-cohorts analyzed were statistically significantly greater than one including the ones of greatest previous interest: leukemia, lymphohematopoietic tissue, lymphoid tumors, NHL, and breast cancer. In our study, no evidence of a positive cumulative exposure-response relationship was found. Fitted Cox proportional hazards models with cumulative EO exposure do not have statistically significant positive slopes. The lack of increasing trends was corroborated by categorical analyses. Cox model estimates of the concentrations corresponding to a 1-in-a-million extra environmental cancer risk are all greater than approximately 1ppb and are more than 1500-fold greater than the 0.4ppt estimate in the 2006 EPA draft IRIS risk assessment. The reasons for this difference are identified and discussed. Copyright 2009 Elsevier Inc. All rights reserved.
Supply Chain Collaboration: Information Sharing in a Tactical Operating Environment
2013-06-01
architecture, there are four tiers: Client (Web Application Clients ), Presentation (Web-Server), Processing (Application-Server), Data (Database...organization in each period. This data will be collected to analyze. i) Analyses and Validation: We will do a statistics test in this data, Pareto ...notes, outstanding deliveries, and inventory. i) Analyses and Validation: We will do a statistics test in this data, Pareto analyses and confirmation
Controlling false-negative errors in microarray differential expression analysis: a PRIM approach.
Cole, Steve W; Galic, Zoran; Zack, Jerome A
2003-09-22
Theoretical considerations suggest that current microarray screening algorithms may fail to detect many true differences in gene expression (Type II analytic errors). We assessed 'false negative' error rates in differential expression analyses by conventional linear statistical models (e.g. t-test), microarray-adapted variants (e.g. SAM, Cyber-T), and a novel strategy based on hold-out cross-validation. The latter approach employs the machine-learning algorithm Patient Rule Induction Method (PRIM) to infer minimum thresholds for reliable change in gene expression from Boolean conjunctions of fold-induction and raw fluorescence measurements. Monte Carlo analyses based on four empirical data sets show that conventional statistical models and their microarray-adapted variants overlook more than 50% of genes showing significant up-regulation. Conjoint PRIM prediction rules recover approximately twice as many differentially expressed transcripts while maintaining strong control over false-positive (Type I) errors. As a result, experimental replication rates increase and total analytic error rates decline. RT-PCR studies confirm that gene inductions detected by PRIM but overlooked by other methods represent true changes in mRNA levels. PRIM-based conjoint inference rules thus represent an improved strategy for high-sensitivity screening of DNA microarrays. Freestanding JAVA application at http://microarray.crump.ucla.edu/focus
Raupp, Ludimila; Fávaro, Thatiana Regina; Cunha, Geraldo Marcelo; Santos, Ricardo Ventura
2017-01-01
The aims of this study were to analyze and describe the presence and infrastructure of basic sanitation in the urban areas of Brazil, contrasting indigenous with non-indigenous households. Methods: A cross-sectional study based on microdata from the 2010 Census was conducted. The analyses were based on descriptive statistics (prevalence) and the construction of multiple logistic regression models (adjusted by socioeconomic and demographic covariates). The odds ratios were estimated for the association between the explanatory variables (covariates) and the outcome variables (water supply, sewage, garbage collection, and adequate sanitation). The statistical significance level established was 5%. Among the analyzed services, sewage proved to be the most precarious. Regarding race or color, indigenous households presented the lowest rate of sanitary infrastructure in Urban Brazil. The adjusted regression showed that, in general, indigenous households were at a disadvantage when compared to other categories of race or color, especially in terms of the presence of garbage collection services. These inequalities were much more pronounced in the South and Southeastern regions. The analyses of this study not only confirm the profile of poor conditions and infrastructure of the basic sanitation of indigenous households in urban areas, but also demonstrate the persistence of inequalities associated with race or color in the country.
Lucas, Rico; Groeneveld, Jürgen; Harms, Hauke; Johst, Karin; Frank, Karin; Kleinsteuber, Sabine
2017-01-01
In times of global change and intensified resource exploitation, advanced knowledge of ecophysiological processes in natural and engineered systems driven by complex microbial communities is crucial for both safeguarding environmental processes and optimising rational control of biotechnological processes. To gain such knowledge, high-throughput molecular techniques are routinely employed to investigate microbial community composition and dynamics within a wide range of natural or engineered environments. However, for molecular dataset analyses no consensus about a generally applicable alpha diversity concept and no appropriate benchmarking of corresponding statistical indices exist yet. To overcome this, we listed criteria for the appropriateness of an index for such analyses and systematically scrutinised commonly employed ecological indices describing diversity, evenness and richness based on artificial and real molecular datasets. We identified appropriate indices warranting interstudy comparability and intuitive interpretability. The unified diversity concept based on 'effective numbers of types' provides the mathematical framework for describing community composition. Additionally, the Bray-Curtis dissimilarity as a beta-diversity index was found to reflect compositional changes. The employed statistical procedure is presented comprising commented R-scripts and example datasets for user-friendly trial application. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
[Statistics for statistics?--Thoughts about psychological tools].
Berger, Uwe; Stöbel-Richter, Yve
2007-12-01
Statistical methods take a prominent place among psychologists' educational programs. Being known as difficult to understand and heavy to learn, students fear of these contents. Those, who do not aspire after a research carrier at the university, will forget the drilled contents fast. Furthermore, because it does not apply for the work with patients and other target groups at a first glance, the methodological education as a whole was often questioned. For many psychological practitioners the statistical education makes only sense by enforcing respect against other professions, namely physicians. For the own business, statistics is rarely taken seriously as a professional tool. The reason seems to be clear: Statistics treats numbers, while psychotherapy treats subjects. So, does statistics ends in itself? With this article, we try to answer the question, if and how statistical methods were represented within the psychotherapeutical and psychological research. Therefore, we analyzed 46 Originals of a complete volume of the journal Psychotherapy, Psychosomatics, Psychological Medicine (PPmP). Within the volume, 28 different analyse methods were applied, from which 89 per cent were directly based upon statistics. To be able to write and critically read Originals as a backbone of research, presumes a high degree of statistical education. To ignore statistics means to ignore research and at least to reveal the own professional work to arbitrariness.
Content-based VLE designs improve learning efficiency in constructivist statistics education.
Wessa, Patrick; De Rycker, Antoon; Holliday, Ian Edward
2011-01-01
We introduced a series of computer-supported workshops in our undergraduate statistics courses, in the hope that it would help students to gain a deeper understanding of statistical concepts. This raised questions about the appropriate design of the Virtual Learning Environment (VLE) in which such an approach had to be implemented. Therefore, we investigated two competing software design models for VLEs. In the first system, all learning features were a function of the classical VLE. The second system was designed from the perspective that learning features should be a function of the course's core content (statistical analyses), which required us to develop a specific-purpose Statistical Learning Environment (SLE) based on Reproducible Computing and newly developed Peer Review (PR) technology. The main research question is whether the second VLE design improved learning efficiency as compared to the standard type of VLE design that is commonly used in education. As a secondary objective we provide empirical evidence about the usefulness of PR as a constructivist learning activity which supports non-rote learning. Finally, this paper illustrates that it is possible to introduce a constructivist learning approach in large student populations, based on adequately designed educational technology, without subsuming educational content to technological convenience. Both VLE systems were tested within a two-year quasi-experiment based on a Reliable Nonequivalent Group Design. This approach allowed us to draw valid conclusions about the treatment effect of the changed VLE design, even though the systems were implemented in successive years. The methodological aspects about the experiment's internal validity are explained extensively. The effect of the design change is shown to have substantially increased the efficiency of constructivist, computer-assisted learning activities for all cohorts of the student population under investigation. The findings demonstrate that a content-based design outperforms the traditional VLE-based design.
Giordano, Bruno L.; Kayser, Christoph; Rousselet, Guillaume A.; Gross, Joachim; Schyns, Philippe G.
2016-01-01
Abstract We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open‐source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541–1573, 2017. © 2016 Wiley Periodicals, Inc. PMID:27860095
Research of Extension of the Life Cycle of Helicopter Rotor Blade in Hungary
2003-02-01
Radiography (DXR), and (iii) Vibration Diagnostics (VD) with Statistical Energy Analysis (SEA) were semi- simultaneously applied [1]. The used three...2.2. Vibration Diagnostics (VD)) Parallel to the NDT measurements the Statistical Energy Analysis (SEA) as a vibration diagnostical tool were...noises were analysed with a dual-channel real time frequency analyser (BK2035). In addition to the Statistical Energy Analysis measurement a small
Hamel, Jean-Francois; Saulnier, Patrick; Pe, Madeline; Zikos, Efstathios; Musoro, Jammbe; Coens, Corneel; Bottomley, Andrew
2017-09-01
Over the last decades, Health-related Quality of Life (HRQoL) end-points have become an important outcome of the randomised controlled trials (RCTs). HRQoL methodology in RCTs has improved following international consensus recommendations. However, no international recommendations exist concerning the statistical analysis of such data. The aim of our study was to identify and characterise the quality of the statistical methods commonly used for analysing HRQoL data in cancer RCTs. Building on our recently published systematic review, we analysed a total of 33 published RCTs studying the HRQoL methods reported in RCTs since 1991. We focussed on the ability of the methods to deal with the three major problems commonly encountered when analysing HRQoL data: their multidimensional and longitudinal structure and the commonly high rate of missing data. All studies reported HRQoL being assessed repeatedly over time for a period ranging from 2 to 36 months. Missing data were common, with compliance rates ranging from 45% to 90%. From the 33 studies considered, 12 different statistical methods were identified. Twenty-nine studies analysed each of the questionnaire sub-dimensions without type I error adjustment. Thirteen studies repeated the HRQoL analysis at each assessment time again without type I error adjustment. Only 8 studies used methods suitable for repeated measurements. Our findings show a lack of consistency in statistical methods for analysing HRQoL data. Problems related to multiple comparisons were rarely considered leading to a high risk of false positive results. It is therefore critical that international recommendations for improving such statistical practices are developed. Copyright © 2017. Published by Elsevier Ltd.
Sunspot activity and influenza pandemics: a statistical assessment of the purported association.
Towers, S
2017-10-01
Since 1978, a series of papers in the literature have claimed to find a significant association between sunspot activity and the timing of influenza pandemics. This paper examines these analyses, and attempts to recreate the three most recent statistical analyses by Ertel (1994), Tapping et al. (2001), and Yeung (2006), which all have purported to find a significant relationship between sunspot numbers and pandemic influenza. As will be discussed, each analysis had errors in the data. In addition, in each analysis arbitrary selections or assumptions were also made, and the authors did not assess the robustness of their analyses to changes in those arbitrary assumptions. Varying the arbitrary assumptions to other, equally valid, assumptions negates the claims of significance. Indeed, an arbitrary selection made in one of the analyses appears to have resulted in almost maximal apparent significance; changing it only slightly yields a null result. This analysis applies statistically rigorous methodology to examine the purported sunspot/pandemic link, using more statistically powerful un-binned analysis methods, rather than relying on arbitrarily binned data. The analyses are repeated using both the Wolf and Group sunspot numbers. In all cases, no statistically significant evidence of any association was found. However, while the focus in this particular analysis was on the purported relationship of influenza pandemics to sunspot activity, the faults found in the past analyses are common pitfalls; inattention to analysis reproducibility and robustness assessment are common problems in the sciences, that are unfortunately not noted often enough in review.
Testing Genetic Pleiotropy with GWAS Summary Statistics for Marginal and Conditional Analyses.
Deng, Yangqing; Pan, Wei
2017-12-01
There is growing interest in testing genetic pleiotropy, which is when a single genetic variant influences multiple traits. Several methods have been proposed; however, these methods have some limitations. First, all the proposed methods are based on the use of individual-level genotype and phenotype data; in contrast, for logistical, and other, reasons, summary statistics of univariate SNP-trait associations are typically only available based on meta- or mega-analyzed large genome-wide association study (GWAS) data. Second, existing tests are based on marginal pleiotropy, which cannot distinguish between direct and indirect associations of a single genetic variant with multiple traits due to correlations among the traits. Hence, it is useful to consider conditional analysis, in which a subset of traits is adjusted for another subset of traits. For example, in spite of substantial lowering of low-density lipoprotein cholesterol (LDL) with statin therapy, some patients still maintain high residual cardiovascular risk, and, for these patients, it might be helpful to reduce their triglyceride (TG) level. For this purpose, in order to identify new therapeutic targets, it would be useful to identify genetic variants with pleiotropic effects on LDL and TG after adjusting the latter for LDL; otherwise, a pleiotropic effect of a genetic variant detected by a marginal model could simply be due to its association with LDL only, given the well-known correlation between the two types of lipids. Here, we develop a new pleiotropy testing procedure based only on GWAS summary statistics that can be applied for both marginal analysis and conditional analysis. Although the main technical development is based on published union-intersection testing methods, care is needed in specifying conditional models to avoid invalid statistical estimation and inference. In addition to the previously used likelihood ratio test, we also propose using generalized estimating equations under the working independence model for robust inference. We provide numerical examples based on both simulated and real data, including two large lipid GWAS summary association datasets based on ∼100,000 and ∼189,000 samples, respectively, to demonstrate the difference between marginal and conditional analyses, as well as the effectiveness of our new approach. Copyright © 2017 by the Genetics Society of America.
Ng'andu, N H
1997-03-30
In the analysis of survival data using the Cox proportional hazard (PH) model, it is important to verify that the explanatory variables analysed satisfy the proportional hazard assumption of the model. This paper presents results of a simulation study that compares five test statistics to check the proportional hazard assumption of Cox's model. The test statistics were evaluated under proportional hazards and the following types of departures from the proportional hazard assumption: increasing relative hazards; decreasing relative hazards; crossing hazards; diverging hazards, and non-monotonic hazards. The test statistics compared include those based on partitioning of failure time and those that do not require partitioning of failure time. The simulation results demonstrate that the time-dependent covariate test, the weighted residuals score test and the linear correlation test have equally good power for detection of non-proportionality in the varieties of non-proportional hazards studied. Using illustrative data from the literature, these test statistics performed similarly.
Approach for Uncertainty Propagation and Robust Design in CFD Using Sensitivity Derivatives
NASA Technical Reports Server (NTRS)
Putko, Michele M.; Newman, Perry A.; Taylor, Arthur C., III; Green, Lawrence L.
2001-01-01
This paper presents an implementation of the approximate statistical moment method for uncertainty propagation and robust optimization for a quasi 1-D Euler CFD (computational fluid dynamics) code. Given uncertainties in statistically independent, random, normally distributed input variables, a first- and second-order statistical moment matching procedure is performed to approximate the uncertainty in the CFD output. Efficient calculation of both first- and second-order sensitivity derivatives is required. In order to assess the validity of the approximations, the moments are compared with statistical moments generated through Monte Carlo simulations. The uncertainties in the CFD input variables are also incorporated into a robust optimization procedure. For this optimization, statistical moments involving first-order sensitivity derivatives appear in the objective function and system constraints. Second-order sensitivity derivatives are used in a gradient-based search to successfully execute a robust optimization. The approximate methods used throughout the analyses are found to be valid when considering robustness about input parameter mean values.
Real-world visual statistics and infants' first-learned object names
Clerkin, Elizabeth M.; Hart, Elizabeth; Rehg, James M.; Yu, Chen
2017-01-01
We offer a new solution to the unsolved problem of how infants break into word learning based on the visual statistics of everyday infant-perspective scenes. Images from head camera video captured by 8 1/2 to 10 1/2 month-old infants at 147 at-home mealtime events were analysed for the objects in view. The images were found to be highly cluttered with many different objects in view. However, the frequency distribution of object categories was extremely right skewed such that a very small set of objects was pervasively present—a fact that may substantially reduce the problem of referential ambiguity. The statistical structure of objects in these infant egocentric scenes differs markedly from that in the training sets used in computational models and in experiments on statistical word-referent learning. Therefore, the results also indicate a need to re-examine current explanations of how infants break into word learning. This article is part of the themed issue ‘New frontiers for statistical learning in the cognitive sciences’. PMID:27872373
Improving surveillance for injuries associated with potential motor vehicle safety defects
Whitfield, R; Whitfield, A
2004-01-01
Objective: To improve surveillance for deaths and injuries associated with potential motor vehicle safety defects. Design: Vehicles in fatal crashes can be studied for indications of potential defects using an "early warning" surveillance statistic previously suggested for screening reports of adverse drug reactions. This statistic is illustrated with time series data for fatal, tire related and fire related crashes. Geographic analyses are used to augment the tire related statistics. Results: A statistical criterion based on the Poisson distribution that tests the likelihood of an expected number of events, given the number of events that actually occurred, is a promising method that can be readily adapted for use in injury surveillance. Conclusions: Use of the demonstrated techniques could have helped to avert a well known injury surveillance failure. This method is adaptable to aid in the direction of engineering and statistical reviews to prevent deaths and injuries associated with potential motor vehicle safety defects using available databases. PMID:15066972
NASA Astrophysics Data System (ADS)
Toth-Tascau, Mirela; Balanean, Flavia; Krepelka, Mircea
2013-10-01
Musculoskeletal impairment of the upper limb can cause difficulties in performing basic daily activities. Three dimensional motion analyses can provide valuable data of arm movement in order to precisely determine arm movement and inter-joint coordination. The purpose of this study was to develop a method to evaluate the degree of impairment based on the influence of shoulder movements in the amplitude of elbow flexion and extension based on the assumption that a lack of motion of the elbow joint will be compensated by an increased shoulder activity. In order to develop and validate a statistical model, one healthy young volunteer has been involved in the study. The activity of choice simulated blowing the nose, starting from a slight flexion of the elbow and raising the hand until the middle finger touches the tip of the nose and return to the start position. Inter-joint coordination between the elbow and shoulder movements showed significant correlation. Statistical regression was used to fit an equation model describing the influence of shoulder movements on the elbow mobility. The study provides a brief description of the kinematic analysis protocol and statistical models that may be useful in describing the relation between inter-joint movements of daily activities.
Affirmative Action Plans, January 1, 1994--December 31, 1994. Revision
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1994-02-16
This document is the Affirmative Action Plan for January 1, 1994 through December 31, 1994 for the Lawrence Berkeley Laboratory, University of California (``LBL`` or ``the Laboratory.``) This is an official document that will be presented upon request to the Office of Federal Contract Compliance Programs, US Department of Labor. The plan is prepared in accordance with the Executive Order 11246 and 41 CFR Section 60-1 et seq. covering equal employment opportunity and will be updated during the year, if appropriate. Analyses included in this volume as required by government regulations are based on statistical comparisons. All statistical comparisons involvemore » the use of geographic areas and various sources of statistics. The geographic areas and sources of statistics used here are in compliance with the government regulations, as interpreted. The use of any geographic area or statistic does not indicate agreement that the geographic area is the most appropriate or that the statistic is the most relevant. The use of such geographic areas and statistics is intended to have no significance outside the context of this Affirmative Action Plan, although, of course, such statistics and geographic areas will be used in good faith with respect to this Affirmative Action Plan.« less
Estimation versus falsification approaches in sport and exercise science.
Wilkinson, Michael; Winter, Edward M
2018-05-22
There has been a recent resurgence in debate about methods for statistical inference in science. The debate addresses statistical concepts and their impact on the value and meaning of analyses' outcomes. In contrast, philosophical underpinnings of approaches and the extent to which analytical tools match philosophical goals of the scientific method have received less attention. This short piece considers application of the scientific method to "what-is-the-influence-of x-on-y" type questions characteristic of sport and exercise science. We consider applications and interpretations of estimation versus falsification based statistical approaches and their value in addressing how much x influences y, and in measurement error and method agreement settings. We compare estimation using magnitude based inference (MBI) with falsification using null hypothesis significance testing (NHST), and highlight the limited value both of falsification and NHST to address problems in sport and exercise science. We recommend adopting an estimation approach, expressing the uncertainty of effects of x on y, and their practical/clinical value against pre-determined effect magnitudes using MBI.
Robust approaches to quantification of margin and uncertainty for sparse data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hund, Lauren; Schroeder, Benjamin B.; Rumsey, Kelin
Characterizing the tails of probability distributions plays a key role in quantification of margins and uncertainties (QMU), where the goal is characterization of low probability, high consequence events based on continuous measures of performance. When data are collected using physical experimentation, probability distributions are typically fit using statistical methods based on the collected data, and these parametric distributional assumptions are often used to extrapolate about the extreme tail behavior of the underlying probability distribution. In this project, we character- ize the risk associated with such tail extrapolation. Specifically, we conducted a scaling study to demonstrate the large magnitude of themore » risk; then, we developed new methods for communicat- ing risk associated with tail extrapolation from unvalidated statistical models; lastly, we proposed a Bayesian data-integration framework to mitigate tail extrapolation risk through integrating ad- ditional information. We conclude that decision-making using QMU is a complex process that cannot be achieved using statistical analyses alone.« less
Sybil--efficient constraint-based modelling in R.
Gelius-Dietrich, Gabriel; Desouki, Abdelmoneim Amer; Fritzemeier, Claus Jonathan; Lercher, Martin J
2013-11-13
Constraint-based analyses of metabolic networks are widely used to simulate the properties of genome-scale metabolic networks. Publicly available implementations tend to be slow, impeding large scale analyses such as the genome-wide computation of pairwise gene knock-outs, or the automated search for model improvements. Furthermore, available implementations cannot easily be extended or adapted by users. Here, we present sybil, an open source software library for constraint-based analyses in R; R is a free, platform-independent environment for statistical computing and graphics that is widely used in bioinformatics. Among other functions, sybil currently provides efficient methods for flux-balance analysis (FBA), MOMA, and ROOM that are about ten times faster than previous implementations when calculating the effect of whole-genome single gene deletions in silico on a complete E. coli metabolic model. Due to the object-oriented architecture of sybil, users can easily build analysis pipelines in R or even implement their own constraint-based algorithms. Based on its highly efficient communication with different mathematical optimisation programs, sybil facilitates the exploration of high-dimensional optimisation problems on small time scales. Sybil and all its dependencies are open source. Sybil and its documentation are available for download from the comprehensive R archive network (CRAN).
Vainik, Uku; Konstabel, Kenn; Lätt, Evelin; Mäestu, Jarek; Purge, Priit; Jürimäe, Jaak
2016-10-01
Subjective energy intake (sEI) is often misreported, providing unreliable estimates of energy consumed. Therefore, relating sEI data to health outcomes is difficult. Recently, Börnhorst et al. compared various methods to correct sEI-based energy intake estimates. They criticised approaches that categorise participants as under-reporters, plausible reporters and over-reporters based on the sEI:total energy expenditure (TEE) ratio, and thereafter use these categories as statistical covariates or exclusion criteria. Instead, they recommended using external predictors of sEI misreporting as statistical covariates. We sought to confirm and extend these findings. Using a sample of 190 adolescent boys (mean age=14), we demonstrated that dual-energy X-ray absorptiometry-measured fat-free mass is strongly associated with objective energy intake data (onsite weighted breakfast), but the association with sEI (previous 3-d dietary interview) is weak. Comparing sEI with TEE revealed that sEI was mostly under-reported (74 %). Interestingly, statistically controlling for dietary reporting groups or restricting samples to plausible reporters created a stronger-than-expected association between fat-free mass and sEI. However, the association was an artifact caused by selection bias - that is, data re-sampling and simulations showed that these methods overestimated the effect size because fat-free mass was related to sEI both directly and indirectly via TEE. A more realistic association between sEI and fat-free mass was obtained when the model included common predictors of misreporting (e.g. BMI, restraint). To conclude, restricting sEI data only to plausible reporters can cause selection bias and inflated associations in later analyses. Therefore, we further support statistically correcting sEI data in nutritional analyses. The script for running simulations is provided.
Detecting Genomic Clustering of Risk Variants from Sequence Data: Cases vs. Controls
Schaid, Daniel J.; Sinnwell, Jason P.; McDonnell, Shannon K.; Thibodeau, Stephen N.
2013-01-01
As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method – Tango’s statistic – to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled chi-square distribution, making computation of p-values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test (SKAT). Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios. PMID:23842950
Ahmad, Iftikhar; Ahmad, Manzoor; Khan, Karim; Ikram, Masroor
2016-06-01
Optical polarimetry was employed for assessment of ex vivo healthy and basal cell carcinoma (BCC) tissue samples from human skin. Polarimetric analyses revealed that depolarization and retardance for healthy tissue group were significantly higher (p<0.001) compared to BCC tissue group. Histopathology indicated that these differences partially arise from BCC-related characteristic changes in tissue morphology. Wilks lambda statistics demonstrated the potential of all investigated polarimetric properties for computer assisted classification of the two tissue groups. Based on differences in polarimetric properties, partial least square (PLS) regression classified the samples with 100% accuracy, sensitivity and specificity. These findings indicate that optical polarimetry together with PLS statistics hold promise for automated pathology classification. Copyright © 2016 Elsevier B.V. All rights reserved.
A statistical anomaly indicates symbiotic origins of eukaryotic membranes
Bansal, Suneyna; Mittal, Aditya
2015-01-01
Compositional analyses of nucleic acids and proteins have shed light on possible origins of living cells. In this work, rigorous compositional analyses of ∼5000 plasma membrane lipid constituents of 273 species in the three life domains (archaea, eubacteria, and eukaryotes) revealed a remarkable statistical paradox, indicating symbiotic origins of eukaryotic cells involving eubacteria. For lipids common to plasma membranes of the three domains, the number of carbon atoms in eubacteria was found to be similar to that in eukaryotes. However, mutually exclusive subsets of same data show exactly the opposite—the number of carbon atoms in lipids of eukaryotes was higher than in eubacteria. This statistical paradox, called Simpson's paradox, was absent for lipids in archaea and for lipids not common to plasma membranes of the three domains. This indicates the presence of interaction(s) and/or association(s) in lipids forming plasma membranes of eubacteria and eukaryotes but not for those in archaea. Further inspection of membrane lipid structures affecting physicochemical properties of plasma membranes provides the first evidence (to our knowledge) on the symbiotic origins of eukaryotic cells based on the “third front” (i.e., lipids) in addition to the growing compositional data from nucleic acids and proteins. PMID:25631820
[Comorbidity of different forms of anxiety disorders and depression].
Małyszczak, Krzysztof; Szechiński, Marcin
2004-01-01
Comorbidity of some anxiety disorders and depression were examined in order to compare their statistical closeness. Patients treated in an out-patient care center for psychiatric disorders and/or family medicine were recruited. Persons that have anxiety and depressive symptoms as a consequence of somatic illnesses or consequence of other psychiatric disorders were excluded. Disorders were diagnosed a with diagnostic questionnaire based on Schedule for Assessment in Neuropsychiatry (SCAN), version 2.0, according to ICD-10 criteria. Analyses include selected disorders: generalized anxiety disorder, panic disorder, agoraphobia, specific phobias, social phobia and depression. 104 patients were included. 35 of them (33.7%) had anxiety disorders, 13 persons (12.5%) have depression. Analyses show that in patients with generalized anxiety disorder, depression occurred at least twice as often as in the remaining patients (odds ratio = 7.1), while in patients with agoraphobia the occurrence of panic disorder increased at least by 2.88 times (odds ratio = 11.9). In other disorders the odds ratios was greater than 1, but the differences were not statistically significant. Depression/generalized anxiety disorder and agoraphobia/panic disorder were shown to be statistically closer than other disorders.
Considerations in the statistical analysis of clinical trials in periodontitis.
Imrey, P B
1986-05-01
Adult periodontitis has been described as a chronic infectious process exhibiting sporadic, acute exacerbations which cause quantal, localized losses of dental attachment. Many analytic problems of periodontal trials are similar to those of other chronic diseases. However, the episodic, localized, infrequent, and relatively unpredictable behavior of exacerbations, coupled with measurement error difficulties, cause some specific problems. Considerable controversy exists as to the proper selection and treatment of multiple site data from the same patient for group comparisons for epidemiologic or therapeutic evaluative purposes. This paper comments, with varying degrees of emphasis, on several issues pertinent to the analysis of periodontal trials. Considerable attention is given to the ways in which measurement variability may distort analytic results. Statistical treatments of multiple site data for descriptive summaries are distinguished from treatments for formal statistical inference to validate therapeutic effects. Evidence suggesting that sites behave independently is contested. For inferential analyses directed at therapeutic or preventive effects, analytic models based on site independence are deemed unsatisfactory. Methods of summarization that may yield more powerful analyses than all-site mean scores, while retaining appropriate treatment of inter-site associations, are suggested. Brief comments and opinions on an assortment of other issues in clinical trial analysis are preferred.
NASA Technical Reports Server (NTRS)
Myers, R. H.
1976-01-01
The depletion of ozone in the stratosphere is examined, and causes for the depletion are cited. Ground station and satellite measurements of ozone, which are taken on a worldwide basis, are discussed. Instruments used in ozone measurement are discussed, such as the Dobson spectrophotometer, which is credited with providing the longest and most extensive series of observations for ground based observation of stratospheric ozone. Other ground based instruments used to measure ozone are also discussed. The statistical differences of ground based measurements of ozone from these different instruments are compared to each other, and to satellite measurements. Mathematical methods (i.e., trend analysis or linear regression analysis) of analyzing the variability of ozone concentration with respect to time and lattitude are described. Various time series models which can be employed in accounting for ozone concentration variability are examined.
Jacob, Benjamin J; Krapp, Fiorella; Ponce, Mario; Gottuzzo, Eduardo; Griffith, Daniel A; Novak, Robert J
2010-05-01
Spatial autocorrelation is problematic for classical hierarchical cluster detection tests commonly used in multi-drug resistant tuberculosis (MDR-TB) analyses as considerable random error can occur. Therefore, when MDRTB clusters are spatially autocorrelated the assumption that the clusters are independently random is invalid. In this research, a product moment correlation coefficient (i.e., the Moran's coefficient) was used to quantify local spatial variation in multiple clinical and environmental predictor variables sampled in San Juan de Lurigancho, Lima, Peru. Initially, QuickBird 0.61 m data, encompassing visible bands and the near infra-red bands, were selected to synthesize images of land cover attributes of the study site. Data of residential addresses of individual patients with smear-positive MDR-TB were geocoded, prevalence rates calculated and then digitally overlaid onto the satellite data within a 2 km buffer of 31 georeferenced health centers, using a 10 m2 grid-based algorithm. Geographical information system (GIS)-gridded measurements of each health center were generated based on preliminary base maps of the georeferenced data aggregated to block groups and census tracts within each buffered area. A three-dimensional model of the study site was constructed based on a digital elevation model (DEM) to determine terrain covariates associated with the sampled MDR-TB covariates. Pearson's correlation was used to evaluate the linear relationship between the DEM and the sampled MDR-TB data. A SAS/GIS(R) module was then used to calculate univariate statistics and to perform linear and non-linear regression analyses using the sampled predictor variables. The estimates generated from a global autocorrelation analyses were then spatially decomposed into empirical orthogonal bases using a negative binomial regression with a non-homogeneous mean. Results of the DEM analyses indicated a statistically non-significant, linear relationship between georeferenced health centers and the sampled covariate elevation. The data exhibited positive spatial autocorrelation and the decomposition of Moran's coefficient into uncorrelated, orthogonal map pattern components revealed global spatial heterogeneities necessary to capture latent autocorrelation in the MDR-TB model. It was thus shown that Poisson regression analyses and spatial eigenvector mapping can elucidate the mechanics of MDR-TB transmission by prioritizing clinical and environmental-sampled predictor variables for identifying high risk populations.
Webster, R J; Williams, A; Marchetti, F; Yauk, C L
2018-07-01
Mutations in germ cells pose potential genetic risks to offspring. However, de novo mutations are rare events that are spread across the genome and are difficult to detect. Thus, studies in this area have generally been under-powered, and no human germ cell mutagen has been identified. Whole Genome Sequencing (WGS) of human pedigrees has been proposed as an approach to overcome these technical and statistical challenges. WGS enables analysis of a much wider breadth of the genome than traditional approaches. Here, we performed power analyses to determine the feasibility of using WGS in human families to identify germ cell mutagens. Different statistical models were compared in the power analyses (ANOVA and multiple regression for one-child families, and mixed effect model sampling between two to four siblings per family). Assumptions were made based on parameters from the existing literature, such as the mutation-by-paternal age effect. We explored two scenarios: a constant effect due to an exposure that occurred in the past, and an accumulating effect where the exposure is continuing. Our analysis revealed the importance of modeling inter-family variability of the mutation-by-paternal age effect. Statistical power was improved by models accounting for the family-to-family variability. Our power analyses suggest that sufficient statistical power can be attained with 4-28 four-sibling families per treatment group, when the increase in mutations ranges from 40 to 10% respectively. Modeling family variability using mixed effect models provided a reduction in sample size compared to a multiple regression approach. Much larger sample sizes were required to detect an interaction effect between environmental exposures and paternal age. These findings inform study design and statistical modeling approaches to improve power and reduce sequencing costs for future studies in this area. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
Jin, Zhichao; Yu, Danghui; Zhang, Luoman; Meng, Hong; Lu, Jian; Gao, Qingbin; Cao, Yang; Ma, Xiuqiang; Wu, Cheng; He, Qian; Wang, Rui; He, Jia
2010-01-01
Background High quality clinical research not only requires advanced professional knowledge, but also needs sound study design and correct statistical analyses. The number of clinical research articles published in Chinese medical journals has increased immensely in the past decade, but study design quality and statistical analyses have remained suboptimal. The aim of this investigation was to gather evidence on the quality of study design and statistical analyses in clinical researches conducted in China for the first decade of the new millennium. Methodology/Principal Findings Ten (10) leading Chinese medical journals were selected and all original articles published in 1998 (N = 1,335) and 2008 (N = 1,578) were thoroughly categorized and reviewed. A well-defined and validated checklist on study design, statistical analyses, results presentation, and interpretation was used for review and evaluation. Main outcomes were the frequencies of different types of study design, error/defect proportion in design and statistical analyses, and implementation of CONSORT in randomized clinical trials. From 1998 to 2008: The error/defect proportion in statistical analyses decreased significantly ( = 12.03, p<0.001), 59.8% (545/1,335) in 1998 compared to 52.2% (664/1,578) in 2008. The overall error/defect proportion of study design also decreased ( = 21.22, p<0.001), 50.9% (680/1,335) compared to 42.40% (669/1,578). In 2008, design with randomized clinical trials remained low in single digit (3.8%, 60/1,578) with two-third showed poor results reporting (defects in 44 papers, 73.3%). Nearly half of the published studies were retrospective in nature, 49.3% (658/1,335) in 1998 compared to 48.2% (761/1,578) in 2008. Decreases in defect proportions were observed in both results presentation ( = 93.26, p<0.001), 92.7% (945/1,019) compared to 78.2% (1023/1,309) and interpretation ( = 27.26, p<0.001), 9.7% (99/1,019) compared to 4.3% (56/1,309), some serious ones persisted. Conclusions/Significance Chinese medical research seems to have made significant progress regarding statistical analyses, but there remains ample room for improvement regarding study designs. Retrospective clinical studies are the most often used design, whereas randomized clinical trials are rare and often show methodological weaknesses. Urgent implementation of the CONSORT statement is imperative. PMID:20520824
ERIC Educational Resources Information Center
Baxter, G. P.; Ahmed, S.; Sikali, E.; Waits, T.; Sloan, M.; Salvucci, S.
2007-01-01
In 2003, a trial National Assessment of Educational Progress (NAEP) mathematics assessment was administered in Spanish to public school students at grades 4 and 8 in Puerto Rico. Based on preliminary analyses of the 2003 data, changes were made in administration and translation procedures for the 2005 NAEP administration in Puerto Rico. This…
ERIC Educational Resources Information Center
Mitchell, James K.; Carter, William E.
2000-01-01
Describes using a computer statistical software package called Minitab to model the sensitivity of several microbes to the disinfectant NaOCl (Clorox') using the Kirby-Bauer technique. Each group of students collects data from one microbe, conducts regression analyses, then chooses the best-fit model based on the highest r-values obtained.…
ERIC Educational Resources Information Center
Black, Stephen; Yasukawa, Keiko
2016-01-01
This paper analyses research that has impacted on Australia's most recent national policy document on adult literacy and numeracy, the National Foundation Skills Strategy (NFSS). The paper draws in part on Lingard's 2013 paper, "The impact of research on education policy in an era of evidence-based policy", in which he outlines the…
USDA-ARS?s Scientific Manuscript database
To investigate the natural variability of leaf metabolism and enzymatic activity in a maize inbred population, statistical and network analyses were employed on metabolite and enzyme profiles. The test of coefficient of variation showed that sugars and amino acids displayed opposite trends in their ...
ERIC Educational Resources Information Center
Leow, Christine; Wen, Xiaoli; Korfmacher, Jon
2015-01-01
This article compares regression modeling and propensity score analysis as different types of statistical techniques used in addressing selection bias when estimating the impact of two-year versus one-year Head Start on children's school readiness. The analyses were based on the national Head Start secondary dataset. After controlling for…
S.C. Hagen; B.H. Braswell; E. Linder; S. Frolking; A.D. Richardson; David Hollinger. D.Y; Hollinger. D.Y
2006-01-01
We present an uncertainty analysis of gross ecosystem carbon exchange (GEE) estimates derived from 7 years of continuous eddy covariance measurements of forest atmosphere CO2 fluxes at Howland Forest, Maine, USA. These data, which have high temporal resolution, can be used to validate process modeling analyses, remote sensing assessments, and field surveys. However,...
ERIC Educational Resources Information Center
Bernard, Robert M.; Borokhovski, Eugene; Schmid, Richard F.; Tamim, Rana M.
2014-01-01
This article contains a second-order meta-analysis and an exploration of bias in the technology integration literature in higher education. Thirteen meta-analyses, dated from 2000 to 2014 were selected to be included based on the questions asked and the presence of adequate statistical information to conduct a quantitative synthesis. The weighted…
Use of Statistical Analyses in the Ophthalmic Literature
Lisboa, Renato; Meira-Freitas, Daniel; Tatham, Andrew J.; Marvasti, Amir H.; Sharpsten, Lucie; Medeiros, Felipe A.
2014-01-01
Purpose To identify the most commonly used statistical analyses in the ophthalmic literature and to determine the likely gain in comprehension of the literature that readers could expect if they were to sequentially add knowledge of more advanced techniques to their statistical repertoire. Design Cross-sectional study Methods All articles published from January 2012 to December 2012 in Ophthalmology, American Journal of Ophthalmology and Archives of Ophthalmology were reviewed. A total of 780 peer-reviewed articles were included. Two reviewers examined each article and assigned categories to each one depending on the type of statistical analyses used. Discrepancies between reviewers were resolved by consensus. Main Outcome Measures Total number and percentage of articles containing each category of statistical analysis were obtained. Additionally we estimated the accumulated number and percentage of articles that a reader would be expected to be able to interpret depending on their statistical repertoire. Results Readers with little or no statistical knowledge would be expected to be able to interpret the statistical methods presented in only 20.8% of articles. In order to understand more than half (51.4%) of the articles published, readers were expected to be familiar with at least 15 different statistical methods. Knowledge of 21 categories of statistical methods was necessary to comprehend 70.9% of articles, while knowledge of more than 29 categories was necessary to comprehend more than 90% of articles. Articles in retina and glaucoma subspecialties showed a tendency for using more complex analysis when compared to cornea. Conclusions Readers of clinical journals in ophthalmology need to have substantial knowledge of statistical methodology to understand the results of published studies in the literature. The frequency of use of complex statistical analyses also indicates that those involved in the editorial peer-review process must have sound statistical knowledge in order to critically appraise articles submitted for publication. The results of this study could provide guidance to direct the statistical learning of clinical ophthalmologists, researchers and educators involved in the design of courses for residents and medical students. PMID:24612977
Characterizing Uncertainty and Variability in PBPK Models ...
Mode-of-action based risk and safety assessments can rely upon tissue dosimetry estimates in animals and humans obtained from physiologically-based pharmacokinetic (PBPK) modeling. However, risk assessment also increasingly requires characterization of uncertainty and variability; such characterization for PBPK model predictions represents a continuing challenge to both modelers and users. Current practices show significant progress in specifying deterministic biological models and the non-deterministic (often statistical) models, estimating their parameters using diverse data sets from multiple sources, and using them to make predictions and characterize uncertainty and variability. The International Workshop on Uncertainty and Variability in PBPK Models, held Oct 31-Nov 2, 2006, sought to identify the state-of-the-science in this area and recommend priorities for research and changes in practice and implementation. For the short term, these include: (1) multidisciplinary teams to integrate deterministic and non-deterministic/statistical models; (2) broader use of sensitivity analyses, including for structural and global (rather than local) parameter changes; and (3) enhanced transparency and reproducibility through more complete documentation of the model structure(s) and parameter values, the results of sensitivity and other analyses, and supporting, discrepant, or excluded data. Longer-term needs include: (1) theoretic and practical methodological impro
Statistics of fully turbulent impinging jets
NASA Astrophysics Data System (ADS)
Wilke, Robert; Sesterhenn, Jörn
2017-08-01
Direct numerical simulations of sub- and supersonic impinging jets with Reynolds numbers of 3300 and 8000 are carried out to analyse their statistical properties. The influence of the parameters Mach number, Reynolds number and ambient temperature on the mean velocity and temperature fields are studied. For the compressible subsonic cold impinging jets into a heated environment, different Reynolds analogies are assesses. It is shown, that the (original) Reynolds analogy as well as the Chilton Colburn analogy are in good agreement with the DNS data outside the impinging area. The generalised Reynolds analogy (GRA) and the Crocco-Busemann relation are not suited for the estimation of the mean temperature field based on the mean velocity field of impinging jets. Furthermore, the prediction of fluctuating temperatures according to the GRA fails. On the contrary, the linear relation between thermodynamic fluctuations of entropy, density and temperature as suggested by Lechner et al. (2001) can be confirmed for the entire wall jet. The turbulent heat flux and Reynolds stress tensor are analysed and brought into coherence with the primary and secondary ring vortices of the wall jet. Budget terms of the Reynolds stress tensor are given as data base for the improvement of turbulence models.
Prognostic value of inflammation-based scores in patients with osteosarcoma
Liu, Bangjian; Huang, Yujing; Sun, Yuanjue; Zhang, Jianjun; Yao, Yang; Shen, Zan; Xiang, Dongxi; He, Aina
2016-01-01
Systemic inflammation responses have been associated with cancer development and progression. C-reactive protein (CRP), Glasgow prognostic score (GPS), neutrophil-lymphocyte ratio (NLR), platelet-lymphocyte ratio (PLR), lymphocyte-monocyte ratio (LMR), and neutrophil-platelet score (NPS) have been shown to be independent risk factors in various types of malignant tumors. This retrospective analysis of 162 osteosarcoma cases was performed to estimate their predictive value of survival in osteosarcoma. All statistical analyses were performed by SPSS statistical software. Receiver operating characteristic (ROC) analysis was generated to set optimal thresholds; area under the curve (AUC) was used to show the discriminatory abilities of inflammation-based scores; Kaplan-Meier analysis was performed to plot the survival curve; cox regression models were employed to determine the independent prognostic factors. The optimal cut-off points of NLR, PLR, and LMR were 2.57, 123.5 and 4.73, respectively. GPS and NLR had a markedly larger AUC than CRP, PLR and LMR. High levels of CRP, GPS, NLR, PLR, and low level of LMR were significantly associated with adverse prognosis (P < 0.05). Multivariate Cox regression analyses revealed that GPS, NLR, and occurrence of metastasis were top risk factors associated with death of osteosarcoma patients. PMID:28008988
Fukuda, Haruhisa; Kuroki, Manabu
2016-03-01
To develop and internally validate a surgical site infection (SSI) prediction model for Japan. Retrospective observational cohort study. We analyzed surveillance data submitted to the Japan Nosocomial Infections Surveillance system for patients who had undergone target surgical procedures from January 1, 2010, through December 31, 2012. Logistic regression analyses were used to develop statistical models for predicting SSIs. An SSI prediction model was constructed for each of the procedure categories by statistically selecting the appropriate risk factors from among the collected surveillance data and determining their optimal categorization. Standard bootstrapping techniques were applied to assess potential overfitting. The C-index was used to compare the predictive performances of the new statistical models with those of models based on conventional risk index variables. The study sample comprised 349,987 cases from 428 participant hospitals throughout Japan, and the overall SSI incidence was 7.0%. The C-indices of the new statistical models were significantly higher than those of the conventional risk index models in 21 (67.7%) of the 31 procedure categories (P<.05). No significant overfitting was detected. Japan-specific SSI prediction models were shown to generally have higher accuracy than conventional risk index models. These new models may have applications in assessing hospital performance and identifying high-risk patients in specific procedure categories.
Testosterone replacement therapy and the heart: friend, foe or bystander?
Canfield, Steven; Wang, Run
2016-01-01
The role of testosterone therapy (TTh) in cardiovascular disease (CVD) outcomes is still controversial, and it seems will remain inconclusive for the moment. An extensive body of literature has investigated the association of endogenous testosterone and use of TTh with CVD events including several meta-analyses. In some instances, a number of studies reported beneficial effects of TTh on CVD events and in other instances the body of literature reported detrimental effects or no effects at all. Yet, no review article has scrutinized this body of literature using the magnitude of associations and statistical significance reported from this relationship. We critically reviewed the previous and emerging body of literature that investigated the association of endogenous testosterone and use of TTh with CVD events (only fatal and nonfatal). These studies were divided into three groups, “beneficial (friendly use)”, “detrimental (foe)” and “no effects at all (bystander)”, based on their magnitude of associations and statistical significance from original research studies and meta-analyses of epidemiological studies and of randomized controlled trials (RCT’s). In this review article, the studies reporting a significant association of high levels of testosterone with a reduced risk of CVD events in original prospective studies and meta-analyses of cross-sectional and prospective studies seems to be more consistent. However, the number of meta-analyses of RCT’s does not provide a clear picture after we divided it into the beneficial, detrimental or no effects all groups using their magnitudes of association and statistical significance. From this review, we suggest that we need a study or number of studies that have the adequate power, epidemiological, and clinical data to provide a definitive conclusion on whether the effect of TTh on the natural history of CVD is real or not. PMID:28078222
Testosterone replacement therapy and the heart: friend, foe or bystander?
Lopez, David S; Canfield, Steven; Wang, Run
2016-12-01
The role of testosterone therapy (TTh) in cardiovascular disease (CVD) outcomes is still controversial, and it seems will remain inconclusive for the moment. An extensive body of literature has investigated the association of endogenous testosterone and use of TTh with CVD events including several meta-analyses. In some instances, a number of studies reported beneficial effects of TTh on CVD events and in other instances the body of literature reported detrimental effects or no effects at all. Yet, no review article has scrutinized this body of literature using the magnitude of associations and statistical significance reported from this relationship. We critically reviewed the previous and emerging body of literature that investigated the association of endogenous testosterone and use of TTh with CVD events (only fatal and nonfatal). These studies were divided into three groups, "beneficial (friendly use)", "detrimental (foe)" and "no effects at all (bystander)", based on their magnitude of associations and statistical significance from original research studies and meta-analyses of epidemiological studies and of randomized controlled trials (RCT's). In this review article, the studies reporting a significant association of high levels of testosterone with a reduced risk of CVD events in original prospective studies and meta-analyses of cross-sectional and prospective studies seems to be more consistent. However, the number of meta-analyses of RCT's does not provide a clear picture after we divided it into the beneficial, detrimental or no effects all groups using their magnitudes of association and statistical significance. From this review, we suggest that we need a study or number of studies that have the adequate power, epidemiological, and clinical data to provide a definitive conclusion on whether the effect of TTh on the natural history of CVD is real or not.
Ganger, Michael T; Dietz, Geoffrey D; Ewing, Sarah J
2017-12-01
qPCR has established itself as the technique of choice for the quantification of gene expression. Procedures for conducting qPCR have received significant attention; however, more rigorous approaches to the statistical analysis of qPCR data are needed. Here we develop a mathematical model, termed the Common Base Method, for analysis of qPCR data based on threshold cycle values (C q ) and efficiencies of reactions (E). The Common Base Method keeps all calculations in the logscale as long as possible by working with log 10 (E) ∙ C q , which we call the efficiency-weighted C q value; subsequent statistical analyses are then applied in the logscale. We show how efficiency-weighted C q values may be analyzed using a simple paired or unpaired experimental design and develop blocking methods to help reduce unexplained variation. The Common Base Method has several advantages. It allows for the incorporation of well-specific efficiencies and multiple reference genes. The method does not necessitate the pairing of samples that must be performed using traditional analysis methods in order to calculate relative expression ratios. Our method is also simple enough to be implemented in any spreadsheet or statistical software without additional scripts or proprietary components.
Confounding in statistical mediation analysis: What it is and how to address it.
Valente, Matthew J; Pelham, William E; Smyth, Heather; MacKinnon, David P
2017-11-01
Psychology researchers are often interested in mechanisms underlying how randomized interventions affect outcomes such as substance use and mental health. Mediation analysis is a common statistical method for investigating psychological mechanisms that has benefited from exciting new methodological improvements over the last 2 decades. One of the most important new developments is methodology for estimating causal mediated effects using the potential outcomes framework for causal inference. Potential outcomes-based methods developed in epidemiology and statistics have important implications for understanding psychological mechanisms. We aim to provide a concise introduction to and illustration of these new methods and emphasize the importance of confounder adjustment. First, we review the traditional regression approach for estimating mediated effects. Second, we describe the potential outcomes framework. Third, we define what a confounder is and how the presence of a confounder can provide misleading evidence regarding mechanisms of interventions. Fourth, we describe experimental designs that can help rule out confounder bias. Fifth, we describe new statistical approaches to adjust for measured confounders of the mediator-outcome relation and sensitivity analyses to probe effects of unmeasured confounders on the mediated effect. All approaches are illustrated with application to a real counseling intervention dataset. Counseling psychologists interested in understanding the causal mechanisms of their interventions can benefit from incorporating the most up-to-date techniques into their mediation analyses. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Global atmospheric circulation statistics, 1000-1 mb
NASA Technical Reports Server (NTRS)
Randel, William J.
1992-01-01
The atlas presents atmospheric general circulation statistics derived from twelve years (1979-90) of daily National Meteorological Center (NMC) operational geopotential height analyses; it is an update of a prior atlas using data over 1979-1986. These global analyses are available on pressure levels covering 1000-1 mb (approximately 0-50 km). The geopotential grids are a combined product of the Climate Analysis Center (which produces analyses over 70-1 mb) and operational NMC analyses (over 1000-100 mb). Balance horizontal winds and hydrostatic temperatures are derived from the geopotential fields.
Gai, Liping; Liu, Hui; Cui, Jing-Hui; Yu, Weijian; Ding, Xiao-Dong
2017-03-20
The purpose of this study was to examine the specific allele combinations of three loci connected with the liver cancers, stomach cancers, hematencephalon and patients with chronic obstructive pulmonary disease (COPD) and to explore the feasibility of the research methods. We explored different mathematical methods for statistical analyses to assess the association between the genotype and phenotype. At the same time we still analyses the statistical results of allele combinations of three loci by difference value method and ratio method. All the DNA blood samples were collected from patients with 50 liver cancers, 75 stomach cancers, 50 hematencephalon, 72 COPD and 200 normal populations. All the samples were from Chinese. Alleles from short tandem repeat (STR) loci were determined using the STR Profiler plus PCR amplification kit (15 STR loci). Previous research was based on combinations of single-locus alleles, and combinations of cross-loci (two loci) alleles. Allele combinations of three loci were obtained by computer counting and stronger genetic signal was obtained. The methods of allele combinations of three loci can help to identify the statistically significant differences of allele combinations between liver cancers, stomach cancers, patients with hematencephalon, COPD and the normal population. The probability of illness followed different rules and had apparent specificity. This method can be extended to other diseases and provide reference for early clinical diagnosis. Copyright © 2016. Published by Elsevier B.V.
Image encryption based on a delayed fractional-order chaotic logistic system
NASA Astrophysics Data System (ADS)
Wang, Zhen; Huang, Xia; Li, Ning; Song, Xiao-Na
2012-05-01
A new image encryption scheme is proposed based on a delayed fractional-order chaotic logistic system. In the process of generating a key stream, the time-varying delay and fractional derivative are embedded in the proposed scheme to improve the security. Such a scheme is described in detail with security analyses including correlation analysis, information entropy analysis, run statistic analysis, mean-variance gray value analysis, and key sensitivity analysis. Experimental results show that the newly proposed image encryption scheme possesses high security.
Development of the Statistical Reasoning in Biology Concept Inventory (SRBCI)
Deane, Thomas; Nomme, Kathy; Jeffery, Erica; Pollock, Carol; Birol, Gülnur
2016-01-01
We followed established best practices in concept inventory design and developed a 12-item inventory to assess student ability in statistical reasoning in biology (Statistical Reasoning in Biology Concept Inventory [SRBCI]). It is important to assess student thinking in this conceptual area, because it is a fundamental requirement of being statistically literate and associated skills are needed in almost all walks of life. Despite this, previous work shows that non–expert-like thinking in statistical reasoning is common, even after instruction. As science educators, our goal should be to move students along a novice-to-expert spectrum, which could be achieved with growing experience in statistical reasoning. We used item response theory analyses (the one-parameter Rasch model and associated analyses) to assess responses gathered from biology students in two populations at a large research university in Canada in order to test SRBCI’s robustness and sensitivity in capturing useful data relating to the students’ conceptual ability in statistical reasoning. Our analyses indicated that SRBCI is a unidimensional construct, with items that vary widely in difficulty and provide useful information about such student ability. SRBCI should be useful as a diagnostic tool in a variety of biology settings and as a means of measuring the success of teaching interventions designed to improve statistical reasoning skills. PMID:26903497
Angstman, Nicholas B; Frank, Hans-Georg; Schmitz, Christoph
2016-01-01
As a widely used and studied model organism, Caenorhabditis elegans worms offer the ability to investigate implications of behavioral change. Although, investigation of C. elegans behavioral traits has been shown, analysis is often narrowed down to measurements based off a single point, and thus cannot pick up on subtle behavioral and morphological changes. In the present study videos were captured of four different C. elegans strains grown in liquid cultures and transferred to NGM-agar plates with an E. coli lawn or with no lawn. Using an advanced software, WormLab, the full skeleton and outline of worms were tracked to determine whether the presence of food affects behavioral traits. In all seven investigated parameters, statistically significant differences were found in worm behavior between those moving on NGM-agar plates with an E. coli lawn and NGM-agar plates with no lawn. Furthermore, multiple test groups showed differences in interaction between variables as the parameters that significantly correlated statistically with speed of locomotion varied. In the present study, we demonstrate the validity of a model to analyze C. elegans behavior beyond simple speed of locomotion. The need to account for a nested design while performing statistical analyses in similar studies is also demonstrated. With extended analyses, C. elegans behavioral change can be investigated with greater sensitivity, which could have wide utility in fields such as, but not limited to, toxicology, drug discovery, and RNAi screening.
NASA Astrophysics Data System (ADS)
El Sharawy, Mohamed S.; Gaafar, Gamal R.
2016-12-01
Both reservoir engineers and petrophysicists have been concerned about dividing a reservoir into zones for engineering and petrophysics purposes. Through decades, several techniques and approaches were introduced. Out of them, statistical reservoir zonation, stratigraphic modified Lorenz (SML) plot and the principal component and clustering analyses techniques were chosen to apply on the Nubian sandstone reservoir of Palaeozoic - Lower Cretaceous age, Gulf of Suez, Egypt, by using five adjacent wells. The studied reservoir consists mainly of sandstone with some intercalation of shale layers with varying thickness from one well to another. The permeability ranged from less than 1 md to more than 1000 md. The statistical reservoir zonation technique, depending on core permeability, indicated that the cored interval of the studied reservoir can be divided into two zones. Using reservoir properties such as porosity, bulk density, acoustic impedance and interval transit time indicated also two zones with an obvious variation in separation depth and zones continuity. The stratigraphic modified Lorenz (SML) plot indicated the presence of more than 9 flow units in the cored interval as well as a high degree of microscopic heterogeneity. On the other hand, principal component and cluster analyses, depending on well logging data (gamma ray, sonic, density and neutron), indicated that the whole reservoir can be divided at least into four electrofacies having a noticeable variation in reservoir quality, as correlated with the measured permeability. Furthermore, continuity or discontinuity of the reservoir zones can be determined using this analysis.
Secondary Analysis of National Longitudinal Transition Study 2 Data
ERIC Educational Resources Information Center
Hicks, Tyler A.; Knollman, Greg A.
2015-01-01
This review examines published secondary analyses of National Longitudinal Transition Study 2 (NLTS2) data, with a primary focus upon statistical objectives, paradigms, inferences, and methods. Its primary purpose was to determine which statistical techniques have been common in secondary analyses of NLTS2 data. The review begins with an…
A Nonparametric Geostatistical Method For Estimating Species Importance
Andrew J. Lister; Rachel Riemann; Michael Hoppus
2001-01-01
Parametric statistical methods are not always appropriate for conducting spatial analyses of forest inventory data. Parametric geostatistical methods such as variography and kriging are essentially averaging procedures, and thus can be affected by extreme values. Furthermore, non normal distributions violate the assumptions of analyses in which test statistics are...
ERIC Educational Resources Information Center
Ellis, Barbara G.; Dick, Steven J.
1996-01-01
Employs the statistics-documentation portion of a word-processing program's grammar-check feature together with qualitative analyses to determine that Henry Watterson, long-time editor of the "Louisville Courier-Journal," was probably the South's famed Civil War correspondent "Shadow." (TB)
Wang, Xiaolong; Li, Lin; Zhao, Jiaxin; Li, Fangliang; Guo, Wei; Chen, Xia
2017-04-01
To evaluate the effects of different preservation methods (stored in a -20°C ice chest, preserved in liquid nitrogen and dried in silica gel) on inter simple sequence repeat (ISSR) or random amplified polymorphic DNA (RAPD) analyses in various botanical specimens (including broad-leaved plants, needle-leaved plants and succulent plants) for different times (three weeks and three years), we used a statistical analysis based on the number of bands, genetic index and cluster analysis. The results demonstrate that methods used to preserve samples can provide sufficient amounts of genomic DNA for ISSR and RAPD analyses; however, the effect of different preservation methods on these analyses vary significantly, and the preservation time has little effect on these analyses. Our results provide a reference for researchers to select the most suitable preservation method depending on their study subject for the analysis of molecular markers based on genomic DNA. Copyright © 2017 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.
Enarson, C; Cariaga-Lo, L
2001-11-01
The results of the United States Medical Licensing Examination Step 1 and 2 examinations are reported for students enrolled in a problem-based and traditional lecture-based curricula over a seven-year period at a single institution. There were no statistically significant differences in mean scores on either examination over the seven year period as a whole. There were statistically significant main effects noted by cohort year and curricular track for both the Step 1 and 2 examinations. These results support the general, long-term effectiveness of problem-based learning with respect to basic and clinical science knowledge acquisition. This paper reports the United States Medical Licensing Examination Step 1 and Step 2 results for students enrolled in a problem-based and traditional lecture-based learning curricula over the seven-year period (1992-98) in order to evaluate the adequacy of each curriculum in supporting students learning of the basic and clinical sciences. Six hundred and eighty-nine students who took the United States Medical Licensing Examination Step 1 and 540 students who took Step 2 for the first time over the seven-year period were included in the analyses. T-test analyses were utilized to compare students' Step 1 and Step 2 performance by curriculum groups. United States Medical Licensing Examination Step 1 scores over the seven-year period were 214 for Traditional Curriculum students and 208 for Parallel Curriculum students (t-value = 1.32, P=0.21). Mean Step 2 scores over the seven-year period were 208 for Traditional Curriculum students and 206 for Parallel Curriculum students (t-value=1.08, P=0.30). Statistically significant main effects were noted by cohort year and curricular track for both the Step 1 and Step 2 examinations. The totality of experience in both groups, although differing by curricular type, may be similar enough that the comparable scores are what should be expected. These results should be reassuring to curricular planners and faculty that problem-based learning can provide students with the knowledge needed for the subsequent phases of their medical education.
1993-08-01
subtitled "Simulation Data," consists of detailed infonrnation on the design parmneter variations tested, subsequent statistical analyses conducted...used with confidence during the design process. The data quality can be examined in various forms such as statistical analyses of measure of merit data...merit, such as time to capture or nmaximurn pitch rate, can be calculated from the simulation time history data. Statistical techniques are then used
Statistical issues on the analysis of change in follow-up studies in dental research.
Blance, Andrew; Tu, Yu-Kang; Baelum, Vibeke; Gilthorpe, Mark S
2007-12-01
To provide an overview to the problems in study design and associated analyses of follow-up studies in dental research, particularly addressing three issues: treatment-baselineinteractions; statistical power; and nonrandomization. Our previous work has shown that many studies purport an interacion between change (from baseline) and baseline values, which is often based on inappropriate statistical analyses. A priori power calculations are essential for randomized controlled trials (RCTs), but in the pre-test/post-test RCT design it is not well known to dental researchers that the choice of statistical method affects power, and that power is affected by treatment-baseline interactions. A common (good) practice in the analysis of RCT data is to adjust for baseline outcome values using ancova, thereby increasing statistical power. However, an important requirement for ancova is there to be no interaction between the groups and baseline outcome (i.e. effective randomization); the patient-selection process should not cause differences in mean baseline values across groups. This assumption is often violated for nonrandomized (observational) studies and the use of ancova is thus problematic, potentially giving biased estimates, invoking Lord's paradox and leading to difficulties in the interpretation of results. Baseline interaction issues can be overcome by use of statistical methods; not widely practiced in dental research: Oldham's method and multilevel modelling; the latter is preferred for its greater flexibility to deal with more than one follow-up occasion as well as additional covariates To illustrate these three key issues, hypothetical examples are considered from the fields of periodontology, orthodontics, and oral implantology. Caution needs to be exercised when considering the design and analysis of follow-up studies. ancova is generally inappropriate for nonrandomized studies and causal inferences from observational data should be avoided.
Evidence-based pathology in its second decade: toward probabilistic cognitive computing.
Marchevsky, Alberto M; Walts, Ann E; Wick, Mark R
2017-03-01
Evidence-based pathology advocates using a combination of best available data ("evidence") from the literature and personal experience for the diagnosis, estimation of prognosis, and assessment of other variables that impact individual patient care. Evidence-based pathology relies on systematic reviews of the literature, evaluation of the quality of evidence as categorized by evidence levels and statistical tools such as meta-analyses, estimates of probabilities and odds, and others. However, it is well known that previously "statistically significant" information usually does not accurately forecast the future for individual patients. There is great interest in "cognitive computing" in which "data mining" is combined with "predictive analytics" designed to forecast future events and estimate the strength of those predictions. This study demonstrates the use of IBM Watson Analytics software to evaluate and predict the prognosis of 101 patients with typical and atypical pulmonary carcinoid tumors in which Ki-67 indices have been determined. The results obtained with this system are compared with those previously reported using "routine" statistical software and the help of a professional statistician. IBM Watson Analytics interactively provides statistical results that are comparable to those obtained with routine statistical tools but much more rapidly, with considerably less effort and with interactive graphics that are intuitively easy to apply. It also enables analysis of natural language variables and yields detailed survival predictions for patient subgroups selected by the user. Potential applications of this tool and basic concepts of cognitive computing are discussed. Copyright © 2016 Elsevier Inc. All rights reserved.
Systems and methods for detection of blowout precursors in combustors
Lieuwen, Tim C.; Nair, Suraj
2006-08-15
The present invention comprises systems and methods for detecting flame blowout precursors in combustors. The blowout precursor detection system comprises a combustor, a pressure measuring device, and blowout precursor detection unit. A combustion controller may also be used to control combustor parameters. The methods of the present invention comprise receiving pressure data measured by an acoustic pressure measuring device, performing one or a combination of spectral analysis, statistical analysis, and wavelet analysis on received pressure data, and determining the existence of a blowout precursor based on such analyses. The spectral analysis, statistical analysis, and wavelet analysis further comprise their respective sub-methods to determine the existence of blowout precursors.
Egorov, A D; Stepantsov, V I; Nosovskiĭ, A M; Shipov, A A
2009-01-01
Cluster analysis was applied to evaluate locomotion training (running and running intermingled with walking) of 13 cosmonauts on long-term ISS missions by the parameters of duration (min), distance (m) and intensity (km/h). Based on the results of analyses, the cosmonauts were distributed into three steady groups of 2, 5 and 6 persons. Distance and speed showed a statistical rise (p < 0.03) from group 1 to group 3. Duration of physical locomotion training was not statistically different in the groups (p = 0.125). Therefore, cluster analysis is an adequate method of evaluating fitness of cosmonauts on long-term missions.
Bluhmki, Tobias; Bramlage, Peter; Volk, Michael; Kaltheuner, Matthias; Danne, Thomas; Rathmann, Wolfgang; Beyersmann, Jan
2017-02-01
Complex longitudinal sampling and the observational structure of patient registers in health services research are associated with methodological challenges regarding data management and statistical evaluation. We exemplify common pitfalls and want to stimulate discussions on the design, development, and deployment of future longitudinal patient registers and register-based studies. For illustrative purposes, we use data from the prospective, observational, German DIabetes Versorgungs-Evaluation register. One aim was to explore predictors for the initiation of a basal insulin supported therapy in patients with type 2 diabetes initially prescribed to glucose-lowering drugs alone. Major challenges are missing mortality information, time-dependent outcomes, delayed study entries, different follow-up times, and competing events. We show that time-to-event methodology is a valuable tool for improved statistical evaluation of register data and should be preferred to simple case-control approaches. Patient registers provide rich data sources for health services research. Analyses are accompanied with the trade-off between data availability, clinical plausibility, and statistical feasibility. Cox' proportional hazards model allows for the evaluation of the outcome-specific hazards, but prediction of outcome probabilities is compromised by missing mortality information. Copyright © 2016 Elsevier Inc. All rights reserved.
Huang, Shi; MacKinnon, David P.; Perrino, Tatiana; Gallo, Carlos; Cruden, Gracelyn; Brown, C Hendricks
2016-01-01
Mediation analysis often requires larger sample sizes than main effect analysis to achieve the same statistical power. Combining results across similar trials may be the only practical option for increasing statistical power for mediation analysis in some situations. In this paper, we propose a method to estimate: 1) marginal means for mediation path a, the relation of the independent variable to the mediator; 2) marginal means for path b, the relation of the mediator to the outcome, across multiple trials; and 3) the between-trial level variance-covariance matrix based on a bivariate normal distribution. We present the statistical theory and an R computer program to combine regression coefficients from multiple trials to estimate a combined mediated effect and confidence interval under a random effects model. Values of coefficients a and b, along with their standard errors from each trial are the input for the method. This marginal likelihood based approach with Monte Carlo confidence intervals provides more accurate inference than the standard meta-analytic approach. We discuss computational issues, apply the method to two real-data examples and make recommendations for the use of the method in different settings. PMID:28239330
Cognitive predictors of balance in Parkinson's disease.
Fernandes, Ângela; Mendes, Andreia; Rocha, Nuno; Tavares, João Manuel R S
2016-06-01
Postural instability is one of the most incapacitating symptoms of Parkinson's disease (PD) and appears to be related to cognitive deficits. This study aims to determine the cognitive factors that can predict deficits in static and dynamic balance in individuals with PD. A sociodemographic questionnaire characterized 52 individuals with PD for this work. The Trail Making Test, Rule Shift Cards Test, and Digit Span Test assessed the executive functions. The static balance was assessed using a plantar pressure platform, and dynamic balance was based on the Timed Up and Go Test. The results were statistically analysed using SPSS Statistics software through linear regression analysis. The results show that a statistically significant model based on cognitive outcomes was able to explain the variance of motor variables. Also, the explanatory value of the model tended to increase with the addition of individual and clinical variables, although the resulting model was not statistically significant The model explained 25-29% of the variability of the Timed Up and Go Test, while for the anteroposterior displacement it was 23-34%, and for the mediolateral displacement it was 24-39%. From the findings, we conclude that the cognitive performance, especially the executive functions, is a predictor of balance deficit in individuals with PD.
Lee, Jeeyeon
2015-01-01
Background An acellular dermal matrix (ADM) is applied to release the surrounding muscles and prevent dislocation or rippling of the implant. We compared implant-based breast reconstruction using the latissimus dorsi (LD) muscle, referred to as an “LD muscle onlay patch,” with using an ADM. Method A total of 56 patients (60 breasts) underwent nipple sparing mastectomy with implant-based breast reconstruction using an ADM or LD muscle onlay patch. Cosmetic outcomes were assessed 4 weeks after chemotherapy or radiotherapy, and statistical analyses were performed. Results Mean surgical time and hospital stay were significantly longer in the LD muscle onlay patch group than the ADM group. However, there were no statistically significant differences between groups in postoperative complications. Cosmetic outcomes for breast symmetry and shape were higher in the LD muscle onlay patch group. Conclusions Implant-based breast reconstruction with an LD muscle onlay patch would be a feasible alternative to using an ADM. PMID:26161312
Coastal and Marine Bird Data Base
Anderson, S.H.; Geissler, P.H.; Dawson, D.K.
1980-01-01
Summary: This report discusses the development of a coastal and marine bird data base at the Migratory Bird and Habitat Research Laboratory. The system is compared with other data bases, and suggestions for future development, such as possible adaptations for other taxonomic groups, are included. The data base is based on the Statistical Analysis System but includes extensions programmed in PL/I. The Appendix shows how the system evolved. Output examples are given for heron data and pelagic bird data which indicate the types of analyses that can be conducted and output figures. The Appendixes include a retrieval language user's guide and description of the retrieval process and listing of translator program.
Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression
Chen, Yanguang
2016-01-01
In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson’s statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran’s index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China’s regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test. PMID:26800271
Alcohol Control Policies and Alcohol Consumption by Youth: A Multi-National Study
Paschall, Mallie J.; Grube, Joel W.; Kypri, Kypros
2009-01-01
Aims The study examined relationships between alcohol control policies and adolescent alcohol use in 26 countries. Design Cross-sectional analyses of alcohol policy ratings based on the Alcohol Policy Index (API), per capita consumption, and national adolescent survey data. Setting Data are from 26 countries. Participants Adolescents (15-17 years old) who participated in the 2003 ESPAD (European countries) or national secondary school surveys in Spain, Canada, Australia, New Zealand and the USA. Measurements Alcohol control policy ratings based on the API; prevalence of alcohol use, heavy drinking, and first drink by age 13 based on national secondary school surveys; per capita alcohol consumption for each country in 2003. Analysis Correlational and linear regression analyses were conducted to examine relationships between alcohol control policy ratings and past-30-day prevalence of adolescent alcohol use, heavy drinking, and having first drink by age 13. Per capita consumption of alcohol was included as a covariate in regression analyses. Findings More comprehensive API ratings and alcohol availability and advertising control ratings were inversely related to the past-30-day prevalence of alcohol use and prevalence rates for drinking 3-5 times and 6 or more times in the past 30 days. Alcohol advertising control was also inversely related to the prevalence of past-30-day heavy drinking and having first drink by age 13. Most of the relationships between API, alcohol availability and advertising control and drinking prevalence rates were attenuated and no longer statistically significant when controlling for per capita consumption in regression analyses, suggesting that alcohol use in the general population may confound or mediate observed relationships between alcohol control policies and youth alcohol consumption. Several of the inverse relationships remained statistically significant when controlling for per capita consumption. Conclusions More comprehensive and stringent alcohol control policies, particularly policies affecting alcohol availability and marketing, are associated with lower prevalence and frequency of adolescent alcohol consumption and age of first alcohol use. PMID:19832785
Lin, Shuai-Chun; Pasquale, Louis R; Singh, Kuldev; Lin, Shan C
2018-03-01
The purpose of this article is to investigate the association between body mass index (BMI) and open-angle glaucoma (OAG) in a sample of the South Korean population. The sample consisted of a cross-sectional, population-based sample of 10,978 participants, 40 years of age and older, enrolled in the 2008 to 2011 Korean National Health and Nutrition Examination Survey. All participants had measured intraocular pressure <22 mm Hg and open anterior chamber angles. OAG was defined using disc and visual field criteria established by the International Society for Geographical and Epidemiological Ophthalmology. Multivariable analyses were performed to determine the association between BMI and OAG. These analyses were also performed in a sex-stratified and age-stratified manner. After adjusting for potential confounding variables, lower BMI (<19 kg/m) was associated with greater risk of OAG compared with normal BMI (19 to 24.9 kg/m) [odds ratio (OR), 2.28; 95% confidence interval (CI), 1.22-4.26]. In sex-stratified analyses, low BMI remained adversely related to glaucoma in women (OR, 3.45; 95% CI, 1.42-8.38) but not in men (OR, 1.72; 95% CI, 0.71-4.20). In age-stratified analyses, lower BMI was adversely related to glaucoma among subjects 40- to 49-year old (OR, 5.16; 95% CI, 1.86-14.36) but differences in glaucoma prevalence were not statistically significant between those with low versus normal BMI in other age strata. Lower BMI was associated with increased odds of OAG in a sample of the South Korean population. Multivariate analysis revealed the association to be statistically significant in women and those in the youngest age stratum.
A statistical study of magnetopause structures: Tangential versus rotational discontinuities
NASA Astrophysics Data System (ADS)
Chou, Y.-C.; Hau, L.-N.
2012-08-01
A statistical study of the structure of Earth's magnetopause is carried out by analyzing two-year AMPTE/IRM plasma and magnetic field data. The analyses are based on the minimum variance analysis (MVA), the deHoffmann-Teller (HT) frame analysis and the Walén relation. A total of 328 magnetopause crossings are identified and error estimates associated with MVA and HT frame analyses are performed for each case. In 142 out of 328 events both MVA and HT frame analyses yield high quality results which are classified as either tangential-discontinuity (TD) or rotational-discontinuity (RD) structures based only on the Walén relation: Events withSWA ≤ 0.4 (SWA ≥ 0.5) are classified as TD (RD), and rest (with 0.4 < SWA < 0.5) is classified as "uncertain," where SWA refers to the Walén slope. With this criterion, 84% of 142 events are TDs, 12% are RDs, and 4% are uncertain events. There are a large portion of TD events which exhibit a finite normal magnetic field component Bnbut have insignificant flow as compared to the Alfvén velocity in the HT frame. Two-dimensional Grad-Shafranov reconstruction of forty selected TD and RD events show that single or multiple X-line accompanied with magnetic islands are common feature of magnetopause current. A survey plot of the HT velocity associated with TD structures projected onto the magnetopause shows that the flow is diverted at the subsolar point and accelerated toward the dawn and dusk flanks.
Full in-vitro analyses of new-generation bulk fill dental composites cured by halogen light.
Tekin, Tuçe Hazal; Kantürk Figen, Aysel; Yılmaz Atalı, Pınar; Coşkuner Filiz, Bilge; Pişkin, Mehmet Burçin
2017-08-01
The objective of this study was to investigate the full in-vitro analyses of new-generation bulk-fill dental composites cured by halogen light (HLG). Two types' four composites were studied: Surefill SDR (SDR) and Xtra Base (XB) as bulk-fill flowable materials; QuixFill (QF) and XtraFill (XF) as packable bulk-fill materials. Samples were prepared for each analysis and test by applying the same procedure, but with different diameters and thicknesses appropriate to the analysis and test requirements. Thermal properties were determined by thermogravimetric analysis (TG/DTG) and differential scanning calorimetry (DSC) analysis; the Vickers microhardness (VHN) was measured after 1, 7, 15 and 30days of storage in water. The degree of conversion values for the materials (DC, %) were immediately measured using near-infrared spectroscopy (FT-IR). The surface morphology of the composites was investigated by scanning electron microscopes (SEM) and atomic-force microscopy (AFM) analyses. The sorption and solubility measurements were also performed after 1, 7, 15 and 30days of storage in water. In addition to his, the data were statistically analyzed using one-way analysis of variance, and both the Newman Keuls and Tukey multiple comparison tests. The statistical significance level was established at p<0.05. According to the ISO 4049 standards, all the tested materials showed acceptable water sorption and solubility, and a halogen light source was an option to polymerize bulk-fill, resin-based dental composites. Copyright © 2017 Elsevier B.V. All rights reserved.
Visual Exploration of Genetic Association with Voxel-based Imaging Phenotypes in an MCI/AD Study
Kim, Sungeun; Shen, Li; Saykin, Andrew J.; West, John D.
2010-01-01
Neuroimaging genomics is a new transdisciplinary research field, which aims to examine genetic effects on brain via integrated analyses of high throughput neuroimaging and genomic data. We report our recent work on (1) developing an imaging genomic browsing system that allows for whole genome and entire brain analyses based on visual exploration and (2) applying the system to the imaging genomic analysis of an existing MCI/AD cohort. Voxel-based morphometry is used to define imaging phenotypes. ANCOVA is employed to evaluate the effect of the interaction of genotypes and diagnosis in relation to imaging phenotypes while controlling for relevant covariates. Encouraging experimental results suggest that the proposed system has substantial potential for enabling discovery of imaging genomic associations through visual evaluation and for localizing candidate imaging regions and genomic regions for refined statistical modeling. PMID:19963597
Quantitative Analysis of Venus Radar Backscatter Data in ArcGIS
NASA Technical Reports Server (NTRS)
Long, S. M.; Grosfils, E. B.
2005-01-01
Ongoing mapping of the Ganiki Planitia (V14) quadrangle of Venus and definition of material units has involved an integrated but qualitative analysis of Magellan radar backscatter images and topography using standard geomorphological mapping techniques. However, such analyses do not take full advantage of the quantitative information contained within the images. Analysis of the backscatter coefficient allows a much more rigorous statistical comparison between mapped units, permitting first order selfsimilarity tests of geographically separated materials assigned identical geomorphological labels. Such analyses cannot be performed directly on pixel (DN) values from Magellan backscatter images, because the pixels are scaled to the Muhleman law for radar echoes on Venus and are not corrected for latitudinal variations in incidence angle. Therefore, DN values must be converted based on pixel latitude back to their backscatter coefficient values before accurate statistical analysis can occur. Here we present a method for performing the conversions and analysis of Magellan backscatter data using commonly available ArcGIS software and illustrate the advantages of the process for geological mapping.
2012-01-01
Background A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. Methods We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). Results The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. Conclusions The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint. PMID:22962944
Adrion, Christine; Mansmann, Ulrich
2012-09-10
A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint.
Ince, Robin A A; Giordano, Bruno L; Kayser, Christoph; Rousselet, Guillaume A; Gross, Joachim; Schyns, Philippe G
2017-03-01
We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open-source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541-1573, 2017. © 2016 Wiley Periodicals, Inc. 2016 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.
Inferential Statistics in "Language Teaching Research": A Review and Ways Forward
ERIC Educational Resources Information Center
Lindstromberg, Seth
2016-01-01
This article reviews all (quasi)experimental studies appearing in the first 19 volumes (1997-2015) of "Language Teaching Research" (LTR). Specifically, it provides an overview of how statistical analyses were conducted in these studies and of how the analyses were reported. The overall conclusion is that there has been a tight adherence…
Tipping point analysis of atmospheric oxygen concentration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Livina, V. N.; Forbes, A. B.; Vaz Martins, T. M.
2015-03-15
We apply tipping point analysis to nine observational oxygen concentration records around the globe, analyse their dynamics and perform projections under possible future scenarios, leading to oxygen deficiency in the atmosphere. The analysis is based on statistical physics framework with stochastic modelling, where we represent the observed data as a composition of deterministic and stochastic components estimated from the observed data using Bayesian and wavelet techniques.
ERIC Educational Resources Information Center
Baxter, G. P.; Ahmed, S.; Sikali, E.; Waits, T.; Sloan, M.; Salvucci, S.
2007-01-01
The Nation's Report Card[TM] informs the public about the academic achievement of elementary and secondary students in the United States and its jurisdictions, including Puerto Rico. In 2003, a trial NAEP mathematics assessment was administered in Spanish to public school students at grades 4 and 8 in Puerto Rico. Based on preliminary analyses of…
ERIC Educational Resources Information Center
Ozel, Murat; Caglak, Serdar; Erdogan, Mehmet
2013-01-01
This study investigated how affective factors like attitude and motivation contribute to science achievement in PISA 2006 using linear structural modeling. The data set of PISA 2006 collected from 4942 fifteen-year-old Turkish students (2290 females, 2652 males) was used for the statistical analyses. A total of 42 selected items on a four point…
ERIC Educational Resources Information Center
Graafland-Essers, Irma; Cremonini, Leon; Ettedgui, Emile; Botterman, Maarten
2003-01-01
This report presents the current understanding of the advancement of the Information Society within the European Union and countries that are up for accession in 2004, and is based on the SIBIS (Statistical Indicators Benchmarking the Information Society) surveys and analyses per SIBIS theme and country. The report is unique in its coherent and…
ERIC Educational Resources Information Center
Graafland-Essers, Irma; Cremonini, Leon; Ettedgui, Emile; Botterman, Maarten
2003-01-01
This report presents the current understanding of the advancement of the Information Society within the European Union and countries that are up for accession in 2004, and is based on the SIBIS (Statistical Indicators Benchmarking the Information Society) surveys and analyses per SIBIS theme and country. The report is unique in its coherent and…
A Fatigue Management System for Sustained Military Operations
2008-03-31
Bradford, Brenda Jones, Margaret Campbell, Heather McCrory, Amy Campbell, Linda Mendez, Juan Cardenas, Beckie Moise, Samuel Cardenas, Fernando...always under the direct observation of research personnel or knowingly monitored from a central control station by closed circuit television, excluding...blocks 1, 5, 9, and 13). Statistical Analyses To determine the appropriate sample size for this study, a power analysis was based on the post
ERIC Educational Resources Information Center
Mattern, Krista D.; Patterson, Brian F.
2011-01-01
This report presents the findings from a replication of the analyses from the report, "Is Performance on the SAT Related to College Retention?" (Mattern & Patterson, 2009). The tables presented herein are based on the 2007 sample and the findings are largely the same as those presented in the original report, and show SAT scores are…
Zhang, Lin; Vranckx, Katleen; Janssens, Koen; Sandrin, Todd R.
2015-01-01
MALDI-TOF mass spectrometry has been shown to be a rapid and reliable tool for identification of bacteria at the genus and species, and in some cases, strain levels. Commercially available and open source software tools have been developed to facilitate identification; however, no universal/standardized data analysis pipeline has been described in the literature. Here, we provide a comprehensive and detailed demonstration of bacterial identification procedures using a MALDI-TOF mass spectrometer. Mass spectra were collected from 15 diverse bacteria isolated from Kartchner Caverns, AZ, USA, and identified by 16S rDNA sequencing. Databases were constructed in BioNumerics 7.1. Follow-up analyses of mass spectra were performed, including cluster analyses, peak matching, and statistical analyses. Identification was performed using blind-coded samples randomly selected from these 15 bacteria. Two identification methods are presented: similarity coefficient-based and biomarker-based methods. Results show that both identification methods can identify the bacteria to the species level. PMID:25590854
Zhang, Lin; Vranckx, Katleen; Janssens, Koen; Sandrin, Todd R
2015-01-02
MALDI-TOF mass spectrometry has been shown to be a rapid and reliable tool for identification of bacteria at the genus and species, and in some cases, strain levels. Commercially available and open source software tools have been developed to facilitate identification; however, no universal/standardized data analysis pipeline has been described in the literature. Here, we provide a comprehensive and detailed demonstration of bacterial identification procedures using a MALDI-TOF mass spectrometer. Mass spectra were collected from 15 diverse bacteria isolated from Kartchner Caverns, AZ, USA, and identified by 16S rDNA sequencing. Databases were constructed in BioNumerics 7.1. Follow-up analyses of mass spectra were performed, including cluster analyses, peak matching, and statistical analyses. Identification was performed using blind-coded samples randomly selected from these 15 bacteria. Two identification methods are presented: similarity coefficient-based and biomarker-based methods. Results show that both identification methods can identify the bacteria to the species level.
Orsi, Rebecca
2017-02-01
Concept mapping is now a commonly-used technique for articulating and evaluating programmatic outcomes. However, research regarding validity of knowledge and outcomes produced with concept mapping is sparse. The current study describes quantitative validity analyses using a concept mapping dataset. We sought to increase the validity of concept mapping evaluation results by running multiple cluster analysis methods and then using several metrics to choose from among solutions. We present four different clustering methods based on analyses using the R statistical software package: partitioning around medoids (PAM), fuzzy analysis (FANNY), agglomerative nesting (AGNES) and divisive analysis (DIANA). We then used the Dunn and Davies-Bouldin indices to assist in choosing a valid cluster solution for a concept mapping outcomes evaluation. We conclude that the validity of the outcomes map is high, based on the analyses described. Finally, we discuss areas for further concept mapping methods research. Copyright © 2016 Elsevier Ltd. All rights reserved.
Masiak, Marek; Loza, Bartosz
2004-01-01
A lot of inconsistencies across dimensional studies of schizophrenia(s) are being unveiled. These problems are strongly related to the methodological aspects of collecting data and specific statistical analyses. Psychiatrists have developed lots of psychopathological models derived from analytic studies based on SAPS/SANS (the Scale for the Assessment of Positive Symptoms/the Scale for the Assessment of Negative Symptoms) and PANSS (The Positive and Negative Syndrome Scale). The unique validation of parallel two independent factor models was performed--ascribed to the same illness and based on different diagnostic scales--to investigate indirect methodological causes of clinical discrepancies. 100 newly admitted patients (mean age--33.5, 18-45, males--64, females--36, hospitalised on average 5.15 times) with paranoid schizophrenia (according to ICD-10) were scored and analysed using PANSS and SAPS/SANS during psychotic exacerbation. All patients were treated with neuroleptics of various kinds with 410mg equivalents of chlorpromazine (atypicals:typicals --> 41:59). Factor analyses were applied to basic results (with principal component analysis, normalised varimax rotation). Investing the cross-model validity, canonical analysis was applied. Models of schizophrenia varied from 3 to 5 factors. PANSS model included: positive, negative, disorganisation, cognitive and depressive components and SAPS/SANS model was dominated by positive, negative and disorganisation factors. The SAPS/SANS accounted for merely 48% of the PANSS common variances. The SAPS/SANS combined measurement preferentially (67% of canonical variance) targeted positive-negative dichotomy. Respectively, PANSS shared positive-negative phenomenology in 35% of its own variance. The general concept of five-dimensionality in paranoid schizophrenia looks clinically more heuristic and statistically more stabilised.
Kessel, K A; Habermehl, D; Bohn, C; Jäger, A; Floca, R O; Zhang, L; Bougatf, N; Bendl, R; Debus, J; Combs, S E
2012-12-01
Especially in the field of radiation oncology, handling a large variety of voluminous datasets from various information systems in different documentation styles efficiently is crucial for patient care and research. To date, conducting retrospective clinical analyses is rather difficult and time consuming. With the example of patients with pancreatic cancer treated with radio-chemotherapy, we performed a therapy evaluation by using an analysis system connected with a documentation system. A total number of 783 patients have been documented into a professional, database-based documentation system. Information about radiation therapy, diagnostic images and dose distributions have been imported into the web-based system. For 36 patients with disease progression after neoadjuvant chemoradiation, we designed and established an analysis workflow. After an automatic registration of the radiation plans with the follow-up images, the recurrence volumes are segmented manually. Based on these volumes the DVH (dose volume histogram) statistic is calculated, followed by the determination of the dose applied to the region of recurrence. All results are saved in the database and included in statistical calculations. The main goal of using an automatic analysis tool is to reduce time and effort conducting clinical analyses, especially with large patient groups. We showed a first approach and use of some existing tools, however manual interaction is still necessary. Further steps need to be taken to enhance automation. Already, it has become apparent that the benefits of digital data management and analysis lie in the central storage of data and reusability of the results. Therefore, we intend to adapt the analysis system to other types of tumors in radiation oncology.
The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update
Afgan, Enis; Baker, Dannon; van den Beek, Marius; Blankenberg, Daniel; Bouvier, Dave; Čech, Martin; Chilton, John; Clements, Dave; Coraor, Nate; Eberhard, Carl; Grüning, Björn; Guerler, Aysam; Hillman-Jackson, Jennifer; Von Kuster, Greg; Rasche, Eric; Soranzo, Nicola; Turaga, Nitesh; Taylor, James; Nekrutenko, Anton; Goecks, Jeremy
2016-01-01
High-throughput data production technologies, particularly ‘next-generation’ DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale. PMID:27137889
De Moor, G J E; Claerhout, B; De Meyer, F
2003-01-01
To introduce some of the privacy protection problems related to genomics based medicine and to highlight the relevance of Trusted Third Parties (TTPs) and of Privacy Enhancing Techniques (PETs) in the restricted context of clinical research and statistics. Practical approaches based on two different pseudonymisation models, both for batch and interactive data collection and exchange, are described and analysed. The growing need of managing both clinical and genetic data raises important legal and ethical challenges. Protecting human rights in the realm of privacy, while optimising research potential and other statistical activities is a challenge that can easily be overcome with the assistance of a trust service provider offering advanced privacy enabling/enhancing solutions. As such, the use of pseudonymisation and other innovative Privacy Enhancing Techniques can unlock valuable data sources.
A new in silico classification model for ready biodegradability, based on molecular fragments.
Lombardo, Anna; Pizzo, Fabiola; Benfenati, Emilio; Manganaro, Alberto; Ferrari, Thomas; Gini, Giuseppina
2014-08-01
Regulations such as the European REACH (Registration, Evaluation, Authorization and restriction of Chemicals) often require chemicals to be evaluated for ready biodegradability, to assess the potential risk for environmental and human health. Because not all chemicals can be tested, there is an increasing demand for tools for quick and inexpensive biodegradability screening, such as computer-based (in silico) theoretical models. We developed an in silico model starting from a dataset of 728 chemicals with ready biodegradability data (MITI-test Ministry of International Trade and Industry). We used the novel software SARpy to automatically extract, through a structural fragmentation process, a set of substructures statistically related to ready biodegradability. Then, we analysed these substructures in order to build some general rules. The model consists of a rule-set made up of the combination of the statistically relevant fragments and of the expert-based rules. The model gives good statistical performance with 92%, 82% and 76% accuracy on the training, test and external set respectively. These results are comparable with other in silico models like BIOWIN developed by the United States Environmental Protection Agency (EPA); moreover this new model includes an easily understandable explanation. Copyright © 2014 Elsevier Ltd. All rights reserved.
Forster, H.-J.; Davis, J.C.; Tischendorf, G.; Seltmann, R.
1999-01-01
High-precision major, minor and trace element analyses for 44 elements have been made of 329 Late Variscan granitic and rhyolitic rocks from the Erzgebirge metallogenic province of Germany. The intrusive histories of some of these granites are not completely understood and exposures of rock are not adequate to resolve relationships between what apparently are different plutons. Therefore, it is necessary to turn to chemical analyses to decipher the evolution of the plutons and their relationships. A new classification of Erzgebirge plutons into five major groups of granites, based on petrologic interpretations of geochemical and mineralogical relationships (low-F biotite granites; low-F two-mica granites; high-F, high-P2O5 Li-mica granites; high-F, low-P2O5 Li-mica granites; high-F, low-P2O5 biotite granites) was tested by multivariate techniques. Canonical analyses of major elements, minor elements, trace elements and ratio variables all distinguish the groups with differing amounts of success. Univariate ANOVA's, in combination with forward-stepwise and backward-elimination canonical analyses, were used to select ten variables which were most effective in distinguishing groups. In a biplot, groups form distinct clusters roughly arranged along a quadratic path. Within groups, individual plutons tend to be arranged in patterns possibly reflecting granitic evolution. Canonical functions were used to classify samples of rhyolites of unknown association into the five groups. Another canonical analysis was based on ten elements traditionally used in petrology and which were important in the new classification of granites. Their biplot pattern is similar to that from statistically chosen variables but less effective at distinguishing the five groups of granites. This study shows that multivariate statistical techniques can provide significant insight into problems of granitic petrogenesis and may be superior to conventional procedures for petrological interpretation.
NASA Astrophysics Data System (ADS)
Stück, H. L.; Siegesmund, S.
2012-04-01
Sandstones are a popular natural stone due to their wide occurrence and availability. The different applications for these stones have led to an increase in demand. From the viewpoint of conservation and the natural stone industry, an understanding of the material behaviour of this construction material is very important. Sandstones are a highly heterogeneous material. Based on statistical analyses with a sufficiently large dataset, a systematic approach to predicting the material behaviour should be possible. Since the literature already contains a large volume of data concerning the petrographical and petrophysical properties of sandstones, a large dataset could be compiled for the statistical analyses. The aim of this study is to develop constraints on the material behaviour and especially on the weathering behaviour of sandstones. Approximately 300 samples from historical and presently mined natural sandstones in Germany and ones described worldwide were included in the statistical approach. The mineralogical composition and fabric characteristics were determined from detailed thin section analyses and descriptions in the literature. Particular attention was paid to evaluating the compositional and textural maturity, grain contact respectively contact thickness, type of cement, degree of alteration and the intergranular volume. Statistical methods were used to test for normal distributions and calculating the linear regression of the basic petrophysical properties of density, porosity, water uptake as well as the strength. The sandstones were classified into three different pore size distributions and evaluated with the other petrophysical properties. Weathering behavior like hygric swelling and salt loading tests were also included. To identify similarities between individual sandstones or to define groups of specific sandstone types, principle component analysis, cluster analysis and factor analysis were applied. Our results show that composition and porosity evolution during diagenesis is a very important control on the petrophysical properties of a building stone. The relationship between intergranular volume, cementation and grain contact, can also provide valuable information to predict the strength properties. Since the samples investigated mainly originate from the Triassic German epicontinental basin, arkoses and feldspar-arenites are underrepresented. In general, the sandstones can be grouped as follows: i) quartzites, highly mature with a primary porosity of about 40%, ii) quartzites, highly mature, showing a primary porosity of 40% but with early clay infiltration, iii) sublitharenites-lithic arenites exhibiting a lower primary porosity, higher cementation with quartz and Fe-oxides ferritic and iv) sublitharenites-lithic arenites with a higher content of pseudomatrix. However, in the last two groups the feldspar and lithoclasts can also show considerable alteration. All sandstone groups differ with respect to the pore space and strength data, as well as water uptake properties, which were obtained by linear regression analysis. Similar petrophysical properties are discernible for each type when using principle component analysis. Furthermore, strength as well as the porosity of sandstones shows distinct differences considering their stratigraphic ages and the compositions. The relationship between porosity, strength as well as salt resistance could also be verified. Hygric swelling shows an interrelation to pore size type, porosity and strength but also to the degree of alteration (e.g. lithoclasts, pseudomatrix). To summarize, the different regression analyses and the calculated confidence regions provide a significant tool to classify the petrographical and petrophysical parameters of sandstones. Based on this, the durability and the weathering behavior of the sandstone groups can be constrained. Keywords: sandstones, petrographical & petrophysical properties, predictive approach, statistical investigation
Determination of quality parameters from statistical analysis of routine TLD dosimetry data.
German, U; Weinstein, M; Pelled, O
2006-01-01
Following the as low as reasonably achievable (ALARA) practice, there is a need to measure very low doses, of the same order of magnitude as the natural background, and the limits of detection of the dosimetry systems. The different contributions of the background signals to the total zero dose reading of thermoluminescence dosemeter (TLD) cards were analysed by using the common basic definitions of statistical indicators: the critical level (L(C)), the detection limit (L(D)) and the determination limit (L(Q)). These key statistical parameters for the system operated at NRC-Negev were quantified, based on the history of readings of the calibration cards in use. The electronic noise seems to play a minor role, but the reading of the Teflon coating (without the presence of a TLD crystal) gave a significant contribution.
Lu, Z. Q. J.; Lowhorn, N. D.; Wong-Ng, W.; Zhang, W.; Thomas, E. L.; Otani, M.; Green, M. L.; Tran, T. N.; Caylor, C.; Dilley, N. R.; Downey, A.; Edwards, B.; Elsner, N.; Ghamaty, S.; Hogan, T.; Jie, Q.; Li, Q.; Martin, J.; Nolas, G.; Obara, H.; Sharp, J.; Venkatasubramanian, R.; Willigan, R.; Yang, J.; Tritt, T.
2009-01-01
In an effort to develop a Standard Reference Material (SRM™) for Seebeck coefficient, we have conducted a round-robin measurement survey of two candidate materials—undoped Bi2Te3 and Constantan (55 % Cu and 45 % Ni alloy). Measurements were performed in two rounds by twelve laboratories involved in active thermoelectric research using a number of different commercial and custom-built measurement systems and techniques. In this paper we report the detailed statistical analyses on the interlaboratory measurement results and the statistical methodology for analysis of irregularly sampled measurement curves in the interlaboratory study setting. Based on these results, we have selected Bi2Te3 as the prototype standard material. Once available, this SRM will be useful for future interlaboratory data comparison and instrument calibrations. PMID:27504212
Jeyapalan, Karthigeyan; Kumar, Jaya Krishna; Azhagarasan, N. S.
2015-01-01
Aims: The aim was to evaluate and compare the effects of three chemically different commercially available denture cleansing agents on the surface topography of two different denture base materials. Materials and Methods: Three chemically different denture cleansers (sodium perborate, 1% sodium hypochlorite, 0.2% chlorhexidine gluconate) were used on two denture base materials (acrylic resin and chrome cobalt alloy) and the changes were evaluated at 3 times intervals (56 h, 120 h, 240 h). Changes from baseline for surface roughness were recorded using a surface profilometer and standard error of the mean (SEM) both quantitatively and qualitatively, respectively. Qualitative surface analyses for all groups were done by SEM. Statistical Analysis Used: The values obtained were analyzed statistically using one-way ANOVA and paired t-test. Results: All three denture cleanser solutions showed no statistically significant surface changes on the acrylic resin portions at 56 h, 120 h, and 240 h of immersion. However, on the alloy portion changes were significant at the end of 120 h and 240 h. Conclusion: Of the three denture cleansers used in the study, none produced significant changes on the two denture base materials for the short duration of immersion, whereas changes were seen as the immersion periods were increased. PMID:26538915
A Cyber-Attack Detection Model Based on Multivariate Analyses
NASA Astrophysics Data System (ADS)
Sakai, Yuto; Rinsaka, Koichiro; Dohi, Tadashi
In the present paper, we propose a novel cyber-attack detection model based on two multivariate-analysis methods to the audit data observed on a host machine. The statistical techniques used here are the well-known Hayashi's quantification method IV and cluster analysis method. We quantify the observed qualitative audit event sequence via the quantification method IV, and collect similar audit event sequence in the same groups based on the cluster analysis. It is shown in simulation experiments that our model can improve the cyber-attack detection accuracy in some realistic cases where both normal and attack activities are intermingled.
A d-statistic for single-case designs that is equivalent to the usual between-groups d-statistic.
Shadish, William R; Hedges, Larry V; Pustejovsky, James E; Boyajian, Jonathan G; Sullivan, Kristynn J; Andrade, Alma; Barrientos, Jeannette L
2014-01-01
We describe a standardised mean difference statistic (d) for single-case designs that is equivalent to the usual d in between-groups experiments. We show how it can be used to summarise treatment effects over cases within a study, to do power analyses in planning new studies and grant proposals, and to meta-analyse effects across studies of the same question. We discuss limitations of this d-statistic, and possible remedies to them. Even so, this d-statistic is better founded statistically than other effect size measures for single-case design, and unlike many general linear model approaches such as multilevel modelling or generalised additive models, it produces a standardised effect size that can be integrated over studies with different outcome measures. SPSS macros for both effect size computation and power analysis are available.
Reiss, Philip T
2015-08-01
The "ten ironic rules for statistical reviewers" presented by Friston (2012) prompted a rebuttal by Lindquist et al. (2013), which was followed by a rejoinder by Friston (2013). A key issue left unresolved in this discussion is the use of cross-validation to test the significance of predictive analyses. This note discusses the role that cross-validation-based and related hypothesis tests have come to play in modern data analyses, in neuroimaging and other fields. It is shown that such tests need not be suboptimal and can fill otherwise-unmet inferential needs. Copyright © 2015 Elsevier Inc. All rights reserved.
Southard, Rodney E.
2013-01-01
The weather and precipitation patterns in Missouri vary considerably from year to year. In 2008, the statewide average rainfall was 57.34 inches and in 2012, the statewide average rainfall was 30.64 inches. This variability in precipitation and resulting streamflow in Missouri underlies the necessity for water managers and users to have reliable streamflow statistics and a means to compute select statistics at ungaged locations for a better understanding of water availability. Knowledge of surface-water availability is dependent on the streamflow data that have been collected and analyzed by the U.S. Geological Survey for more than 100 years at approximately 350 streamgages throughout Missouri. The U.S. Geological Survey, in cooperation with the Missouri Department of Natural Resources, computed streamflow statistics at streamgages through the 2010 water year, defined periods of drought and defined methods to estimate streamflow statistics at ungaged locations, and developed regional regression equations to compute selected streamflow statistics at ungaged locations. Streamflow statistics and flow durations were computed for 532 streamgages in Missouri and in neighboring States of Missouri. For streamgages with more than 10 years of record, Kendall’s tau was computed to evaluate for trends in streamflow data. If trends were detected, the variable length method was used to define the period of no trend. Water years were removed from the dataset from the beginning of the record for a streamgage until no trend was detected. Low-flow frequency statistics were then computed for the entire period of record and for the period of no trend if 10 or more years of record were available for each analysis. Three methods are presented for computing selected streamflow statistics at ungaged locations. The first method uses power curve equations developed for 28 selected streams in Missouri and neighboring States that have multiple streamgages on the same streams. Statistical estimates on one of these streams can be calculated at an ungaged location that has a drainage area that is between 40 percent of the drainage area of the farthest upstream streamgage and within 150 percent of the drainage area of the farthest downstream streamgage along the stream of interest. The second method may be used on any stream with a streamgage that has operated for 10 years or longer and for which anthropogenic effects have not changed the low-flow characteristics at the ungaged location since collection of the streamflow data. A ratio of drainage area of the stream at the ungaged location to the drainage area of the stream at the streamgage was computed to estimate the statistic at the ungaged location. The range of applicability is between 40- and 150-percent of the drainage area of the streamgage, and the ungaged location must be located on the same stream as the streamgage. The third method uses regional regression equations to estimate selected low-flow frequency statistics for unregulated streams in Missouri. This report presents regression equations to estimate frequency statistics for the 10-year recurrence interval and for the N-day durations of 1, 2, 3, 7, 10, 30, and 60 days. Basin and climatic characteristics were computed using geographic information system software and digital geospatial data. A total of 35 characteristics were computed for use in preliminary statewide and regional regression analyses based on existing digital geospatial data and previous studies. Spatial analyses for geographical bias in the predictive accuracy of the regional regression equations defined three low-flow regions with the State representing the three major physiographic provinces in Missouri. Region 1 includes the Central Lowlands, Region 2 includes the Ozark Plateaus, and Region 3 includes the Mississippi Alluvial Plain. A total of 207 streamgages were used in the regression analyses for the regional equations. Of the 207 U.S. Geological Survey streamgages, 77 were located in Region 1, 120 were located in Region 2, and 10 were located in Region 3. Streamgages located outside of Missouri were selected to extend the range of data used for the independent variables in the regression analyses. Streamgages included in the regression analyses had 10 or more years of record and were considered to be affected minimally by anthropogenic activities or trends. Regional regression analyses identified three characteristics as statistically significant for the development of regional equations. For Region 1, drainage area, longest flow path, and streamflow-variability index were statistically significant. The range in the standard error of estimate for Region 1 is 79.6 to 94.2 percent. For Region 2, drainage area and streamflow variability index were statistically significant, and the range in the standard error of estimate is 48.2 to 72.1 percent. For Region 3, drainage area and streamflow-variability index also were statistically significant with a range in the standard error of estimate of 48.1 to 96.2 percent. Limitations on the use of estimating low-flow frequency statistics at ungaged locations are dependent on the method used. The first method outlined for use in Missouri, power curve equations, were developed to estimate the selected statistics for ungaged locations on 28 selected streams with multiple streamgages located on the same stream. A second method uses a drainage-area ratio to compute statistics at an ungaged location using data from a single streamgage on the same stream with 10 or more years of record. Ungaged locations on these streams may use the ratio of the drainage area at an ungaged location to the drainage area at a streamgage location to scale the selected statistic value from the streamgage location to the ungaged location. This method can be used if the drainage area of the ungaged location is within 40 to 150 percent of the streamgage drainage area. The third method is the use of the regional regression equations. The limits for the use of these equations are based on the ranges of the characteristics used as independent variables and that streams must be affected minimally by anthropogenic activities.
Smith, Joseph M.; Mather, Martha E.
2012-01-01
Ecological indicators are science-based tools used to assess how human activities have impacted environmental resources. For monitoring and environmental assessment, existing species assemblage data can be used to make these comparisons through time or across sites. An impediment to using assemblage data, however, is that these data are complex and need to be simplified in an ecologically meaningful way. Because multivariate statistics are mathematical relationships, statistical groupings may not make ecological sense and will not have utility as indicators. Our goal was to define a process to select defensible and ecologically interpretable statistical simplifications of assemblage data in which researchers and managers can have confidence. For this, we chose a suite of statistical methods, compared the groupings that resulted from these analyses, identified convergence among groupings, then we interpreted the groupings using species and ecological guilds. When we tested this approach using a statewide stream fish dataset, not all statistical methods worked equally well. For our dataset, logistic regression (Log), detrended correspondence analysis (DCA), cluster analysis (CL), and non-metric multidimensional scaling (NMDS) provided consistent, simplified output. Specifically, the Log, DCA, CL-1, and NMDS-1 groupings were ≥60% similar to each other, overlapped with the fluvial-specialist ecological guild, and contained a common subset of species. Groupings based on number of species (e.g., Log, DCA, CL and NMDS) outperformed groupings based on abundance [e.g., principal components analysis (PCA) and Poisson regression]. Although the specific methods that worked on our test dataset have generality, here we are advocating a process (e.g., identifying convergent groupings with redundant species composition that are ecologically interpretable) rather than the automatic use of any single statistical tool. We summarize this process in step-by-step guidance for the future use of these commonly available ecological and statistical methods in preparing assemblage data for use in ecological indicators.
Furlan, Leonardo; Sterr, Annette
2018-01-01
Motor learning studies face the challenge of differentiating between real changes in performance and random measurement error. While the traditional p -value-based analyses of difference (e.g., t -tests, ANOVAs) provide information on the statistical significance of a reported change in performance scores, they do not inform as to the likely cause or origin of that change, that is, the contribution of both real modifications in performance and random measurement error to the reported change. One way of differentiating between real change and random measurement error is through the utilization of the statistics of standard error of measurement (SEM) and minimal detectable change (MDC). SEM is estimated from the standard deviation of a sample of scores at baseline and a test-retest reliability index of the measurement instrument or test employed. MDC, in turn, is estimated from SEM and a degree of confidence, usually 95%. The MDC value might be regarded as the minimum amount of change that needs to be observed for it to be considered a real change, or a change to which the contribution of real modifications in performance is likely to be greater than that of random measurement error. A computer-based motor task was designed to illustrate the applicability of SEM and MDC to motor learning research. Two studies were conducted with healthy participants. Study 1 assessed the test-retest reliability of the task and Study 2 consisted in a typical motor learning study, where participants practiced the task for five consecutive days. In Study 2, the data were analyzed with a traditional p -value-based analysis of difference (ANOVA) and also with SEM and MDC. The findings showed good test-retest reliability for the task and that the p -value-based analysis alone identified statistically significant improvements in performance over time even when the observed changes could in fact have been smaller than the MDC and thereby caused mostly by random measurement error, as opposed to by learning. We suggest therefore that motor learning studies could complement their p -value-based analyses of difference with statistics such as SEM and MDC in order to inform as to the likely cause or origin of any reported changes in performance.
ERIC Educational Resources Information Center
Thompson, Bruce; Melancon, Janet G.
Effect sizes have been increasingly emphasized in research as more researchers have recognized that: (1) all parametric analyses (t-tests, analyses of variance, etc.) are correlational; (2) effect sizes have played an important role in meta-analytic work; and (3) statistical significance testing is limited in its capacity to inform scientific…
Comments on `A Cautionary Note on the Interpretation of EOFs'.
NASA Astrophysics Data System (ADS)
Behera, Swadhin K.; Rao, Suryachandra A.; Saji, Hameed N.; Yamagata, Toshio
2003-04-01
The misleading aspect of the statistical analyses used in Dommenget and Latif, which raises concerns on some of the reported climate modes, is demonstrated. Adopting simple statistical techniques, the physical existence of the Indian Ocean dipole mode is shown and then the limitations of varimax and regression analyses in capturing the climate mode are discussed.
An order statistics approach to the halo model for galaxies
NASA Astrophysics Data System (ADS)
Paul, Niladri; Paranjape, Aseem; Sheth, Ravi K.
2017-04-01
We use the halo model to explore the implications of assuming that galaxy luminosities in groups are randomly drawn from an underlying luminosity function. We show that even the simplest of such order statistics models - one in which this luminosity function p(L) is universal - naturally produces a number of features associated with previous analyses based on the 'central plus Poisson satellites' hypothesis. These include the monotonic relation of mean central luminosity with halo mass, the lognormal distribution around this mean and the tight relation between the central and satellite mass scales. In stark contrast to observations of galaxy clustering; however, this model predicts no luminosity dependence of large-scale clustering. We then show that an extended version of this model, based on the order statistics of a halo mass dependent luminosity function p(L|m), is in much better agreement with the clustering data as well as satellite luminosities, but systematically underpredicts central luminosities. This brings into focus the idea that central galaxies constitute a distinct population that is affected by different physical processes than are the satellites. We model this physical difference as a statistical brightening of the central luminosities, over and above the order statistics prediction. The magnitude gap between the brightest and second brightest group galaxy is predicted as a by-product, and is also in good agreement with observations. We propose that this order statistics framework provides a useful language in which to compare the halo model for galaxies with more physically motivated galaxy formation models.
Real-world visual statistics and infants' first-learned object names.
Clerkin, Elizabeth M; Hart, Elizabeth; Rehg, James M; Yu, Chen; Smith, Linda B
2017-01-05
We offer a new solution to the unsolved problem of how infants break into word learning based on the visual statistics of everyday infant-perspective scenes. Images from head camera video captured by 8 1/2 to 10 1/2 month-old infants at 147 at-home mealtime events were analysed for the objects in view. The images were found to be highly cluttered with many different objects in view. However, the frequency distribution of object categories was extremely right skewed such that a very small set of objects was pervasively present-a fact that may substantially reduce the problem of referential ambiguity. The statistical structure of objects in these infant egocentric scenes differs markedly from that in the training sets used in computational models and in experiments on statistical word-referent learning. Therefore, the results also indicate a need to re-examine current explanations of how infants break into word learning.This article is part of the themed issue 'New frontiers for statistical learning in the cognitive sciences'. © 2016 The Author(s).
Leyrat, Clémence; Caille, Agnès; Foucher, Yohann; Giraudeau, Bruno
2016-01-22
Despite randomization, baseline imbalance and confounding bias may occur in cluster randomized trials (CRTs). Covariate imbalance may jeopardize the validity of statistical inferences if they occur on prognostic factors. Thus, the diagnosis of a such imbalance is essential to adjust statistical analysis if required. We developed a tool based on the c-statistic of the propensity score (PS) model to detect global baseline covariate imbalance in CRTs and assess the risk of confounding bias. We performed a simulation study to assess the performance of the proposed tool and applied this method to analyze the data from 2 published CRTs. The proposed method had good performance for large sample sizes (n =500 per arm) and when the number of unbalanced covariates was not too small as compared with the total number of baseline covariates (≥40% of unbalanced covariates). We also provide a strategy for pre selection of the covariates needed to be included in the PS model to enhance imbalance detection. The proposed tool could be useful in deciding whether covariate adjustment is required before performing statistical analyses of CRTs.
Cluster mass inference via random field theory.
Zhang, Hui; Nichols, Thomas E; Johnson, Timothy D
2009-01-01
Cluster extent and voxel intensity are two widely used statistics in neuroimaging inference. Cluster extent is sensitive to spatially extended signals while voxel intensity is better for intense but focal signals. In order to leverage strength from both statistics, several nonparametric permutation methods have been proposed to combine the two methods. Simulation studies have shown that of the different cluster permutation methods, the cluster mass statistic is generally the best. However, to date, there is no parametric cluster mass inference available. In this paper, we propose a cluster mass inference method based on random field theory (RFT). We develop this method for Gaussian images, evaluate it on Gaussian and Gaussianized t-statistic images and investigate its statistical properties via simulation studies and real data. Simulation results show that the method is valid under the null hypothesis and demonstrate that it can be more powerful than the cluster extent inference method. Further, analyses with a single subject and a group fMRI dataset demonstrate better power than traditional cluster size inference, and good accuracy relative to a gold-standard permutation test.
Survival Regression Modeling Strategies in CVD Prediction.
Barkhordari, Mahnaz; Padyab, Mojgan; Sardarinia, Mahsa; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza
2016-04-01
A fundamental part of prevention is prediction. Potential predictors are the sine qua non of prediction models. However, whether incorporating novel predictors to prediction models could be directly translated to added predictive value remains an area of dispute. The difference between the predictive power of a predictive model with (enhanced model) and without (baseline model) a certain predictor is generally regarded as an indicator of the predictive value added by that predictor. Indices such as discrimination and calibration have long been used in this regard. Recently, the use of added predictive value has been suggested while comparing the predictive performances of the predictive models with and without novel biomarkers. User-friendly statistical software capable of implementing novel statistical procedures is conspicuously lacking. This shortcoming has restricted implementation of such novel model assessment methods. We aimed to construct Stata commands to help researchers obtain the aforementioned statistical indices. We have written Stata commands that are intended to help researchers obtain the following. 1, Nam-D'Agostino X 2 goodness of fit test; 2, Cut point-free and cut point-based net reclassification improvement index (NRI), relative absolute integrated discriminatory improvement index (IDI), and survival-based regression analyses. We applied the commands to real data on women participating in the Tehran lipid and glucose study (TLGS) to examine if information relating to a family history of premature cardiovascular disease (CVD), waist circumference, and fasting plasma glucose can improve predictive performance of Framingham's general CVD risk algorithm. The command is adpredsurv for survival models. Herein we have described the Stata package "adpredsurv" for calculation of the Nam-D'Agostino X 2 goodness of fit test as well as cut point-free and cut point-based NRI, relative and absolute IDI, and survival-based regression analyses. We hope this work encourages the use of novel methods in examining predictive capacity of the emerging plethora of novel biomarkers.
Antal, Péter; Kiszel, Petra Sz.; Gézsi, András; Hadadi, Éva; Virág, Viktor; Hajós, Gergely; Millinghoffer, András; Nagy, Adrienne; Kiss, András; Semsei, Ágnes F.; Temesi, Gergely; Melegh, Béla; Kisfali, Péter; Széll, Márta; Bikov, András; Gálffy, Gabriella; Tamási, Lilla; Falus, András; Szalai, Csaba
2012-01-01
Genetic studies indicate high number of potential factors related to asthma. Based on earlier linkage analyses we selected the 11q13 and 14q22 asthma susceptibility regions, for which we designed a partial genome screening study using 145 SNPs in 1201 individuals (436 asthmatic children and 765 controls). The results were evaluated with traditional frequentist methods and we applied a new statistical method, called Bayesian network based Bayesian multilevel analysis of relevance (BN-BMLA). This method uses Bayesian network representation to provide detailed characterization of the relevance of factors, such as joint significance, the type of dependency, and multi-target aspects. We estimated posteriors for these relations within the Bayesian statistical framework, in order to estimate the posteriors whether a variable is directly relevant or its association is only mediated. With frequentist methods one SNP (rs3751464 in the FRMD6 gene) provided evidence for an association with asthma (OR = 1.43(1.2–1.8); p = 3×10−4). The possible role of the FRMD6 gene in asthma was also confirmed in an animal model and human asthmatics. In the BN-BMLA analysis altogether 5 SNPs in 4 genes were found relevant in connection with asthma phenotype: PRPF19 on chromosome 11, and FRMD6, PTGER2 and PTGDR on chromosome 14. In a subsequent step a partial dataset containing rhinitis and further clinical parameters was used, which allowed the analysis of relevance of SNPs for asthma and multiple targets. These analyses suggested that SNPs in the AHNAK and MS4A2 genes were indirectly associated with asthma. This paper indicates that BN-BMLA explores the relevant factors more comprehensively than traditional statistical methods and extends the scope of strong relevance based methods to include partial relevance, global characterization of relevance and multi-target relevance. PMID:22432035
Agriculture, population growth, and statistical analysis of the radiocarbon record.
Zahid, H Jabran; Robinson, Erick; Kelly, Robert L
2016-01-26
The human population has grown significantly since the onset of the Holocene about 12,000 y ago. Despite decades of research, the factors determining prehistoric population growth remain uncertain. Here, we examine measurements of the rate of growth of the prehistoric human population based on statistical analysis of the radiocarbon record. We find that, during most of the Holocene, human populations worldwide grew at a long-term annual rate of 0.04%. Statistical analysis of the radiocarbon record shows that transitioning farming societies experienced the same rate of growth as contemporaneous foraging societies. The same rate of growth measured for populations dwelling in a range of environments and practicing a variety of subsistence strategies suggests that the global climate and/or endogenous biological factors, not adaptability to local environment or subsistence practices, regulated the long-term growth of the human population during most of the Holocene. Our results demonstrate that statistical analyses of large ensembles of radiocarbon dates are robust and valuable for quantitatively investigating the demography of prehistoric human populations worldwide.
Gorgolewski, Krzysztof J; Varoquaux, Gael; Rivera, Gabriel; Schwartz, Yannick; Sochat, Vanessa V; Ghosh, Satrajit S; Maumet, Camille; Nichols, Thomas E; Poline, Jean-Baptiste; Yarkoni, Tal; Margulies, Daniel S; Poldrack, Russell A
2016-01-01
NeuroVault.org is dedicated to storing outputs of analyses in the form of statistical maps, parcellations and atlases, a unique strategy that contrasts with most neuroimaging repositories that store raw acquisition data or stereotaxic coordinates. Such maps are indispensable for performing meta-analyses, validating novel methodology, and deciding on precise outlines for regions of interest (ROIs). NeuroVault is open to maps derived from both healthy and clinical populations, as well as from various imaging modalities (sMRI, fMRI, EEG, MEG, PET, etc.). The repository uses modern web technologies such as interactive web-based visualization, cognitive decoding, and comparison with other maps to provide researchers with efficient, intuitive tools to improve the understanding of their results. Each dataset and map is assigned a permanent Universal Resource Locator (URL), and all of the data is accessible through a REST Application Programming Interface (API). Additionally, the repository supports the NIDM-Results standard and has the ability to parse outputs from popular FSL and SPM software packages to automatically extract relevant metadata. This ease of use, modern web-integration, and pioneering functionality holds promise to improve the workflow for making inferences about and sharing whole-brain statistical maps. Copyright © 2015 Elsevier Inc. All rights reserved.
Traeger, Adrian C; Skinner, Ian W; Hübscher, Markus; Lee, Hopin; Moseley, G Lorimer; Nicholas, Michael K; Henschke, Nicholas; Refshauge, Kathryn M; Blyth, Fiona M; Main, Chris J; Hush, Julia M; Pearce, Garry; Lo, Serigne; McAuley, James H
Statistical analysis plans increase the transparency of decisions made in the analysis of clinical trial results. The purpose of this paper is to detail the planned analyses for the PREVENT trial, a randomized, placebo-controlled trial of patient education for acute low back pain. We report the pre-specified principles, methods, and procedures to be adhered to in the main analysis of the PREVENT trial data. The primary outcome analysis will be based on Mixed Models for Repeated Measures (MMRM), which can test treatment effects at specific time points, and the assumptions of this analysis are outlined. We also outline the treatment of secondary outcomes and planned sensitivity analyses. We provide decisions regarding the treatment of missing data, handling of descriptive and process measure data, and blinded review procedures. Making public the pre-specified statistical analysis plan for the PREVENT trial minimizes the potential for bias in the analysis of trial data, and in the interpretation and reporting of trial results. ACTRN12612001180808 (https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?ACTRN=12612001180808). Copyright © 2017 Associação Brasileira de Pesquisa e Pós-Graduação em Fisioterapia. Publicado por Elsevier Editora Ltda. All rights reserved.
The influence of control group reproduction on the statistical ...
Because of various Congressional mandates to protect the environment from endocrine disrupting chemicals (EDCs), the United States Environmental Protection Agency (USEPA) initiated the Endocrine Disruptor Screening Program. In the context of this framework, the Office of Research and Development within the USEPA developed the Medaka Extended One Generation Reproduction Test (MEOGRT) to characterize the endocrine action of a suspected EDC. One important endpoint of the MEOGRT is fecundity of breeding pairs of medaka. Power analyses were conducted to determine the number of replicates needed in proposed test designs and to determine the effects that varying reproductive parameters (e.g. mean fecundity, variance, and days with no egg production) will have on the statistical power of the test. A software tool, the MEOGRT Reproduction Power Analysis Tool, was developed to expedite these power analyses by both calculating estimates of the needed reproductive parameters (e.g. population mean and variance) and performing the power analysis under user specified scenarios. The manuscript illustrates how the reproductive performance of the control medaka that are used in a MEOGRT influence statistical power, and therefore the successful implementation of the protocol. Example scenarios, based upon medaka reproduction data collected at MED, are discussed that bolster the recommendation that facilities planning to implement the MEOGRT should have a culture of medaka with hi
System for analysing sickness absenteeism in Poland.
Indulski, J A; Szubert, Z
1997-01-01
The National System of Sickness Absenteeism Statistics has been functioning in Poland since 1977, as the part of the national health statistics. The system is based on a 15-percent random sample of copies of certificates of temporary incapacity for work issued by all health care units and authorised private medical practitioners. A certificate of temporary incapacity for work is received by every insured employee who is compelled to stop working due to sickness, accident, or due to the necessity to care for a sick member of his/her family. The certificate is required on the first day of sickness. Analyses of disease- and accident-related sickness absenteeism carried out each year in Poland within the statistical system lead to the main conclusions: 1. Diseases of the musculoskeletal and peripheral nervous systems accounting, when combined, for 1/3 of the total sickness absenteeism, are a major health problem of the working population in Poland. During the past five years, incapacity for work caused by these diseases in males increased 2.5 times. 2. Circulatory diseases, and arterial hypertension and ischaemic heart disease in particular (41% and 27% of sickness days, respectively), create an essential health problem among males at productive age, especially, in the 40 and older age group. Absenteeism due to these diseases has increased in males more than two times.
Nurses' readiness for evidence-based practice at Finnish university hospitals: a national survey.
Saunders, Hannele; Stevens, Kathleen R; Vehviläinen-Julkunen, Katri
2016-08-01
The aim of this study was to determine nurses' readiness for evidence-based practice at Finnish university hospitals. Although systematic implementation of evidence-based practice is essential to effectively improving patient outcomes and value of care, nurses do not consistently use evidence in practice. Uptake is hampered by lack of nurses' individual and organizational readiness for evidence-based practice. Although nurses' evidence-based practice competencies have been widely studied in countries leading the evidence-based practice movement, less is known about nurses' readiness for evidence-based practice in the non-English-speaking world. A cross-sectional descriptive survey design. The study was conducted in November-December 2014 in every university hospital in Finland with a convenience sample (n = 943) of practicing nurses. The electronic survey data were collected using the Stevens' Evidence-Based Practice Readiness Inventory, which was translated into Finnish according to standardized guidelines for translation of research instruments. The data were analysed using descriptive and inferential statistics. Nurses reported low to moderate levels of self-efficacy and low levels of evidence-based practice knowledge. A statistically significant, direct correlation was found between nurses' self-efficacy in employing evidence-based practice and their actual evidence-based practice knowledge level. Several statistically significant differences were found between nurses' socio-demographic variables and nurses' self-efficacy in employing evidence-based practice, and actual and perceived evidence-based practice knowledge. Finnish nurses at university hospitals are not ready for evidence-based practice. Although nurses are familiar with the concept of evidence-based practice, they lack the evidence-based practice knowledge and self-efficacy in employing evidence-based practice required for integrating best evidence into clinical care delivery. © 2016 John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Scudder, Rachel P.; Murray, Richard W.; Schindlbeck, Julie C.; Kutterolf, Steffen; Hauff, Folkmar; McKinley, Claire C.
2014-11-01
We have geochemically and statistically characterized bulk marine sediment and ash layers at Ocean Drilling Program Site 1149 (Izu-Bonin Arc) and Deep Sea Drilling Project Site 52 (Mariana Arc), and have quantified that multiple dispersed ash sources collectively comprise ˜30-35% of the hemipelagic sediment mass entering the Izu-Bonin-Mariana subduction system. Multivariate statistical analyses indicate that the bulk sediment at Site 1149 is a mixture of Chinese Loess, a second compositionally distinct eolian source, a dispersed mafic ash, and a dispersed felsic ash. We interpret the source of these ashes as, respectively, being basalt from the Izu-Bonin Front Arc (IBFA) and rhyolite from the Honshu Arc. Sr-, Nd-, and Pb isotopic analyses of the bulk sediment are consistent with the chemical/statistical-based interpretations. Comparison of the mass accumulation rate of the dispersed ash component to discrete ash layer parameters (thickness, sedimentation rate, and number of layers) suggests that eruption frequency, rather than eruption size, drives the dispersed ash record. At Site 52, the geochemistry and statistical modeling indicates that Chinese Loess, IBFA, dispersed BNN (boninite from Izu-Bonin), and a dispersed felsic ash of unknown origin are the sources. At Site 1149, the ash layers and the dispersed ash are compositionally coupled, whereas at Site 52 they are decoupled in that there are no boninite layers, yet boninite is dispersed within the sediment. Changes in the volcanic and eolian inputs through time indicate strong arc-related and climate-related controls.
Rotman, B. L.; Sullivan, A. N.; McDonald, T.; DeSmedt, P.; Goodnature, D.; Higgins, M.; Suermondt, H. J.; Young, C. Y.; Owens, D. K.
1995-01-01
We are performing a randomized, controlled trial of a Physician's Workstation (PWS), an ambulatory care information system, developed for use in the General Medical Clinic (GMC) of the Palo Alto VA. Goals for the project include selecting appropriate outcome variables and developing a statistically powerful experimental design with a limited number of subjects. As PWS provides real-time drug-ordering advice, we retrospectively examined drug costs and drug-drug interactions in order to select outcome variables sensitive to our short-term intervention as well as to estimate the statistical efficiency of alternative design possibilities. Drug cost data revealed the mean daily cost per physician per patient was 99.3 cents +/- 13.4 cents, with a range from 0.77 cent to 1.37 cents. The rate of major interactions per prescription for each physician was 2.9% +/- 1%, with a range from 1.5% to 4.8%. Based on these baseline analyses, we selected a two-period parallel design for the evaluation, which maximized statistical power while minimizing sources of bias. PMID:8563376
NASA DOE POD NDE Capabilities Data Book
NASA Technical Reports Server (NTRS)
Generazio, Edward R.
2015-01-01
This data book contains the Directed Design of Experiments for Validating Probability of Detection (POD) Capability of NDE Systems (DOEPOD) analyses of the nondestructive inspection data presented in the NTIAC, Nondestructive Evaluation (NDE) Capabilities Data Book, 3rd ed., NTIAC DB-97-02. DOEPOD is designed as a decision support system to validate inspection system, personnel, and protocol demonstrating 0.90 POD with 95% confidence at critical flaw sizes, a90/95. The test methodology used in DOEPOD is based on the field of statistical sequential analysis founded by Abraham Wald. Sequential analysis is a method of statistical inference whose characteristic feature is that the number of observations required by the procedure is not determined in advance of the experiment. The decision to terminate the experiment depends, at each stage, on the results of the observations previously made. A merit of the sequential method, as applied to testing statistical hypotheses, is that test procedures can be constructed which require, on average, a substantially smaller number of observations than equally reliable test procedures based on a predetermined number of observations.
Drury, J P; Grether, G F; Garland, T; Morlon, H
2018-05-01
Much ecological and evolutionary theory predicts that interspecific interactions often drive phenotypic diversification and that species phenotypes in turn influence species interactions. Several phylogenetic comparative methods have been developed to assess the importance of such processes in nature; however, the statistical properties of these methods have gone largely untested. Focusing mainly on scenarios of competition between closely-related species, we assess the performance of available comparative approaches for analyzing the interplay between interspecific interactions and species phenotypes. We find that many currently used statistical methods often fail to detect the impact of interspecific interactions on trait evolution, that sister-taxa analyses are particularly unreliable in general, and that recently developed process-based models have more satisfactory statistical properties. Methods for detecting predictors of species interactions are generally more reliable than methods for detecting character displacement. In weighing the strengths and weaknesses of different approaches, we hope to provide a clear guide for empiricists testing hypotheses about the reciprocal effect of interspecific interactions and species phenotypes and to inspire further development of process-based models.
Gray, Alastair
2017-01-01
Increasing numbers of economic evaluations are conducted alongside randomised controlled trials. Such studies include factorial trials, which randomise patients to different levels of two or more factors and can therefore evaluate the effect of multiple treatments alone and in combination. Factorial trials can provide increased statistical power or assess interactions between treatments, but raise additional challenges for trial‐based economic evaluations: interactions may occur more commonly for costs and quality‐adjusted life‐years (QALYs) than for clinical endpoints; economic endpoints raise challenges for transformation and regression analysis; and both factors must be considered simultaneously to assess which treatment combination represents best value for money. This article aims to examine issues associated with factorial trials that include assessment of costs and/or cost‐effectiveness, describe the methods that can be used to analyse such studies and make recommendations for health economists, statisticians and trialists. A hypothetical worked example is used to illustrate the challenges and demonstrate ways in which economic evaluations of factorial trials may be conducted, and how these methods affect the results and conclusions. Ignoring interactions introduces bias that could result in adopting a treatment that does not make best use of healthcare resources, while considering all interactions avoids bias but reduces statistical power. We also introduce the concept of the opportunity cost of ignoring interactions as a measure of the bias introduced by not taking account of all interactions. We conclude by offering recommendations for planning, analysing and reporting economic evaluations based on factorial trials, taking increased analysis costs into account. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28470760
Wangia, Victoria; Shireman, Theresa I
2013-01-01
While understanding geography's role in healthcare has been an area of research for over 40 years, the application of geography-based analyses to prescription medication use is limited. The body of literature was reviewed to assess the current state of such studies to demonstrate the scale and scope of projects in order to highlight potential research opportunities. To review systematically how researchers have applied geography-based analyses to medication use data. Empiric, English language research articles were identified through PubMed and bibliographies. Original research articles were independently reviewed as to the medications or classes studied, data sources, measures of medication exposure, geographic units of analysis, geospatial measures, and statistical approaches. From 145 publications matching key search terms, forty publications met the inclusion criteria. Cardiovascular and psychotropic classes accounted for the largest proportion of studies. Prescription drug claims were the primary source, and medication exposure was frequently captured as period prevalence. Medication exposure was documented across a variety of geopolitical units such as countries, provinces, regions, states, and postal codes. Most results were descriptive and formal statistical modeling capitalizing on geospatial techniques was rare. Despite the extensive research on small area variation analysis in healthcare, there are a limited number of studies that have examined geographic variation in medication use. Clearly, there is opportunity to collaborate with geographers and GIS professionals to harness the power of GIS technologies and to strengthen future medication studies by applying more robust geospatial statistical methods. Copyright © 2013 Elsevier Inc. All rights reserved.
Papageorgiou, Spyridon N; Kloukos, Dimitrios; Petridis, Haralampos; Pandis, Nikolaos
2015-10-01
To assess the hypothesis that there is excessive reporting of statistically significant studies published in prosthodontic and implantology journals, which could indicate selective publication. The last 30 issues of 9 journals in prosthodontics and implant dentistry were hand-searched for articles with statistical analyses. The percentages of significant and non-significant results were tabulated by parameter of interest. Univariable/multivariable logistic regression analyses were applied to identify possible predictors of reporting statistically significance findings. The results of this study were compared with similar studies in dentistry with random-effects meta-analyses. From the 2323 included studies 71% of them reported statistically significant results, with the significant results ranging from 47% to 86%. Multivariable modeling identified that geographical area and involvement of statistician were predictors of statistically significant results. Compared to interventional studies, the odds that in vitro and observational studies would report statistically significant results was increased by 1.20 times (OR: 2.20, 95% CI: 1.66-2.92) and 0.35 times (OR: 1.35, 95% CI: 1.05-1.73), respectively. The probability of statistically significant results from randomized controlled trials was significantly lower compared to various study designs (difference: 30%, 95% CI: 11-49%). Likewise the probability of statistically significant results in prosthodontics and implant dentistry was lower compared to other dental specialties, but this result did not reach statistical significant (P>0.05). The majority of studies identified in the fields of prosthodontics and implant dentistry presented statistically significant results. The same trend existed in publications of other specialties in dentistry. Copyright © 2015 Elsevier Ltd. All rights reserved.
Content-Based VLE Designs Improve Learning Efficiency in Constructivist Statistics Education
Wessa, Patrick; De Rycker, Antoon; Holliday, Ian Edward
2011-01-01
Background We introduced a series of computer-supported workshops in our undergraduate statistics courses, in the hope that it would help students to gain a deeper understanding of statistical concepts. This raised questions about the appropriate design of the Virtual Learning Environment (VLE) in which such an approach had to be implemented. Therefore, we investigated two competing software design models for VLEs. In the first system, all learning features were a function of the classical VLE. The second system was designed from the perspective that learning features should be a function of the course's core content (statistical analyses), which required us to develop a specific–purpose Statistical Learning Environment (SLE) based on Reproducible Computing and newly developed Peer Review (PR) technology. Objectives The main research question is whether the second VLE design improved learning efficiency as compared to the standard type of VLE design that is commonly used in education. As a secondary objective we provide empirical evidence about the usefulness of PR as a constructivist learning activity which supports non-rote learning. Finally, this paper illustrates that it is possible to introduce a constructivist learning approach in large student populations, based on adequately designed educational technology, without subsuming educational content to technological convenience. Methods Both VLE systems were tested within a two-year quasi-experiment based on a Reliable Nonequivalent Group Design. This approach allowed us to draw valid conclusions about the treatment effect of the changed VLE design, even though the systems were implemented in successive years. The methodological aspects about the experiment's internal validity are explained extensively. Results The effect of the design change is shown to have substantially increased the efficiency of constructivist, computer-assisted learning activities for all cohorts of the student population under investigation. The findings demonstrate that a content–based design outperforms the traditional VLE–based design. PMID:21998652
Metaprop: a Stata command to perform meta-analysis of binomial data.
Nyaga, Victoria N; Arbyn, Marc; Aerts, Marc
2014-01-01
Meta-analyses have become an essential tool in synthesizing evidence on clinical and epidemiological questions derived from a multitude of similar studies assessing the particular issue. Appropriate and accessible statistical software is needed to produce the summary statistic of interest. Metaprop is a statistical program implemented to perform meta-analyses of proportions in Stata. It builds further on the existing Stata procedure metan which is typically used to pool effects (risk ratios, odds ratios, differences of risks or means) but which is also used to pool proportions. Metaprop implements procedures which are specific to binomial data and allows computation of exact binomial and score test-based confidence intervals. It provides appropriate methods for dealing with proportions close to or at the margins where the normal approximation procedures often break down, by use of the binomial distribution to model the within-study variability or by allowing Freeman-Tukey double arcsine transformation to stabilize the variances. Metaprop was applied on two published meta-analyses: 1) prevalence of HPV-infection in women with a Pap smear showing ASC-US; 2) cure rate after treatment for cervical precancer using cold coagulation. The first meta-analysis showed a pooled HPV-prevalence of 43% (95% CI: 38%-48%). In the second meta-analysis, the pooled percentage of cured women was 94% (95% CI: 86%-97%). By using metaprop, no studies with 0% or 100% proportions were excluded from the meta-analysis. Furthermore, study specific and pooled confidence intervals always were within admissible values, contrary to the original publication, where metan was used.
Rzucidlo, Justyna K; Roseman, Paige L; Laurienti, Paul J; Dagenbach, Dale
2013-01-01
Graph-theory based analyses of resting state functional Magnetic Resonance Imaging (fMRI) data have been used to map the network organization of the brain. While numerous analyses of resting state brain organization exist, many questions remain unexplored. The present study examines the stability of findings based on this approach over repeated resting state and working memory state sessions within the same individuals. This allows assessment of stability of network topology within the same state for both rest and working memory, and between rest and working memory as well. fMRI scans were performed on five participants while at rest and while performing the 2-back working memory task five times each, with task state alternating while they were in the scanner. Voxel-based whole brain network analyses were performed on the resulting data along with analyses of functional connectivity in regions associated with resting state and working memory. Network topology was fairly stable across repeated sessions of the same task, but varied significantly between rest and working memory. In the whole brain analysis, local efficiency, Eloc, differed significantly between rest and working memory. Analyses of network statistics for the precuneus and dorsolateral prefrontal cortex revealed significant differences in degree as a function of task state for both regions and in local efficiency for the precuneus. Conversely, no significant differences were observed across repeated sessions of the same state. These findings suggest that network topology is fairly stable within individuals across time for the same state, but also fluid between states. Whole brain voxel-based network analyses may prove to be a valuable tool for exploring how functional connectivity changes in response to task demands.
Eickhoff, Simon B; Nichols, Thomas E; Laird, Angela R; Hoffstaedter, Felix; Amunts, Katrin; Fox, Peter T; Bzdok, Danilo; Eickhoff, Claudia R
2016-08-15
Given the increasing number of neuroimaging publications, the automated knowledge extraction on brain-behavior associations by quantitative meta-analyses has become a highly important and rapidly growing field of research. Among several methods to perform coordinate-based neuroimaging meta-analyses, Activation Likelihood Estimation (ALE) has been widely adopted. In this paper, we addressed two pressing questions related to ALE meta-analysis: i) Which thresholding method is most appropriate to perform statistical inference? ii) Which sample size, i.e., number of experiments, is needed to perform robust meta-analyses? We provided quantitative answers to these questions by simulating more than 120,000 meta-analysis datasets using empirical parameters (i.e., number of subjects, number of reported foci, distribution of activation foci) derived from the BrainMap database. This allowed to characterize the behavior of ALE analyses, to derive first power estimates for neuroimaging meta-analyses, and to thus formulate recommendations for future ALE studies. We could show as a first consequence that cluster-level family-wise error (FWE) correction represents the most appropriate method for statistical inference, while voxel-level FWE correction is valid but more conservative. In contrast, uncorrected inference and false-discovery rate correction should be avoided. As a second consequence, researchers should aim to include at least 20 experiments into an ALE meta-analysis to achieve sufficient power for moderate effects. We would like to note, though, that these calculations and recommendations are specific to ALE and may not be extrapolated to other approaches for (neuroimaging) meta-analysis. Copyright © 2016 Elsevier Inc. All rights reserved.
Eickhoff, Simon B.; Nichols, Thomas E.; Laird, Angela R.; Hoffstaedter, Felix; Amunts, Katrin; Fox, Peter T.
2016-01-01
Given the increasing number of neuroimaging publications, the automated knowledge extraction on brain-behavior associations by quantitative meta-analyses has become a highly important and rapidly growing field of research. Among several methods to perform coordinate-based neuroimaging meta-analyses, Activation Likelihood Estimation (ALE) has been widely adopted. In this paper, we addressed two pressing questions related to ALE meta-analysis: i) Which thresholding method is most appropriate to perform statistical inference? ii) Which sample size, i.e., number of experiments, is needed to perform robust meta-analyses? We provided quantitative answers to these questions by simulating more than 120,000 meta-analysis datasets using empirical parameters (i.e., number of subjects, number of reported foci, distribution of activation foci) derived from the BrainMap database. This allowed to characterize the behavior of ALE analyses, to derive first power estimates for neuroimaging meta-analyses, and to thus formulate recommendations for future ALE studies. We could show as a first consequence that cluster-level family-wise error (FWE) correction represents the most appropriate method for statistical inference, while voxel-level FWE correction is valid but more conservative. In contrast, uncorrected inference and false-discovery rate correction should be avoided. As a second consequence, researchers should aim to include at least 20 experiments into an ALE meta-analysis to achieve sufficient power for moderate effects. We would like to note, though, that these calculations and recommendations are specific to ALE and may not be extrapolated to other approaches for (neuroimaging) meta-analysis. PMID:27179606
2016-09-01
Orthopedic Research Society Meeting Inventions, Patents and Licenses Provisional patent covering intra-articular delivery of amobarbital to prevent PTOA (A...completed and statistical analyses were finalized. Abstracts describing the results were submitted to the Orthopedic Research Society and Osteoarthritis...52242-1320 REPORT DATE: September 2016 TYPE OF REPORT: Annual PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick
Applications of spatial statistical network models to stream data
Isaak, Daniel J.; Peterson, Erin E.; Ver Hoef, Jay M.; Wenger, Seth J.; Falke, Jeffrey A.; Torgersen, Christian E.; Sowder, Colin; Steel, E. Ashley; Fortin, Marie-Josée; Jordan, Chris E.; Ruesch, Aaron S.; Som, Nicholas; Monestiez, Pascal
2014-01-01
Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for terrestrial applications and are not optimized for streams. A new class of spatial statistical model, based on valid covariance structures for stream networks, can be used with many common types of stream data (e.g., water quality attributes, habitat conditions, biological surveys) through application of appropriate distributions (e.g., Gaussian, binomial, Poisson). The spatial statistical network models account for spatial autocorrelation (i.e., nonindependence) among measurements, which allows their application to databases with clustered measurement locations. Large amounts of stream data exist in many areas where spatial statistical analyses could be used to develop novel insights, improve predictions at unsampled sites, and aid in the design of efficient monitoring strategies at relatively low cost. We review the topic of spatial autocorrelation and its effects on statistical inference, demonstrate the use of spatial statistics with stream datasets relevant to common research and management questions, and discuss additional applications and development potential for spatial statistics on stream networks. Free software for implementing the spatial statistical network models has been developed that enables custom applications with many stream databases.
Cyber Risk Management for Critical Infrastructure: A Risk Analysis Model and Three Case Studies.
Paté-Cornell, M-Elisabeth; Kuypers, Marshall; Smith, Matthew; Keller, Philip
2018-02-01
Managing cyber security in an organization involves allocating the protection budget across a spectrum of possible options. This requires assessing the benefits and the costs of these options. The risk analyses presented here are statistical when relevant data are available, and system-based for high-consequence events that have not happened yet. This article presents, first, a general probabilistic risk analysis framework for cyber security in an organization to be specified. It then describes three examples of forward-looking analyses motivated by recent cyber attacks. The first one is the statistical analysis of an actual database, extended at the upper end of the loss distribution by a Bayesian analysis of possible, high-consequence attack scenarios that may happen in the future. The second is a systems analysis of cyber risks for a smart, connected electric grid, showing that there is an optimal level of connectivity. The third is an analysis of sequential decisions to upgrade the software of an existing cyber security system or to adopt a new one to stay ahead of adversaries trying to find their way in. The results are distributions of losses to cyber attacks, with and without some considered countermeasures in support of risk management decisions based both on past data and anticipated incidents. © 2017 Society for Risk Analysis.
Finke, Erinn H; Hickerson, Benjamin; McLaughlin, Eileen
2015-04-01
The purpose of this study was to determine parental attitudes regarding engagement with video games by their children with autism spectrum disorder (ASD) and whether attitudes vary based on ASD symptom severity. Online survey methodology was used to gather information from parents of children with ASD between the ages of 8 and 12 years. The finalized data set included 152 cases. Descriptive statistics and frequency analyses were used to examine participant demographics and video game play. Descriptive and inferential statistics were used to evaluate questions on the theory of planned behavior. Regression analyses determined the predictive ability of the theory of planned behavior constructs, and t tests provided additional descriptive information about between-group differences. Children with ASD play video games. There are no significant differences in the time, intensity, or types of games played based on severity of ASD symptoms (mild vs. moderate). Parents of children with ASD had positive attitudes about video game play. Parents of children with ASD appear to support video game play. On average, parents indicated video game play was positive for their children with ASD, particularly if they believed the games were having a positive impact on their child's development.
NASA Technical Reports Server (NTRS)
Bhattacharyya, S.
1982-01-01
Four wrought alloys (A-286, IN 800H, N-155, and 19-9DL) and two cast alloys (CRM-6D and XF-818) were tested to determine their creep-rupture behavior. The wrought alloys were used in the form of sheets of 0.89 mm (0.035 in.) average thickness. The cast alloy specimens were investment cast and machined to 6.35 mm (0.250 in.) gage diameter. All specimens were tested to rupture in air at different times up to 3000 h over the temperature range of 650 C to 925 C (1200 F to 1700 F). Rupture life, minimum creep rate, and time to 1% creep strain were statistically analyzed as a function of stress at different temperatures. Temperature-compensated analysis was also performed to obtain the activation energies for rupture life, time to 1% creep strain, and the minimum creep rate. Microstructural and fracture analyses were also performed. Based on statistical analyses, estimates were made for stress levels at different temperatures to obtain 3500 h rupture life and time to 1% creep strain. Test results are to be compared with similar data being obtained for these alloys under 15 MPa (2175 psi) hydrogen.
Zhang, Han; Wheeler, William; Song, Lei; Yu, Kai
2017-07-07
As meta-analysis results published by consortia of genome-wide association studies (GWASs) become increasingly available, many association summary statistics-based multi-locus tests have been developed to jointly evaluate multiple single-nucleotide polymorphisms (SNPs) to reveal novel genetic architectures of various complex traits. The validity of these approaches relies on the accurate estimate of z-score correlations at considered SNPs, which in turn requires knowledge on the set of SNPs assessed by each study participating in the meta-analysis. However, this exact SNP coverage information is usually unavailable from the meta-analysis results published by GWAS consortia. In the absence of the coverage information, researchers typically estimate the z-score correlations by making oversimplified coverage assumptions. We show through real studies that such a practice can generate highly inflated type I errors, and we demonstrate the proper way to incorporate correct coverage information into multi-locus analyses. We advocate that consortia should make SNP coverage information available when posting their meta-analysis results, and that investigators who develop analytic tools for joint analyses based on summary data should pay attention to the variation in SNP coverage and adjust for it appropriately. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.
Effect of non-normality on test statistics for one-way independent groups designs.
Cribbie, Robert A; Fiksenbaum, Lisa; Keselman, H J; Wilcox, Rand R
2012-02-01
The data obtained from one-way independent groups designs is typically non-normal in form and rarely equally variable across treatment populations (i.e., population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e., the analysis of variance F test) typically provides invalid results (e.g., too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non-normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e., trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non-normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non-normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non-normal. © 2011 The British Psychological Society.
Global aesthetic surgery statistics: a closer look.
Heidekrueger, Paul I; Juran, S; Ehrl, D; Aung, T; Tanna, N; Broer, P Niclas
2017-08-01
Obtaining quality global statistics about surgical procedures remains an important yet challenging task. The International Society of Aesthetic Plastic Surgery (ISAPS) reports the total number of surgical and non-surgical procedures performed worldwide on a yearly basis. While providing valuable insight, ISAPS' statistics leave two important factors unaccounted for: (1) the underlying base population, and (2) the number of surgeons performing the procedures. Statistics of the published ISAPS' 'International Survey on Aesthetic/Cosmetic Surgery' were analysed by country, taking into account the underlying national base population according to the official United Nations population estimates. Further, the number of surgeons per country was used to calculate the number of surgeries performed per surgeon. In 2014, based on ISAPS statistics, national surgical procedures ranked in the following order: 1st USA, 2nd Brazil, 3rd South Korea, 4th Mexico, 5th Japan, 6th Germany, 7th Colombia, and 8th France. When considering the size of the underlying national populations, the demand for surgical procedures per 100,000 people changes the overall ranking substantially. It was also found that the rate of surgical procedures per surgeon shows great variation between the responding countries. While the US and Brazil are often quoted as the countries with the highest demand for plastic surgery, according to the presented analysis, other countries surpass these countries in surgical procedures per capita. While data acquisition and quality should be improved in the future, valuable insight regarding the demand for surgical procedures can be gained by taking specific demographic and geographic factors into consideration.
An Evaluation of the Impact of E-Learning Media Formats on Student Perception and Performance
NASA Astrophysics Data System (ADS)
Kurbel, Karl; Stankov, Ivo; Datsenka, Rastsislau
Factors influencing student evaluation of web-based courses are analyzed, based on student feedback from an online distance-learning graduate program. The impact of different media formats on the perception of the courses by the students as well as on their performance in these courses are examined. In particular, we studied conventional hypertext-based courses, video-based courses and audio-based courses, and tried to find out whether the media format has an effect on how students assess courses and how good or bad their grades are. Statistical analyses were performed to answer several research questions related to the topic and to properly evaluate the factors influencing student evaluation.
Childhood Leukemia and 50 Hz Magnetic Fields: Findings from the Italian SETIL Case-Control Study
Salvan, Alberto; Ranucci, Alessandra; Lagorio, Susanna; Magnani, Corrado
2015-01-01
We report on an Italian case-control study on childhood leukemia and exposure to extremely low frequency magnetic fields (ELF-MF). Eligible for inclusion were 745 leukemia cases, aged 0–10 years at diagnosis in 1998–2001, and 1475 sex- and age-matched population controls. Parents of 683 cases and 1044 controls (92% vs. 71%) were interviewed. ELF-MF measurements (24–48 h), in the child’s bedroom of the dwelling inhabited one year before diagnosis, were available for 412 cases and 587 controls included in the main conditional regression analyses. The magnetic field induction was 0.04 μT on average (geometric mean), with 0.6% of cases and 1.6% of controls exposed to >0.3 μT. The impact of changes in the statistical model, exposure metric, and data-set restriction criteria was explored via sensitivity analyses. No exposure-disease association was observed in analyses based on continuous exposure, while analyses based on categorical variables were characterized by incoherent exposure-outcome relationships. In conclusion, our results may be affected by several sources of bias and they are noninformative at exposure levels >0.3 μT. Nonetheless, the study may contribute to future meta- or pooled analyses. Furthermore, exposure levels among population controls are useful to estimate attributable risk. PMID:25689995
DMINDA: an integrated web server for DNA motif identification and analyses
Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying
2014-01-01
DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. PMID:24753419
DESIGNING ENVIRONMENTAL MONITORING DATABASES FOR STATISTIC ASSESSMENT
Databases designed for statistical analyses have characteristics that distinguish them from databases intended for general use. EMAP uses a probabilistic sampling design to collect data to produce statistical assessments of environmental conditions. In addition to supporting the ...
Comparing Visual and Statistical Analysis of Multiple Baseline Design Graphs.
Wolfe, Katie; Dickenson, Tammiee S; Miller, Bridget; McGrath, Kathleen V
2018-04-01
A growing number of statistical analyses are being developed for single-case research. One important factor in evaluating these methods is the extent to which each corresponds to visual analysis. Few studies have compared statistical and visual analysis, and information about more recently developed statistics is scarce. Therefore, our purpose was to evaluate the agreement between visual analysis and four statistical analyses: improvement rate difference (IRD); Tau-U; Hedges, Pustejovsky, Shadish (HPS) effect size; and between-case standardized mean difference (BC-SMD). Results indicate that IRD and BC-SMD had the strongest overall agreement with visual analysis. Although Tau-U had strong agreement with visual analysis on raw values, it had poorer agreement when those values were dichotomized to represent the presence or absence of a functional relation. Overall, visual analysis appeared to be more conservative than statistical analysis, but further research is needed to evaluate the nature of these disagreements.
Data harmonization and federated analysis of population-based studies: the BioSHaRE project
2013-01-01
Abstracts Background Individual-level data pooling of large population-based studies across research centres in international research projects faces many hurdles. The BioSHaRE (Biobank Standardisation and Harmonisation for Research Excellence in the European Union) project aims to address these issues by building a collaborative group of investigators and developing tools for data harmonization, database integration and federated data analyses. Methods Eight population-based studies in six European countries were recruited to participate in the BioSHaRE project. Through workshops, teleconferences and electronic communications, participating investigators identified a set of 96 variables targeted for harmonization to answer research questions of interest. Using each study’s questionnaires, standard operating procedures, and data dictionaries, harmonization potential was assessed. Whenever harmonization was deemed possible, processing algorithms were developed and implemented in an open-source software infrastructure to transform study-specific data into the target (i.e. harmonized) format. Harmonized datasets located on server in each research centres across Europe were interconnected through a federated database system to perform statistical analysis. Results Retrospective harmonization led to the generation of common format variables for 73% of matches considered (96 targeted variables across 8 studies). Authenticated investigators can now perform complex statistical analyses of harmonized datasets stored on distributed servers without actually sharing individual-level data using the DataSHIELD method. Conclusion New Internet-based networking technologies and database management systems are providing the means to support collaborative, multi-center research in an efficient and secure manner. The results from this pilot project show that, given a strong collaborative relationship between participating studies, it is possible to seamlessly co-analyse internationally harmonized research databases while allowing each study to retain full control over individual-level data. We encourage additional collaborative research networks in epidemiology, public health, and the social sciences to make use of the open source tools presented herein. PMID:24257327
Yu, Feiqiao Brian; Blainey, Paul C; Schulz, Frederik; Woyke, Tanja; Horowitz, Mark A; Quake, Stephen R
2017-07-05
Metagenomics and single-cell genomics have enabled genome discovery from unknown branches of life. However, extracting novel genomes from complex mixtures of metagenomic data can still be challenging and represents an ill-posed problem which is generally approached with ad hoc methods. Here we present a microfluidic-based mini-metagenomic method which offers a statistically rigorous approach to extract novel microbial genomes while preserving single-cell resolution. We used this approach to analyze two hot spring samples from Yellowstone National Park and extracted 29 new genomes, including three deeply branching lineages. The single-cell resolution enabled accurate quantification of genome function and abundance, down to 1% in relative abundance. Our analyses of genome level SNP distributions also revealed low to moderate environmental selection. The scale, resolution, and statistical power of microfluidic-based mini-metagenomics make it a powerful tool to dissect the genomic structure of microbial communities while effectively preserving the fundamental unit of biology, the single cell.
The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth
ERIC Educational Resources Information Center
Steyvers, Mark; Tenenbaum, Joshua B.
2005-01-01
We present statistical analyses of the large-scale structure of 3 types of semantic networks: word associations, WordNet, and Roget's Thesaurus. We show that they have a small-world structure, characterized by sparse connectivity, short average path lengths between words, and strong local clustering. In addition, the distributions of the number of…
Gingerich, Stephen B.
2005-01-01
Flow-duration statistics under natural (undiverted) and diverted flow conditions were estimated for gaged and ungaged sites on 21 streams in northeast Maui, Hawaii. The estimates were made using the optimal combination of continuous-record gaging-station data, low-flow measurements, and values determined from regression equations developed as part of this study. Estimated 50- and 95-percent flow duration statistics for streams are presented and the analyses done to develop and evaluate the methods used in estimating the statistics are described. Estimated streamflow statistics are presented for sites where various amounts of streamflow data are available as well as for locations where no data are available. Daily mean flows were used to determine flow-duration statistics for continuous-record stream-gaging stations in the study area following U.S. Geological Survey established standard methods. Duration discharges of 50- and 95-percent were determined from total flow and base flow for each continuous-record station. The index-station method was used to adjust all of the streamflow records to a common, long-term period. The gaging station on West Wailuaiki Stream (16518000) was chosen as the index station because of its record length (1914-2003) and favorable geographic location. Adjustments based on the index-station method resulted in decreases to the 50-percent duration total flow, 50-percent duration base flow, 95-percent duration total flow, and 95-percent duration base flow computed on the basis of short-term records that averaged 7, 3, 4, and 1 percent, respectively. For the drainage basin of each continuous-record gaged site and selected ungaged sites, morphometric, geologic, soil, and rainfall characteristics were quantified using Geographic Information System techniques. Regression equations relating the non-diverted streamflow statistics to basin characteristics of the gaged basins were developed using ordinary-least-squares regression analyses. Rainfall rate, maximum basin elevation, and the elongation ratio of the basin were the basin characteristics used in the final regression equations for 50-percent duration total flow and base flow. Rainfall rate and maximum basin elevation were used in the final regression equations for the 95-percent duration total flow and base flow. The relative errors between observed and estimated flows ranged from 10 to 20 percent for the 50-percent duration total flow and base flow, and from 29 to 56 percent for the 95-percent duration total flow and base flow. The regression equations developed for this study were used to determine the 50-percent duration total flow, 50-percent duration base flow, 95-percent duration total flow, and 95-percent duration base flow at selected ungaged diverted and undiverted sites. Estimated streamflow, prediction intervals, and standard errors were determined for 48 ungaged sites in the study area and for three gaged sites west of the study area. Relative errors were determined for sites where measured values of 95-percent duration discharge of total flow were available. East of Keanae Valley, the 95-percent duration discharge equation generally underestimated flow, and within and west of Keanae Valley, the equation generally overestimated flow. Reduction in 50- and 95-percent flow-duration values in stream reaches affected by diversions throughout the study area average 58 to 60 percent.
Villa-Uriol, M. C.; Berti, G.; Hose, D. R.; Marzo, A.; Chiarini, A.; Penrose, J.; Pozo, J.; Schmidt, J. G.; Singh, P.; Lycett, R.; Larrabide, I.; Frangi, A. F.
2011-01-01
Cerebral aneurysms are a multi-factorial disease with severe consequences. A core part of the European project @neurIST was the physical characterization of aneurysms to find candidate risk factors associated with aneurysm rupture. The project investigated measures based on morphological, haemodynamic and aneurysm wall structure analyses for more than 300 cases of ruptured and unruptured aneurysms, extracting descriptors suitable for statistical studies. This paper deals with the unique challenges associated with this task, and the implemented solutions. The consistency of results required by the subsequent statistical analyses, given the heterogeneous image data sources and multiple human operators, was met by a highly automated toolchain combined with training. A testimonial of the successful automation is the positive evaluation of the toolchain by over 260 clinicians during various hands-on workshops. The specification of the analyses required thorough investigations of modelling and processing choices, discussed in a detailed analysis protocol. Finally, an abstract data model governing the management of the simulation-related data provides a framework for data provenance and supports future use of data and toolchain. This is achieved by enabling the easy modification of the modelling approaches and solution details through abstract problem descriptions, removing the need of repetition of manual processing work. PMID:22670202
Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra
2015-01-01
Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level. PMID:25830807
Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra
2015-01-01
Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.
Differences in Performance Among Test Statistics for Assessing Phylogenomic Model Adequacy.
Duchêne, David A; Duchêne, Sebastian; Ho, Simon Y W
2018-05-18
Statistical phylogenetic analyses of genomic data depend on models of nucleotide or amino acid substitution. The adequacy of these substitution models can be assessed using a number of test statistics, allowing the model to be rejected when it is found to provide a poor description of the evolutionary process. A potentially valuable use of model-adequacy test statistics is to identify when data sets are likely to produce unreliable phylogenetic estimates, but their differences in performance are rarely explored. We performed a comprehensive simulation study to identify test statistics that are sensitive to some of the most commonly cited sources of phylogenetic estimation error. Our results show that, for many test statistics, traditional thresholds for assessing model adequacy can fail to reject the model when the phylogenetic inferences are inaccurate and imprecise. This is particularly problematic when analysing loci that have few variable informative sites. We propose new thresholds for assessing substitution model adequacy and demonstrate their effectiveness in analyses of three phylogenomic data sets. These thresholds lead to frequent rejection of the model for loci that yield topological inferences that are imprecise and are likely to be inaccurate. We also propose the use of a summary statistic that provides a practical assessment of overall model adequacy. Our approach offers a promising means of enhancing model choice in genome-scale data sets, potentially leading to improvements in the reliability of phylogenomic inference.
NASA Astrophysics Data System (ADS)
Kaleva Oikarinen, Juho; Järvelä, Sanna; Kaasila, Raimo
2014-04-01
This design-based research project focuses on documenting statistical learning among 16-17-year-old Finnish upper secondary school students (N = 78) in a computer-supported collaborative learning (CSCL) environment. One novel value of this study is in reporting the shift from teacher-led mathematical teaching to autonomous small-group learning in statistics. The main aim of this study is to examine how student collaboration occurs in learning statistics in a CSCL environment. The data include material from videotaped classroom observations and the researcher's notes. In this paper, the inter-subjective phenomena of students' interactions in a CSCL environment are analysed by using a contact summary sheet (CSS). The development of the multi-dimensional coding procedure of the CSS instrument is presented. Aptly selected video episodes were transcribed and coded in terms of conversational acts, which were divided into non-task-related and task-related categories to depict students' levels of collaboration. The results show that collaborative learning (CL) can facilitate cohesion and responsibility and reduce students' feelings of detachment in our classless, periodic school system. The interactive .pdf material and collaboration in small groups enable statistical learning. It is concluded that CSCL is one possible method of promoting statistical teaching. CL using interactive materials seems to foster and facilitate statistical learning processes.
Sul, Jae Hoon; Bilow, Michael; Yang, Wen-Yun; Kostem, Emrah; Furlotte, Nick; He, Dan; Eskin, Eleazar
2016-03-01
Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants.
Frey, N; Hügle, T; Jick, S S; Meier, C R; Spoendlin, J
2016-09-01
Emerging evidence suggests that diabetes may be a risk factor for osteoarthritis (OA). However, previous results on the association between diabetes and all OA were conflicting. We aimed to comprehensively analyse the association between type II diabetes mellitus (T2DM) and osteoarthritis of the hand (HOA) specifically. We conducted a matched (1:1) case-control study using the UK-based Clinical Practice Research Datalink (CPRD) of cases aged 30-90 years with an incident diagnosis of HOA from 1995 to 2013. In multivariable conditional logistic regression analyses, we calculated odds ratios (OR) for incident HOA in patients with T2DM, categorized by T2DM severity (HbA1C), duration, and pharmacological treatment. We further performed sensitivity analyses in patients with and without other metabolic diseases (hypertension (HT), hyperlipidaemia (HL), obesity). Among 13,500 cases and 13,500 controls, we observed no statistically significant association between T2DM and HOA (OR 0.95, 95% confidence interval (CI) 0.87-1.04), regardless of T2DM severity, duration, or pharmacological treatment. Having HT did not change the OR. Although we observed slightly increased ORs in overweight T2DM patients with co-occurring HL with or without coexisting HT, none of these ORs were statistically significant. Our results provide evidence that T2DM is not an independent risk factor for HOA. Concurrence of T2DM with HT, HL, and/or obesity did not change this association significantly. Copyright © 2016 Osteoarthritis Research Society International. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Kessel, Kerstin A.; Jäger, Andreas; Bohn, Christian; Habermehl, Daniel; Zhang, Lanlan; Engelmann, Uwe; Bougatf, Nina; Bendl, Rolf; Debus, Jürgen; Combs, Stephanie E.
2013-03-01
To date, conducting retrospective clinical analyses is rather difficult and time consuming. Especially in radiation oncology, handling voluminous datasets from various information systems and different documentation styles efficiently is crucial for patient care and research. With the example of patients with pancreatic cancer treated with radio-chemotherapy, we performed a therapy evaluation by using analysis tools connected with a documentation system. A total number of 783 patients have been documented into a professional, web-based documentation system. Information about radiation therapy, diagnostic images and dose distributions have been imported. For patients with disease progression after neoadjuvant chemoradiation, we designed and established an analysis workflow. After automatic registration of the radiation plans with the follow-up images, the recurrence volumes are segmented manually. Based on these volumes the DVH (dose-volume histogram) statistic is calculated, followed by the determination of the dose applied to the region of recurrence. All results are stored in the database and included in statistical calculations. The main goal of using an automatic evaluation system is to reduce time and effort conducting clinical analyses, especially with large patient groups. We showed a first approach and use of some existing tools, however manual interaction is still necessary. Further steps need to be taken to enhance automation. Already, it has become apparent that the benefits of digital data management and analysis lie in the central storage of data and reusability of the results. Therefore, we intend to adapt the evaluation system to other types of tumors in radiation oncology.
Park, Ji Eun; Han, Kyunghwa; Sung, Yu Sub; Chung, Mi Sun; Koo, Hyun Jung; Yoon, Hee Mang; Choi, Young Jun; Lee, Seung Soo; Kim, Kyung Won; Shin, Youngbin; An, Suah; Cho, Hyo-Min
2017-01-01
Objective To evaluate the frequency and adequacy of statistical analyses in a general radiology journal when reporting a reliability analysis for a diagnostic test. Materials and Methods Sixty-three studies of diagnostic test accuracy (DTA) and 36 studies reporting reliability analyses published in the Korean Journal of Radiology between 2012 and 2016 were analyzed. Studies were judged using the methodological guidelines of the Radiological Society of North America-Quantitative Imaging Biomarkers Alliance (RSNA-QIBA), and COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative. DTA studies were evaluated by nine editorial board members of the journal. Reliability studies were evaluated by study reviewers experienced with reliability analysis. Results Thirty-one (49.2%) of the 63 DTA studies did not include a reliability analysis when deemed necessary. Among the 36 reliability studies, proper statistical methods were used in all (5/5) studies dealing with dichotomous/nominal data, 46.7% (7/15) of studies dealing with ordinal data, and 95.2% (20/21) of studies dealing with continuous data. Statistical methods were described in sufficient detail regarding weighted kappa in 28.6% (2/7) of studies and regarding the model and assumptions of intraclass correlation coefficient in 35.3% (6/17) and 29.4% (5/17) of studies, respectively. Reliability parameters were used as if they were agreement parameters in 23.1% (3/13) of studies. Reproducibility and repeatability were used incorrectly in 20% (3/15) of studies. Conclusion Greater attention to the importance of reporting reliability, thorough description of the related statistical methods, efforts not to neglect agreement parameters, and better use of relevant terminology is necessary. PMID:29089821
Ceppi, Marcello; Gallo, Fabio; Bonassi, Stefano
2011-01-01
The most common study design performed in population studies based on the micronucleus (MN) assay, is the cross-sectional study, which is largely performed to evaluate the DNA damaging effects of exposure to genotoxic agents in the workplace, in the environment, as well as from diet or lifestyle factors. Sample size is still a critical issue in the design of MN studies since most recent studies considering gene-environment interaction, often require a sample size of several hundred subjects, which is in many cases difficult to achieve. The control of confounding is another major threat to the validity of causal inference. The most popular confounders considered in population studies using MN are age, gender and smoking habit. Extensive attention is given to the assessment of effect modification, given the increasing inclusion of biomarkers of genetic susceptibility in the study design. Selected issues concerning the statistical treatment of data have been addressed in this mini-review, starting from data description, which is a critical step of statistical analysis, since it allows to detect possible errors in the dataset to be analysed and to check the validity of assumptions required for more complex analyses. Basic issues dealing with statistical analysis of biomarkers are extensively evaluated, including methods to explore the dose-response relationship among two continuous variables and inferential analysis. A critical approach to the use of parametric and non-parametric methods is presented, before addressing the issue of most suitable multivariate models to fit MN data. In the last decade, the quality of statistical analysis of MN data has certainly evolved, although even nowadays only a small number of studies apply the Poisson model, which is the most suitable method for the analysis of MN data.
NASA Astrophysics Data System (ADS)
González Gómez, Dulce I.; Moreno Barbosa, E.; Martínez Hernández, Mario Iván; Ramos Méndez, José; Hidalgo Tobón, Silvia; Dies Suarez, Pilar; Barragán Pérez, Eduardo; De Celis Alonso, Benito
2014-11-01
The main goal of this project was to create a computer algorithm based on wavelet analysis of region of homogeneity images obtained during resting state studies. Ideally it would automatically diagnose ADHD. Because the cerebellum is an area known to be affected by ADHD, this study specifically analysed this region. Male right handed volunteers (infants with ages between 7 and 11 years old) were studied and compared with age matched controls. Statistical differences between the values of the absolute integrated wavelet spectrum were found and showed significant differences (p<0.0015) between groups. This difference might help in the future to distinguish healthy from ADHD patients and therefore diagnose ADHD. Even if results were statistically significant, the small size of the sample limits the applicability of this methods as it is presented here, and further work with larger samples and using freely available datasets must be done.
Gel, Bernat; Díez-Villanueva, Anna; Serra, Eduard; Buschbeck, Marcus; Peinado, Miguel A; Malinverni, Roberto
2016-01-15
Statistically assessing the relation between a set of genomic regions and other genomic features is a common challenging task in genomic and epigenomic analyses. Randomization based approaches implicitly take into account the complexity of the genome without the need of assuming an underlying statistical model. regioneR is an R package that implements a permutation test framework specifically designed to work with genomic regions. In addition to the predefined randomization and evaluation strategies, regioneR is fully customizable allowing the use of custom strategies to adapt it to specific questions. Finally, it also implements a novel function to evaluate the local specificity of the detected association. regioneR is an R package released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor (http://www.bioconductor.org/packages/regioneR). rmalinverni@carrerasresearch.org. © The Author 2015. Published by Oxford University Press.
Wiuf, Carsten; Schaumburg-Müller Pallesen, Jonatan; Foldager, Leslie; Grove, Jakob
2016-08-01
In many areas of science it is custom to perform many, potentially millions, of tests simultaneously. To gain statistical power it is common to group tests based on a priori criteria such as predefined regions or by sliding windows. However, it is not straightforward to choose grouping criteria and the results might depend on the chosen criteria. Methods that summarize, or aggregate, test statistics or p-values, without relying on a priori criteria, are therefore desirable. We present a simple method to aggregate a sequence of stochastic variables, such as test statistics or p-values, into fewer variables without assuming a priori defined groups. We provide different ways to evaluate the significance of the aggregated variables based on theoretical considerations and resampling techniques, and show that under certain assumptions the FWER is controlled in the strong sense. Validity of the method was demonstrated using simulations and real data analyses. Our method may be a useful supplement to standard procedures relying on evaluation of test statistics individually. Moreover, by being agnostic and not relying on predefined selected regions, it might be a practical alternative to conventionally used methods of aggregation of p-values over regions. The method is implemented in Python and freely available online (through GitHub, see the Supplementary information).
Statistical Analyses of Raw Material Data for MTM45-1/CF7442A-36% RW: CMH Cure Cycle
NASA Technical Reports Server (NTRS)
Coroneos, Rula; Pai, Shantaram, S.; Murthy, Pappu
2013-01-01
This report describes statistical characterization of physical properties of the composite material system MTM45-1/CF7442A, which has been tested and is currently being considered for use on spacecraft structures. This composite system is made of 6K plain weave graphite fibers in a highly toughened resin system. This report summarizes the distribution types and statistical details of the tests and the conditions for the experimental data generated. These distributions will be used in multivariate regression analyses to help determine material and design allowables for similar material systems and to establish a procedure for other material systems. Additionally, these distributions will be used in future probabilistic analyses of spacecraft structures. The specific properties that are characterized are the ultimate strength, modulus, and Poisson??s ratio by using a commercially available statistical package. Results are displayed using graphical and semigraphical methods and are included in the accompanying appendixes.
Wu, S.-S.; Wang, L.; Qiu, X.
2008-01-01
This article presents a deterministic model for sub-block-level population estimation based on the total building volumes derived from geographic information system (GIS) building data and three census block-level housing statistics. To assess the model, we generated artificial blocks by aggregating census block areas and calculating the respective housing statistics. We then applied the model to estimate populations for sub-artificial-block areas and assessed the estimates with census populations of the areas. Our analyses indicate that the average percent error of population estimation for sub-artificial-block areas is comparable to those for sub-census-block areas of the same size relative to associated blocks. The smaller the sub-block-level areas, the higher the population estimation errors. For example, the average percent error for residential areas is approximately 0.11 percent for 100 percent block areas and 35 percent for 5 percent block areas.
Gribova, N P; Iudel'son, Ia B; Golubev, V L; Abramenkova, I V
2003-01-01
To carry out a differential diagnosis of two facial dyskinesia (FD) models--facial hemispasm (FH) and facial paraspasm (FP), a combined program of electroneuromyographic (ENMG) examination has been created, using statistical analyses, including that for objects identification based on hybrid neural network with the application of adaptive fuzzy logic method and standard statistics programs (Wilcoxon, Student statistics). In FH, a lesion of peripheral facial neuromotor apparatus with augmentation of functions of inter-neurons in segmental and upper segmental stem levels predominated. In FP, primary afferent strengthening in mimic muscles was accompanied by increased motor neurons activity and reciprocal augmentation of inter-neurons, inhibiting motor portion of V pair. Mathematical algorithm for ENMG results recognition worked out in the study provides a precise differentiation of two FD models and opens possibilities for differential diagnosis of other facial motor disorders.
Shrout, Patrick E; Rodgers, Joseph L
2018-01-04
Psychology advances knowledge by testing statistical hypotheses using empirical observations and data. The expectation is that most statistically significant findings can be replicated in new data and in new laboratories, but in practice many findings have replicated less often than expected, leading to claims of a replication crisis. We review recent methodological literature on questionable research practices, meta-analysis, and power analysis to explain the apparently high rates of failure to replicate. Psychologists can improve research practices to advance knowledge in ways that improve replicability. We recommend that researchers adopt open science conventions of preregi-stration and full disclosure and that replication efforts be based on multiple studies rather than on a single replication attempt. We call for more sophisticated power analyses, careful consideration of the various influences on effect sizes, and more complete disclosure of nonsignificant as well as statistically significant findings.
Statistical analysis of QC data and estimation of fuel rod behaviour
NASA Astrophysics Data System (ADS)
Heins, L.; Groβ, H.; Nissen, K.; Wunderlich, F.
1991-02-01
The behaviour of fuel rods while in reactor is influenced by many parameters. As far as fabrication is concerned, fuel pellet diameter and density, and inner cladding diameter are important examples. Statistical analyses of quality control data show a scatter of these parameters within the specified tolerances. At present it is common practice to use a combination of superimposed unfavorable tolerance limits (worst case dataset) in fuel rod design calculations. Distributions are not considered. The results obtained in this way are very conservative but the degree of conservatism is difficult to quantify. Probabilistic calculations based on distributions allow the replacement of the worst case dataset by a dataset leading to results with known, defined conservatism. This is achieved by response surface methods and Monte Carlo calculations on the basis of statistical distributions of the important input parameters. The procedure is illustrated by means of two examples.
OTD Observations of Continental US Ground and Cloud Flashes
NASA Technical Reports Server (NTRS)
Koshak, William
2007-01-01
Lightning optical flash parameters (e.g., radiance, area, duration, number of optical groups, and number of optical events) derived from almost five years of Optical Transient Detector (OTD) data are analyzed. Hundreds of thousands of OTD flashes occurring over the continental US are categorized according to flash type (ground or cloud flash) using US National Lightning Detection Network TM (NLDN) data. The statistics of the optical characteristics of the ground and cloud flashes are inter-compared on an overall basis, and as a function of ground flash polarity. A standard two-distribution hypothesis test is used to inter-compare the population means of a given lightning parameter for the two flash types. Given the differences in the statistics of the optical characteristics, it is suggested that statistical analyses (e.g., Bayesian Inference) of the space-based optical measurements might make it possible to successfully discriminate ground and cloud flashes a reasonable percentage of the time.
Zheng, Jie; Harris, Marcelline R; Masci, Anna Maria; Lin, Yu; Hero, Alfred; Smith, Barry; He, Yongqun
2016-09-14
Statistics play a critical role in biological and clinical research. However, most reports of scientific results in the published literature make it difficult for the reader to reproduce the statistical analyses performed in achieving those results because they provide inadequate documentation of the statistical tests and algorithms applied. The Ontology of Biological and Clinical Statistics (OBCS) is put forward here as a step towards solving this problem. The terms in OBCS including 'data collection', 'data transformation in statistics', 'data visualization', 'statistical data analysis', and 'drawing a conclusion based on data', cover the major types of statistical processes used in basic biological research and clinical outcome studies. OBCS is aligned with the Basic Formal Ontology (BFO) and extends the Ontology of Biomedical Investigations (OBI), an OBO (Open Biological and Biomedical Ontologies) Foundry ontology supported by over 20 research communities. Currently, OBCS comprehends 878 terms, representing 20 BFO classes, 403 OBI classes, 229 OBCS specific classes, and 122 classes imported from ten other OBO ontologies. We discuss two examples illustrating how the ontology is being applied. In the first (biological) use case, we describe how OBCS was applied to represent the high throughput microarray data analysis of immunological transcriptional profiles in human subjects vaccinated with an influenza vaccine. In the second (clinical outcomes) use case, we applied OBCS to represent the processing of electronic health care data to determine the associations between hospital staffing levels and patient mortality. Our case studies were designed to show how OBCS can be used for the consistent representation of statistical analysis pipelines under two different research paradigms. Other ongoing projects using OBCS for statistical data processing are also discussed. The OBCS source code and documentation are available at: https://github.com/obcs/obcs . The Ontology of Biological and Clinical Statistics (OBCS) is a community-based open source ontology in the domain of biological and clinical statistics. OBCS is a timely ontology that represents statistics-related terms and their relations in a rigorous fashion, facilitates standard data analysis and integration, and supports reproducible biological and clinical research.
Reporting and methodological quality of meta-analyses in urological literature.
Xia, Leilei; Xu, Jing; Guzzo, Thomas J
2017-01-01
To assess the overall quality of published urological meta-analyses and identify predictive factors for high quality. We systematically searched PubMed to identify meta-analyses published from January 1st, 2011 to December 31st, 2015 in 10 predetermined major paper-based urology journals. The characteristics of the included meta-analyses were collected, and their reporting and methodological qualities were assessed by the PRISMA checklist (27 items) and AMSTAR tool (11 items), respectively. Descriptive statistics were used for individual items as a measure of overall compliance, and PRISMA and AMSTAR scores were calculated as the sum of adequately reported domains. Logistic regression was used to identify predictive factors for high qualities. A total of 183 meta-analyses were included. The mean PRISMA and AMSTAR scores were 22.74 ± 2.04 and 7.57 ± 1.41, respectively. PRISMA item 5, protocol and registration, items 15 and 22, risk of bias across studies, items 16 and 23, additional analysis had less than 50% adherence. AMSTAR item 1, " a priori " design, item 5, list of studies and item 10, publication bias had less than 50% adherence. Logistic regression analyses showed that funding support and " a priori " design were associated with superior reporting quality, following PRISMA guideline and " a priori " design were associated with superior methodological quality. Reporting and methodological qualities of recently published meta-analyses in major paper-based urology journals are generally good. Further improvement could potentially be achieved by strictly adhering to PRISMA guideline and having " a priori " protocol.
Nour-Eldein, Hebatallah
2016-01-01
With limited statistical knowledge of most physicians it is not uncommon to find statistical errors in research articles. To determine the statistical methods and to assess the statistical errors in family medicine (FM) research articles that were published between 2010 and 2014. This was a cross-sectional study. All 66 FM research articles that were published over 5 years by FM authors with affiliation to Suez Canal University were screened by the researcher between May and August 2015. Types and frequencies of statistical methods were reviewed in all 66 FM articles. All 60 articles with identified inferential statistics were examined for statistical errors and deficiencies. A comprehensive 58-item checklist based on statistical guidelines was used to evaluate the statistical quality of FM articles. Inferential methods were recorded in 62/66 (93.9%) of FM articles. Advanced analyses were used in 29/66 (43.9%). Contingency tables 38/66 (57.6%), regression (logistic, linear) 26/66 (39.4%), and t-test 17/66 (25.8%) were the most commonly used inferential tests. Within 60 FM articles with identified inferential statistics, no prior sample size 19/60 (31.7%), application of wrong statistical tests 17/60 (28.3%), incomplete documentation of statistics 59/60 (98.3%), reporting P value without test statistics 32/60 (53.3%), no reporting confidence interval with effect size measures 12/60 (20.0%), use of mean (standard deviation) to describe ordinal/nonnormal data 8/60 (13.3%), and errors related to interpretation were mainly for conclusions without support by the study data 5/60 (8.3%). Inferential statistics were used in the majority of FM articles. Data analysis and reporting statistics are areas for improvement in FM research articles.
Nour-Eldein, Hebatallah
2016-01-01
Background: With limited statistical knowledge of most physicians it is not uncommon to find statistical errors in research articles. Objectives: To determine the statistical methods and to assess the statistical errors in family medicine (FM) research articles that were published between 2010 and 2014. Methods: This was a cross-sectional study. All 66 FM research articles that were published over 5 years by FM authors with affiliation to Suez Canal University were screened by the researcher between May and August 2015. Types and frequencies of statistical methods were reviewed in all 66 FM articles. All 60 articles with identified inferential statistics were examined for statistical errors and deficiencies. A comprehensive 58-item checklist based on statistical guidelines was used to evaluate the statistical quality of FM articles. Results: Inferential methods were recorded in 62/66 (93.9%) of FM articles. Advanced analyses were used in 29/66 (43.9%). Contingency tables 38/66 (57.6%), regression (logistic, linear) 26/66 (39.4%), and t-test 17/66 (25.8%) were the most commonly used inferential tests. Within 60 FM articles with identified inferential statistics, no prior sample size 19/60 (31.7%), application of wrong statistical tests 17/60 (28.3%), incomplete documentation of statistics 59/60 (98.3%), reporting P value without test statistics 32/60 (53.3%), no reporting confidence interval with effect size measures 12/60 (20.0%), use of mean (standard deviation) to describe ordinal/nonnormal data 8/60 (13.3%), and errors related to interpretation were mainly for conclusions without support by the study data 5/60 (8.3%). Conclusion: Inferential statistics were used in the majority of FM articles. Data analysis and reporting statistics are areas for improvement in FM research articles. PMID:27453839
Lightfoot, Emma; O’Connell, Tamsin C.
2016-01-01
Oxygen isotope analysis of archaeological skeletal remains is an increasingly popular tool to study past human migrations. It is based on the assumption that human body chemistry preserves the δ18O of precipitation in such a way as to be a useful technique for identifying migrants and, potentially, their homelands. In this study, the first such global survey, we draw on published human tooth enamel and bone bioapatite data to explore the validity of using oxygen isotope analyses to identify migrants in the archaeological record. We use human δ18O results to show that there are large variations in human oxygen isotope values within a population sample. This may relate to physiological factors influencing the preservation of the primary isotope signal, or due to human activities (such as brewing, boiling, stewing, differential access to water sources and so on) causing variation in ingested water and food isotope values. We compare the number of outliers identified using various statistical methods. We determine that the most appropriate method for identifying migrants is dependent on the data but is likely to be the IQR or median absolute deviation from the median under most archaeological circumstances. Finally, through a spatial assessment of the dataset, we show that the degree of overlap in human isotope values from different locations across Europe is such that identifying individuals’ homelands on the basis of oxygen isotope analysis alone is not possible for the regions analysed to date. Oxygen isotope analysis is a valid method for identifying first-generation migrants from an archaeological site when used appropriately, however it is difficult to identify migrants using statistical methods for a sample size of less than c. 25 individuals. In the absence of local previous analyses, each sample should be treated as an individual dataset and statistical techniques can be used to identify migrants, but in most cases pinpointing a specific homeland should not be attempted. PMID:27124001
NASA Astrophysics Data System (ADS)
Alshipli, Marwan; Kabir, Norlaili A.
2017-05-01
Computed tomography (CT) employs X-ray radiation to create cross-sectional images. Dual-energy CT acquisition includes the images acquired from an alternating voltage of X-ray tube: a low- and a high-peak kilovoltage. The main objective of this study is to determine the best slice thickness that reduces image noise with adequate diagnostic information using dual energy CT head protocol. The study used the ImageJ software and statistical analyses to aid the medical image analysis of dual-energy CT. In this study, ImageJ software and F-test were utilised as the combination methods to analyse DICOM CT images. They were used to investigate the effect of slice thickness on noise and visibility in dual-energy CT head protocol images. Catphan-600 phantom was scanned at different slice thickness values;.6, 1, 2, 3, 4, 5 and 6 mm, then quantitative analyses were carried out. The DECT operated in helical mode with another fixed scan parameter values. Based on F-test statistical analyses, image noise at 0.6, 1, and 2 mm were significantly different compared to the other images acquired at slice thickness of 3, 4, 5, and 6 mm. However, no significant differences of image noise were observed at 3, 4, 5, and 6 mm. As a result, better diagnostic image value, image visibility, and lower image noise in dual-energy CT head protocol was observed at a slice thickness of 3 mm.
GenoMatrix: A Software Package for Pedigree-Based and Genomic Prediction Analyses on Complex Traits.
Nazarian, Alireza; Gezan, Salvador Alejandro
2016-07-01
Genomic and pedigree-based best linear unbiased prediction methodologies (G-BLUP and P-BLUP) have proven themselves efficient for partitioning the phenotypic variance of complex traits into its components, estimating the individuals' genetic merits, and predicting unobserved (or yet-to-be observed) phenotypes in many species and fields of study. The GenoMatrix software, presented here, is a user-friendly package to facilitate the process of using genome-wide marker data and parentage information for G-BLUP and P-BLUP analyses on complex traits. It provides users with a collection of applications which help them on a set of tasks from performing quality control on data to constructing and manipulating the genomic and pedigree-based relationship matrices and obtaining their inverses. Such matrices will be then used in downstream analyses by other statistical packages. The package also enables users to obtain predicted values for unobserved individuals based on the genetic values of observed related individuals. GenoMatrix is available to the research community as a Windows 64bit executable and can be downloaded free of charge at: http://compbio.ufl.edu/software/genomatrix/. © The American Genetic Association. 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
2014-01-01
Objective To offer a practical demonstration of receiver operating characteristic (ROC) analyses, diagnostic efficiency statistics, and their application to clinical decision making using a popular parent checklist to assess for potential mood disorder. Method Secondary analyses of data from 589 families seeking outpatient mental health services, completing the Child Behavior Checklist and semi-structured diagnostic interviews. Results Internalizing Problems raw scores discriminated mood disorders significantly better than did age- and gender-normed T scores, or an Affective Problems score. Internalizing scores <8 had a diagnostic likelihood ratio <0.3, and scores >30 had a diagnostic likelihood ratio of 7.4. Conclusions This study illustrates a series of steps in defining a clinical problem, operationalizing it, selecting a valid study design, and using ROC analyses to generate statistics that support clinical decisions. The ROC framework offers important advantages for clinical interpretation. Appendices include sample scripts using SPSS and R to check assumptions and conduct ROC analyses. PMID:23965298
Youngstrom, Eric A
2014-03-01
To offer a practical demonstration of receiver operating characteristic (ROC) analyses, diagnostic efficiency statistics, and their application to clinical decision making using a popular parent checklist to assess for potential mood disorder. Secondary analyses of data from 589 families seeking outpatient mental health services, completing the Child Behavior Checklist and semi-structured diagnostic interviews. Internalizing Problems raw scores discriminated mood disorders significantly better than did age- and gender-normed T scores, or an Affective Problems score. Internalizing scores <8 had a diagnostic likelihood ratio <0.3, and scores >30 had a diagnostic likelihood ratio of 7.4. This study illustrates a series of steps in defining a clinical problem, operationalizing it, selecting a valid study design, and using ROC analyses to generate statistics that support clinical decisions. The ROC framework offers important advantages for clinical interpretation. Appendices include sample scripts using SPSS and R to check assumptions and conduct ROC analyses.
Winer, E Samuel; Cervone, Daniel; Bryant, Jessica; McKinney, Cliff; Liu, Richard T; Nadorff, Michael R
2016-09-01
A popular way to attempt to discern causality in clinical psychology is through mediation analysis. However, mediation analysis is sometimes applied to research questions in clinical psychology when inferring causality is impossible. This practice may soon increase with new, readily available, and easy-to-use statistical advances. Thus, we here provide a heuristic to remind clinical psychological scientists of the assumptions of mediation analyses. We describe recent statistical advances and unpack assumptions of causality in mediation, underscoring the importance of time in understanding mediational hypotheses and analyses in clinical psychology. Example analyses demonstrate that statistical mediation can occur despite theoretical mediation being improbable. We propose a delineation of mediational effects derived from cross-sectional designs into the terms temporal and atemporal associations to emphasize time in conceptualizing process models in clinical psychology. The general implications for mediational hypotheses and the temporal frameworks from within which they may be drawn are discussed. © 2016 Wiley Periodicals, Inc.
This tool allows users to animate cancer trends over time by cancer site and cause of death, race, and sex. Provides access to incidence, mortality, and survival. Select the type of statistic, variables, format, and then extract the statistics in a delimited format for further analyses.
Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F
2013-01-01
Abstract To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches – for example, analysis of variance (ANOVA) – are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in field testing. PMID:24567836
Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F
2013-08-01
To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches - for example, analysis of variance (ANOVA) - are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in field testing.
Boudreault, David J; Li, Chin-Shang; Wong, Michael S
2016-01-01
To evaluate the effect of web-based education on (1) patient satisfaction, (2) consultation times, and (3) conversion to surgery. A retrospective review of 767 new patient consultations seen by 4 university-based plastic surgeons was conducted between May 2012 and August 2013 to determine the effect a web-based education program has on patient satisfaction and consultation time. A standard 5-point Likert scale survey completed at the end of the consultation was used to assess satisfaction with their experience. Consult times were obtained from the electronic medical record. All analyses were done with Statistical Analysis Software version 9.2 (SAS Inc., Cary, NC). A P value less than 0.05 was considered statistically significant. Those who viewed the program before their consultation were more satisfied with their consultation compared to those who did not (satisfaction scores, mean ± SD: 1.13 ± 0.44 vs 1.36 ± 0.74; P = 0.02) and more likely to rate their experience as excellent (92% vs 75%; P = 0.02). Contrary to the claims of Emmi Solutions, patients who viewed the educational program before consultation trended toward longer visits compared to those who did not (mean time ± SD: 54 ± 26 vs 50 ± 35 minutes; P = 0.10). More patients who completed the program went on to undergo a procedure (44% vs 37%; P = 0.16), but this difference was not statistically significant. Viewing web-based educational programs significantly improved plastic surgery patients' satisfaction with their consultation, but patients who viewed the program also trended toward longer consultation times. Although there was an increase in converting to surgical procedures, this did not reach statistical significance.
A data base of geologic field spectra
NASA Technical Reports Server (NTRS)
Kahle, A. B.; Goetz, A. F. H.; Paley, H. N.; Alley, R. E.; Abbott, E. A.
1981-01-01
It is noted that field samples measured in the laboratory do not always present an accurate picture of the ground surface sensed by airborne or spaceborne instruments because of the heterogeneous nature of most surfaces and because samples are disturbed and surface characteristics changed by collection and handling. The development of new remote sensing instruments relies on the analysis of surface materials in their natural state. The existence of thousands of Portable Field Reflectance Spectrometer (PFRS) spectra has necessitated a single, all-inclusive data base that permits greatly simplified searching and sorting procedures and facilitates further statistical analyses. The data base developed at JPL for cataloging geologic field spectra is discussed.
NASA Astrophysics Data System (ADS)
Vereshchaka, A. L.
1997-11-01
Four populations (a total of 677 specimens) of the hydrothermal shrimp species Rimicaris exoculata from three Mid-Atlantic Ridge vent fields were studied: Broken Spur (29°N), TAG (26°N), and "14-45" (14°N). Five morphological characters were analysed: number of dorsolateral spines on telson, telative carapace width, relative abdominal length, presence of "abnormal telson", and fat content. Dependences of each character upon shrimp size were analysed. Division of the shrimp ontogenesis on the basis of general morphology is proposed. Phenotypic analysis based upon five selected characters revealed statistically significant divergence between two populations within the same vent field TAG. Probable causes of observed divergence are discussed.
A catalogue of /Fe/H/ determinations
NASA Astrophysics Data System (ADS)
Cayrel de Strobel, G.; Bentolila, C.; Hauck, B.; Curchod, A.
1980-09-01
A catalog of iron/hydrogen abundance ratios for 628 stars is compiled based on 1109 published values. The catalog consists of (1) a table of absolute iron abundance determinations in the solar photosphere as compiled by Blackwell (1974); (2) the iron/hydrogen abundances of 628 stars in the form of logarithmic differences between iron abundances in the given star and a standard star, obtained from analyses of high-dispersion spectra as well as useful stellar spectroscopic and photometric parameters; and (3) indications of the mean dispersion and wavelength interval used in the analyses. In addition, statistics on the distributions of the number of determinations per star and the apparent magnitudes and spectral types of the stars are presented.
A review of approaches to identifying patient phenotype cohorts using electronic health records
Shivade, Chaitanya; Raghavan, Preethi; Fosler-Lussier, Eric; Embi, Peter J; Elhadad, Noemie; Johnson, Stephen B; Lai, Albert M
2014-01-01
Objective To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype. Materials and methods We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included. Results Ninety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients. Discussion We observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems. Conclusions There are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses. PMID:24201027
Statistical definition of relapse: case of family drug court.
Alemi, Farrokh; Haack, Mary; Nemes, Susanna
2004-06-01
At any point in time, a patient's return to drug use can be seen either as a temporary event or as a return to persistent use. There is no formal standard for distinguishing persistent drug use from an occasional relapse. This lack of standardization persists although the consequences of either interpretation can be life altering. In a drug court or regulatory situation, for example, misinterpreting relapse as return to drug use could lead to incarceration, loss of child custody, or loss of employment. A clinician who mistakes a client's relapse for persistent drug use may fail to adjust treatment intensity to client's needs. An empirical and standardized method for distinguishing relapse from persistent drug use is needed. This paper provides a tool for clinicians and judges to distinguish relapse from persistent use based on statistical analyses of patterns of client's drug use. To accomplish this, a control chart is created for time-in-between relapses. This paper shows how a statistical limit can be calculated by examining either the client's history or other clients in the same program. If client's time-in-between relapse exceeds the statistical limit, then the client has returned to persistent use. Otherwise, the drug use is temporary. To illustrate the method, it is applied to data from three family drug courts. The approach allows the estimation of control limits based on the client's as well as the court's historical patterns. The approach also allows comparison of courts based on recovery rates.
Biosurveillance applying scan statistics with multiple, disparate data sources.
Burkom, Howard S
2003-06-01
Researchers working on the Department of Defense Global Emerging Infections System (DoD-GEIS) pilot system, the Electronic Surveillance System for the Early Notification of Community-Based Epidemics (ESSENCE), have applied scan statistics for early outbreak detection using both traditional and nontraditional data sources. These sources include medical data indexed by International Classification of Disease, 9th Revision (ICD-9) diagnosis codes, as well as less-specific, but potentially timelier, indicators such as records of over-the-counter remedy sales and of school absenteeism. Early efforts employed the Kulldorff scan statistic as implemented in the SaTScan software of the National Cancer Institute. A key obstacle to this application is that the input data streams are typically based on time-varying factors, such as consumer behavior, rather than simply on the populations of the component subregions. We have used both modeling and recent historical data distributions to obtain background spatial distributions. Data analyses have provided guidance on how to condition and model input data to avoid excessive clustering. We have used this methodology in combining data sources for both retrospective studies of known outbreaks and surveillance of high-profile events of concern to local public health authorities. We have integrated the scan statistic capability into a Microsoft Access-based system in which we may include or exclude data sources, vary time windows separately for different data sources, censor data from subsets of individual providers or subregions, adjust the background computation method, and run retrospective or simulated studies.
Kierepka, E M; Latch, E K
2016-01-01
Landscape genetics is a powerful tool for conservation because it identifies landscape features that are important for maintaining genetic connectivity between populations within heterogeneous landscapes. However, using landscape genetics in poorly understood species presents a number of challenges, namely, limited life history information for the focal population and spatially biased sampling. Both obstacles can reduce power in statistics, particularly in individual-based studies. In this study, we genotyped 233 American badgers in Wisconsin at 12 microsatellite loci to identify alternative statistical approaches that can be applied to poorly understood species in an individual-based framework. Badgers are protected in Wisconsin owing to an overall lack in life history information, so our study utilized partial redundancy analysis (RDA) and spatially lagged regressions to quantify how three landscape factors (Wisconsin River, Ecoregions and land cover) impacted gene flow. We also performed simulations to quantify errors created by spatially biased sampling. Statistical analyses first found that geographic distance was an important influence on gene flow, mainly driven by fine-scale positive spatial autocorrelations. After controlling for geographic distance, both RDA and regressions found that Wisconsin River and Agriculture were correlated with genetic differentiation. However, only Agriculture had an acceptable type I error rate (3–5%) to be considered biologically relevant. Collectively, this study highlights the benefits of combining robust statistics and error assessment via simulations and provides a method for hypothesis testing in individual-based landscape genetics. PMID:26243136
Dissecting the genetics of complex traits using summary association statistics.
Pasaniuc, Bogdan; Price, Alkes L
2017-02-01
During the past decade, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyse summary association statistics. Here, we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.
Statistical innovations in diagnostic device evaluation.
Yu, Tinghui; Li, Qin; Gray, Gerry; Yue, Lilly Q
2016-01-01
Due to rapid technological development, innovations in diagnostic devices are proceeding at an extremely fast pace. Accordingly, the needs for adopting innovative statistical methods have emerged in the evaluation of diagnostic devices. Statisticians in the Center for Devices and Radiological Health at the Food and Drug Administration have provided leadership in implementing statistical innovations. The innovations discussed in this article include: the adoption of bootstrap and Jackknife methods, the implementation of appropriate multiple reader multiple case study design, the application of robustness analyses for missing data, and the development of study designs and data analyses for companion diagnostics.
Causes of Death Data in the Global Burden of Disease Estimates for Ischemic and Hemorrhagic Stroke.
Truelsen, Thomas; Krarup, Lars-Henrik; Iversen, Helle K; Mensah, George A; Feigin, Valery L; Sposato, Luciano A; Naghavi, Mohsen
2015-01-01
Stroke mortality estimates in the Global Burden of Disease (GBD) study are based on routine mortality statistics and redistribution of ill-defined codes that cannot be a cause of death, the so-called 'garbage codes' (GCs). This study describes the contribution of these codes to stroke mortality estimates. All available mortality data were compiled and non-specific cause codes were redistributed based on literature review and statistical methods. Ill-defined codes were redistributed to their specific cause of disease by age, sex, country and year. The reassignment was done based on the International Classification of Diseases and the pathology behind each code by checking multiple causes of death and literature review. Unspecified stroke and primary and secondary hypertension are leading contributing 'GCs' to stroke mortality estimates for hemorrhagic stroke (HS) and ischemic stroke (IS). There were marked differences in the fraction of death assigned to IS and HS for unspecified stroke and hypertension between GBD regions and between age groups. A large proportion of stroke fatalities are derived from the redistribution of 'unspecified stroke' and 'hypertension' with marked regional differences. Future advancements in stroke certification, data collections and statistical analyses may improve the estimation of the global stroke burden. © 2015 S. Karger AG, Basel.
Causes of Death Data in the Global Burden of Disease Estimates for Ischemic and Hemorrhagic Stroke
Truelsen, Thomas; Krarup, Lars-Henrik; Iversen, Helle; Mensah, George A.; Feigin, Valery; Sposato, Luciano; Naghavi, Mohsen
2015-01-01
Background Stroke mortality estimates in the Global Burden of Disease (GBD) study are based on routine mortality statistics and redistribution of ill-defined codes that cannot be a cause of death, the so-called “garbage codes”. This study describes the contribution of these codes to stroke mortality estimates. Methods All available mortality data were compiled and non-specific cause codes were redistributed based on literature review and statistical methods. Ill-defined codes were redistributed to their specific cause of disease by age, sex, country, and year. The reassignment was done based on the international classification of diseases and the pathology behind each code by checking multiple causes of death and literature review. Results Unspecified stroke, and primary and secondary hypertension are leading contributing “garbage codes” to stroke mortality estimates for intracranial hemorrhagic stroke and ischemic stroke. There were marked differences in the fraction of death assigned to ischemic stroke and hemorrhagic stroke for unspecified stroke and hypertension between GBD regions and between age groups. Conclusions A large proportion of stroke fatalities is derived from the redistribution of “unspecified stroke” and “hypertension” with marked regional differences. Future advancements in stroke certification, data collections, and statistical analyses may improve the estimation of the global stroke burden. PMID:26505189
Martin, Lisa; Watanabe, Sharon; Fainsinger, Robin; Lau, Francis; Ghosh, Sunita; Quan, Hue; Atkins, Marlis; Fassbender, Konrad; Downing, G Michael; Baracos, Vickie
2010-10-01
To determine whether elements of a standard nutritional screening assessment are independently prognostic of survival in patients with advanced cancer. A prospective nested cohort of patients with metastatic cancer were accrued from different units of a Regional Palliative Care Program. Patients completed a nutritional screen on admission. Data included age, sex, cancer site, height, weight history, dietary intake, 13 nutrition impact symptoms, and patient- and physician-reported performance status (PS). Univariate and multivariate survival analyses were conducted. Concordance statistics (c-statistics) were used to test the predictive accuracy of models based on training and validation sets; a c-statistic of 0.5 indicates the model predicts the outcome as well as chance; perfect prediction has a c-statistic of 1.0. A training set of patients in palliative home care (n = 1,164) was used to identify prognostic variables. Primary disease site, PS, short-term weight change (either gain or loss), dietary intake, and dysphagia predicted survival in multivariate analysis (P < .05). A model including only patients separated by disease site and PS with high c-statistics between predicted and observed responses for survival in the training set (0.90) and validation set (0.88; n = 603). The addition of weight change, dietary intake, and dysphagia did not further improve the c-statistic of the model. The c-statistic was also not altered by substituting physician-rated palliative PS for patient-reported PS. We demonstrate a high probability of concordance between predicted and observed survival for patients in distinct palliative care settings (home care, tertiary inpatient, ambulatory outpatient) based on patient-reported information.
Use of the Analysis of the Volatile Faecal Metabolome in Screening for Colorectal Cancer
2015-01-01
Diagnosis of colorectal cancer is an invasive and expensive colonoscopy, which is usually carried out after a positive screening test. Unfortunately, existing screening tests lack specificity and sensitivity, hence many unnecessary colonoscopies are performed. Here we report on a potential new screening test for colorectal cancer based on the analysis of volatile organic compounds (VOCs) in the headspace of faecal samples. Faecal samples were obtained from subjects who had a positive faecal occult blood sample (FOBT). Subjects subsequently had colonoscopies performed to classify them into low risk (non-cancer) and high risk (colorectal cancer) groups. Volatile organic compounds were analysed by selected ion flow tube mass spectrometry (SIFT-MS) and then data were analysed using both univariate and multivariate statistical methods. Ions most likely from hydrogen sulphide, dimethyl sulphide and dimethyl disulphide are statistically significantly higher in samples from high risk rather than low risk subjects. Results using multivariate methods show that the test gives a correct classification of 75% with 78% specificity and 72% sensitivity on FOBT positive samples, offering a potentially effective alternative to FOBT. PMID:26086914
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilson, William; Krakowiak, Konrad J.; Ulm, Franz-Josef, E-mail: ulm@mit.edu
2014-01-15
According to recent developments in cement clinker engineering, the optimization of chemical substitutions in the main clinker phases offers a promising approach to improve both reactivity and grindability of clinkers. Thus, monitoring the chemistry of the phases may become part of the quality control at the cement plants, along with the usual measurements of the abundance of the mineralogical phases (quantitative X-ray diffraction) and the bulk chemistry (X-ray fluorescence). This paper presents a new method to assess these three complementary quantities with a single experiment. The method is based on electron microprobe spot analyses, performed over a grid located onmore » a representative surface of the sample and interpreted with advanced statistical tools. This paper describes the method and the experimental program performed on industrial clinkers to establish the accuracy in comparison to conventional methods. -- Highlights: •A new method of clinker characterization •Combination of electron probe technique with cluster analysis •Simultaneous assessment of phase abundance, composition and bulk chemistry •Experimental validation performed on industrial clinkers.« less
[Basic concepts for network meta-analysis].
Catalá-López, Ferrán; Tobías, Aurelio; Roqué, Marta
2014-12-01
Systematic reviews and meta-analyses have long been fundamental tools for evidence-based clinical practice. Initially, meta-analyses were proposed as a technique that could improve the accuracy and the statistical power of previous research from individual studies with small sample size. However, one of its main limitations has been the fact of being able to compare no more than two treatments in an analysis, even when the clinical research question necessitates that we compare multiple interventions. Network meta-analysis (NMA) uses novel statistical methods that incorporate information from both direct and indirect treatment comparisons in a network of studies examining the effects of various competing treatments, estimating comparisons between many treatments in a single analysis. Despite its potential limitations, NMA applications in clinical epidemiology can be of great value in situations where there are several treatments that have been compared against a common comparator. Also, NMA can be relevant to a research or clinical question when many treatments must be considered or when there is a mix of both direct and indirect information in the body of evidence. Copyright © 2013 Elsevier España, S.L.U. All rights reserved.
Huvane, Jacqueline; Komarow, Lauren; Hill, Carol; Tran, Thuy Tien T.; Pereira, Carol; Rosenkranz, Susan L.; Finnemeyer, Matt; Earley, Michelle; Jiang, Hongyu (Jeanne); Wang, Rui; Lok, Judith
2017-01-01
Abstract The Statistical and Data Management Center (SDMC) provides the Antibacterial Resistance Leadership Group (ARLG) with statistical and data management expertise to advance the ARLG research agenda. The SDMC is active at all stages of a study, including design; data collection and monitoring; data analyses and archival; and publication of study results. The SDMC enhances the scientific integrity of ARLG studies through the development and implementation of innovative and practical statistical methodologies and by educating research colleagues regarding the application of clinical trial fundamentals. This article summarizes the challenges and roles, as well as the innovative contributions in the design, monitoring, and analyses of clinical trials and diagnostic studies, of the ARLG SDMC. PMID:28350899
A Custom Made Intrinsic Silicone Shade Guide for Indian Population
Behanam, Mohammed; Ahila, S.C.; Jei, J. Brintha
2016-01-01
Introduction Replication of natural skin colour in maxillofacial prosthesis has been traditionally done using trial and error method, as concrete shade guides are unavailable till date. Hence a novel custom made intrinsic silicone shade guide has been attempted for Indian population. Aim Reconstruction of maxillofacial defects is challenging, as achieving an aesthetic result is not always easy. A concoction of a novel intrinsic silicone shade guide was contemplated for the study and its reproducibility in clinical practice was analysed. Materials and Methods Medical grade room temperature vulcanising silicone was used for the fabrication of shade tabs. The shade guide consisted of three main groups I, II and III which were divided based upon the hues yellow, red and blue respectively. Five distinct intrinsic pigments were added in definite proportions to subdivide each group of different values from lighter to darker shades. A total number of 15 circular shade tabs comprised the guide. To validate the usage of the guide, visual assessment of colour matching was done by four investigators to investigate the consent of perfect colour correspondence. Data was statistically analysed using kappa coefficients. Results The kappa values were found to be 0.47 to 0.78 for yellow based group I, 0.13 to 0.65 for red based group II, and 0.07 to 0.36 for blue based group III. This revealed that the shade tabs of yellow and red based hues matched well and showed a statistically good colour matching. Conclusion This intrinsic silicone shade guide can be effectively utilised for fabrication of maxillofacial prosthesis with silicone in Indian population. A transparent colour formula with definite proportioning of intrinsic pigments is provided for obtaining an aesthetic match to skin tone. PMID:27190946
Yamamoto, F; Yamamoto, M
2004-07-01
We previously developed a PCR-based DNA fingerprinting technique named the Methylation Sensitive (MS)-AFLP method, which permits comparative genome-wide scanning of methylation status with a manageable number of fingerprinting experiments. The technique uses the methylation sensitive restriction enzyme NotI in the context of the existing Amplified Fragment Length Polymorphism (AFLP) method. Here we report the successful conversion of this gel electrophoresis-based DNA fingerprinting technique into a DNA microarray hybridization technique (DNA Microarray MS-AFLP). By performing a total of 30 (15 x 2 reciprocal labeling) DNA Microarray MS-AFLP hybridization experiments on genomic DNA from two breast and three prostate cancer cell lines in all pairwise combinations, and Southern hybridization experiments using more than 100 different probes, we have demonstrated that the DNA Microarray MS-AFLP is a reliable method for genetic and epigenetic analyses. No statistically significant differences were observed in the number of differences between the breast-prostate hybridization experiments and the breast-breast or prostate-prostate comparisons.
Developing web-based data analysis tools for precision farming using R and Shiny
NASA Astrophysics Data System (ADS)
Jahanshiri, Ebrahim; Mohd Shariff, Abdul Rashid
2014-06-01
Technologies that are set to increase the productivity of agricultural practices require more and more data. Nevertheless, farming data is also being increasingly cheap to collect and maintain. Bulk of data that are collected by the sensors and samples need to be analysed in an efficient and transparent manner. Web technologies have long being used to develop applications that can assist the farmers and managers. However until recently, analysing the data in an online environment has not been an easy task especially in the eyes of data analysts. This barrier is now overcome by the availability of new application programming interfaces that can provide real-time web based data analysis. In this paper developing a prototype web based application for data analysis using new facilities in R statistical package and its web development facility, Shiny is explored. The pros and cons of this type of data analysis environment for precision farming are enumerated and future directions in web application development for agricultural data are discussed.
Analyses of global sea surface temperature 1856-1991
NASA Astrophysics Data System (ADS)
Kaplan, Alexey; Cane, Mark A.; Kushnir, Yochanan; Clement, Amy C.; Blumenthal, M. Benno; Rajagopalan, Balaji
1998-08-01
Global analyses of monthly sea surface temperature (SST) anomalies from 1856 to 1991 are produced using three statistically based methods: optimal smoothing (OS), the Kaiman filter (KF) and optimal interpolation (OI). Each of these is accompanied by estimates of the error covariance of the analyzed fields. The spatial covariance function these methods require is estimated from the available data; the timemarching model is a first-order autoregressive model again estimated from data. The data input for the analyses are monthly anomalies from the United Kingdom Meteorological Office historical sea surface temperature data set (MOHSST5) [Parker et al., 1994] of the Global Ocean Surface Temperature Atlas (GOSTA) [Bottomley et al., 1990]. These analyses are compared with each other, with GOSTA, and with an analysis generated by projection (P) onto a set of empirical orthogonal functions (as in Smith et al. [1996]). In theory, the quality of the analyses should rank in the order OS, KF, OI, P, and GOSTA. It is found that the first four give comparable results in the data-rich periods (1951-1991), but at times when data is sparse the first three differ significantly from P and GOSTA. At these times the latter two often have extreme and fluctuating values, prima facie evidence of error. The statistical schemes are also verified against data not used in any of the analyses (proxy records derived from corals and air temperature records from coastal and island stations). We also present evidence that the analysis error estimates are indeed indicative of the quality of the products. At most times the OS and KF products are close to the OI product, but at times of especially poor coverage their use of information from other times is advantageous. The methods appear to reconstruct the major features of the global SST field from very sparse data. Comparison with other indications of the El Niño-Southern Oscillation cycle show that the analyses provide usable information on interannual variability as far back as the 1860s.
Repairability of CAD/CAM high-density PMMA- and composite-based polymers.
Wiegand, Annette; Stucki, Lukas; Hoffmann, Robin; Attin, Thomas; Stawarczyk, Bogna
2015-11-01
The study aimed to analyse the shear bond strength of computer-aided design and computer-aided manufacturing (CAD/CAM) polymethyl methacrylate (PMMA)- and composite-based polymer materials repaired with a conventional methacrylate-based composite after different surface pretreatments. Each 48 specimens was prepared from six different CAD/CAM polymer materials (Ambarino high-class, artBloc Temp, CAD-Temp, Lava Ultimate, Telio CAD, Everest C-Temp) and a conventional dimethacrylate-based composite (Filtek Supreme XTE, control) and aged by thermal cycling (5000 cycles, 5-55 °C). The surfaces were left untreated or were pretreated by mechanical roughening, aluminium oxide air abrasion or silica coating/silanization (each subgroup n = 12). The surfaces were further conditioned with an etch&rinse adhesive (OptiBond FL) before the repair composite (Filtek Supreme XTE) was adhered to the surface. After further thermal cycling, shear bond strength was tested, and failure modes were assessed. Shear bond strength was statistically analysed by two- and one-way ANOVAs and Weibull statistics, failure mode by chi(2) test (p ≤ 0.05). Shear bond strength was highest for silica coating/silanization > aluminium oxide air abrasion = mechanical roughening > no surface pretreatment. Independently of the repair pretreatment, highest bond strength values were observed in the control group and for the composite-based Everest C-Temp and Ambarino high-class, while PMMA-based materials (artBloc Temp, CAD-Temp and Telio CAD) presented significantly lowest values. For all materials, repair without any surface pretreatment resulted in adhesive failures only, which mostly were reduced when surface pretreatment was performed. Repair of CAD/CAM high-density polymers requires surface pretreatment prior to adhesive and composite application. However, four out of six of the tested CAD/CAM materials did not achieve the repair bond strength of a conventional dimethacrylate-based composite. Repair of PMMA- and composite-based polymers can be achieved by surface pretreatment followed by application of an adhesive and a conventional methacrylate-based composite.
Nonindependence and sensitivity analyses in ecological and evolutionary meta-analyses.
Noble, Daniel W A; Lagisz, Malgorzata; O'dea, Rose E; Nakagawa, Shinichi
2017-05-01
Meta-analysis is an important tool for synthesizing research on a variety of topics in ecology and evolution, including molecular ecology, but can be susceptible to nonindependence. Nonindependence can affect two major interrelated components of a meta-analysis: (i) the calculation of effect size statistics and (ii) the estimation of overall meta-analytic estimates and their uncertainty. While some solutions to nonindependence exist at the statistical analysis stages, there is little advice on what to do when complex analyses are not possible, or when studies with nonindependent experimental designs exist in the data. Here we argue that exploring the effects of procedural decisions in a meta-analysis (e.g. inclusion of different quality data, choice of effect size) and statistical assumptions (e.g. assuming no phylogenetic covariance) using sensitivity analyses are extremely important in assessing the impact of nonindependence. Sensitivity analyses can provide greater confidence in results and highlight important limitations of empirical work (e.g. impact of study design on overall effects). Despite their importance, sensitivity analyses are seldom applied to problems of nonindependence. To encourage better practice for dealing with nonindependence in meta-analytic studies, we present accessible examples demonstrating the impact that ignoring nonindependence can have on meta-analytic estimates. We also provide pragmatic solutions for dealing with nonindependent study designs, and for analysing dependent effect sizes. Additionally, we offer reporting guidelines that will facilitate disclosure of the sources of nonindependence in meta-analyses, leading to greater transparency and more robust conclusions. © 2017 John Wiley & Sons Ltd.
Martiniano, Rui; McLaughlin, Russell; Silva, Nuno M.; Manco, Licinio; Pereira, Tania; Coelho, Maria J.; Serra, Miguel; Burger, Joachim; Parreira, Rui; Moran, Elena; Valera, Antonio C.; Silva, Ana M.
2017-01-01
We analyse new genomic data (0.05–2.95x) from 14 ancient individuals from Portugal distributed from the Middle Neolithic (4200–3500 BC) to the Middle Bronze Age (1740–1430 BC) and impute genomewide diploid genotypes in these together with published ancient Eurasians. While discontinuity is evident in the transition to agriculture across the region, sensitive haplotype-based analyses suggest a significant degree of local hunter-gatherer contribution to later Iberian Neolithic populations. A more subtle genetic influx is also apparent in the Bronze Age, detectable from analyses including haplotype sharing with both ancient and modern genomes, D-statistics and Y-chromosome lineages. However, the limited nature of this introgression contrasts with the major Steppe migration turnovers within third Millennium northern Europe and echoes the survival of non-Indo-European language in Iberia. Changes in genomic estimates of individual height across Europe are also associated with these major cultural transitions, and ancestral components continue to correlate with modern differences in stature. PMID:28749934
Sharma, Swarkar; Saha, Anjana; Rai, Ekta; Bhat, Audesh; Bamezai, Ramesh
2005-01-01
We have analysed the hypervariable regions (HVR I and II) of human mitochondrial DNA (mtDNA) in individuals from Uttar Pradesh (UP), Bihar (BI) and Punjab (PUNJ), belonging to the Indo-European linguistic group, and from South India (SI), that have their linguistic roots in Dravidian language. Our analysis revealed the presence of known and novel mutations in both hypervariable regions in the studied population groups. Median joining network analyses based on mtDNA showed extensive overlap in mtDNA lineages despite the extensive cultural and linguistic diversity. MDS plot analysis based on Fst distances suggested increased maternal genetic proximity for the studied population groups compared with other world populations. Mismatch distribution curves, respective neighbour joining trees and other statistical analyses showed that there were significant expansions. The study revealed an ancient common ancestry for the studied population groups, most probably through common founder female lineage(s), and also indicated that human migrations occurred (maybe across and within the Indian subcontinent) even after the initial phase of female migration to India.
Fatigue reliability of deck structures subjected to correlated crack growth
NASA Astrophysics Data System (ADS)
Feng, G. Q.; Garbatov, Y.; Guedes Soares, C.
2013-12-01
The objective of this work is to analyse fatigue reliability of deck structures subjected to correlated crack growth. The stress intensity factors of the correlated cracks are obtained by finite element analysis and based on which the geometry correction functions are derived. The Monte Carlo simulations are applied to predict the statistical descriptors of correlated cracks based on the Paris-Erdogan equation. A probabilistic model of crack growth as a function of time is used to analyse the fatigue reliability of deck structures accounting for the crack propagation correlation. A deck structure is modelled as a series system of stiffened panels, where a stiffened panel is regarded as a parallel system composed of plates and are longitudinal. It has been proven that the method developed here can be conveniently applied to perform the fatigue reliability assessment of structures subjected to correlated crack growth.
Analysis of complex environment effect on near-field emission
NASA Astrophysics Data System (ADS)
Ravelo, B.; Lalléchère, S.; Bonnet, P.; Paladian, F.
2014-10-01
The article is dealing with uncertainty analyses of radiofrequency circuits electromagnetic compatibility emission based on the near-field/near-field (NF/NF) transform combined with stochastic approach. By using 2D data corresponding to electromagnetic (EM) field (X=E or H) scanned in the observation plane placed at the position z0 above the circuit under test (CUT), the X field map was extracted. Then, uncertainty analyses were assessed via the statistical moments from X component. In addition, stochastic collocation based was considered and calculations were applied to planar EM NF radiated by the CUTs as Wilkinson power divider and a microstrip line operating at GHz levels. After Matlab implementation, the mean and standard deviation were assessed. The present study illustrates how the variations of environmental parameters may impact EM fields. The NF uncertainty methodology can be applied to any physical parameter effects in complex environment and useful for printed circuit board (PCBs) design guideline.
Continuous Covariate Imbalance and Conditional Power for Clinical Trial Interim Analyses
Ciolino, Jody D.; Martin, Renee' H.; Zhao, Wenle; Jauch, Edward C.; Hill, Michael D.; Palesch, Yuko Y.
2014-01-01
Oftentimes valid statistical analyses for clinical trials involve adjustment for known influential covariates, regardless of imbalance observed in these covariates at baseline across treatment groups. Thus, it must be the case that valid interim analyses also properly adjust for these covariates. There are situations, however, in which covariate adjustment is not possible, not planned, or simply carries less merit as it makes inferences less generalizable and less intuitive. In this case, covariate imbalance between treatment groups can have a substantial effect on both interim and final primary outcome analyses. This paper illustrates the effect of influential continuous baseline covariate imbalance on unadjusted conditional power (CP), and thus, on trial decisions based on futility stopping bounds. The robustness of the relationship is illustrated for normal, skewed, and bimodal continuous baseline covariates that are related to a normally distributed primary outcome. Results suggest that unadjusted CP calculations in the presence of influential covariate imbalance require careful interpretation and evaluation. PMID:24607294
Tuuli, Methodius G; Odibo, Anthony O
2011-08-01
The objective of this article is to discuss the rationale for common statistical tests used for the analysis and interpretation of prenatal diagnostic imaging studies. Examples from the literature are used to illustrate descriptive and inferential statistics. The uses and limitations of linear and logistic regression analyses are discussed in detail.
Using a Five-Step Procedure for Inferential Statistical Analyses
ERIC Educational Resources Information Center
Kamin, Lawrence F.
2010-01-01
Many statistics texts pose inferential statistical problems in a disjointed way. By using a simple five-step procedure as a template for statistical inference problems, the student can solve problems in an organized fashion. The problem and its solution will thus be a stand-by-itself organic whole and a single unit of thought and effort. The…
Identifying city PV roof resource based on Gabor filter
NASA Astrophysics Data System (ADS)
Ruhang, Xu; Zhilin, Liu; Yong, Huang; Xiaoyu, Zhang
2017-06-01
To identify a city’s PV roof resources, the area and ownership distribution of residential buildings in an urban district should be assessed. To achieve this assessment, remote sensing data analysing is a promising approach. Urban building roof area estimation is a major topic for remote sensing image information extraction. There are normally three ways to solve this problem. The first way is pixel-based analysis, which is based on mathematical morphology or statistical methods; the second way is object-based analysis, which is able to combine semantic information and expert knowledge; the third way is signal-processing view method. This paper presented a Gabor filter based method. This result shows that the method is fast and with proper accuracy.
Imperial College near infrared spectroscopy neuroimaging analysis framework.
Orihuela-Espina, Felipe; Leff, Daniel R; James, David R C; Darzi, Ara W; Yang, Guang-Zhong
2018-01-01
This paper describes the Imperial College near infrared spectroscopy neuroimaging analysis (ICNNA) software tool for functional near infrared spectroscopy neuroimaging data. ICNNA is a MATLAB-based object-oriented framework encompassing an application programming interface and a graphical user interface. ICNNA incorporates reconstruction based on the modified Beer-Lambert law and basic processing and data validation capabilities. Emphasis is placed on the full experiment rather than individual neuroimages as the central element of analysis. The software offers three types of analyses including classical statistical methods based on comparison of changes in relative concentrations of hemoglobin between the task and baseline periods, graph theory-based metrics of connectivity and, distinctively, an analysis approach based on manifold embedding. This paper presents the different capabilities of ICNNA in its current version.
Harris, Michael; Radtke, Arthur S.
1976-01-01
Linear regression and discriminant analyses techniques were applied to gold, mercury, arsenic, antimony, barium, copper, molybdenum, lead, zinc, boron, tellurium, selenium, and tungsten analyses from drill holes into unoxidized gold ore at the Carlin gold mine near Carlin, Nev. The statistical treatments employed were used to judge proposed hypotheses on the origin and geochemical paragenesis of this disseminated gold deposit.
ERIC Educational Resources Information Center
Neumann, David L.; Hood, Michelle
2009-01-01
A wiki was used as part of a blended learning approach to promote collaborative learning among students in a first year university statistics class. One group of students analysed a data set and communicated the results by jointly writing a practice report using a wiki. A second group analysed the same data but communicated the results in a…
Extreme between-study homogeneity in meta-analyses could offer useful insights.
Ioannidis, John P A; Trikalinos, Thomas A; Zintzaras, Elias
2006-10-01
Meta-analyses are routinely evaluated for the presence of large between-study heterogeneity. We examined whether it is also important to probe whether there is extreme between-study homogeneity. We used heterogeneity tests with left-sided statistical significance for inference and developed a Monte Carlo simulation test for testing extreme homogeneity in risk ratios across studies, using the empiric distribution of the summary risk ratio and heterogeneity statistic. A left-sided P=0.01 threshold was set for claiming extreme homogeneity to minimize type I error. Among 11,803 meta-analyses with binary contrasts from the Cochrane Library, 143 (1.21%) had left-sided P-value <0.01 for the asymptotic Q statistic and 1,004 (8.50%) had left-sided P-value <0.10. The frequency of extreme between-study homogeneity did not depend on the number of studies in the meta-analyses. We identified examples where extreme between-study homogeneity (left-sided P-value <0.01) could result from various possibilities beyond chance. These included inappropriate statistical inference (asymptotic vs. Monte Carlo), use of a specific effect metric, correlated data or stratification using strong predictors of outcome, and biases and potential fraud. Extreme between-study homogeneity may provide useful insights about a meta-analysis and its constituent studies.
Cavalcante, Y L; Hauser-Davis, R A; Saraiva, A C F; Brandão, I L S; Oliveira, T F; Silveira, A M
2013-01-01
This paper compared and evaluated seasonal variations in physico-chemical parameters and metals at a hydroelectric power station reservoir by applying Multivariate Analyses and Artificial Neural Networks (ANN) statistical techniques. A Factor Analysis was used to reduce the number of variables: the first factor was composed of elements Ca, K, Mg and Na, and the second by Chemical Oxygen Demand. The ANN showed 100% correct classifications in training and validation samples. Physico-chemical analyses showed that water pH values were not statistically different between the dry and rainy seasons, while temperature, conductivity, alkalinity, ammonia and DO were higher in the dry period. TSS, hardness and COD, on the other hand, were higher during the rainy season. The statistical analyses showed that Ca, K, Mg and Na are directly connected to the Chemical Oxygen Demand, which indicates a possibility of their input into the reservoir system by domestic sewage and agricultural run-offs. These statistical applications, thus, are also relevant in cases of environmental management and policy decision-making processes, to identify which factors should be further studied and/or modified to recover degraded or contaminated water bodies. Copyright © 2012 Elsevier B.V. All rights reserved.
Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T
2016-12-20
Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Identification of Chinese plague foci from long-term epidemiological data
Ben-Ari, Tamara; Neerinckx, Simon; Agier, Lydiane; Cazelles, Bernard; Xu, Lei; Zhang, Zhibin; Fang, Xiye; Wang, Shuchun; Liu, Qiyong; Stenseth, Nils C.
2012-01-01
Carrying out statistical analysis over an extensive dataset of human plague reports in Chinese villages from 1772 to 1964, we identified plague endemic territories in China (i.e., plague foci). Analyses rely on (i) a clustering method that groups time series based on their time-frequency resemblances and (ii) an ecological niche model that helps identify plague suitable territories characterized by value ranges for a set of predefined environmental variables. Results from both statistical tools indicate the existence of two disconnected plague territories corresponding to Northern and Southern China. Altogether, at least four well defined independent foci are identified. Their contours compare favorably with field observations. Potential and limitations of inferring plague foci and dynamics using epidemiological data is discussed. PMID:22570501
Statistical Learning Analysis in Neuroscience: Aiming for Transparency
Hanke, Michael; Halchenko, Yaroslav O.; Haxby, James V.; Pollmann, Stefan
2009-01-01
Encouraged by a rise of reciprocal interest between the machine learning and neuroscience communities, several recent studies have demonstrated the explanatory power of statistical learning techniques for the analysis of neural data. In order to facilitate a wider adoption of these methods, neuroscientific research needs to ensure a maximum of transparency to allow for comprehensive evaluation of the employed procedures. We argue that such transparency requires “neuroscience-aware” technology for the performance of multivariate pattern analyses of neural data that can be documented in a comprehensive, yet comprehensible way. Recently, we introduced PyMVPA, a specialized Python framework for machine learning based data analysis that addresses this demand. Here, we review its features and applicability to various neural data modalities. PMID:20582270
From sexless to sexy: Why it is time for human genetics to consider and report analyses of sex.
Powers, Matthew S; Smith, Phillip H; McKee, Sherry A; Ehringer, Marissa A
2017-01-01
Science has come a long way with regard to the consideration of sex differences in clinical and preclinical research, but one field remains behind the curve: human statistical genetics. The goal of this commentary is to raise awareness and discussion about how to best consider and evaluate possible sex effects in the context of large-scale human genetic studies. Over the course of this commentary, we reinforce the importance of interpreting genetic results in the context of biological sex, establish evidence that sex differences are not being considered in human statistical genetics, and discuss how best to conduct and report such analyses. Our recommendation is to run stratified analyses by sex no matter the sample size or the result and report the findings. Summary statistics from stratified analyses are helpful for meta-analyses, and patterns of sex-dependent associations may be hidden in a combined dataset. In the age of declining sequencing costs, large consortia efforts, and a number of useful control samples, it is now time for the field of human genetics to appropriately include sex in the design, analysis, and reporting of results.
NASA Astrophysics Data System (ADS)
Nguyen, A.; Mueller, C.; Brooks, A. N.; Kislik, E. A.; Baney, O. N.; Ramirez, C.; Schmidt, C.; Torres-Perez, J. L.
2014-12-01
The Sierra Nevada is experiencing changes in hydrologic regimes, such as decreases in snowmelt and peak runoff, which affect forest health and the availability of water resources. Currently, the USDA Forest Service Region 5 is undergoing Forest Plan revisions to include climate change impacts into mitigation and adaptation strategies. However, there are few processes in place to conduct quantitative assessments of forest conditions in relation to mountain hydrology, while easily and effectively delivering that information to forest managers. To assist the USDA Forest Service, this study is the final phase of a three-term project to create a Decision Support System (DSS) to allow ease of access to historical and forecasted hydrologic, climatic, and terrestrial conditions for the entire Sierra Nevada. This data is featured within three components of the DSS: the Mapping Viewer, Statistical Analysis Portal, and Geospatial Data Gateway. Utilizing ArcGIS Online, the Sierra DSS Mapping Viewer enables users to visually analyze and locate areas of interest. Once the areas of interest are targeted, the Statistical Analysis Portal provides subbasin level statistics for each variable over time by utilizing a recently developed web-based data analysis and visualization tool called Plotly. This tool allows users to generate graphs and conduct statistical analyses for the Sierra Nevada without the need to download the dataset of interest. For more comprehensive analysis, users are also able to download datasets via the Geospatial Data Gateway. The third phase of this project focused on Python-based data processing, the adaptation of the multiple capabilities of ArcGIS Online and Plotly, and the integration of the three Sierra DSS components within a website designed specifically for the USDA Forest Service.
Nicolay, C R; Purkayastha, S; Greenhalgh, A; Benn, J; Chaturvedi, S; Phillips, N; Darzi, A
2012-03-01
The demand for the highest-quality patient care coupled with pressure on funding has led to the increasing use of quality improvement (QI) methodologies from the manufacturing industry. The aim of this systematic review was to identify and evaluate the application and effectiveness of these QI methodologies to the field of surgery. MEDLINE, the Cochrane Database, Allied and Complementary Medicine Database, British Nursing Index, Cumulative Index to Nursing and Allied Health Literature, Embase, Health Business(™) Elite, the Health Management Information Consortium and PsycINFO(®) were searched according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement. Empirical studies were included that implemented a described QI methodology to surgical care and analysed a named outcome statistically. Some 34 of 1595 articles identified met the inclusion criteria after consensus from two independent investigators. Nine studies described continuous quality improvement (CQI), five Six Sigma, five total quality management (TQM), five plan-do-study-act (PDSA) or plan-do-check-act (PDCA) cycles, five statistical process control (SPC) or statistical quality control (SQC), four Lean and one Lean Six Sigma; 20 of the studies were undertaken in the USA. The most common aims were to reduce complications or improve outcomes (11), to reduce infection (7), and to reduce theatre delays (7). There was one randomized controlled trial. QI methodologies from industry can have significant effects on improving surgical care, from reducing infection rates to increasing operating room efficiency. The evidence is generally of suboptimal quality, and rigorous randomized multicentre studies are needed to bring evidence-based management into the same league as evidence-based medicine. Copyright © 2011 British Journal of Surgery Society Ltd. Published by John Wiley & Sons, Ltd.
2013-01-01
Background Malnutrition is one of the principal causes of child mortality in developing countries including Bangladesh. According to our knowledge, most of the available studies, that addressed the issue of malnutrition among under-five children, considered the categorical (dichotomous/polychotomous) outcome variables and applied logistic regression (binary/multinomial) to find their predictors. In this study malnutrition variable (i.e. outcome) is defined as the number of under-five malnourished children in a family, which is a non-negative count variable. The purposes of the study are (i) to demonstrate the applicability of the generalized Poisson regression (GPR) model as an alternative of other statistical methods and (ii) to find some predictors of this outcome variable. Methods The data is extracted from the Bangladesh Demographic and Health Survey (BDHS) 2007. Briefly, this survey employs a nationally representative sample which is based on a two-stage stratified sample of households. A total of 4,460 under-five children is analysed using various statistical techniques namely Chi-square test and GPR model. Results The GPR model (as compared to the standard Poisson regression and negative Binomial regression) is found to be justified to study the above-mentioned outcome variable because of its under-dispersion (variance < mean) property. Our study also identify several significant predictors of the outcome variable namely mother’s education, father’s education, wealth index, sanitation status, source of drinking water, and total number of children ever born to a woman. Conclusions Consistencies of our findings in light of many other studies suggest that the GPR model is an ideal alternative of other statistical models to analyse the number of under-five malnourished children in a family. Strategies based on significant predictors may improve the nutritional status of children in Bangladesh. PMID:23297699
Dwivedi, Raghav C; St Rose, Suzanne; Chisholm, Edward J; Bisase, Brian; Amen, Furrat; Nutting, Christopher M; Clarke, Peter M; Kerawala, Cyrus J; Rhys-Evans, Peter H; Harrington, Kevin J; Kazi, Rehan
2012-06-01
The aim of this study was to explore post-treatment speech impairments using English version of Speech Handicap Index (SHI) (first speech-specific questionnaire) in a cohort of oral cavity (OC) and oropharyngeal (OP) cancer patients. Sixty-three consecutive OC and OP cancer patients in follow-up participated in this study. Descriptive analyses have been presented as percentages, while Mann-Whitney U-test and Kruskall-Wallis test have been used for the quantitative variables. Statistical Package for Social Science-15 statistical software (SPSS Inc., Chicago, IL) was used for the statistical analyses. Over a third (36.1%) of patients reported their speech as either average or bad. Speech intelligibility and articulation were the main speech concerns for 58.8% and 52.9% OC and 31.6% and 34.2% OP cancer patients, respectively. While feeling of incompetent and being less outgoing were the speech-related psychosocial concerns for 64.7% and 23.5% OC and 15.8% and 18.4% OP cancer patients, respectively. Worse speech outcomes were noted for oral tongue and base of tongue cancers vs. tonsillar cancers, mean (SD) values were 56.7 (31.3) and 52.0 (38.4) vs. 10.9 (14.8) (P<0.001) and late vs. early T stage cancers 65.0 (29.9) vs. 29.3 (32.7) (P<0.005). The English version of the SHI is a reliable, valid and useful tool for the evaluation of speech in HNC patients. Over one-third of OC and OP cancer patients reported speech problems in their day-do-day life. Advanced T-stage tumors affecting the oral tongue or base of tongue are particularly associated with poor speech outcomes. Copyright © 2012 Elsevier Ltd. All rights reserved.
Nagy, László G; Urban, Alexander; Orstadius, Leif; Papp, Tamás; Larsson, Ellen; Vágvölgyi, Csaba
2010-12-01
Recently developed comparative phylogenetic methods offer a wide spectrum of applications in evolutionary biology, although it is generally accepted that their statistical properties are incompletely known. Here, we examine and compare the statistical power of the ML and Bayesian methods with regard to selection of best-fit models of fruiting-body evolution and hypothesis testing of ancestral states on a real-life data set of a physiological trait (autodigestion) in the family Psathyrellaceae. Our phylogenies are based on the first multigene data set generated for the family. Two different coding regimes (binary and multistate) and two data sets differing in taxon sampling density are examined. The Bayesian method outperformed Maximum Likelihood with regard to statistical power in all analyses. This is particularly evident if the signal in the data is weak, i.e. in cases when the ML approach does not provide support to choose among competing hypotheses. Results based on binary and multistate coding differed only modestly, although it was evident that multistate analyses were less conclusive in all cases. It seems that increased taxon sampling density has favourable effects on inference of ancestral states, while model parameters are influenced to a smaller extent. The model best fitting our data implies that the rate of losses of deliquescence equals zero, although model selection in ML does not provide proper support to reject three of the four candidate models. The results also support the hypothesis that non-deliquescence (lack of autodigestion) has been ancestral in Psathyrellaceae, and that deliquescent fruiting bodies represent the preferred state, having evolved independently several times during evolution. Copyright © 2010 Elsevier Inc. All rights reserved.
2013-01-01
Background Population studies on end-of-life decisions have not been conducted in Cyprus. Our study aim was to evaluate the beliefs and attitudes of Greek Cypriots towards end-of-life issues regarding euthanasia and cremation. Methods A population-based telephone survey was conducted in Cyprus. One thousand randomly selected individuals from the population of Cyprus age 20 years or older were invited to participate. Beliefs and attitudes on end-of-life decisions were collected using an anonymous and validated questionnaire. Statistical analyses included cross-tabulations, Pearson’s chi-square tests and multivariable-adjusted logistic regression models. Results A total of 308 males and 689 females participated in the survey. About 70% of the respondents did not support euthanasia for people with incurable illness and/or elders with dementia when requested by them and 77% did not support euthanasia for people with incurable illness and/or elders with dementia when requested by relatives. Regarding cremation, 78% were against and only 14% reported being in favor. Further statistical analyses showed that male gender, being single and having reached higher educational level were factors positively associated with support for euthanasia in a statistically significant fashion. On the contrary, the more religiosity expressed by study participants, the less support they reported for euthanasia or cremation. Conclusions The vast majority of Greek Cypriots does not support euthanasia for people with incurable illness and/or elders with dementia and also do not support cremation. Certain demographic characteristics such as age and education have a positive influence towards attitudes for euthanasia and cremation, while religiosity exerts a strong negative influence on the above. Family bonding as well as social and cultural traditions may also play a role although not comprehensively evaluated in the current study. PMID:24060291
Televantos, Anastasios; Talias, Michael A; Charalambous, Marianna; Soteriades, Elpidoforos S
2013-09-23
Population studies on end-of-life decisions have not been conducted in Cyprus. Our study aim was to evaluate the beliefs and attitudes of Greek Cypriots towards end-of-life issues regarding euthanasia and cremation. A population-based telephone survey was conducted in Cyprus. One thousand randomly selected individuals from the population of Cyprus age 20 years or older were invited to participate. Beliefs and attitudes on end-of-life decisions were collected using an anonymous and validated questionnaire. Statistical analyses included cross-tabulations, Pearson's chi-square tests and multivariable-adjusted logistic regression models. A total of 308 males and 689 females participated in the survey. About 70% of the respondents did not support euthanasia for people with incurable illness and/or elders with dementia when requested by them and 77% did not support euthanasia for people with incurable illness and/or elders with dementia when requested by relatives. Regarding cremation, 78% were against and only 14% reported being in favor. Further statistical analyses showed that male gender, being single and having reached higher educational level were factors positively associated with support for euthanasia in a statistically significant fashion. On the contrary, the more religiosity expressed by study participants, the less support they reported for euthanasia or cremation. The vast majority of Greek Cypriots does not support euthanasia for people with incurable illness and/or elders with dementia and also do not support cremation. Certain demographic characteristics such as age and education have a positive influence towards attitudes for euthanasia and cremation, while religiosity exerts a strong negative influence on the above. Family bonding as well as social and cultural traditions may also play a role although not comprehensively evaluated in the current study.
The cost of an emergency department visit and its relationship to emergency department volume.
Bamezai, Anil; Melnick, Glenn; Nawathe, Amar
2005-05-01
This article addresses 2 questions: (1) to what extent do emergency departments (EDs) exhibit economies of scale; and (2) to what extent do publicly available accounting data understate the marginal cost of an outpatient ED visit? Understanding the appropriate role for EDs in the overall health care system is crucially dependent on answers to these questions. The literature on these issues is sparse and somewhat dated and fails to differentiate between trauma and nontrauma hospitals. We believe a careful review of these questions is necessary because several changes (greater managed care penetration, increased price competition, cost of compliance with Emergency Medical Treatment and Active Labor Act regulations, and so on) may have significantly altered ED economics in recent years. We use a 2-pronged approach, 1 based on descriptive analyses of publicly available accounting data and 1 based on statistical cost models estimated from a 9-year panel of hospital data, to address the above-mentioned questions. Neither the descriptive analyses nor the statistical models support the existence of significant scale economies. Furthermore, the marginal cost of outpatient ED visits, even without the emergency physician component, appear quite high--in 1998 dollars, US295 dollars and US412 dollars for nontrauma and trauma EDs, respectively. These statistical estimates exceed the accounting estimates of per-visit costs by a factor of roughly 2. Our findings suggest that the marginal cost of an outpatient ED visit is higher than is generally believed. Hospitals thus need to carefully review how EDs fit within their overall operations and cost structure and may need to pay special attention to policies and procedures that guide the delivery of nonurgent care through the ED.
Billot, Laurent; Lindley, Richard I; Harvey, Lisa A; Maulik, Pallab K; Hackett, Maree L; Murthy, Gudlavalleti Vs; Anderson, Craig S; Shamanna, Bindiganavale R; Jan, Stephen; Walker, Marion; Forster, Anne; Langhorne, Peter; Verma, Shweta J; Felix, Cynthia; Alim, Mohammed; Gandhi, Dorcas Bc; Pandian, Jeyaraj Durai
2017-02-01
Background In low- and middle-income countries, few patients receive organized rehabilitation after stroke, yet the burden of chronic diseases such as stroke is increasing in these countries. Affordable models of effective rehabilitation could have a major impact. The ATTEND trial is evaluating a family-led caregiver delivered rehabilitation program after stroke. Objective To publish the detailed statistical analysis plan for the ATTEND trial prior to trial unblinding. Methods Based upon the published registration and protocol, the blinded steering committee and management team, led by the trial statistician, have developed a statistical analysis plan. The plan has been informed by the chosen outcome measures, the data collection forms and knowledge of key baseline data. Results The resulting statistical analysis plan is consistent with best practice and will allow open and transparent reporting. Conclusions Publication of the trial statistical analysis plan reduces potential bias in trial reporting, and clearly outlines pre-specified analyses. Clinical Trial Registrations India CTRI/2013/04/003557; Australian New Zealand Clinical Trials Registry ACTRN1261000078752; Universal Trial Number U1111-1138-6707.
Systematic review and meta-analysis in cardiac surgery: a primer.
Yanagawa, Bobby; Tam, Derrick Y; Mazine, Amine; Tricco, Andrea C
2018-03-01
The purpose of this article is to review the strengths and weaknesses of systematic reviews and meta-analyses to inform our current understanding of cardiac surgery. A systematic review and meta-analysis of a focused topic can provide a quantitative estimate for the effect of a treatment intervention or exposure. In cardiac surgery, observational studies and small, single-center prospective trials provide most of the clinical outcomes that form the evidence base for patient management and guideline recommendations. As such, meta-analyses can be particularly valuable in synthesizing the literature for a particular focused surgical question. Since the year 2000, there are over 800 meta-analysis-related publications in our field. There are some limitations to this technique, including clinical, methodological and statistical heterogeneity, among other challenges. Despite these caveats, results of meta-analyses have been useful in forming treatment recommendations or in providing guidance in the design of future clinical trials. There is a growing number of meta-analyses in the field of cardiac surgery. Knowledge translation via meta-analyses will continue to guide and inform cardiac surgical practice and our practice guidelines.
Pike, Katie; Nash, Rachel L; Murphy, Gavin J; Reeves, Barnaby C; Rogers, Chris A
2015-02-22
The Transfusion Indication Threshold Reduction (TITRe2) trial is the largest randomized controlled trial to date to compare red blood cell transfusion strategies following cardiac surgery. This update presents the statistical analysis plan, detailing how the study will be analyzed and presented. The statistical analysis plan has been written following recommendations from the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use, prior to database lock and the final analysis of trial data. Outlined analyses are in line with the Consolidated Standards of Reporting Trials (CONSORT). The study aims to randomize 2000 patients from 17 UK centres. Patients are randomized to either a restrictive (transfuse if haemoglobin concentration <7.5 g/dl) or liberal (transfuse if haemoglobin concentration <9 g/dl) transfusion strategy. The primary outcome is a binary composite outcome of any serious infectious or ischaemic event in the first 3 months following randomization. The statistical analysis plan details how non-adherence with the intervention, withdrawals from the study, and the study population will be derived and dealt with in the analysis. The planned analyses of the trial primary and secondary outcome measures are described in detail, including approaches taken to deal with multiple testing, model assumptions not being met and missing data. Details of planned subgroup and sensitivity analyses and pre-specified ancillary analyses are given, along with potential issues that have been identified with such analyses and possible approaches to overcome such issues. ISRCTN70923932 .
Barton, Hugh A; Chiu, Weihsueh A; Setzer, R Woodrow; Andersen, Melvin E; Bailer, A John; Bois, Frédéric Y; Dewoskin, Robert S; Hays, Sean; Johanson, Gunnar; Jones, Nancy; Loizou, George; Macphail, Robert C; Portier, Christopher J; Spendiff, Martin; Tan, Yu-Mei
2007-10-01
Physiologically based pharmacokinetic (PBPK) models are used in mode-of-action based risk and safety assessments to estimate internal dosimetry in animals and humans. When used in risk assessment, these models can provide a basis for extrapolating between species, doses, and exposure routes or for justifying nondefault values for uncertainty factors. Characterization of uncertainty and variability is increasingly recognized as important for risk assessment; this represents a continuing challenge for both PBPK modelers and users. Current practices show significant progress in specifying deterministic biological models and nondeterministic (often statistical) models, estimating parameters using diverse data sets from multiple sources, using them to make predictions, and characterizing uncertainty and variability of model parameters and predictions. The International Workshop on Uncertainty and Variability in PBPK Models, held 31 Oct-2 Nov 2006, identified the state-of-the-science, needed changes in practice and implementation, and research priorities. For the short term, these include (1) multidisciplinary teams to integrate deterministic and nondeterministic/statistical models; (2) broader use of sensitivity analyses, including for structural and global (rather than local) parameter changes; and (3) enhanced transparency and reproducibility through improved documentation of model structure(s), parameter values, sensitivity and other analyses, and supporting, discrepant, or excluded data. Longer-term needs include (1) theoretical and practical methodological improvements for nondeterministic/statistical modeling; (2) better methods for evaluating alternative model structures; (3) peer-reviewed databases of parameters and covariates, and their distributions; (4) expanded coverage of PBPK models across chemicals with different properties; and (5) training and reference materials, such as cases studies, bibliographies/glossaries, model repositories, and enhanced software. The multidisciplinary dialogue initiated by this Workshop will foster the collaboration, research, data collection, and training necessary to make characterizing uncertainty and variability a standard practice in PBPK modeling and risk assessment.
Hope, Andrew G.; Waltari, Eric; Malaney, Jason L.; Payer, David C.; Cook, J.A.; Talbot, Sandra L.
2015-01-01
As ancestral biodiversity responded dynamically to late-Quaternary climate changes, so are extant organisms responding to the warming trajectory of the Anthropocene. Ecological predictive modeling, statistical hypothesis tests, and genetic signatures of demographic change can provide a powerful integrated toolset for investigating these biodiversity responses to climate change, and relative resiliency across different communities. Within the biotic province of Beringia, we analyzed specimen localities and DNA sequences from 28 mammal species associated with boreal forest and Arctic tundra biomes to assess both historical distributional and evolutionary responses and then forecasted future changes based on statistical assessments of past and present trajectories, and quantified distributional and demographic changes in relation to major management regions within the study area. We addressed three sets of hypotheses associated with aspects of methodological, biological, and socio-political importance by asking (1) what is the consistency among implications of predicted changes based on the results of both ecological and evolutionary analyses; (2) what are the ecological and evolutionary implications of climate change considering either total regional diversity or distinct communities associated with major biomes; and (3) are there differences in management implications across regions? Our results indicate increasing Arctic richness through time that highlights a potential state shift across the Arctic landscape. However, within distinct ecological communities, we found a predicted decline in the range and effective population size of tundra species into several discrete refugial areas. Consistency in results based on a combination of both ecological and evolutionary approaches demonstrates increased statistical confidence by applying cross-discipline comparative analyses to conservation of biodiversity, particularly considering variable management regimes that seek to balance sustainable ecosystems with other anthropogenic values. Refugial areas for cold-adapted taxa appear to be persistent across both warm and cold climate phases and although fragmented, constitute vital regions for persistence of Arctic mammals.
NASA Astrophysics Data System (ADS)
Rich, Grayson Currie
The COHERENT Collaboration has produced the first-ever observation, with a significance of 6.7sigma, of a process consistent with coherent, elastic neutrino-nucleus scattering (CEnuNS) as first predicted and described by D.Z. Freedman in 1974. Physics of the CEnuNS process are presented along with its relationship to future measurements in the arenas of nuclear physics, fundamental particle physics, and astroparticle physics, where the newly-observed interaction presents a viable tool for investigations into numerous outstanding questions about the nature of the universe. To enable the CEnuNS observation with a 14.6-kg CsI[Na] detector, new measurements of the response of CsI[Na] to low-energy nuclear recoils, which is the only mechanism by which CEnuNS is detectable, were carried out at Triangle Universities Nuclear Laboratory; these measurements are detailed and an effective nuclear-recoil quenching factor of 8.78 +/- 1.66% is established for CsI[Na] in the recoil-energy range of 5-30 keV, based on new and literature data. Following separate analyses of the CEnuNS-search data by groups at the University of Chicago and the Moscow Engineering and Physics Institute, information from simulations, calculations, and ancillary measurements were used to inform statistical analyses of the collected data. Based on input from the Chicago analysis, the number of CEnuNS events expected from the Standard Model is 173 +/- 48; interpretation as a simple counting experiment finds 136 +/- 31 CEnuNS counts in the data, while a two-dimensional, profile likelihood fit yields 134 +/- 22 CEnuNS counts. Details of the simulations, calculations, and supporting measurements are discussed, in addition to the statistical procedures. Finally, potential improvements to the CsI[Na]-based CEnuNS measurement are presented along with future possibilities for COHERENT Collaboration, including new CEnuNS detectors and measurement of the neutrino-induced neutron spallation process.
VCSEL-based fiber optic link for avionics: implementation and performance analyses
NASA Astrophysics Data System (ADS)
Shi, Jieqin; Zhang, Chunxi; Duan, Jingyuan; Wen, Huaitao
2006-11-01
A Gb/s fiber optic link with built-in test capability (BIT) basing on vertical-cavity surface-emitting laser (VCSEL) sources for military avionics bus for next generation has been presented in this paper. To accurately predict link performance, statistical methods and Bit Error Rate (BER) measurements have been examined. The results show that the 1Gb/s fiber optic link meets the BER requirement and values for link margin can reach up to 13dB. Analysis shows that the suggested photonic network may provide high performance and low cost interconnections alternative for future military avionics.
Novel image encryption algorithm based on multiple-parameter discrete fractional random transform
NASA Astrophysics Data System (ADS)
Zhou, Nanrun; Dong, Taiji; Wu, Jianhua
2010-08-01
A new method of digital image encryption is presented by utilizing a new multiple-parameter discrete fractional random transform. Image encryption and decryption are performed based on the index additivity and multiple parameters of the multiple-parameter fractional random transform. The plaintext and ciphertext are respectively in the spatial domain and in the fractional domain determined by the encryption keys. The proposed algorithm can resist statistic analyses effectively. The computer simulation results show that the proposed encryption algorithm is sensitive to the multiple keys, and that it has considerable robustness, noise immunity and security.
Evaluation of the ecological relevance of mysid toxicity tests using population modeling techniques
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kuhn-Hines, A.; Munns, W.R. Jr.; Lussier, S.
1995-12-31
A number of acute and chronic bioassay statistics are used to evaluate the toxicity and risks of chemical stressors to the mysid shrimp, Mysidopsis bahia. These include LC{sub 50}S from acute tests, NOECs from 7-day and life-cycle tests, and the US EPA Water Quality Criteria Criterion Continuous Concentrations (CCC). Because these statistics are generated from endpoints which focus upon the responses of individual organisms, their relationships to significant effects at higher levels of ecological organization are unknown. This study was conducted to evaluate the quantitative relationships between toxicity test statistics and a concentration-based statistic derived from exposure-response models describing populationmore » growth rate ({lambda}) to stressor concentration. This statistic, C{sup {sm_bullet}} (concentration where {lambda} = I, zero population growth) describes the concentration above which mysid populations are projected to decline in abundance as determined using population modeling techniques. An analysis of M. bahia responses to 9 metals and 9 organic contaminants indicated the NOEC from life-cycle tests to be the best predictor of C{sup {sm_bullet}}, although the acute LC{sub 50} predicted population-level response surprisingly well. These analyses provide useful information regarding uncertainties of extrapolation among test statistics in assessments of ecological risk.« less
Memory matters: influence from a cognitive map on animal space use.
Gautestad, Arild O
2011-10-21
A vertebrate individual's cognitive map provides a capacity for site fidelity and long-distance returns to favorable patches. Fractal-geometrical analysis of individual space use based on collection of telemetry fixes makes it possible to verify the influence of a cognitive map on the spatial scatter of habitat use and also to what extent space use has been of a scale-specific versus a scale-free kind. This approach rests on a statistical mechanical level of system abstraction, where micro-scale details of behavioral interactions are coarse-grained to macro-scale observables like the fractal dimension of space use. In this manner, the magnitude of the fractal dimension becomes a proxy variable for distinguishing between main classes of habitat exploration and site fidelity, like memory-less (Markovian) Brownian motion and Levy walk and memory-enhanced space use like Multi-scaled Random Walk (MRW). In this paper previous analyses are extended by exploring MRW simulations under three scenarios: (1) central place foraging, (2) behavioral adaptation to resource depletion (avoidance of latest visited locations) and (3) transition from MRW towards Levy walk by narrowing memory capacity to a trailing time window. A generalized statistical-mechanical theory with the power to model cognitive map influence on individual space use will be important for statistical analyses of animal habitat preferences and the mechanics behind site fidelity and home ranges. Copyright © 2011 Elsevier Ltd. All rights reserved.
Using Network Analysis to Characterize Biogeographic Data in a Community Archive
NASA Astrophysics Data System (ADS)
Wellman, T. P.; Bristol, S.
2017-12-01
Informative measures are needed to evaluate and compare data from multiple providers in a community-driven data archive. This study explores insights from network theory and other descriptive and inferential statistics to examine data content and application across an assemblage of publically available biogeographic data sets. The data are archived in ScienceBase, a collaborative catalog of scientific data supported by the U.S Geological Survey to enhance scientific inquiry and acuity. In gaining understanding through this investigation and other scientific venues our goal is to improve scientific insight and data use across a spectrum of scientific applications. Network analysis is a tool to reveal patterns of non-trivial topological features in the data that do not exhibit complete regularity or randomness. In this work, network analyses are used to explore shared events and dependencies between measures of data content and application derived from metadata and catalog information and measures relevant to biogeographic study. Descriptive statistical tools are used to explore relations between network analysis properties, while inferential statistics are used to evaluate the degree of confidence in these assessments. Network analyses have been used successfully in related fields to examine social awareness of scientific issues, taxonomic structures of biological organisms, and ecosystem resilience to environmental change. Use of network analysis also shows promising potential to identify relationships in biogeographic data that inform programmatic goals and scientific interests.
Holocaust exposure and subsequent suicide risk: a population-based study.
Bursztein Lipsicas, Cendrine; Levav, Itzhak; Levine, Stephen Z
2017-03-01
To examine the association between the extent of genocide exposure and subsequent suicide risk among Holocaust survivors. Persons born in Holocaust-exposed European countries during the years 1922-1945 that immigrated to Israel by 1965 were identified in the Population Registry (N = 209,429), and followed up for suicide (1950-2014). They were divided into three groups based on likely exposure to Nazi persecution: those who immigrated before (indirect; n = 20,229; 10%), during (partial direct; n = 17,189; 8%), and after (full direct; n = 172,061; 82%) World War II. Groups were contrasted for suicide risk, accounting for the extent of genocide in their respective countries of origin, high (>70%) or lower levels (<50%). Cox model survival analyses were computed examining calendar year at suicide. Sensitivity analyses were recomputed for two additional suicide-associated variables (age and years since immigration) for each exposure group. All analyses were adjusted for confounders. Survival analysis showed that compared to the indirect exposure group, the partial direct exposure group from countries with high genocide level had a statistically significant (P < .05) increased suicide risk for the main outcome (calendar year: HR 1.78, 95% CI 1.09, 2.90). This effect significantly (P < .05) replicated in two sensitivity analyses for countries with higher relative levels of genocide (age: HR 1.77, 95% CI 1.09, 2.89; years since immigration: HR 1.85, 95% CI 1.14, 3.02). The full direct exposure group was not at significant suicide risk compared to the indirect exposure group. Suicide associations for groups from countries with relative lower level of genocide were not statistically significant. This study partly converges with findings identifying Holocaust survivors (full direct exposure) as a resilient group. A tentative mechanism for higher vulnerability to suicide risk of the partial direct exposure group from countries with higher genocide exposure includes protracted guilt feelings, having directly witnessed atrocities and escaped death.
Meta-Analysis of Inquiry-Based Instruction Research
NASA Astrophysics Data System (ADS)
Hasanah, N.; Prasetyo, A. P. B.; Rudyatmi, E.
2017-04-01
Inquiry-based instruction in biology has been the focus of educational research conducted by Unnes biology department students in collaboration with their university supervisors. This study aimed to describe the methodological aspects, inquiry teaching methods critically, and to analyse the results claims, of the selected four student research reports, grounded in inquiry, based on the database of Unnes biology department 2014. Four experimental quantitative research of 16 were selected as research objects by purposive sampling technique. Data collected through documentation study was qualitatively analysed regarding methods used, quality of inquiry syntax, and finding claims. Findings showed that the student research was still the lack of relevant aspects of research methodology, namely in appropriate sampling procedures, limited validity tests of all research instruments, and the limited parametric statistic (t-test) not supported previously by data normality tests. Their consistent inquiry syntax supported the four mini-thesis claims that inquiry-based teaching influenced their dependent variables significantly. In other words, the findings indicated that positive claims of the research results were not fully supported by good research methods, and well-defined inquiry procedures implementation.
Souza, Isys Mascarenhas; Funch, Ligia Silveira; de Queiroz, Luciano Paganucci
2014-01-01
Abstract Hymenaea is a genus of the Resin-producing Clade of the tribe Detarieae (Leguminosae: Caesalpinioideae) with 14 species. Hymenaea courbaril is the most widespread species of the genus, ranging from southern Mexico to southeastern Brazil. As currently circumscribed, Hymenaea courbaril is a polytypic species with six varieties: var. altissima, var. courbaril, var. longifolia, var. stilbocarpa, var. subsessilis, and var. villosa. These varieties are distinguishable mostly by traits related to leaflet shape and indumentation, and calyx indumentation. We carried out morphometric analyses of 14 quantitative (continuous) leaf characters in order to assess the taxonomy of Hymenaea courbaril under the Unified Species Concept framework. Cluster analysis used the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) based on Bray-Curtis dissimilarity matrices. Principal Component Analyses (PCA) were carried out based on the same morphometric matrix. Two sets of Analyses of Similarity and Non Parametric Multivariate Analysis of Variance were carried out to evaluate statistical support (1) for the major groups recovered using UPGMA and PCA, and (2) for the varieties. All analyses recovered three major groups coincident with (1) var. altissima, (2) var. longifolia, and (3) all other varieties. These results, together with geographical and habitat information, were taken as evidence of three separate metapopulation lineages recognized here as three distinct species. Nomenclatural adjustments, including reclassifying formerly misapplied types, are proposed. PMID:25009440
Souza, Isys Mascarenhas; Funch, Ligia Silveira; de Queiroz, Luciano Paganucci
2014-01-01
Hymenaea is a genus of the Resin-producing Clade of the tribe Detarieae (Leguminosae: Caesalpinioideae) with 14 species. Hymenaea courbaril is the most widespread species of the genus, ranging from southern Mexico to southeastern Brazil. As currently circumscribed, Hymenaea courbaril is a polytypic species with six varieties: var. altissima, var. courbaril, var. longifolia, var. stilbocarpa, var. subsessilis, and var. villosa. These varieties are distinguishable mostly by traits related to leaflet shape and indumentation, and calyx indumentation. We carried out morphometric analyses of 14 quantitative (continuous) leaf characters in order to assess the taxonomy of Hymenaea courbaril under the Unified Species Concept framework. Cluster analysis used the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) based on Bray-Curtis dissimilarity matrices. Principal Component Analyses (PCA) were carried out based on the same morphometric matrix. Two sets of Analyses of Similarity and Non Parametric Multivariate Analysis of Variance were carried out to evaluate statistical support (1) for the major groups recovered using UPGMA and PCA, and (2) for the varieties. All analyses recovered three major groups coincident with (1) var. altissima, (2) var. longifolia, and (3) all other varieties. These results, together with geographical and habitat information, were taken as evidence of three separate metapopulation lineages recognized here as three distinct species. Nomenclatural adjustments, including reclassifying formerly misapplied types, are proposed.