Sample records for multivariate biasing statistics

  1. [Biases in the study of prognostic factors].

    PubMed

    Delgado-Rodríguez, M

    1999-01-01

    The main objective is to detail the main biases in the study of prognostic factors. Confounding bias is illustrated with social class, a prognostic factor still discussed. Within selection bias several cases are commented: response bias, specially frequent when the patients of a clinical trial are used; the shortcomings in the formation of an inception cohort; the fallacy of Neyman (bias due to the duration of disease) when the study begins with a cross-sectional study; the selection bias in the treatment of survivors for the different treatment opportunity of those living longer; the bias due to the inclusion of heterogeneous diagnostic groups; and the selection bias due to differential information losses and the use of statistical multivariate procedures. Within the biases during follow-up, an empiric rule to value the impact of the number of losses is given. In information bias the Will Rogers' phenomenon and the usefulness of clinical databases are discussed. Lastly, a recommendation against the use of cutoff points yielded by bivariate analyses to select the variable to be included in multivariate analysis is given.

  2. Multivariate bias adjustment of high-dimensional climate simulations: the Rank Resampling for Distributions and Dependences (R2D2) bias correction

    NASA Astrophysics Data System (ADS)

    Vrac, Mathieu

    2018-06-01

    Climate simulations often suffer from statistical biases with respect to observations or reanalyses. It is therefore common to correct (or adjust) those simulations before using them as inputs into impact models. However, most bias correction (BC) methods are univariate and so do not account for the statistical dependences linking the different locations and/or physical variables of interest. In addition, they are often deterministic, and stochasticity is frequently needed to investigate climate uncertainty and to add constrained randomness to climate simulations that do not possess a realistic variability. This study presents a multivariate method of rank resampling for distributions and dependences (R2D2) bias correction allowing one to adjust not only the univariate distributions but also their inter-variable and inter-site dependence structures. Moreover, the proposed R2D2 method provides some stochasticity since it can generate as many multivariate corrected outputs as the number of statistical dimensions (i.e., number of grid cell × number of climate variables) of the simulations to be corrected. It is based on an assumption of stability in time of the dependence structure - making it possible to deal with a high number of statistical dimensions - that lets the climate model drive the temporal properties and their changes in time. R2D2 is applied on temperature and precipitation reanalysis time series with respect to high-resolution reference data over the southeast of France (1506 grid cell). Bivariate, 1506-dimensional and 3012-dimensional versions of R2D2 are tested over a historical period and compared to a univariate BC. How the different BC methods behave in a climate change context is also illustrated with an application to regional climate simulations over the 2071-2100 period. The results indicate that the 1d-BC basically reproduces the climate model multivariate properties, 2d-R2D2 is only satisfying in the inter-variable context, 1506d-R2D2 strongly improves inter-site properties and 3012d-R2D2 is able to account for both. Applications of the proposed R2D2 method to various climate datasets are relevant for many impact studies. The perspectives of improvements are numerous, such as introducing stochasticity in the dependence itself, questioning its stability assumption, and accounting for temporal properties adjustment while including more physics in the adjustment procedures.

  3. Analysis of Developmental Data: Comparison Among Alternative Methods

    ERIC Educational Resources Information Center

    Wilson, Ronald S.

    1975-01-01

    To examine the ability of the correction factor epsilon to counteract statistical bias in univariate analysis, an analysis of variance (adjusted by epsilon) and a multivariate analysis of variance were performed on the same data. The results indicated that univariate analysis is a fully protected design when used with epsilon. (JMB)

  4. Calibration transfer of a Raman spectroscopic quantification method for the assessment of liquid detergent compositions from at-line laboratory to in-line industrial scale.

    PubMed

    Brouckaert, D; Uyttersprot, J-S; Broeckx, W; De Beer, T

    2018-03-01

    Calibration transfer or standardisation aims at creating a uniform spectral response on different spectroscopic instruments or under varying conditions, without requiring a full recalibration for each situation. In the current study, this strategy is applied to construct at-line multivariate calibration models and consequently employ them in-line in a continuous industrial production line, using the same spectrometer. Firstly, quantitative multivariate models are constructed at-line at laboratory scale for predicting the concentration of two main ingredients in hard surface cleaners. By regressing the Raman spectra of a set of small-scale calibration samples against their reference concentration values, partial least squares (PLS) models are developed to quantify the surfactant levels in the liquid detergent compositions under investigation. After evaluating the models performance with a set of independent validation samples, a univariate slope/bias correction is applied in view of transporting these at-line calibration models to an in-line manufacturing set-up. This standardisation technique allows a fast and easy transfer of the PLS regression models, by simply correcting the model predictions on the in-line set-up, without adjusting anything to the original multivariate calibration models. An extensive statistical analysis is performed in order to assess the predictive quality of the transferred regression models. Before and after transfer, the R 2 and RMSEP of both models is compared for evaluating if their magnitude is similar. T-tests are then performed to investigate whether the slope and intercept of the transferred regression line are not statistically different from 1 and 0, respectively. Furthermore, it is inspected whether no significant bias can be noted. F-tests are executed as well, for assessing the linearity of the transfer regression line and for investigating the statistical coincidence of the transfer and validation regression line. Finally, a paired t-test is performed to compare the original at-line model to the slope/bias corrected in-line model, using interval hypotheses. It is shown that the calibration models of Surfactant 1 and Surfactant 2 yield satisfactory in-line predictions after slope/bias correction. While Surfactant 1 passes seven out of eight statistical tests, the recommended validation parameters are 100% successful for Surfactant 2. It is hence concluded that the proposed strategy for transferring at-line calibration models to an in-line industrial environment via a univariate slope/bias correction of the predicted values offers a successful standardisation approach. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Simultaneous calibration of ensemble river flow predictions over an entire range of lead times

    NASA Astrophysics Data System (ADS)

    Hemri, S.; Fundel, F.; Zappa, M.

    2013-10-01

    Probabilistic estimates of future water levels and river discharge are usually simulated with hydrologic models using ensemble weather forecasts as main inputs. As hydrologic models are imperfect and the meteorological ensembles tend to be biased and underdispersed, the ensemble forecasts for river runoff typically are biased and underdispersed, too. Thus, in order to achieve both reliable and sharp predictions statistical postprocessing is required. In this work Bayesian model averaging (BMA) is applied to statistically postprocess ensemble runoff raw forecasts for a catchment in Switzerland, at lead times ranging from 1 to 240 h. The raw forecasts have been obtained using deterministic and ensemble forcing meteorological models with different forecast lead time ranges. First, BMA is applied based on mixtures of univariate normal distributions, subject to the assumption of independence between distinct lead times. Then, the independence assumption is relaxed in order to estimate multivariate runoff forecasts over the entire range of lead times simultaneously, based on a BMA version that uses multivariate normal distributions. Since river runoff is a highly skewed variable, Box-Cox transformations are applied in order to achieve approximate normality. Both univariate and multivariate BMA approaches are able to generate well calibrated probabilistic forecasts that are considerably sharper than climatological forecasts. Additionally, multivariate BMA provides a promising approach for incorporating temporal dependencies into the postprocessed forecasts. Its major advantage against univariate BMA is an increase in reliability when the forecast system is changing due to model availability.

  6. Visual classification of very fine-grained sediments: Evaluation through univariate and multivariate statistics

    USGS Publications Warehouse

    Hohn, M. Ed; Nuhfer, E.B.; Vinopal, R.J.; Klanderman, D.S.

    1980-01-01

    Classifying very fine-grained rocks through fabric elements provides information about depositional environments, but is subject to the biases of visual taxonomy. To evaluate the statistical significance of an empirical classification of very fine-grained rocks, samples from Devonian shales in four cored wells in West Virginia and Virginia were measured for 15 variables: quartz, illite, pyrite and expandable clays determined by X-ray diffraction; total sulfur, organic content, inorganic carbon, matrix density, bulk density, porosity, silt, as well as density, sonic travel time, resistivity, and ??-ray response measured from well logs. The four lithologic types comprised: (1) sharply banded shale, (2) thinly laminated shale, (3) lenticularly laminated shale, and (4) nonbanded shale. Univariate and multivariate analyses of variance showed that the lithologic classification reflects significant differences for the variables measured, difference that can be detected independently of stratigraphic effects. Little-known statistical methods found useful in this work included: the multivariate analysis of variance with more than one effect, simultaneous plotting of samples and variables on canonical variates, and the use of parametric ANOVA and MANOVA on ranked data. ?? 1980 Plenum Publishing Corporation.

  7. Uncertainty Analysis of Instrument Calibration and Application

    NASA Technical Reports Server (NTRS)

    Tripp, John S.; Tcheng, Ping

    1999-01-01

    Experimental aerodynamic researchers require estimated precision and bias uncertainties of measured physical quantities, typically at 95 percent confidence levels. Uncertainties of final computed aerodynamic parameters are obtained by propagation of individual measurement uncertainties through the defining functional expressions. In this paper, rigorous mathematical techniques are extended to determine precision and bias uncertainties of any instrument-sensor system. Through this analysis, instrument uncertainties determined through calibration are now expressed as functions of the corresponding measurement for linear and nonlinear univariate and multivariate processes. Treatment of correlated measurement precision error is developed. During laboratory calibration, calibration standard uncertainties are assumed to be an order of magnitude less than those of the instrument being calibrated. Often calibration standards do not satisfy this assumption. This paper applies rigorous statistical methods for inclusion of calibration standard uncertainty and covariance due to the order of their application. The effects of mathematical modeling error on calibration bias uncertainty are quantified. The effects of experimental design on uncertainty are analyzed. The importance of replication is emphasized, techniques for estimation of both bias and precision uncertainties using replication are developed. Statistical tests for stationarity of calibration parameters over time are obtained.

  8. Systematic review of the methodological quality of controlled trials evaluating Chinese herbal medicine in patients with rheumatoid arthritis

    PubMed Central

    Pan, Xin; Lopez-Olivo, Maria A; Song, Juhee; Pratt, Gregory; Suarez-Almazor, Maria E

    2017-01-01

    Objectives We appraised the methodological and reporting quality of randomised controlled clinical trials (RCTs) evaluating the efficacy and safety of Chinese herbal medicine (CHM) in patients with rheumatoid arthritis (RA). Design For this systematic review, electronic databases were searched from inception until June 2015. The search was limited to humans and non-case report studies, but was not limited by language, year of publication or type of publication. Two independent reviewers selected RCTs, evaluating CHM in RA (herbals and decoctions). Descriptive statistics were used to report on risk of bias and their adherence to reporting standards. Multivariable logistic regression analysis was performed to determine study characteristics associated with high or unclear risk of bias. Results Out of 2342 unique citations, we selected 119 RCTs including 18 919 patients: 10 108 patients received CHM alone and 6550 received one of 11 treatment combinations. A high risk of bias was observed across all domains: 21% had a high risk for selection bias (11% from sequence generation and 30% from allocation concealment), 85% for performance bias, 89% for detection bias, 4% for attrition bias and 40% for reporting bias. In multivariable analysis, fewer authors were associated with selection bias (allocation concealment), performance bias and attrition bias, and earlier year of publication and funding source not reported or disclosed were associated with selection bias (sequence generation). Studies published in non-English language were associated with reporting bias. Poor adherence to recommended reporting standards (<60% of the studies not providing sufficient information) was observed in 11 of the 23 sections evaluated. Limitations Study quality and data extraction were performed by one reviewer and cross-checked by a second reviewer. Translation to English was performed by one reviewer in 85% of the included studies. Conclusions Studies evaluating CHM often fail to meet expected methodological criteria, and high-quality evidence is lacking. PMID:28249848

  9. Unconscious race and class bias: its association with decision making by trauma and acute care surgeons.

    PubMed

    Haider, Adil H; Schneider, Eric B; Sriram, N; Dossick, Deborah S; Scott, Valerie K; Swoboda, Sandra M; Losonczy, Lia; Haut, Elliott R; Efron, David T; Pronovost, Peter J; Freischlag, Julie A; Lipsett, Pamela A; Cornwell, Edward E; MacKenzie, Ellen J; Cooper, Lisa A

    2014-09-01

    Recent studies have found that unconscious biases may influence physicians' clinical decision making. The objective of our study was to determine, using clinical vignettes, if unconscious race and class biases exist specifically among trauma/acute care surgeons and, if so, whether those biases impact surgeons' clinical decision making. A prospective Web-based survey was administered to active members of the Eastern Association for the Surgery of Trauma. Participants completed nine clinical vignettes, each with three trauma/acute care surgery management questions. Race Implicit Association Test (IAT) and social class IAT assessments were completed by each participant. Multivariable, ordered logistic regression analysis was then used to determine whether implicit biases reflected on the IAT tests were associated with vignette responses. In total, 248 members of the Eastern Association for the Surgery of Trauma participated. Of these, 79% explicitly stated that they had no race preferences and 55% stated they had no social class preferences. However, 73.5% of the participants had IAT scores demonstrating an unconscious preference toward white persons; 90.7% demonstrated an implicit preference toward upper social class persons. Only 2 of 27 vignette-based clinical decisions were associated with patient race or social class on univariate analyses. Multivariable analyses revealed no relationship between IAT scores and vignette-based clinical assessments. Unconscious preferences for white and upper-class persons are prevalent among trauma and acute care surgeons. In this study, these biases were not statistically significantly associated with clinical decision making. Further study of the factors that may prevent implicit biases from influencing patient management is warranted. Epidemiologic study, level II.

  10. Multivariate Statistics Applied to Seismic Phase Picking

    NASA Astrophysics Data System (ADS)

    Velasco, A. A.; Zeiler, C. P.; Anderson, D.; Pingitore, N. E.

    2008-12-01

    The initial effort of the Seismogram Picking Error from Analyst Review (SPEAR) project has been to establish a common set of seismograms to be picked by the seismological community. Currently we have 13 analysts from 4 institutions that have provided picks on the set of 26 seismograms. In comparing the picks thus far, we have identified consistent biases between picks from different institutions; effects of the experience of analysts; and the impact of signal-to-noise on picks. The institutional bias in picks brings up the important concern that picks will not be the same between different catalogs. This difference means less precision and accuracy when combing picks from multiple institutions. We also note that depending on the experience level of the analyst making picks for a catalog the error could fluctuate dramatically. However, the experience level is based off of number of years in picking seismograms and this may not be an appropriate criterion for determining an analyst's precision. The common data set of seismograms provides a means to test an analyst's level of precision and biases. The analyst is also limited by the quality of the signal and we show that the signal-to-noise ratio and pick error are correlated to the location, size and distance of the event. This makes the standard estimate of picking error based on SNR more complex because additional constraints are needed to accurately constrain the measurement error. We propose to extend the current measurement of error by adding the additional constraints of institutional bias and event characteristics to the standard SNR measurement. We use multivariate statistics to model the data and provide constraints to accurately assess earthquake location and measurement errors.

  11. Detection and Attribution of Simulated Climatic Extreme Events and Impacts: High Sensitivity to Bias Correction

    NASA Astrophysics Data System (ADS)

    Sippel, S.; Otto, F. E. L.; Forkel, M.; Allen, M. R.; Guillod, B. P.; Heimann, M.; Reichstein, M.; Seneviratne, S. I.; Kirsten, T.; Mahecha, M. D.

    2015-12-01

    Understanding, quantifying and attributing the impacts of climatic extreme events and variability is crucial for societal adaptation in a changing climate. However, climate model simulations generated for this purpose typically exhibit pronounced biases in their output that hinders any straightforward assessment of impacts. To overcome this issue, various bias correction strategies are routinely used to alleviate climate model deficiencies most of which have been criticized for physical inconsistency and the non-preservation of the multivariate correlation structure. We assess how biases and their correction affect the quantification and attribution of simulated extremes and variability in i) climatological variables and ii) impacts on ecosystem functioning as simulated by a terrestrial biosphere model. Our study demonstrates that assessments of simulated climatic extreme events and impacts in the terrestrial biosphere are highly sensitive to bias correction schemes with major implications for the detection and attribution of these events. We introduce a novel ensemble-based resampling scheme based on a large regional climate model ensemble generated by the distributed weather@home setup[1], which fully preserves the physical consistency and multivariate correlation structure of the model output. We use extreme value statistics to show that this procedure considerably improves the representation of climatic extremes and variability. Subsequently, biosphere-atmosphere carbon fluxes are simulated using a terrestrial ecosystem model (LPJ-GSI) to further demonstrate the sensitivity of ecosystem impacts to the methodology of bias correcting climate model output. We find that uncertainties arising from bias correction schemes are comparable in magnitude to model structural and parameter uncertainties. The present study consists of a first attempt to alleviate climate model biases in a physically consistent way and demonstrates that this yields improved simulations of climate extremes and associated impacts. [1] http://www.climateprediction.net/weatherathome/

  12. Multivariate statistical model for 3D image segmentation with application to medical images.

    PubMed

    John, Nigel M; Kabuka, Mansur R; Ibrahim, Mohamed O

    2003-12-01

    In this article we describe a statistical model that was developed to segment brain magnetic resonance images. The statistical segmentation algorithm was applied after a pre-processing stage involving the use of a 3D anisotropic filter along with histogram equalization techniques. The segmentation algorithm makes use of prior knowledge and a probability-based multivariate model designed to semi-automate the process of segmentation. The algorithm was applied to images obtained from the Center for Morphometric Analysis at Massachusetts General Hospital as part of the Internet Brain Segmentation Repository (IBSR). The developed algorithm showed improved accuracy over the k-means, adaptive Maximum Apriori Probability (MAP), biased MAP, and other algorithms. Experimental results showing the segmentation and the results of comparisons with other algorithms are provided. Results are based on an overlap criterion against expertly segmented images from the IBSR. The algorithm produced average results of approximately 80% overlap with the expertly segmented images (compared with 85% for manual segmentation and 55% for other algorithms).

  13. Uncertainty Analysis of Inertial Model Attitude Sensor Calibration and Application with a Recommended New Calibration Method

    NASA Technical Reports Server (NTRS)

    Tripp, John S.; Tcheng, Ping

    1999-01-01

    Statistical tools, previously developed for nonlinear least-squares estimation of multivariate sensor calibration parameters and the associated calibration uncertainty analysis, have been applied to single- and multiple-axis inertial model attitude sensors used in wind tunnel testing to measure angle of attack and roll angle. The analysis provides confidence and prediction intervals of calibrated sensor measurement uncertainty as functions of applied input pitch and roll angles. A comparative performance study of various experimental designs for inertial sensor calibration is presented along with corroborating experimental data. The importance of replicated calibrations over extended time periods has been emphasized; replication provides independent estimates of calibration precision and bias uncertainties, statistical tests for calibration or modeling bias uncertainty, and statistical tests for sensor parameter drift over time. A set of recommendations for a new standardized model attitude sensor calibration method and usage procedures is included. The statistical information provided by these procedures is necessary for the uncertainty analysis of aerospace test results now required by users of industrial wind tunnel test facilities.

  14. Systematic review of the methodological quality of controlled trials evaluating Chinese herbal medicine in patients with rheumatoid arthritis.

    PubMed

    Pan, Xin; Lopez-Olivo, Maria A; Song, Juhee; Pratt, Gregory; Suarez-Almazor, Maria E

    2017-03-01

    We appraised the methodological and reporting quality of randomised controlled clinical trials (RCTs) evaluating the efficacy and safety of Chinese herbal medicine (CHM) in patients with rheumatoid arthritis (RA). For this systematic review, electronic databases were searched from inception until June 2015. The search was limited to humans and non-case report studies, but was not limited by language, year of publication or type of publication. Two independent reviewers selected RCTs, evaluating CHM in RA (herbals and decoctions). Descriptive statistics were used to report on risk of bias and their adherence to reporting standards. Multivariable logistic regression analysis was performed to determine study characteristics associated with high or unclear risk of bias. Out of 2342 unique citations, we selected 119 RCTs including 18 919 patients: 10 108 patients received CHM alone and 6550 received one of 11 treatment combinations. A high risk of bias was observed across all domains: 21% had a high risk for selection bias (11% from sequence generation and 30% from allocation concealment), 85% for performance bias, 89% for detection bias, 4% for attrition bias and 40% for reporting bias. In multivariable analysis, fewer authors were associated with selection bias (allocation concealment), performance bias and attrition bias, and earlier year of publication and funding source not reported or disclosed were associated with selection bias (sequence generation). Studies published in non-English language were associated with reporting bias. Poor adherence to recommended reporting standards (<60% of the studies not providing sufficient information) was observed in 11 of the 23 sections evaluated. Study quality and data extraction were performed by one reviewer and cross-checked by a second reviewer. Translation to English was performed by one reviewer in 85% of the included studies. Studies evaluating CHM often fail to meet expected methodological criteria, and high-quality evidence is lacking. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  15. Post-processing of multi-hydrologic model simulations for improved streamflow projections

    NASA Astrophysics Data System (ADS)

    khajehei, sepideh; Ahmadalipour, Ali; Moradkhani, Hamid

    2016-04-01

    Hydrologic model outputs are prone to bias and uncertainty due to knowledge deficiency in model and data. Uncertainty in hydroclimatic projections arises due to uncertainty in hydrologic model as well as the epistemic or aleatory uncertainties in GCM parameterization and development. This study is conducted to: 1) evaluate the recently developed multi-variate post-processing method for historical simulations and 2) assess the effect of post-processing on uncertainty and reliability of future streamflow projections in both high-flow and low-flow conditions. The first objective is performed for historical period of 1970-1999. Future streamflow projections are generated for 10 statistically downscaled GCMs from two widely used downscaling methods: Bias Corrected Statistically Downscaled (BCSD) and Multivariate Adaptive Constructed Analogs (MACA), over the period of 2010-2099 for two representative concentration pathways of RCP4.5 and RCP8.5. Three semi-distributed hydrologic models were employed and calibrated at 1/16 degree latitude-longitude resolution for over 100 points across the Columbia River Basin (CRB) in the pacific northwest USA. Streamflow outputs are post-processed through a Bayesian framework based on copula functions. The post-processing approach is relying on a transfer function developed based on bivariate joint distribution between the observation and simulation in historical period. Results show that application of post-processing technique leads to considerably higher accuracy in historical simulations and also reducing model uncertainty in future streamflow projections.

  16. The Plant Ionome Revisited by the Nutrient Balance Concept

    PubMed Central

    Parent, Serge-Étienne; Parent, Léon Etienne; Egozcue, Juan José; Rozane, Danilo-Eduardo; Hernandes, Amanda; Lapointe, Line; Hébert-Gentile, Valérie; Naess, Kristine; Marchand, Sébastien; Lafond, Jean; Mattos, Dirceu; Barlow, Philip; Natale, William

    2013-01-01

    Tissue analysis is commonly used in ecology and agronomy to portray plant nutrient signatures. Nutrient concentration data, or ionomes, belong to the compositional data class, i.e., multivariate data that are proportions of some whole, hence carrying important numerical properties. Statistics computed across raw or ordinary log-transformed nutrient data are intrinsically biased, hence possibly leading to wrong inferences. Our objective was to present a sound and robust approach based on a novel nutrient balance concept to classify plant ionomes. We analyzed leaf N, P, K, Ca, and Mg of two wild and six domesticated fruit species from Canada, Brazil, and New Zealand sampled during reproductive stages. Nutrient concentrations were (1) analyzed without transformation, (2) ordinary log-transformed as commonly but incorrectly applied in practice, (3) additive log-ratio (alr) transformed as surrogate to stoichiometric rules, and (4) converted to isometric log-ratios (ilr) arranged as sound nutrient balance variables. Raw concentration and ordinary log transformation both led to biased multivariate analysis due to redundancy between interacting nutrients. The alr- and ilr-transformed data provided unbiased discriminant analyses of plant ionomes, where wild and domesticated species formed distinct groups and the ionomes of species and cultivars were differentiated without numerical bias. The ilr nutrient balance concept is preferable to alr, because the ilr technique projects the most important interactions between nutrients into a convenient Euclidean space. This novel numerical approach allows rectifying historical biases and supervising phenotypic plasticity in plant nutrition studies. PMID:23526060

  17. Industry Bias in Randomized Controlled Trials in General and Abdominal Surgery: An Empirical Study.

    PubMed

    Probst, Pascal; Knebel, Phillip; Grummich, Kathrin; Tenckhoff, Solveig; Ulrich, Alexis; Büchler, Markus W; Diener, Markus K

    2016-07-01

    Industry sponsorship has been identified as a source of bias in several fields of medical science. To date, the influence of industry sponsorship in the field of general and abdominal surgery has not been evaluated. A systematic literature search (1985-2014) was performed in the Cochrane Library, MEDLINE, and EMBASE to identify randomized controlled trials in general and abdominal surgery. Information on funding source, outcome, and methodological quality was extracted. Association of industry sponsorship and positive outcome was expressed as odds ratio (OR) with 95% confidence interval (CI). A χ test and a multivariate logistic regression analysis with study characteristics and known sources of bias were performed. A total of 7934 articles were screened and 165 randomized controlled trials were included. No difference in methodological quality was found. Industry-funded trials more often presented statistically significant results for the primary endpoint (OR, 2.44; CI, 1.04-5.71; P = 0.04). Eighty-eight of 115 (76.5%) industry-funded trials and 19 of 50 (38.0%) non-industry-funded trials reported a positive outcome (OR, 5.32; CI, 2.60-10.88; P < 0.001). Industry-funded trials more often reported a positive outcome without statistical justification (OR, 5.79; CI, 2.13-15.68; P < 0.001). In a multivariate analysis, funding source remained significantly associated with reporting of positive outcome (P < 0.001). Industry funding of surgical trials leads to exaggerated positive reporting of outcomes. This study emphasizes the necessity for declaration of funding source. Industry involvement in surgical research has to ensure scientific integrity and independence and has to be based on full transparency.

  18. The development of comparative bias index

    NASA Astrophysics Data System (ADS)

    Aimran, Ahmad Nazim; Ahmad, Sabri; Afthanorhan, Asyraf; Awang, Zainudin

    2017-08-01

    Structural Equation Modeling (SEM) is a second generation statistical analysis techniques developed for analyzing the inter-relationships among multiple variables in a model simultaneously. There are two most common used methods in SEM namely Covariance-Based Structural Equation Modeling (CB-SEM) and Partial Least Square Path Modeling (PLS-PM). There have been continuous debates among researchers in the use of PLS-PM over CB-SEM. While there is few studies were conducted to test the performance of CB-SEM and PLS-PM bias in estimating simulation data. This study intends to patch this problem by a) developing the Comparative Bias Index and b) testing the performance of CB-SEM and PLS-PM using developed index. Based on balanced experimental design, two multivariate normal simulation data with of distinct specifications of size 50, 100, 200 and 500 are generated and analyzed using CB-SEM and PLS-PM.

  19. A reply to Zigler and Seitz.

    PubMed

    Neman, R

    1975-03-01

    The Zigler and Seitz (1975) critique was carefully examined with respect to the conclusions of the Neman et al. (1975) study. Particular attention was given to the following questions: (a) did experimenter bias or commitment account for the results, (b) were unreliable and invalid psychometric instruments used, (c) were the statistical analyses insufficient or incorrect, (d) did the results reflect no more than the operation of chance, and (e) were the results biased by artifactually inflated profile scores. Experimenter bias and commitment were shown to be insufficient to account for the results; a further review of Buros (1972) showed that there was no need for apprehension about the testing instruments; the statistical analyses were shown to exceed prevailing standards for research reporting; the results were shown to reflect valid findings at the .05 probability level; and the Neman et al. (1975) results for the profile measure were equally significant using either "raw" neurological scores or "scales" neurological age scores. Zigler, Seitz, and I agreed on the needs for (a) using multivariate analyses, where applicable, in studies having more than one dependent variable; (b) defining the population for which sensorimotor training procedures may be appropriately prescribed; and (c) validating the profile measure as a tool to assess neurological disorganization.

  20. Longitudinal assessment of treatment effects on pulmonary ventilation using 1H/3He MRI multivariate templates

    NASA Astrophysics Data System (ADS)

    Tustison, Nicholas J.; Contrella, Benjamin; Altes, Talissa A.; Avants, Brian B.; de Lange, Eduard E.; Mugler, John P.

    2013-03-01

    The utitlity of pulmonary functional imaging techniques, such as hyperpolarized 3He MRI, has encouraged their inclusion in research studies for longitudinal assessment of disease progression and the study of treatment effects. We present methodology for performing voxelwise statistical analysis of ventilation maps derived from hyper­ polarized 3He MRI which incorporates multivariate template construction using simultaneous acquisition of IH and 3He images. Additional processing steps include intensity normalization, bias correction, 4-D longitudinal segmentation, and generation of expected ventilation maps prior to voxelwise regression analysis. Analysis is demonstrated on a cohort of eight individuals with diagnosed cystic fibrosis (CF) undergoing treatment imaged five times every two weeks with a prescribed treatment schedule.

  1. A direct-gradient multivariate index of biotic condition

    USGS Publications Warehouse

    Miranda, Leandro E.; Aycock, J.N.; Killgore, K. J.

    2012-01-01

    Multimetric indexes constructed by summing metric scores have been criticized despite many of their merits. A leading criticism is the potential for investigator bias involved in metric selection and scoring. Often there is a large number of competing metrics equally well correlated with environmental stressors, requiring a judgment call by the investigator to select the most suitable metrics to include in the index and how to score them. Data-driven procedures for multimetric index formulation published during the last decade have reduced this limitation, yet apprehension remains. Multivariate approaches that select metrics with statistical algorithms may reduce the level of investigator bias and alleviate a weakness of multimetric indexes. We investigated the suitability of a direct-gradient multivariate procedure to derive an index of biotic condition for fish assemblages in oxbow lakes in the Lower Mississippi Alluvial Valley. Although this multivariate procedure also requires that the investigator identify a set of suitable metrics potentially associated with a set of environmental stressors, it is different from multimetric procedures because it limits investigator judgment in selecting a subset of biotic metrics to include in the index and because it produces metric weights suitable for computation of index scores. The procedure, applied to a sample of 35 competing biotic metrics measured at 50 oxbow lakes distributed over a wide geographical region in the Lower Mississippi Alluvial Valley, selected 11 metrics that adequately indexed the biotic condition of five test lakes. Because the multivariate index includes only metrics that explain the maximum variability in the stressor variables rather than a balanced set of metrics chosen to reflect various fish assemblage attributes, it is fundamentally different from multimetric indexes of biotic integrity with advantages and disadvantages. As such, it provides an alternative to multimetric procedures.

  2. Comparing interval estimates for small sample ordinal CFA models

    PubMed Central

    Natesan, Prathiba

    2015-01-01

    Robust maximum likelihood (RML) and asymptotically generalized least squares (AGLS) methods have been recommended for fitting ordinal structural equation models. Studies show that some of these methods underestimate standard errors. However, these studies have not investigated the coverage and bias of interval estimates. An estimate with a reasonable standard error could still be severely biased. This can only be known by systematically investigating the interval estimates. The present study compares Bayesian, RML, and AGLS interval estimates of factor correlations in ordinal confirmatory factor analysis models (CFA) for small sample data. Six sample sizes, 3 factor correlations, and 2 factor score distributions (multivariate normal and multivariate mildly skewed) were studied. Two Bayesian prior specifications, informative and relatively less informative were studied. Undercoverage of confidence intervals and underestimation of standard errors was common in non-Bayesian methods. Underestimated standard errors may lead to inflated Type-I error rates. Non-Bayesian intervals were more positive biased than negatively biased, that is, most intervals that did not contain the true value were greater than the true value. Some non-Bayesian methods had non-converging and inadmissible solutions for small samples and non-normal data. Bayesian empirical standard error estimates for informative and relatively less informative priors were closer to the average standard errors of the estimates. The coverage of Bayesian credibility intervals was closer to what was expected with overcoverage in a few cases. Although some Bayesian credibility intervals were wider, they reflected the nature of statistical uncertainty that comes with the data (e.g., small sample). Bayesian point estimates were also more accurate than non-Bayesian estimates. The results illustrate the importance of analyzing coverage and bias of interval estimates, and how ignoring interval estimates can be misleading. Therefore, editors and policymakers should continue to emphasize the inclusion of interval estimates in research. PMID:26579002

  3. Comparing interval estimates for small sample ordinal CFA models.

    PubMed

    Natesan, Prathiba

    2015-01-01

    Robust maximum likelihood (RML) and asymptotically generalized least squares (AGLS) methods have been recommended for fitting ordinal structural equation models. Studies show that some of these methods underestimate standard errors. However, these studies have not investigated the coverage and bias of interval estimates. An estimate with a reasonable standard error could still be severely biased. This can only be known by systematically investigating the interval estimates. The present study compares Bayesian, RML, and AGLS interval estimates of factor correlations in ordinal confirmatory factor analysis models (CFA) for small sample data. Six sample sizes, 3 factor correlations, and 2 factor score distributions (multivariate normal and multivariate mildly skewed) were studied. Two Bayesian prior specifications, informative and relatively less informative were studied. Undercoverage of confidence intervals and underestimation of standard errors was common in non-Bayesian methods. Underestimated standard errors may lead to inflated Type-I error rates. Non-Bayesian intervals were more positive biased than negatively biased, that is, most intervals that did not contain the true value were greater than the true value. Some non-Bayesian methods had non-converging and inadmissible solutions for small samples and non-normal data. Bayesian empirical standard error estimates for informative and relatively less informative priors were closer to the average standard errors of the estimates. The coverage of Bayesian credibility intervals was closer to what was expected with overcoverage in a few cases. Although some Bayesian credibility intervals were wider, they reflected the nature of statistical uncertainty that comes with the data (e.g., small sample). Bayesian point estimates were also more accurate than non-Bayesian estimates. The results illustrate the importance of analyzing coverage and bias of interval estimates, and how ignoring interval estimates can be misleading. Therefore, editors and policymakers should continue to emphasize the inclusion of interval estimates in research.

  4. Multiple imputation for handling missing outcome data when estimating the relative risk.

    PubMed

    Sullivan, Thomas R; Lee, Katherine J; Ryan, Philip; Salter, Amy B

    2017-09-06

    Multiple imputation is a popular approach to handling missing data in medical research, yet little is known about its applicability for estimating the relative risk. Standard methods for imputing incomplete binary outcomes involve logistic regression or an assumption of multivariate normality, whereas relative risks are typically estimated using log binomial models. It is unclear whether misspecification of the imputation model in this setting could lead to biased parameter estimates. Using simulated data, we evaluated the performance of multiple imputation for handling missing data prior to estimating adjusted relative risks from a correctly specified multivariable log binomial model. We considered an arbitrary pattern of missing data in both outcome and exposure variables, with missing data induced under missing at random mechanisms. Focusing on standard model-based methods of multiple imputation, missing data were imputed using multivariate normal imputation or fully conditional specification with a logistic imputation model for the outcome. Multivariate normal imputation performed poorly in the simulation study, consistently producing estimates of the relative risk that were biased towards the null. Despite outperforming multivariate normal imputation, fully conditional specification also produced somewhat biased estimates, with greater bias observed for higher outcome prevalences and larger relative risks. Deleting imputed outcomes from analysis datasets did not improve the performance of fully conditional specification. Both multivariate normal imputation and fully conditional specification produced biased estimates of the relative risk, presumably since both use a misspecified imputation model. Based on simulation results, we recommend researchers use fully conditional specification rather than multivariate normal imputation and retain imputed outcomes in the analysis when estimating relative risks. However fully conditional specification is not without its shortcomings, and so further research is needed to identify optimal approaches for relative risk estimation within the multiple imputation framework.

  5. Measurement bias detection with Kronecker product restricted models for multivariate longitudinal data: an illustration with health-related quality of life data from thirteen measurement occasions

    PubMed Central

    Verdam, Mathilde G. E.; Oort, Frans J.

    2014-01-01

    Highlights Application of Kronecker product to construct parsimonious structural equation models for multivariate longitudinal data. A method for the investigation of measurement bias with Kronecker product restricted models. Application of these methods to health-related quality of life data from bone metastasis patients, collected at 13 consecutive measurement occasions. The use of curves to facilitate substantive interpretation of apparent measurement bias. Assessment of change in common factor means, after accounting for apparent measurement bias. Longitudinal measurement invariance is usually investigated with a longitudinal factor model (LFM). However, with multiple measurement occasions, the number of parameters to be estimated increases with a multiple of the number of measurement occasions. To guard against too low ratios of numbers of subjects and numbers of parameters, we can use Kronecker product restrictions to model the multivariate longitudinal structure of the data. These restrictions can be imposed on all parameter matrices, including measurement invariance restrictions on factor loadings and intercepts. The resulting models are parsimonious and have attractive interpretation, but require different methods for the investigation of measurement bias. Specifically, additional parameter matrices are introduced to accommodate possible violations of measurement invariance. These additional matrices consist of measurement bias parameters that are either fixed at zero or free to be estimated. In cases of measurement bias, it is also possible to model the bias over time, e.g., with linear or non-linear curves. Measurement bias detection with Kronecker product restricted models will be illustrated with multivariate longitudinal data from 682 bone metastasis patients whose health-related quality of life (HRQL) was measured at 13 consecutive weeks. PMID:25295016

  6. Measurement bias detection with Kronecker product restricted models for multivariate longitudinal data: an illustration with health-related quality of life data from thirteen measurement occasions.

    PubMed

    Verdam, Mathilde G E; Oort, Frans J

    2014-01-01

    Application of Kronecker product to construct parsimonious structural equation models for multivariate longitudinal data.A method for the investigation of measurement bias with Kronecker product restricted models.Application of these methods to health-related quality of life data from bone metastasis patients, collected at 13 consecutive measurement occasions.The use of curves to facilitate substantive interpretation of apparent measurement bias.Assessment of change in common factor means, after accounting for apparent measurement bias.Longitudinal measurement invariance is usually investigated with a longitudinal factor model (LFM). However, with multiple measurement occasions, the number of parameters to be estimated increases with a multiple of the number of measurement occasions. To guard against too low ratios of numbers of subjects and numbers of parameters, we can use Kronecker product restrictions to model the multivariate longitudinal structure of the data. These restrictions can be imposed on all parameter matrices, including measurement invariance restrictions on factor loadings and intercepts. The resulting models are parsimonious and have attractive interpretation, but require different methods for the investigation of measurement bias. Specifically, additional parameter matrices are introduced to accommodate possible violations of measurement invariance. These additional matrices consist of measurement bias parameters that are either fixed at zero or free to be estimated. In cases of measurement bias, it is also possible to model the bias over time, e.g., with linear or non-linear curves. Measurement bias detection with Kronecker product restricted models will be illustrated with multivariate longitudinal data from 682 bone metastasis patients whose health-related quality of life (HRQL) was measured at 13 consecutive weeks.

  7. Multivariate quantile mapping bias correction: an N-dimensional probability density function transform for climate model simulations of multiple variables

    NASA Astrophysics Data System (ADS)

    Cannon, Alex J.

    2018-01-01

    Most bias correction algorithms used in climatology, for example quantile mapping, are applied to univariate time series. They neglect the dependence between different variables. Those that are multivariate often correct only limited measures of joint dependence, such as Pearson or Spearman rank correlation. Here, an image processing technique designed to transfer colour information from one image to another—the N-dimensional probability density function transform—is adapted for use as a multivariate bias correction algorithm (MBCn) for climate model projections/predictions of multiple climate variables. MBCn is a multivariate generalization of quantile mapping that transfers all aspects of an observed continuous multivariate distribution to the corresponding multivariate distribution of variables from a climate model. When applied to climate model projections, changes in quantiles of each variable between the historical and projection period are also preserved. The MBCn algorithm is demonstrated on three case studies. First, the method is applied to an image processing example with characteristics that mimic a climate projection problem. Second, MBCn is used to correct a suite of 3-hourly surface meteorological variables from the Canadian Centre for Climate Modelling and Analysis Regional Climate Model (CanRCM4) across a North American domain. Components of the Canadian Forest Fire Weather Index (FWI) System, a complicated set of multivariate indices that characterizes the risk of wildfire, are then calculated and verified against observed values. Third, MBCn is used to correct biases in the spatial dependence structure of CanRCM4 precipitation fields. Results are compared against a univariate quantile mapping algorithm, which neglects the dependence between variables, and two multivariate bias correction algorithms, each of which corrects a different form of inter-variable correlation structure. MBCn outperforms these alternatives, often by a large margin, particularly for annual maxima of the FWI distribution and spatiotemporal autocorrelation of precipitation fields.

  8. Importance of Preserving Cross-correlation in developing Statistically Downscaled Climate Forcings and in estimating Land-surface Fluxes and States

    NASA Astrophysics Data System (ADS)

    Das Bhowmik, R.; Arumugam, S.

    2015-12-01

    Multivariate downscaling techniques exhibited superiority over univariate regression schemes in terms of preserving cross-correlations between multiple variables- precipitation and temperature - from GCMs. This study focuses on two aspects: (a) develop an analytical solutions on estimating biases in cross-correlations from univariate downscaling approaches and (b) quantify the uncertainty in land-surface states and fluxes due to biases in cross-correlations in downscaled climate forcings. Both these aspects are evaluated using climate forcings available from both historical climate simulations and CMIP5 hindcasts over the entire US. The analytical solution basically relates the univariate regression parameters, co-efficient of determination of regression and the co-variance ratio between GCM and downscaled values. The analytical solutions are compared with the downscaled univariate forcings by choosing the desired p-value (Type-1 error) in preserving the observed cross-correlation. . For quantifying the impacts of biases on cross-correlation on estimating streamflow and groundwater, we corrupt the downscaled climate forcings with different cross-correlation structure.

  9. Bias-Free Chemically Diverse Test Sets from Machine Learning.

    PubMed

    Swann, Ellen T; Fernandez, Michael; Coote, Michelle L; Barnard, Amanda S

    2017-08-14

    Current benchmarking methods in quantum chemistry rely on databases that are built using a chemist's intuition. It is not fully understood how diverse or representative these databases truly are. Multivariate statistical techniques like archetypal analysis and K-means clustering have previously been used to summarize large sets of nanoparticles however molecules are more diverse and not as easily characterized by descriptors. In this work, we compare three sets of descriptors based on the one-, two-, and three-dimensional structure of a molecule. Using data from the NIST Computational Chemistry Comparison and Benchmark Database and machine learning techniques, we demonstrate the functional relationship between these structural descriptors and the electronic energy of molecules. Archetypes and prototypes found with topological or Coulomb matrix descriptors can be used to identify smaller, statistically significant test sets that better capture the diversity of chemical space. We apply this same method to find a diverse subset of organic molecules to demonstrate how the methods can easily be reapplied to individual research projects. Finally, we use our bias-free test sets to assess the performance of density functional theory and quantum Monte Carlo methods.

  10. Medical School Experiences Associated with Change in Implicit Racial Bias Among 3547 Students: A Medical Student CHANGES Study Report.

    PubMed

    van Ryn, Michelle; Hardeman, Rachel; Phelan, Sean M; Burgess, Diana J; Dovidio, John F; Herrin, Jeph; Burke, Sara E; Nelson, David B; Perry, Sylvia; Yeazel, Mark; Przedworski, Julia M

    2015-12-01

    Physician implicit (unconscious, automatic) bias has been shown to contribute to racial disparities in medical care. The impact of medical education on implicit racial bias is unknown. To examine the association between change in student implicit racial bias towards African Americans and student reports on their experiences with 1) formal curricula related to disparities in health and health care, cultural competence, and/or minority health; 2) informal curricula including racial climate and role model behavior; and 3) the amount and favorability of interracial contact during school. Prospective observational study involving Web-based questionnaires administered during first (2010) and last (2014) semesters of medical school. A total of 3547 students from a stratified random sample of 49 U.S. medical schools. Change in implicit racial attitudes as assessed by the Black-White Implicit Association Test administered during the first semester and again during the last semester of medical school. In multivariable modeling, having completed the Black-White Implicit Association Test during medical school remained a statistically significant predictor of decreased implicit racial bias (-5.34, p ≤ 0.001: mixed effects regression with random intercept across schools). Students' self-assessed skills regarding providing care to African American patients had a borderline association with decreased implicit racial bias (-2.18, p = 0.056). Having heard negative comments from attending physicians or residents about African American patients (3.17, p = 0.026) and having had unfavorable vs. very favorable contact with African American physicians (18.79, p = 0.003) were statistically significant predictors of increased implicit racial bias. Medical school experiences in all three domains were independently associated with change in student implicit racial attitudes. These findings are notable given that even small differences in implicit racial attitudes have been shown to affect behavior and that implicit attitudes are developed over a long period of repeated exposure and are difficult to change.

  11. Bias and Precision of Measures of Association for a Fixed-Effect Multivariate Analysis of Variance Model

    ERIC Educational Resources Information Center

    Kim, Soyoung; Olejnik, Stephen

    2005-01-01

    The sampling distributions of five popular measures of association with and without two bias adjusting methods were examined for the single factor fixed-effects multivariate analysis of variance model. The number of groups, sample sizes, number of outcomes, and the strength of association were manipulated. The results indicate that all five…

  12. Accounting for Sampling Error in Genetic Eigenvalues Using Random Matrix Theory.

    PubMed

    Sztepanacz, Jacqueline L; Blows, Mark W

    2017-07-01

    The distribution of genetic variance in multivariate phenotypes is characterized by the empirical spectral distribution of the eigenvalues of the genetic covariance matrix. Empirical estimates of genetic eigenvalues from random effects linear models are known to be overdispersed by sampling error, where large eigenvalues are biased upward, and small eigenvalues are biased downward. The overdispersion of the leading eigenvalues of sample covariance matrices have been demonstrated to conform to the Tracy-Widom (TW) distribution. Here we show that genetic eigenvalues estimated using restricted maximum likelihood (REML) in a multivariate random effects model with an unconstrained genetic covariance structure will also conform to the TW distribution after empirical scaling and centering. However, where estimation procedures using either REML or MCMC impose boundary constraints, the resulting genetic eigenvalues tend not be TW distributed. We show how using confidence intervals from sampling distributions of genetic eigenvalues without reference to the TW distribution is insufficient protection against mistaking sampling error as genetic variance, particularly when eigenvalues are small. By scaling such sampling distributions to the appropriate TW distribution, the critical value of the TW statistic can be used to determine if the magnitude of a genetic eigenvalue exceeds the sampling error for each eigenvalue in the spectral distribution of a given genetic covariance matrix. Copyright © 2017 by the Genetics Society of America.

  13. Stability and bias of classification rates in biological applications of discriminant analysis

    USGS Publications Warehouse

    Williams, B.K.; Titus, K.; Hines, J.E.

    1990-01-01

    We assessed the sampling stability of classification rates in discriminant analysis by using a factorial design with factors for multivariate dimensionality, dispersion structure, configuration of group means, and sample size. A total of 32,400 discriminant analyses were conducted, based on data from simulated populations with appropriate underlying statistical distributions. Simulation results indicated strong bias in correct classification rates when group sample sizes were small and when overlap among groups was high. We also found that stability of the correct classification rates was influenced by these factors, indicating that the number of samples required for a given level of precision increases with the amount of overlap among groups. In a review of 60 published studies, we found that 57% of the articles presented results on classification rates, though few of them mentioned potential biases in their results. Wildlife researchers should choose the total number of samples per group to be at least 2 times the number of variables to be measured when overlap among groups is low. Substantially more samples are required as the overlap among groups increases

  14. Empirical performance of the self-controlled case series design: lessons for developing a risk identification and analysis system.

    PubMed

    Suchard, Marc A; Zorych, Ivan; Simpson, Shawn E; Schuemie, Martijn J; Ryan, Patrick B; Madigan, David

    2013-10-01

    The self-controlled case series (SCCS) offers potential as an statistical method for risk identification involving medical products from large-scale observational healthcare data. However, analytic design choices remain in encoding the longitudinal health records into the SCCS framework and its risk identification performance across real-world databases is unknown. To evaluate the performance of SCCS and its design choices as a tool for risk identification in observational healthcare data. We examined the risk identification performance of SCCS across five design choices using 399 drug-health outcome pairs in five real observational databases (four administrative claims and one electronic health records). In these databases, the pairs involve 165 positive controls and 234 negative controls. We also consider several synthetic databases with known relative risks between drug-outcome pairs. We evaluate risk identification performance through estimating the area under the receiver-operator characteristics curve (AUC) and bias and coverage probability in the synthetic examples. The SCCS achieves strong predictive performance. Twelve of the twenty health outcome-database scenarios return AUCs >0.75 across all drugs. Including all adverse events instead of just the first per patient and applying a multivariate adjustment for concomitant drug use are the most important design choices. However, the SCCS as applied here returns relative risk point-estimates biased towards the null value of 1 with low coverage probability. The SCCS recently extended to apply a multivariate adjustment for concomitant drug use offers promise as a statistical tool for risk identification in large-scale observational healthcare databases. Poor estimator calibration dampens enthusiasm, but on-going work should correct this short-coming.

  15. Measurement issues in research on social support and health.

    PubMed Central

    Dean, K; Holst, E; Kreiner, S; Schoenborn, C; Wilson, R

    1994-01-01

    STUDY OBJECTIVE--The aims were: (1) to identify methodological problems that may explain the inconsistencies and contradictions in the research evidence on social support and health, and (2) to validate a frequently used measure of social support in order to determine whether or not it could be used in multivariate analyses of population data in research on social support and health. DESIGN AND METHODS--Secondary analysis of data collected in a cross sectional survey of a multistage cluster sample of the population of the United States, designed to study relationships in behavioural, social support and health variables. Statistical models based on item response theory and graph theory were used to validate the measure of social support to be used in subsequent analyses. PARTICIPANTS--Data on 1755 men and women aged 20 to 64 years were available for the scale validation. RESULTS--Massive evidence of item bias was found for all items of a group membership subscale. The most serious problems were found in relationship to an item measuring membership in work related groups. Using that item in the social network scale in multivariate analyses would distort findings on the statistical effects of education, employment status, and household income. Evidence of item bias was also found for a sociability subscale. When marital status was included to create what is called an intimate contacts subscale, the confounding grew worse. CONCLUSIONS--The composite measure of social network is not valid and would seriously distort the findings of analyses attempting to study relationships between the index and other variables. The findings show that valid measurement is a methodological issue that must be addressed in scientific research on population health. PMID:8189179

  16. Quantifying the impact of between-study heterogeneity in multivariate meta-analyses

    PubMed Central

    Jackson, Dan; White, Ian R; Riley, Richard D

    2012-01-01

    Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950

  17. Multivariable regression analysis of list experiment data on abortion: results from a large, randomly-selected population based study in Liberia.

    PubMed

    Moseson, Heidi; Gerdts, Caitlin; Dehlendorf, Christine; Hiatt, Robert A; Vittinghoff, Eric

    2017-12-21

    The list experiment is a promising measurement tool for eliciting truthful responses to stigmatized or sensitive health behaviors. However, investigators may be hesitant to adopt the method due to previously untestable assumptions and the perceived inability to conduct multivariable analysis. With a recently developed statistical test that can detect the presence of a design effect - the absence of which is a central assumption of the list experiment method - we sought to test the validity of a list experiment conducted on self-reported abortion in Liberia. We also aim to introduce recently developed multivariable regression estimators for the analysis of list experiment data, to explore relationships between respondent characteristics and having had an abortion - an important component of understanding the experiences of women who have abortions. To test the null hypothesis of no design effect in the Liberian list experiment data, we calculated the percentage of each respondent "type," characterized by response to the control items, and compared these percentages across treatment and control groups with a Bonferroni-adjusted alpha criterion. We then implemented two least squares and two maximum likelihood models (four total), each representing different bias-variance trade-offs, to estimate the association between respondent characteristics and abortion. We find no clear evidence of a design effect in list experiment data from Liberia (p = 0.18), affirming the first key assumption of the method. Multivariable analyses suggest a negative association between education and history of abortion. The retrospective nature of measuring lifetime experience of abortion, however, complicates interpretation of results, as the timing and safety of a respondent's abortion may have influenced her ability to pursue an education. Our work demonstrates that multivariable analyses, as well as statistical testing of a key design assumption, are possible with list experiment data, although with important limitations when considering lifetime measures. We outline how to implement this methodology with list experiment data in future research.

  18. Small Sample Properties of Bayesian Multivariate Autoregressive Time Series Models

    ERIC Educational Resources Information Center

    Price, Larry R.

    2012-01-01

    The aim of this study was to compare the small sample (N = 1, 3, 5, 10, 15) performance of a Bayesian multivariate vector autoregressive (BVAR-SEM) time series model relative to frequentist power and parameter estimation bias. A multivariate autoregressive model was developed based on correlated autoregressive time series vectors of varying…

  19. Analysis of Observational Studies in the Presence of Treatment Selection Bias: Effects of Invasive Cardiac Management on AMI Survival Using Propensity Score and Instrumental Variable Methods

    PubMed Central

    Stukel, Thérèse A.; Fisher, Elliott S; Wennberg, David E.; Alter, David A.; Gottlieb, Daniel J.; Vermeulen, Marian J.

    2007-01-01

    Context Comparisons of outcomes between patients treated and untreated in observational studies may be biased due to differences in patient prognosis between groups, often because of unobserved treatment selection biases. Objective To compare 4 analytic methods for removing the effects of selection bias in observational studies: multivariable model risk adjustment, propensity score risk adjustment, propensity-based matching, and instrumental variable analysis. Design, Setting, and Patients A national cohort of 122 124 patients who were elderly (aged 65–84 years), receiving Medicare, and hospitalized with acute myocardial infarction (AMI) in 1994–1995, and who were eligible for cardiac catheterization. Baseline chart reviews were taken from the Cooperative Cardiovascular Project and linked to Medicare health administrative data to provide a rich set of prognostic variables. Patients were followed up for 7 years through December 31, 2001, to assess the association between long-term survival and cardiac catheterization within 30 days of hospital admission. Main Outcome Measure Risk-adjusted relative mortality rate using each of the analytic methods. Results Patients who received cardiac catheterization (n=73 238) were younger and had lower AMI severity than those who did not. After adjustment for prognostic factors by using standard statistical risk-adjustment methods, cardiac catheterization was associated with a 50% relative decrease in mortality (for multivariable model risk adjustment: adjusted relative risk [RR], 0.51; 95% confidence interval [CI], 0.50–0.52; for propensity score risk adjustment: adjusted RR, 0.54; 95% CI, 0.53–0.55; and for propensity-based matching: adjusted RR, 0.54; 95% CI, 0.52–0.56). Using regional catheterization rate as an instrument, instrumental variable analysis showed a 16% relative decrease in mortality (adjusted RR, 0.84; 95% CI, 0.79–0.90). The survival benefits of routine invasive care from randomized clinical trials are between 8% and 21 %. Conclusions Estimates of the observational association of cardiac catheterization with long-term AMI mortality are highly sensitive to analytic method. All standard risk-adjustment methods have the same limitations regarding removal of unmeasured treatment selection biases. Compared with standard modeling, instrumental variable analysis may produce less biased estimates of treatment effects, but is more suited to answering policy questions than specific clinical questions. PMID:17227979

  20. On using summary statistics from an external calibration sample to correct for covariate measurement error.

    PubMed

    Guo, Ying; Little, Roderick J; McConnell, Daniel S

    2012-01-01

    Covariate measurement error is common in epidemiologic studies. Current methods for correcting measurement error with information from external calibration samples are insufficient to provide valid adjusted inferences. We consider the problem of estimating the regression of an outcome Y on covariates X and Z, where Y and Z are observed, X is unobserved, but a variable W that measures X with error is observed. Information about measurement error is provided in an external calibration sample where data on X and W (but not Y and Z) are recorded. We describe a method that uses summary statistics from the calibration sample to create multiple imputations of the missing values of X in the regression sample, so that the regression coefficients of Y on X and Z and associated standard errors can be estimated using simple multiple imputation combining rules, yielding valid statistical inferences under the assumption of a multivariate normal distribution. The proposed method is shown by simulation to provide better inferences than existing methods, namely the naive method, classical calibration, and regression calibration, particularly for correction for bias and achieving nominal confidence levels. We also illustrate our method with an example using linear regression to examine the relation between serum reproductive hormone concentrations and bone mineral density loss in midlife women in the Michigan Bone Health and Metabolism Study. Existing methods fail to adjust appropriately for bias due to measurement error in the regression setting, particularly when measurement error is substantial. The proposed method corrects this deficiency.

  1. Majority of systematic reviews published in high-impact journals neglected to register the protocols: a meta-epidemiological study.

    PubMed

    Tsujimoto, Yasushi; Tsujimoto, Hiraku; Kataoka, Yuki; Kimachi, Miho; Shimizu, Sayaka; Ikenoue, Tatsuyoshi; Fukuma, Shingo; Yamamoto, Yosuke; Fukuhara, Shunichi

    2017-04-01

    To describe the registration of systematic review (SR) protocols and examine whether or not registration reduced the outcome reporting bias in high-impact journals. We searched MEDLINE via PubMed to identify SRs of randomized controlled trials of interventions. We included SRs published between August 2009 and June 2015 in the 10 general and internal medicinal journals with the highest impact factors in 2013. We examined the proportion of SR protocol registration and investigated the relationship between registration and outcome reporting bias using multivariable logistic regression. Among the 284 included reviews, 60 (21%) protocols were registered. The proportion of registration increased from 5.6% in 2009 to 27% in 2015 (P for trend <0.001). Protocol registration was not associated with outcome reporting bias (adjusted odds ratio [OR] 0.85, 95% confidence interval [CI] 0.39-1.86). The association between Preferred Reporting Items for Systematic review and Meta-Analysis (PRISMA) adherence and protocol registration was not statistically significant (OR 1.09, 95% CI 0.59-2.01). Six years after the launch of the PRISMA statement, the proportion of protocol registration in high-impact journals has increased some but remains low. The present study found no evidence suggesting that protocol registration reduced outcome reporting bias. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Causal Inference and Omitted Variable Bias in Financial Aid Research: Assessing Solutions

    ERIC Educational Resources Information Center

    Riegg, Stephanie K.

    2008-01-01

    This article highlights the problem of omitted variable bias in research on the causal effect of financial aid on college-going. I first describe the problem of self-selection and the resulting bias from omitted variables. I then assess and explore the strengths and weaknesses of random assignment, multivariate regression, proxy variables, fixed…

  3. A question of separation: disentangling tracer bias and gravitational non-linearity with counts-in-cells statistics

    NASA Astrophysics Data System (ADS)

    Uhlemann, C.; Feix, M.; Codis, S.; Pichon, C.; Bernardeau, F.; L'Huillier, B.; Kim, J.; Hong, S. E.; Laigle, C.; Park, C.; Shin, J.; Pogosyan, D.

    2018-02-01

    Starting from a very accurate model for density-in-cells statistics of dark matter based on large deviation theory, a bias model for the tracer density in spheres is formulated. It adopts a mean bias relation based on a quadratic bias model to relate the log-densities of dark matter to those of mass-weighted dark haloes in real and redshift space. The validity of the parametrized bias model is established using a parametrization-independent extraction of the bias function. This average bias model is then combined with the dark matter PDF, neglecting any scatter around it: it nevertheless yields an excellent model for densities-in-cells statistics of mass tracers that is parametrized in terms of the underlying dark matter variance and three bias parameters. The procedure is validated on measurements of both the one- and two-point statistics of subhalo densities in the state-of-the-art Horizon Run 4 simulation showing excellent agreement for measured dark matter variance and bias parameters. Finally, it is demonstrated that this formalism allows for a joint estimation of the non-linear dark matter variance and the bias parameters using solely the statistics of subhaloes. Having verified that galaxy counts in hydrodynamical simulations sampled on a scale of 10 Mpc h-1 closely resemble those of subhaloes, this work provides important steps towards making theoretical predictions for density-in-cells statistics applicable to upcoming galaxy surveys like Euclid or WFIRST.

  4. Prognostic value of stromal decorin expression in patients with breast cancer: a meta-analysis.

    PubMed

    Li, Shuang-Jiang; Chen, Da-Li; Zhang, Wen-Biao; Shen, Cheng; Che, Guo-Wei

    2015-11-01

    Numbers of studies have investigated the biological functions of decorin (DCN) in oncogenesis, tumor progression, angiogenesis and metastasis. Although many of them aim to highlight the prognostic value of stromal DCN expression in breast cancer, some controversial results still exist and a consensus has not been reached until now. Therefore, our meta-analysis aims to determine the prognostic significance of stromal DCN expression in breast cancer patients. PubMed, EMBASE, the Web of Science and China National Knowledge Infrastructure (CNKI) databases were searched for full-text literatures met out inclusion criteria. We applied the hazard ratio (HR) with 95% confidence interval (CI) as the appropriate summarized statistics. Q-test and I(2) statistic were employed to estimate the level of heterogeneity across the included studies. Sensitivity analysis was conducted to further identify the possible origins of heterogeneity. The publication bias was detected by Begg's test and Egger's test. There were three English literatures (involving 6 studies) included into our meta-analysis. On the one hand, both the summarized outcomes based on univariate analysis (HR: 0.513; 95% CI: 0.406-0.648; P<0.001) and multivariate analysis (HR: 0.544; 95% CI: 0.388-0.763; P<0.001) indicated that stromal DCN expression could promise the high cancer-specific survival (CSS) of breast cancer patients. On the other hand, both the summarized outcomes based on univariate analysis (HR: 0.504; 95% CI: 0.389-0.651; P<0.001) and multivariate analysis (HR: 0.568; 95% CI: 0.400-0.806; P=0.002) also indicated that stromal DCN expression was positively associated with high disease-free survival (DFS) of breast cancer patients. No significant heterogeneity or publication bias was observed within this meta-analysis. The present evidences indicate that high stromal DCN expression can significantly predict the good prognosis in patients with breast cancer. The discoveries from our meta-analysis have better be confirmed in the updated review pooling more relevant investigations in the future.

  5. Evaluation of direct and indirect ethanol biomarkers using a likelihood ratio approach to identify chronic alcohol abusers for forensic purposes.

    PubMed

    Alladio, Eugenio; Martyna, Agnieszka; Salomone, Alberto; Pirro, Valentina; Vincenti, Marco; Zadora, Grzegorz

    2017-02-01

    The detection of direct ethanol metabolites, such as ethyl glucuronide (EtG) and fatty acid ethyl esters (FAEEs), in scalp hair is considered the optimal strategy to effectively recognize chronic alcohol misuses by means of specific cut-offs suggested by the Society of Hair Testing. However, several factors (e.g. hair treatments) may alter the correlation between alcohol intake and biomarkers concentrations, possibly introducing bias in the interpretative process and conclusions. 125 subjects with various drinking habits were subjected to blood and hair sampling to determine indirect (e.g. CDT) and direct alcohol biomarkers. The overall data were investigated using several multivariate statistical methods. A likelihood ratio (LR) approach was used for the first time to provide predictive models for the diagnosis of alcohol abuse, based on different combinations of direct and indirect alcohol biomarkers. LR strategies provide a more robust outcome than the plain comparison with cut-off values, where tiny changes in the analytical results can lead to dramatic divergence in the way they are interpreted. An LR model combining EtG and FAEEs hair concentrations proved to discriminate non-chronic from chronic consumers with ideal correct classification rates, whereas the contribution of indirect biomarkers proved to be negligible. Optimal results were observed using a novel approach that associates LR methods with multivariate statistics. In particular, the combination of LR approach with either Principal Component Analysis (PCA) or Linear Discriminant Analysis (LDA) proved successful in discriminating chronic from non-chronic alcohol drinkers. These LR models were subsequently tested on an independent dataset of 43 individuals, which confirmed their high efficiency. These models proved to be less prone to bias than EtG and FAEEs independently considered. In conclusion, LR models may represent an efficient strategy to sustain the diagnosis of chronic alcohol consumption and provide a suitable gradation to support the judgment. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  6. Can statistical linkage of missing variables reduce bias in treatment effect estimates in comparative effectiveness research studies?

    PubMed

    Crown, William; Chang, Jessica; Olson, Melvin; Kahler, Kristijan; Swindle, Jason; Buzinec, Paul; Shah, Nilay; Borah, Bijan

    2015-09-01

    Missing data, particularly missing variables, can create serious analytic challenges in observational comparative effectiveness research studies. Statistical linkage of datasets is a potential method for incorporating missing variables. Prior studies have focused upon the bias introduced by imperfect linkage. This analysis uses a case study of hepatitis C patients to estimate the net effect of statistical linkage on bias, also accounting for the potential reduction in missing variable bias. The results show that statistical linkage can reduce bias while also enabling parameter estimates to be obtained for the formerly missing variables. The usefulness of statistical linkage will vary depending upon the strength of the correlations of the missing variables with the treatment variable, as well as the outcome variable of interest.

  7. Relative risk estimates from spatial and space-time scan statistics: Are they biased?

    PubMed Central

    Prates, Marcos O.; Kulldorff, Martin; Assunção, Renato M.

    2014-01-01

    The purely spatial and space-time scan statistics have been successfully used by many scientists to detect and evaluate geographical disease clusters. Although the scan statistic has high power in correctly identifying a cluster, no study has considered the estimates of the cluster relative risk in the detected cluster. In this paper we evaluate whether there is any bias on these estimated relative risks. Intuitively, one may expect that the estimated relative risks has upward bias, since the scan statistic cherry picks high rate areas to include in the cluster. We show that this intuition is correct for clusters with low statistical power, but with medium to high power the bias becomes negligible. The same behaviour is not observed for the prospective space-time scan statistic, where there is an increasing conservative downward bias of the relative risk as the power to detect the cluster increases. PMID:24639031

  8. Sampling bias in an internet treatment trial for depression.

    PubMed

    Donkin, L; Hickie, I B; Christensen, H; Naismith, S L; Neal, B; Cockayne, N L; Glozier, N

    2012-10-23

    Internet psychological interventions are efficacious and may reduce traditional access barriers. No studies have evaluated whether any sampling bias exists in these trials that may limit the translation of the results of these trials into real-world application. We identified 7999 potentially eligible trial participants from a community-based health cohort study and invited them to participate in a randomized controlled trial of an online cognitive behavioural therapy programme for people with depression. We compared those who consented to being assessed for trial inclusion with nonconsenters on demographic, clinical and behavioural indicators captured in the health study. Any potentially biasing factors were then assessed for their association with depression outcome among trial participants to evaluate the existence of sampling bias. Of the 35 health survey variables explored, only 4 were independently associated with higher likelihood of consenting-female sex (odds ratio (OR) 1.11, 95% confidence interval (CI) 1.05-1.19), speaking English at home (OR 1.48, 95% CI 1.15-1.90) higher education (OR 1.67, 95% CI 1.46-1.92) and a prior diagnosis of depression (OR 1.37, 95% CI 1.22-1.55). The multivariate model accounted for limited variance (C-statistic 0.6) in explaining participation. These four factors were not significantly associated with either the primary trial outcome measure or any differential impact by intervention arm. This demonstrates that, among eligible trial participants, few factors were associated with the consent to participate. There was no indication that such self-selection biased the trial results or would limit the generalizability and translation into a public or clinical setting.

  9. Validating a Local Failure Risk Stratification for Use in Prospective Studies of Adjuvant Radiation Therapy for Bladder Cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baumann, Brian C.; He, Jiwei; Hwang, Wei-Ting

    Purpose: To inform prospective trials of adjuvant radiation therapy (adj-RT) for bladder cancer after radical cystectomy, a locoregional failure (LF) risk stratification was proposed. This stratification was developed and validated using surgical databases that may not reflect the outcomes expected in prospective trials. Our purpose was to assess sources of bias that may affect the stratification model's validity or alter the LF risk estimates for each subgroup: time bias due to evolving surgical techniques; trial accrual bias due to inclusion of patients who would be ineligible for adj-RT trials because of early disease progression, death, or loss to follow-up shortlymore » after cystectomy; bias due to different statistical methods to estimate LF; and subgrouping bias due to different definitions of the LF subgroups. Methods and Materials: The LF risk stratification was developed using a single-institution cohort (n=442, 1990-2008) and the multi-institutional SWOG 8710 cohort (n=264, 1987-1998) treated with radical cystectomy with or without chemotherapy. We evaluated the sensitivity of the stratification to sources of bias using Fine-Gray regression and Kaplan-Meier analyses. Results: Year of radical cystectomy was not associated with LF risk on univariate or multivariate analysis after controlling for risk group. By use of more stringent inclusion criteria, 26 SWOG patients (10%) and 60 patients from the single-institution cohort (14%) were excluded. Analysis of the remaining patients confirmed 3 subgroups with significantly different LF risks with 3-year rates of 7%, 17%, and 36%, respectively (P<.01), nearly identical to the rates without correcting for trial accrual bias. Kaplan-Meier techniques estimated higher subgroup LF rates than competing risk analysis. The subgroup definitions used in the NRG-GU001 adj-RT trial were validated. Conclusions: These sources of bias did not invalidate the LF risk stratification or substantially change the model's LF estimates.« less

  10. Identification of robust statistical downscaling methods based on a comprehensive suite of performance metrics for South Korea

    NASA Astrophysics Data System (ADS)

    Eum, H. I.; Cannon, A. J.

    2015-12-01

    Climate models are a key provider to investigate impacts of projected future climate conditions on regional hydrologic systems. However, there is a considerable mismatch of spatial resolution between GCMs and regional applications, in particular a region characterized by complex terrain such as Korean peninsula. Therefore, a downscaling procedure is an essential to assess regional impacts of climate change. Numerous statistical downscaling methods have been used mainly due to the computational efficiency and simplicity. In this study, four statistical downscaling methods [Bias-Correction/Spatial Disaggregation (BCSD), Bias-Correction/Constructed Analogue (BCCA), Multivariate Adaptive Constructed Analogs (MACA), and Bias-Correction/Climate Imprint (BCCI)] are applied to downscale the latest Climate Forecast System Reanalysis data to stations for precipitation, maximum temperature, and minimum temperature over South Korea. By split sampling scheme, all methods are calibrated with observational station data for 19 years from 1973 to 1991 are and tested for the recent 19 years from 1992 to 2010. To assess skill of the downscaling methods, we construct a comprehensive suite of performance metrics that measure an ability of reproducing temporal correlation, distribution, spatial correlation, and extreme events. In addition, we employ Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) to identify robust statistical downscaling methods based on the performance metrics for each season. The results show that downscaling skill is considerably affected by the skill of CFSR and all methods lead to large improvements in representing all performance metrics. According to seasonal performance metrics evaluated, when TOPSIS is applied, MACA is identified as the most reliable and robust method for all variables and seasons. Note that such result is derived from CFSR output which is recognized as near perfect climate data in climate studies. Therefore, the ranking of this study may be changed when various GCMs are downscaled and evaluated. Nevertheless, it may be informative for end-users (i.e. modelers or water resources managers) to understand and select more suitable downscaling methods corresponding to priorities on regional applications.

  11. Double-Layer Mediated Electromechanical Response of Amyloid Fibrils in Liquid Environment

    PubMed Central

    Nikiforov, M.P.; Thompson, G.L.; Reukov, V.V.; Jesse, S.; Guo, S.; Rodriguez, B.J.; Seal, K.; Vertegel, A.A.; Kalinin, S.V.

    2010-01-01

    Harnessing electrical bias-induced mechanical motion on the nanometer and molecular scale is a critical step towards understanding the fundamental mechanisms of redox processes and implementation of molecular electromechanical machines. Probing these phenomena in biomolecular systems requires electromechanical measurements be performed in liquid environments. Here we demonstrate the use of band excitation piezoresponse force microscopy for probing electromechanical coupling in amyloid fibrils. The approaches for separating the elastic and electromechanical contributions based on functional fits and multivariate statistical analysis are presented. We demonstrate that in the bulk of the fibril the electromechanical response is dominated by double-layer effects (consistent with shear piezoelectricity of biomolecules), while a number of electromechanically active hot spots possibly related to structural defects are observed. PMID:20088597

  12. A Comparison of Three Multivariate Models for Estimating Test Battery Reliability.

    ERIC Educational Resources Information Center

    Wood, Terry M.; Safrit, Margaret J.

    1987-01-01

    A comparison of three multivariate models (canonical reliability model, maximum generalizability model, canonical correlation model) for estimating test battery reliability indicated that the maximum generalizability model showed the least degree of bias, smallest errors in estimation, and the greatest relative efficiency across all experimental…

  13. Multivariate Meta-Analysis Using Individual Participant Data

    ERIC Educational Resources Information Center

    Riley, R. D.; Price, M. J.; Jackson, D.; Wardle, M.; Gueyffier, F.; Wang, J.; Staessen, J. A.; White, I. R.

    2015-01-01

    When combining results across related studies, a multivariate meta-analysis allows the joint synthesis of correlated effect estimates from multiple outcomes. Joint synthesis can improve efficiency over separate univariate syntheses, may reduce selective outcome reporting biases, and enables joint inferences across the outcomes. A common issue is…

  14. Detecting rater bias using a person-fit statistic: a Monte Carlo simulation study.

    PubMed

    Aubin, André-Sébastien; St-Onge, Christina; Renaud, Jean-Sébastien

    2018-04-01

    With the Standards voicing concern for the appropriateness of response processes, we need to explore strategies that would allow us to identify inappropriate rater response processes. Although certain statistics can be used to help detect rater bias, their use is complicated by either a lack of data about their actual power to detect rater bias or the difficulty related to their application in the context of health professions education. This exploratory study aimed to establish the worthiness of pursuing the use of l z to detect rater bias. We conducted a Monte Carlo simulation study to investigate the power of a specific detection statistic, that is: the standardized likelihood l z person-fit statistics (PFS). Our primary outcome was the detection rate of biased raters, namely: raters whom we manipulated into being either stringent (giving lower scores) or lenient (giving higher scores), using the l z statistic while controlling for the number of biased raters in a sample (6 levels) and the rate of bias per rater (6 levels). Overall, stringent raters (M = 0.84, SD = 0.23) were easier to detect than lenient raters (M = 0.31, SD = 0.28). More biased raters were easier to detect then less biased raters (60% bias: 62, SD = 0.37; 10% bias: 43, SD = 0.36). The PFS l z seems to offer an interesting potential to identify biased raters. We observed detection rates as high as 90% for stringent raters, for whom we manipulated more than half their checklist. Although we observed very interesting results, we cannot generalize these results to the use of PFS with estimated item/station parameters or real data. Such studies should be conducted to assess the feasibility of using PFS to identify rater bias.

  15. Nurses' decision making in heart failure management based on heart failure certification status.

    PubMed

    Albert, Nancy M; Bena, James F; Buxbaum, Denise; Martensen, Linda; Morrison, Shannon L; Prasun, Marilyn A; Stamp, Kelly D

    Research findings on the value of nurse certification were based on subjective perceptions or biased by correlations of certification status and global clinical factors. In heart failure, the value of certification is unknown. Examine the value of certification based nurses' decision-making. Cross-sectional study of nurses who completed heart failure clinical vignettes that reflected decision-making in clinical heart failure scenarios. Statistical tests included multivariable linear, logistic and proportional odds logistic regression models. Of nurses (N = 605), 29.1% were heart failure certified, 35.0% were certified in another specialty/job role and 35.9% were not certified. In multivariable modeling, nurses certified in heart failure (versus not heart failure certified) had higher clinical vignette scores (p = 0.002), reflecting higher evidence-based decision making; nurses with another specialty/role certification (versus no certification) did not (p = 0.62). Heart failure certification, but not in other specialty/job roles was associated with decisions that reflected delivery of high-quality care. Copyright © 2018 Elsevier Inc. All rights reserved.

  16. Synthesis of linear regression coefficients by recovering the within-study covariance matrix from summary statistics.

    PubMed

    Yoneoka, Daisuke; Henmi, Masayuki

    2017-06-01

    Recently, the number of regression models has dramatically increased in several academic fields. However, within the context of meta-analysis, synthesis methods for such models have not been developed in a commensurate trend. One of the difficulties hindering the development is the disparity in sets of covariates among literature models. If the sets of covariates differ across models, interpretation of coefficients will differ, thereby making it difficult to synthesize them. Moreover, previous synthesis methods for regression models, such as multivariate meta-analysis, often have problems because covariance matrix of coefficients (i.e. within-study correlations) or individual patient data are not necessarily available. This study, therefore, proposes a brief explanation regarding a method to synthesize linear regression models under different covariate sets by using a generalized least squares method involving bias correction terms. Especially, we also propose an approach to recover (at most) threecorrelations of covariates, which is required for the calculation of the bias term without individual patient data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  17. A Study of Item Bias for Attitudinal Measurement Using Maximum Likelihood Factor Analysis.

    ERIC Educational Resources Information Center

    Mayberry, Paul W.

    A technique for detecting item bias that is responsive to attitudinal measurement considerations is a maximum likelihood factor analysis procedure comparing multivariate factor structures across various subpopulations, often referred to as SIFASP. The SIFASP technique allows for factorial model comparisons in the testing of various hypotheses…

  18. Influence of shifting cultivation practices on soil-plant-beetle interactions.

    PubMed

    Ibrahim, Kalibulla Syed; Momin, Marcy D; Lalrotluanga, R; Rosangliana, David; Ghatak, Souvik; Zothansanga, R; Kumar, Nachimuthu Senthil; Gurusubramanian, Guruswami

    2016-08-01

    Shifting cultivation (jhum) is a major land use practice in Mizoram. It was considered as an eco-friendly and efficient method when the cycle duration was long (15-30 years), but it poses the problem of land degradation and threat to ecology when shortened (4-5 years) due to increased intensification of farming systems. Studying beetle community structure is very helpful in understanding how shifting cultivation affects the biodiversity features compared to natural forest system. The present study examines the beetle species diversity and estimates the effects of shifting cultivation practices on the beetle assemblages in relation to change in tree species composition and soil nutrients. Scarabaeidae and Carabidae were observed to be the dominant families in the land use systems studied. Shifting cultivation practice significantly (P < 0.05) affected the beetle and tree species diversity as well as the soil nutrients as shown by univariate (one-way analysis of variance (ANOVA), correlation and regression, diversity indices) and multivariate (cluster analysis, principal component analysis (PCA), detrended correspondence analysis (DCA), canonical variate analysis (CVA), permutational multivariate analysis of variance (PERMANOVA), permutational multivariate analysis of dispersion (PERMDISP)) statistical analyses. Besides changing the tree species composition and affecting the soil fertility, shifting cultivation provides less suitable habitat conditions for the beetle species. Bioindicator analysis categorized the beetle species into forest specialists, anthropogenic specialists (shifting cultivation habitat specialist), and habitat generalists. Molecular analysis of bioindicator beetle species was done using mitochondrial cytochrome oxidase subunit I (COI) marker to validate the beetle species and describe genetic variation among them in relation to heterogeneity, transition/transversion bias, codon usage bias, evolutionary distance, and substitution pattern. The present study revealed the fact that shifting cultivation practice significantly affects the beetle species in terms of biodiversity pattern as well as evolutionary features. Spatiotemporal assessment of soil-plant-beetle interactions in shifting cultivation system and their influence in land degradation and ecology will be helpful in making biodiversity conservation decisions in the near future.

  19. The basis of orientation decoding in human primary visual cortex: fine- or coarse-scale biases?

    PubMed

    Maloney, Ryan T

    2015-01-01

    Orientation signals in human primary visual cortex (V1) can be reliably decoded from the multivariate pattern of activity as measured with functional magnetic resonance imaging (fMRI). The precise underlying source of these decoded signals (whether by orientation biases at a fine or coarse scale in cortex) remains a matter of some controversy, however. Freeman and colleagues (J Neurosci 33: 19695-19703, 2013) recently showed that the accuracy of decoding of spiral patterns in V1 can be predicted by a voxel's preferred spatial position (the population receptive field) and its coarse orientation preference, suggesting that coarse-scale biases are sufficient for orientation decoding. Whether they are also necessary for decoding remains an open question, and one with implications for the broader interpretation of multivariate decoding results in fMRI studies. Copyright © 2015 the American Physiological Society.

  20. Adaptation and application of multivariate AMBI (M-AMBI) in US coastal waters

    EPA Science Inventory

    The multivariate AMBI (M-AMBI) is an extension of the AZTI Marine Biotic Index (AMBI) that has been used extensively in Europe, but not in the United States. In a previous study, we adapted AMBI for use in US coastal waters (US AMBI), but saw biases in salinity and score distribu...

  1. The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference: an application to longitudinal modeling.

    PubMed

    Heggeseth, Brianna C; Jewell, Nicholas P

    2013-07-20

    Multivariate Gaussian mixtures are a class of models that provide a flexible parametric approach for the representation of heterogeneous multivariate outcomes. When the outcome is a vector of repeated measurements taken on the same subject, there is often inherent dependence between observations. However, a common covariance assumption is conditional independence-that is, given the mixture component label, the outcomes for subjects are independent. In this paper, we study, through asymptotic bias calculations and simulation, the impact of covariance misspecification in multivariate Gaussian mixtures. Although maximum likelihood estimators of regression and mixing probability parameters are not consistent under misspecification, they have little asymptotic bias when mixture components are well separated or if the assumed correlation is close to the truth even when the covariance is misspecified. We also present a robust standard error estimator and show that it outperforms conventional estimators in simulations and can indicate that the model is misspecified. Body mass index data from a national longitudinal study are used to demonstrate the effects of misspecification on potential inferences made in practice. Copyright © 2013 John Wiley & Sons, Ltd.

  2. A Primer on Multivariate Analysis of Variance (MANOVA) for Behavioral Scientists

    ERIC Educational Resources Information Center

    Warne, Russell T.

    2014-01-01

    Reviews of statistical procedures (e.g., Bangert & Baumberger, 2005; Kieffer, Reese, & Thompson, 2001; Warne, Lazo, Ramos, & Ritter, 2012) show that one of the most common multivariate statistical methods in psychological research is multivariate analysis of variance (MANOVA). However, MANOVA and its associated procedures are often not…

  3. Multivariate statistical analysis: Principles and applications to coorbital streams of meteorite falls

    NASA Technical Reports Server (NTRS)

    Wolf, S. F.; Lipschutz, M. E.

    1993-01-01

    Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.

  4. De Facto Versus de Jure Political Institutions in the Long-Run: A Multivariate Analysis, 1820-2000.

    PubMed

    Foldvari, Peter

    2017-01-01

    In this paper we use the components of the PolityIV project's polity2 and Vanhanen's Index of Democracy indicators to analyse the relationship between de jure and de facto political institutions from 1820 until 2000 with a canonical correlation method corrected for the sample selection bias. We find considerable fluctuation in the relationship between the two measures. After a moderate positive correlation found during the first half of the nineteenth century, the two measures become statistically unrelated until the 1940s. The relationship becomes strong and positive only in the second half of the twentieth century. The relationship between de jure and de facto political institutions hence can be described as a U-curve, reminiscent to an inverse Kuznets-curve.

  5. Vis-NIR spectrometric determination of Brix and sucrose in sugar production samples using kernel partial least squares with interval selection based on the successive projections algorithm.

    PubMed

    de Almeida, Valber Elias; de Araújo Gomes, Adriano; de Sousa Fernandes, David Douglas; Goicoechea, Héctor Casimiro; Galvão, Roberto Kawakami Harrop; Araújo, Mario Cesar Ugulino

    2018-05-01

    This paper proposes a new variable selection method for nonlinear multivariate calibration, combining the Successive Projections Algorithm for interval selection (iSPA) with the Kernel Partial Least Squares (Kernel-PLS) modelling technique. The proposed iSPA-Kernel-PLS algorithm is employed in a case study involving a Vis-NIR spectrometric dataset with complex nonlinear features. The analytical problem consists of determining Brix and sucrose content in samples from a sugar production system, on the basis of transflectance spectra. As compared to full-spectrum Kernel-PLS, the iSPA-Kernel-PLS models involve a smaller number of variables and display statistically significant superiority in terms of accuracy and/or bias in the predictions. Published by Elsevier B.V.

  6. Full-text publication of abstracts presented at European Orthodontic Society congresses.

    PubMed

    Livas, Christos; Pandis, Nikolaos; Ren, Yijin

    2014-10-01

    Empirical evidence has indicated that only a subsample of studies conducted reach full-text publication and this phenomenon has become known as publication bias. A form of publication bias is the selectively delayed full publication of conference abstracts. The objective of this article was to examine the publication status of oral abstracts and poster-presentation abstracts, included in the scientific program of the 82nd and 83rd European Orthodontic Society (EOS) congresses, held in 2006 and 2007, and to identify factors associated with full-length publication. A systematic search of PubMed and Google Scholar databases was performed in April 2013 using author names and keywords from the abstract title to locate abstract and full-article publications. Information regarding mode of presentation, type of affiliation, geographical origin, statistical results, and publication details were collected and analyzed using univariable and multivariable logistic regression. Approximately 51 per cent of the EOS 2006 and 55 per cent of the EOS 2007 abstracts appeared in print more than 5 years post congress. A mean period of 1.32 years elapsed between conference and publication date. Mode of presentation (oral or poster), use of statistical analysis, and research subject area were significant predictors for publication success. Inherent discrepancies of abstract reporting, mainly related to presentation of preliminary results and incomplete description of methods, may be considered in analogous studies. On average 52.2 per cent of the abstracts presented at the two EOS conferences reached full publication. Abstracts presented orally, including statistical analysis, were more likely to get published. © The Author 2013. Published by Oxford University Press on behalf of the European Orthodontic Society. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  7. Publication bias in situ.

    PubMed

    Phillips, Carl V

    2004-08-05

    Publication bias, as typically defined, refers to the decreased likelihood of studies' results being published when they are near the null, not statistically significant, or otherwise "less interesting." But choices about how to analyze the data and which results to report create a publication bias within the published results, a bias I label "publication bias in situ" (PBIS). PBIS may create much greater bias in the literature than traditionally defined publication bias (the failure to publish any result from a study). The causes of PBIS are well known, consisting of various decisions about reporting that are influenced by the data. But its impact is not generally appreciated, and very little attention is devoted to it. What attention there is consists largely of rules for statistical analysis that are impractical and do not actually reduce the bias in reported estimates. PBIS cannot be reduced by statistical tools because it is not fundamentally a problem of statistics, but rather of non-statistical choices and plain language interpretations. PBIS should be recognized as a phenomenon worthy of study - it is extremely common and probably has a huge impact on results reported in the literature - and there should be greater systematic efforts to identify and reduce it. The paper presents examples, including results of a recent HIV vaccine trial, that show how easily PBIS can have a large impact on reported results, as well as how there can be no simple answer to it. PBIS is a major problem, worthy of substantially more attention than it receives. There are ways to reduce the bias, but they are very seldom employed because they are largely unrecognized.

  8. Revelation of Influencing Factors in Overall Codon Usage Bias of Equine Influenza Viruses

    PubMed Central

    Bhatia, Sandeep; Sood, Richa; Selvaraj, Pavulraj

    2016-01-01

    Equine influenza viruses (EIVs) of H3N8 subtype are culprits of severe acute respiratory infections in horses, and are still responsible for significant outbreaks worldwide. Adaptability of influenza viruses to a particular host is significantly influenced by their codon usage preference, due to an absolute dependence on the host cellular machinery for their replication. In the present study, we analyzed genome-wide codon usage patterns in 92 EIV strains, including both H3N8 and H7N7 subtypes by computing several codon usage indices and applying multivariate statistical methods. Relative synonymous codon usage (RSCU) analysis disclosed bias of preferred synonymous codons towards A/U-ended codons. The overall codon usage bias in EIVs was slightly lower, and mainly affected by the nucleotide compositional constraints as inferred from the RSCU and effective number of codon (ENc) analysis. Our data suggested that codon usage pattern in EIVs is governed by the interplay of mutation pressure, natural selection from its hosts and undefined factors. The H7N7 subtype was found less fit to its host (horse) in comparison to H3N8, by possessing higher codon bias, lower mutation pressure and much less adaptation to tRNA pool of equine cells. To the best of our knowledge, this is the first report describing the codon usage analysis of the complete genomes of EIVs. The outcome of our study is likely to enhance our understanding of factors involved in viral adaptation, evolution, and fitness towards their hosts. PMID:27119730

  9. A non-iterative extension of the multivariate random effects meta-analysis.

    PubMed

    Makambi, Kepher H; Seung, Hyunuk

    2015-01-01

    Multivariate methods in meta-analysis are becoming popular and more accepted in biomedical research despite computational issues in some of the techniques. A number of approaches, both iterative and non-iterative, have been proposed including the multivariate DerSimonian and Laird method by Jackson et al. (2010), which is non-iterative. In this study, we propose an extension of the method by Hartung and Makambi (2002) and Makambi (2001) to multivariate situations. A comparison of the bias and mean square error from a simulation study indicates that, in some circumstances, the proposed approach perform better than the multivariate DerSimonian-Laird approach. An example is presented to demonstrate the application of the proposed approach.

  10. The Effects and Side-Effects of Statistics Education: Psychology Students' (Mis-)Conceptions of Probability

    ERIC Educational Resources Information Center

    Morsanyi, Kinga; Primi, Caterina; Chiesi, Francesca; Handley, Simon

    2009-01-01

    In three studies we looked at two typical misconceptions of probability: the representativeness heuristic, and the equiprobability bias. The literature on statistics education predicts that some typical errors and biases (e.g., the equiprobability bias) increase with education, whereas others decrease. This is in contrast with reasoning theorists'…

  11. Super-delta: a new differential gene expression analysis procedure with robust data normalization.

    PubMed

    Liu, Yuhang; Zhang, Jinfeng; Qiu, Xing

    2017-12-21

    Normalization is an important data preparation step in gene expression analyses, designed to remove various systematic noise. Sample variance is greatly reduced after normalization, hence the power of subsequent statistical analyses is likely to increase. On the other hand, variance reduction is made possible by borrowing information across all genes, including differentially expressed genes (DEGs) and outliers, which will inevitably introduce some bias. This bias typically inflates type I error; and can reduce statistical power in certain situations. In this study we propose a new differential expression analysis pipeline, dubbed as super-delta, that consists of a multivariate extension of the global normalization and a modified t-test. A robust procedure is designed to minimize the bias introduced by DEGs in the normalization step. The modified t-test is derived based on asymptotic theory for hypothesis testing that suitably pairs with the proposed robust normalization. We first compared super-delta with four commonly used normalization methods: global, median-IQR, quantile, and cyclic loess normalization in simulation studies. Super-delta was shown to have better statistical power with tighter control of type I error rate than its competitors. In many cases, the performance of super-delta is close to that of an oracle test in which datasets without technical noise were used. We then applied all methods to a collection of gene expression datasets on breast cancer patients who received neoadjuvant chemotherapy. While there is a substantial overlap of the DEGs identified by all of them, super-delta were able to identify comparatively more DEGs than its competitors. Downstream gene set enrichment analysis confirmed that all these methods selected largely consistent pathways. Detailed investigations on the relatively small differences showed that pathways identified by super-delta have better connections to breast cancer than other methods. As a new pipeline, super-delta provides new insights to the area of differential gene expression analysis. Solid theoretical foundation supports its asymptotic unbiasedness and technical noise-free properties. Implementation on real and simulated datasets demonstrates its decent performance compared with state-of-art procedures. It also has the potential of expansion to be incorporated with other data type and/or more general between-group comparison problems.

  12. Problems with Multivariate Normality: Can the Multivariate Bootstrap Help?

    ERIC Educational Resources Information Center

    Thompson, Bruce

    Multivariate normality is required for some statistical tests. This paper explores the implications of violating the assumption of multivariate normality and illustrates a graphical procedure for evaluating multivariate normality. The logic for using the multivariate bootstrap is presented. The multivariate bootstrap can be used when distribution…

  13. Analyzing Faculty Salaries When Statistics Fail.

    ERIC Educational Resources Information Center

    Simpson, William A.

    The role played by nonstatistical procedures, in contrast to multivariant statistical approaches, in analyzing faculty salaries is discussed. Multivariant statistical methods are usually used to establish or defend against prima facia cases of gender and ethnic discrimination with respect to faculty salaries. These techniques are not applicable,…

  14. Common method biases in behavioral research: a critical review of the literature and recommended remedies.

    PubMed

    Podsakoff, Philip M; MacKenzie, Scott B; Lee, Jeong-Yeon; Podsakoff, Nathan P

    2003-10-01

    Interest in the problem of method biases has a long history in the behavioral sciences. Despite this, a comprehensive summary of the potential sources of method biases and how to control for them does not exist. Therefore, the purpose of this article is to examine the extent to which method biases influence behavioral research results, identify potential sources of method biases, discuss the cognitive processes through which method biases influence responses to measures, evaluate the many different procedural and statistical techniques that can be used to control method biases, and provide recommendations for how to select appropriate procedural and statistical remedies for different types of research settings.

  15. Post-processing of multi-model ensemble river discharge forecasts using censored EMOS

    NASA Astrophysics Data System (ADS)

    Hemri, Stephan; Lisniak, Dmytro; Klein, Bastian

    2014-05-01

    When forecasting water levels and river discharge, ensemble weather forecasts are used as meteorological input to hydrologic process models. As hydrologic models are imperfect and the input ensembles tend to be biased and underdispersed, the output ensemble forecasts for river runoff typically are biased and underdispersed, too. Thus, statistical post-processing is required in order to achieve calibrated and sharp predictions. Standard post-processing methods such as Ensemble Model Output Statistics (EMOS) that have their origins in meteorological forecasting are now increasingly being used in hydrologic applications. Here we consider two sub-catchments of River Rhine, for which the forecasting system of the Federal Institute of Hydrology (BfG) uses runoff data that are censored below predefined thresholds. To address this methodological challenge, we develop a censored EMOS method that is tailored to such data. The censored EMOS forecast distribution can be understood as a mixture of a point mass at the censoring threshold and a continuous part based on a truncated normal distribution. Parameter estimates of the censored EMOS model are obtained by minimizing the Continuous Ranked Probability Score (CRPS) over the training dataset. Model fitting on Box-Cox transformed data allows us to take account of the positive skewness of river discharge distributions. In order to achieve realistic forecast scenarios over an entire range of lead-times, there is a need for multivariate extensions. To this end, we smooth the marginal parameter estimates over lead-times. In order to obtain realistic scenarios of discharge evolution over time, the marginal distributions have to be linked with each other. To this end, the multivariate dependence structure can either be adopted from the raw ensemble like in Ensemble Copula Coupling (ECC), or be estimated from observations in a training period. The censored EMOS model has been applied to multi-model ensemble forecasts issued on a daily basis over a period of three years. For the two catchments considered, this resulted in well calibrated and sharp forecast distributions over all lead-times from 1 to 114 h. Training observations tended to be better indicators for the dependence structure than the raw ensemble.

  16. Publication bias in situ

    PubMed Central

    Phillips, Carl V

    2004-01-01

    Background Publication bias, as typically defined, refers to the decreased likelihood of studies' results being published when they are near the null, not statistically significant, or otherwise "less interesting." But choices about how to analyze the data and which results to report create a publication bias within the published results, a bias I label "publication bias in situ" (PBIS). Discussion PBIS may create much greater bias in the literature than traditionally defined publication bias (the failure to publish any result from a study). The causes of PBIS are well known, consisting of various decisions about reporting that are influenced by the data. But its impact is not generally appreciated, and very little attention is devoted to it. What attention there is consists largely of rules for statistical analysis that are impractical and do not actually reduce the bias in reported estimates. PBIS cannot be reduced by statistical tools because it is not fundamentally a problem of statistics, but rather of non-statistical choices and plain language interpretations. PBIS should be recognized as a phenomenon worthy of study – it is extremely common and probably has a huge impact on results reported in the literature – and there should be greater systematic efforts to identify and reduce it. The paper presents examples, including results of a recent HIV vaccine trial, that show how easily PBIS can have a large impact on reported results, as well as how there can be no simple answer to it. Summary PBIS is a major problem, worthy of substantially more attention than it receives. There are ways to reduce the bias, but they are very seldom employed because they are largely unrecognized. PMID:15296515

  17. Statistical Downscaling and Bias Correction of Climate Model Outputs for Climate Change Impact Assessment in the U.S. Northeast

    NASA Technical Reports Server (NTRS)

    Ahmed, Kazi Farzan; Wang, Guiling; Silander, John; Wilson, Adam M.; Allen, Jenica M.; Horton, Radley; Anyah, Richard

    2013-01-01

    Statistical downscaling can be used to efficiently downscale a large number of General Circulation Model (GCM) outputs to a fine temporal and spatial scale. To facilitate regional impact assessments, this study statistically downscales (to 1/8deg spatial resolution) and corrects the bias of daily maximum and minimum temperature and daily precipitation data from six GCMs and four Regional Climate Models (RCMs) for the northeast United States (US) using the Statistical Downscaling and Bias Correction (SDBC) approach. Based on these downscaled data from multiple models, five extreme indices were analyzed for the future climate to quantify future changes of climate extremes. For a subset of models and indices, results based on raw and bias corrected model outputs for the present-day climate were compared with observations, which demonstrated that bias correction is important not only for GCM outputs, but also for RCM outputs. For future climate, bias correction led to a higher level of agreements among the models in predicting the magnitude and capturing the spatial pattern of the extreme climate indices. We found that the incorporation of dynamical downscaling as an intermediate step does not lead to considerable differences in the results of statistical downscaling for the study domain.

  18. Learning multivariate distributions by competitive assembly of marginals.

    PubMed

    Sánchez-Vega, Francisco; Younes, Laurent; Geman, Donald

    2013-02-01

    We present a new framework for learning high-dimensional multivariate probability distributions from estimated marginals. The approach is motivated by compositional models and Bayesian networks, and designed to adapt to small sample sizes. We start with a large, overlapping set of elementary statistical building blocks, or "primitives," which are low-dimensional marginal distributions learned from data. Each variable may appear in many primitives. Subsets of primitives are combined in a Lego-like fashion to construct a probabilistic graphical model; only a small fraction of the primitives will participate in any valid construction. Since primitives can be precomputed, parameter estimation and structure search are separated. Model complexity is controlled by strong biases; we adapt the primitives to the amount of training data and impose rules which restrict the merging of them into allowable compositions. The likelihood of the data decomposes into a sum of local gains, one for each primitive in the final structure. We focus on a specific subclass of networks which are binary forests. Structure optimization corresponds to an integer linear program and the maximizing composition can be computed for reasonably large numbers of variables. Performance is evaluated using both synthetic data and real datasets from natural language processing and computational biology.

  19. Using Meteorological Analogues for Reordering Postprocessed Precipitation Ensembles in Hydrological Forecasting

    NASA Astrophysics Data System (ADS)

    Bellier, Joseph; Bontron, Guillaume; Zin, Isabella

    2017-12-01

    Meteorological ensemble forecasts are nowadays widely used as input of hydrological models for probabilistic streamflow forecasting. These forcings are frequently biased and have to be statistically postprocessed, using most of the time univariate techniques that apply independently to individual locations, lead times and weather variables. Postprocessed ensemble forecasts therefore need to be reordered so as to reconstruct suitable multivariate dependence structures. The Schaake shuffle and ensemble copula coupling are the two most popular methods for this purpose. This paper proposes two adaptations of them that make use of meteorological analogues for reconstructing spatiotemporal dependence structures of precipitation forecasts. Performances of the original and adapted techniques are compared through a multistep verification experiment using real forecasts from the European Centre for Medium-Range Weather Forecasts. This experiment evaluates not only multivariate precipitation forecasts but also the corresponding streamflow forecasts that derive from hydrological modeling. Results show that the relative performances of the different reordering methods vary depending on the verification step. In particular, the standard Schaake shuffle is found to perform poorly when evaluated on streamflow. This emphasizes the crucial role of the precipitation spatiotemporal dependence structure in hydrological ensemble forecasting.

  20. Complex codon usage pattern and compositional features of retroviruses.

    PubMed

    RoyChoudhury, Sourav; Mukherjee, Debaprasad

    2013-01-01

    Retroviruses infect a wide range of organisms including humans. Among them, HIV-1, which causes AIDS, has now become a major threat for world health. Some of these viruses are also potential gene transfer vectors. In this study, the patterns of synonymous codon usage in retroviruses have been studied through multivariate statistical methods on ORFs sequences from the available 56 retroviruses. The principal determinant for evolution of the codon usage pattern in retroviruses seemed to be the compositional constraints, while selection for translation of the viral genes plays a secondary role. This was further supported by multivariate analysis on relative synonymous codon usage. Thus, it seems that mutational bias might have dominated role over translational selection in shaping the codon usage of retroviruses. Codon adaptation index was used to identify translationally optimal codons among genes from retroviruses. The comparative analysis of the preferred and optimal codons among different retroviral groups revealed that four codons GAA, AAA, AGA, and GGA were significantly more frequent in most of the retroviral genes inspite of some differences. Cluster analysis also revealed that phylogenetically related groups of retroviruses have probably evolved their codon usage in a concerted manner under the influence of their nucleotide composition.

  1. Multivariate Relationships between Statistics Anxiety and Motivational Beliefs

    ERIC Educational Resources Information Center

    Baloglu, Mustafa; Abbassi, Amir; Kesici, Sahin

    2017-01-01

    In general, anxiety has been found to be associated with motivational beliefs and the current study investigated multivariate relationships between statistics anxiety and motivational beliefs among 305 college students (60.0% women). The Statistical Anxiety Rating Scale, the Motivated Strategies for Learning Questionnaire, and a set of demographic…

  2. Patterns of Unit and Item Nonresponse in the CAHPS® Hospital Survey

    PubMed Central

    Elliott, Marc N; Edwards, Carol; Angeles, January; Hambarsoomians, Katrin; Hays, Ron D

    2005-01-01

    Objective To examine the predictors of unit and item nonresponse, the magnitude of nonresponse bias, and the need for nonresponse weights in the Consumer Assessment of Health Care Providers and Systems (CAHPS®) Hospital Survey. Methods A common set of 11 administrative variables (41 degrees of freedom) was used to predict unit nonresponse and the rate of item nonresponse in multivariate models. Descriptive statistics were used to examine the impact of nonresponse on CAHPS Hospital Survey ratings and reports. Results Unit nonresponse was highest for younger patients and patients other than non-Hispanic whites (p<.001); item nonresponse increased steadily with age (p<.001). Fourteen of 20 reports of ratings of care had significant (p<.05) but small negative correlations with nonresponse weights (median −0.06; maximum −0.09). Nonresponse weights do not improve overall precision below sample sizes of 300–1,000, and are unlikely to improve the precision of hospital comparisons. In some contexts, case-mix adjustment eliminates most observed nonresponse bias. Conclusions Nonresponse weights should not be used for between-hospital comparisons of the CAHPS Hospital Survey, but may make small contributions to overall estimates or demographic comparisons, especially in the absence of case-mix adjustment. PMID:16316440

  3. EMC Global Climate And Weather Modeling Branch Personnel

    Science.gov Websites

    Comparison Statistics which includes: NCEP Raw and Bias-Corrected Ensemble Domain Averaged Bias NCEP Raw and Bias-Corrected Ensemble Domain Averaged Bias Reduction (Percents) CMC Raw and Bias-Corrected Control Forecast Domain Averaged Bias CMC Raw and Bias-Corrected Control Forecast Domain Averaged Bias Reduction

  4. Statistical Clustering and the Contents of the Infant Vocabulary

    ERIC Educational Resources Information Center

    Swingley, Daniel

    2005-01-01

    Infants parse speech into word-sized units according to biases that develop in the first year. One bias, present before the age of 7 months, is to cluster syllables that tend to co-occur. The present computational research demonstrates that this statistical clustering bias could lead to the extraction of speech sequences that are actual words,…

  5. Traits, States, and Attentional Gates: Temperament and Threat Relevance as Predictors of Attentional Bias to Social Threat

    PubMed Central

    Helzer, Erik G.; Connor-Smith, Jennifer K.; Reed, Marjorie A.

    2009-01-01

    This study investigated the influence of situational and dispositional factors on attentional biases toward social threat, and the impact of these attentional biases on distress in a sample of adolescents. Results suggest greater biases for personally-relevant threat cues, as individuals reporting high social stress were vigilant to subliminal social threat cues, but not physical threat cues, and those reporting low social stress showed no attentional biases. Individual differences in fearful temperament and attentional control interacted to influence attentional biases, with fearful temperament predicting biases to supraliminal social threat only for individuals with poor attentional control. Multivariate analyses exploring relations between attentional biases for social threat and symptoms of anxiety and depression revealed that attentional biases alone were rarely related to symptoms. However, biases did interact with social stress, fearful temperament, and attentional control to predict distress. Results are discussed in terms of automatic and effortful cognitive mechanisms underlying threat cue processing. PMID:18791905

  6. Antecedents to Organizational Performance: Theoretical and Practical Implications for Aircraft Maintenance Officer Force Development

    DTIC Science & Technology

    2015-03-26

    to my reader, Lieutenant Colonel Robert Overstreet, for helping solidify my research, coaching me through the statistical analysis, and positive...61  Descriptive Statistics .............................................................................................................. 61...common-method bias requires careful assessment of potential sources of bias and implementing procedural and statistical control methods. Podsakoff

  7. Fractional Brownian motion and multivariate-t models for longitudinal biomedical data, with application to CD4 counts in HIV-positive patients.

    PubMed

    Stirrup, Oliver T; Babiker, Abdel G; Carpenter, James R; Copas, Andrew J

    2016-04-30

    Longitudinal data are widely analysed using linear mixed models, with 'random slopes' models particularly common. However, when modelling, for example, longitudinal pre-treatment CD4 cell counts in HIV-positive patients, the incorporation of non-stationary stochastic processes such as Brownian motion has been shown to lead to a more biologically plausible model and a substantial improvement in model fit. In this article, we propose two further extensions. Firstly, we propose the addition of a fractional Brownian motion component, and secondly, we generalise the model to follow a multivariate-t distribution. These extensions are biologically plausible, and each demonstrated substantially improved fit on application to example data from the Concerted Action on SeroConversion to AIDS and Death in Europe study. We also propose novel procedures for residual diagnostic plots that allow such models to be assessed. Cohorts of patients were simulated from the previously reported and newly developed models in order to evaluate differences in predictions made for the timing of treatment initiation under different clinical management strategies. A further simulation study was performed to demonstrate the substantial biases in parameter estimates of the mean slope of CD4 decline with time that can occur when random slopes models are applied in the presence of censoring because of treatment initiation, with the degree of bias found to depend strongly on the treatment initiation rule applied. Our findings indicate that researchers should consider more complex and flexible models for the analysis of longitudinal biomarker data, particularly when there are substantial missing data, and that the parameter estimates from random slopes models must be interpreted with caution. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  8. Multi objective climate change impact assessment using multi downscaled climate scenarios

    NASA Astrophysics Data System (ADS)

    Rana, Arun; Moradkhani, Hamid

    2016-04-01

    Global Climate Models (GCMs) are often used to downscale the climatic parameters on a regional and global scale. In the present study, we have analyzed the changes in precipitation and temperature for future scenario period of 2070-2099 with respect to historical period of 1970-2000 from a set of statistically downscaled GCM projections for Columbia River Basin (CRB). Analysis is performed using 2 different statistically downscaled climate projections namely the Bias Correction and Spatial Downscaling (BCSD) technique generated at Portland State University and the Multivariate Adaptive Constructed Analogs (MACA) technique, generated at University of Idaho, totaling to 40 different scenarios. Analysis is performed on spatial, temporal and frequency based parameters in the future period at a scale of 1/16th of degree for entire CRB region. Results have indicated in varied degree of spatial change pattern for the entire Columbia River Basin, especially western part of the basin. At temporal scales, winter precipitation has higher variability than summer and vice-versa for temperature. Frequency analysis provided insights into possible explanation to changes in precipitation.

  9. Big Data Analytics for Scanning Transmission Electron Microscopy Ptychography

    NASA Astrophysics Data System (ADS)

    Jesse, S.; Chi, M.; Belianinov, A.; Beekman, C.; Kalinin, S. V.; Borisevich, A. Y.; Lupini, A. R.

    2016-05-01

    Electron microscopy is undergoing a transition; from the model of producing only a few micrographs, through the current state where many images and spectra can be digitally recorded, to a new mode where very large volumes of data (movies, ptychographic and multi-dimensional series) can be rapidly obtained. Here, we discuss the application of so-called “big-data” methods to high dimensional microscopy data, using unsupervised multivariate statistical techniques, in order to explore salient image features in a specific example of BiFeO3 domains. Remarkably, k-means clustering reveals domain differentiation despite the fact that the algorithm is purely statistical in nature and does not require any prior information regarding the material, any coexisting phases, or any differentiating structures. While this is a somewhat trivial case, this example signifies the extraction of useful physical and structural information without any prior bias regarding the sample or the instrumental modality. Further interpretation of these types of results may still require human intervention. However, the open nature of this algorithm and its wide availability, enable broad collaborations and exploratory work necessary to enable efficient data analysis in electron microscopy.

  10. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

    PubMed

    Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

    2016-07-01

    A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  11. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

    PubMed Central

    Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J.; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T.; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

    2016-01-01

    Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco Contacts: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153689

  12. Effects of drain bias on the statistical variation of double-gate tunnel field-effect transistors

    NASA Astrophysics Data System (ADS)

    Choi, Woo Young

    2017-04-01

    The effects of drain bias on the statistical variation of double-gate (DG) tunnel field-effect transistors (TFETs) are discussed in comparison with DG metal-oxide-semiconductor FETs (MOSFETs). Statistical variation corresponds to the variation of threshold voltage (V th), subthreshold swing (SS), and drain-induced barrier thinning (DIBT). The unique statistical variation characteristics of DG TFETs and DG MOSFETs with the variation of drain bias are analyzed by using full three-dimensional technology computer-aided design (TCAD) simulation in terms of the three dominant variation sources: line-edge roughness (LER), random dopant fluctuation (RDF) and workfunction variation (WFV). It is observed than DG TFETs suffer from less severe statistical variation as drain voltage increases unlike DG MOSFETs.

  13. Selective Weighted Least Squares Method for Fourier Transform Infrared Quantitative Analysis.

    PubMed

    Wang, Xin; Li, Yan; Wei, Haoyun; Chen, Xia

    2017-06-01

    Classical least squares (CLS) regression is a popular multivariate statistical method used frequently for quantitative analysis using Fourier transform infrared (FT-IR) spectrometry. Classical least squares provides the best unbiased estimator for uncorrelated residual errors with zero mean and equal variance. However, the noise in FT-IR spectra, which accounts for a large portion of the residual errors, is heteroscedastic. Thus, if this noise with zero mean dominates in the residual errors, the weighted least squares (WLS) regression method described in this paper is a better estimator than CLS. However, if bias errors, such as the residual baseline error, are significant, WLS may perform worse than CLS. In this paper, we compare the effect of noise and bias error in using CLS and WLS in quantitative analysis. Results indicated that for wavenumbers with low absorbance, the bias error significantly affected the error, such that the performance of CLS is better than that of WLS. However, for wavenumbers with high absorbance, the noise significantly affected the error, and WLS proves to be better than CLS. Thus, we propose a selective weighted least squares (SWLS) regression that processes data with different wavenumbers using either CLS or WLS based on a selection criterion, i.e., lower or higher than an absorbance threshold. The effects of various factors on the optimal threshold value (OTV) for SWLS have been studied through numerical simulations. These studies reported that: (1) the concentration and the analyte type had minimal effect on OTV; and (2) the major factor that influences OTV is the ratio between the bias error and the standard deviation of the noise. The last part of this paper is dedicated to quantitative analysis of methane gas spectra, and methane/toluene mixtures gas spectra as measured using FT-IR spectrometry and CLS, WLS, and SWLS. The standard error of prediction (SEP), bias of prediction (bias), and the residual sum of squares of the errors (RSS) from the three quantitative analyses were compared. In methane gas analysis, SWLS yielded the lowest SEP and RSS among the three methods. In methane/toluene mixture gas analysis, a modification of the SWLS has been presented to tackle the bias error from other components. The SWLS without modification presents the lowest SEP in all cases but not bias and RSS. The modification of SWLS reduced the bias, which showed a lower RSS than CLS, especially for small components.

  14. Publication Bias ( The "File-Drawer Problem") in Scientific Inference

    NASA Technical Reports Server (NTRS)

    Scargle, Jeffrey D.; DeVincenzi, Donald (Technical Monitor)

    1999-01-01

    Publication bias arises whenever the probability that a study is published depends on the statistical significance of its results. This bias, often called the file-drawer effect since the unpublished results are imagined to be tucked away in researchers' file cabinets, is potentially a severe impediment to combining the statistical results of studies collected from the literature. With almost any reasonable quantitative model for publication bias, only a small number of studies lost in the file-drawer will produce a significant bias. This result contradicts the well known Fail Safe File Drawer (FSFD) method for setting limits on the potential harm of publication bias, widely used in social, medical and psychic research. This method incorrectly treats the file drawer as unbiased, and almost always miss-estimates the seriousness of publication bias. A large body of not only psychic research, but medical and social science studies, has mistakenly relied on this method to validate claimed discoveries. Statistical combination can be trusted only if it is known with certainty that all studies that have been carried out are included. Such certainty is virtually impossible to achieve in literature surveys.

  15. Full data acquisition in Kelvin Probe Force Microscopy: Mapping dynamic electric phenomena in real space

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Balke, Nina; Kalinin, Sergei V.; Jesse, Stephen

    Kelvin probe force microscopy (KPFM) has provided deep insights into the role local electronic, ionic and electrochemical processes play on the global functionality of materials and devices, even down to the atomic scale. Conventional KPFM utilizes heterodyne detection and bias feedback to measure the contact potential difference (CPD) between tip and sample. This measurement paradigm, however, permits only partial recovery of the information encoded in bias- and time-dependent electrostatic interactions between the tip and sample and effectively down-samples the cantilever response to a single measurement of CPD per pixel. This level of detail is insufficient for electroactive materials, devices, ormore » solid-liquid interfaces, where non-linear dielectrics are present or spurious electrostatic events are possible. Here, we simulate and experimentally validate a novel approach for spatially resolved KPFM capable of a full information transfer of the dynamic electric processes occurring between tip and sample. General acquisition mode, or G-Mode, adopts a big data approach utilising high speed detection, compression, and storage of the raw cantilever deflection signal in its entirety at high sampling rates (> 4 MHz), providing a permanent record of the tip trajectory. We develop a range of methodologies for analysing the resultant large multidimensional datasets involving classical, physics-based and information-based approaches. Physics-based analysis of G-Mode KPFM data recovers the parabolic bias dependence of the electrostatic force for each cycle of the excitation voltage, leading to a multidimensional dataset containing spatial and temporal dependence of the CPD and capacitance channels. We use multivariate statistical methods to reduce data volume and separate the complex multidimensional data sets into statistically significant components that can then be mapped onto separate physical mechanisms. Overall, G-Mode KPFM offers a new paradigm to study dynamic electric phenomena in electroactive interfaces as well as offer a promising approach to extend KPFM to solid-liquid interfaces.« less

  16. Full data acquisition in Kelvin Probe Force Microscopy: Mapping dynamic electric phenomena in real space

    DOE PAGES

    Balke, Nina; Kalinin, Sergei V.; Jesse, Stephen; ...

    2016-08-12

    Kelvin probe force microscopy (KPFM) has provided deep insights into the role local electronic, ionic and electrochemical processes play on the global functionality of materials and devices, even down to the atomic scale. Conventional KPFM utilizes heterodyne detection and bias feedback to measure the contact potential difference (CPD) between tip and sample. This measurement paradigm, however, permits only partial recovery of the information encoded in bias- and time-dependent electrostatic interactions between the tip and sample and effectively down-samples the cantilever response to a single measurement of CPD per pixel. This level of detail is insufficient for electroactive materials, devices, ormore » solid-liquid interfaces, where non-linear dielectrics are present or spurious electrostatic events are possible. Here, we simulate and experimentally validate a novel approach for spatially resolved KPFM capable of a full information transfer of the dynamic electric processes occurring between tip and sample. General acquisition mode, or G-Mode, adopts a big data approach utilising high speed detection, compression, and storage of the raw cantilever deflection signal in its entirety at high sampling rates (> 4 MHz), providing a permanent record of the tip trajectory. We develop a range of methodologies for analysing the resultant large multidimensional datasets involving classical, physics-based and information-based approaches. Physics-based analysis of G-Mode KPFM data recovers the parabolic bias dependence of the electrostatic force for each cycle of the excitation voltage, leading to a multidimensional dataset containing spatial and temporal dependence of the CPD and capacitance channels. We use multivariate statistical methods to reduce data volume and separate the complex multidimensional data sets into statistically significant components that can then be mapped onto separate physical mechanisms. Overall, G-Mode KPFM offers a new paradigm to study dynamic electric phenomena in electroactive interfaces as well as offer a promising approach to extend KPFM to solid-liquid interfaces.« less

  17. Multivariate assessment of event-related potentials with the t-CWT method.

    PubMed

    Bostanov, Vladimir

    2015-11-05

    Event-related brain potentials (ERPs) are usually assessed with univariate statistical tests although they are essentially multivariate objects. Brain-computer interface applications are a notable exception to this practice, because they are based on multivariate classification of single-trial ERPs. Multivariate ERP assessment can be facilitated by feature extraction methods. One such method is t-CWT, a mathematical-statistical algorithm based on the continuous wavelet transform (CWT) and Student's t-test. This article begins with a geometric primer on some basic concepts of multivariate statistics as applied to ERP assessment in general and to the t-CWT method in particular. Further, it presents for the first time a detailed, step-by-step, formal mathematical description of the t-CWT algorithm. A new multivariate outlier rejection procedure based on principal component analysis in the frequency domain is presented as an important pre-processing step. The MATLAB and GNU Octave implementation of t-CWT is also made publicly available for the first time as free and open source code. The method is demonstrated on some example ERP data obtained in a passive oddball paradigm. Finally, some conceptually novel applications of the multivariate approach in general and of the t-CWT method in particular are suggested and discussed. Hopefully, the publication of both the t-CWT source code and its underlying mathematical algorithm along with a didactic geometric introduction to some basic concepts of multivariate statistics would make t-CWT more accessible to both users and developers in the field of neuroscience research.

  18. Instrumental Neutron Activation Analysis and Multivariate Statistics for Pottery Provenance

    NASA Astrophysics Data System (ADS)

    Glascock, M. D.; Neff, H.; Vaughn, K. J.

    2004-06-01

    The application of instrumental neutron activation analysis and multivariate statistics to archaeological studies of ceramics and clays is described. A small pottery data set from the Nasca culture in southern Peru is presented for illustration.

  19. What's Hot and What's Not: Multivariate Statistical Analysis of Ten Labile Trace Elements in H-Chondrite Population Pairs

    NASA Astrophysics Data System (ADS)

    Wolf, S. F.; Lipschutz, M. E.

    1993-07-01

    Dodd et al. [1] found that, from their circumstances of fall, 17 H chondrites ("H Cluster 1") which fell in May, from 1855 to 1895, are distinguishable from other H chondrite falls and apparently derive from a co-orbital stream of meteoroids. From data for 10 moderately to highly labile trace elements (Rb, Ag, Se, Cs, Te, Zn, Cd, Bi, Tl, In), they used two multivariate statistical techniques--linear discriminant analysis and logistic regression--to demonstrate that 1. 13 H Cluster 1 chondrites are compositionally distinguishable from 45 other H chondrite falls, probably because of differences in thermal histories of the meteorites' parent materials; 2. The reality of the compositional differences between the populations of falls are beyond any reasonable statistical doubt. 3. The compositional differences are inconsistent with the notion that the results reflect analytical bias. We have used these techniques to assess analogous data for various H chondrite populations [2-4] with results that are listed in Table 1. These data indicate that 1. There is no statistical reason to believe that random populations from Victoria Land, Antarctica, differ compositionally from each other. 2. There is significant statistical reason to believe that the H chondrite population recovered from Victoria Land, Antarctica, differs compositionally from that from Queen Maud Land, Antarctica, and from falls. 3. There is no reason to believe that the H chondrite population recovered from Queen Maud Land, Antarctica, differs compositionally from falls. 4. These observations can be made either by data obtained by one analyst or several. These results, coupled with earlier ones [5], demonstrate that trivial explanations cannot explain compositional differences involving labile trace elements in pairs of H chondrite populations. These differences must then reflect differences of preterrestrial thermal histories of the meteorites' parent materials. Acceptance of these differences as preterrestrial has led to predictions subsequently verified by others (meteoroid and asteroid stream discoveries, differencesin thermoluminescence or TL). We predict that a TL difference will be seen between the populations of falls defined by Dodd et al. [1]. References: [1] Dodd R. T. et al. (1993) JGR, submitted. [2] Lingner D. W. et al. (1987) GCA, 51, 727-739. [3] Dennison J. E. and Lipschutz M. E. (1987) GCA, 51, 741-754. [4] Wolf S. F. and Lipschutz M. E. (1993) in Advances in Analytical Geochemistry (M. Hyman and M. Rowe, eds.), in press. [5] Wang M.-S. et al. (1992) Meteoritics, 27, 303. [6] Lipschutz M. E. and Samuels S. M. (1991) GCA, 55, 19-47. Table 1, which appears in the hard copy, shows a multivariate statistical analysis of H chondrite population pairs using 10 labile trace elements (number of meteorites in population in parentheses).

  20. Evaluation on the use of cerium in the NBL Titrimetric Method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zebrowski, J.P.; Orlowicz, G.J.; Johnson, K.D.

    An alternative to potassium dichromate as titrant in the New Brunswick Laboratory Titrimetric Method for uranium analysis was sought since chromium in the waste makes disposal difficult. Substitution of a ceric-based titrant was statistically evaluated. Analysis of the data indicated statistically equivalent precisions for the two methods, but a significant overall bias of +0.035% for the ceric titrant procedure. The cause of the bias was investigated, alterations to the procedure were made, and a second statistical study was performed. This second study revealed no statistically significant bias, nor any analyst-to-analyst variation in the ceric titration procedure. A statistically significant day-to-daymore » variation was detected, but this was physically small (0.01 5%) and was only detected because of the within-day precision of the method. The added mean and standard deviation of the %RD for a single measurement was found to be 0.031%. A comparison with quality control blind dichromate titration data again indicated similar overall precision. Effects of ten elements on the ceric titration`s performance was determined. Co, Ti, Cu, Ni, Na, Mg, Gd, Zn, Cd, and Cr in previous work at NBL these impurities did not interfere with the potassium dichromate titrant. This study indicated similar results for the ceric titrant, with the exception of Ti. All the elements (excluding Ti and Cr), caused no statistically significant bias in uranium measurements at levels of 10 mg impurity per 20-40 mg uranium. The presence of Ti was found to cause a bias of {minus}0.05%; this is attributed to the presence of sulfate ions, resulting in precipitation of titanium sulfate and occlusion of uranium. A negative bias of 0.012% was also statistically observed in the samples containing chromium impurities.« less

  1. Bias, precision and statistical power of analysis of covariance in the analysis of randomized trials with baseline imbalance: a simulation study.

    PubMed

    Egbewale, Bolaji E; Lewis, Martyn; Sim, Julius

    2014-04-09

    Analysis of variance (ANOVA), change-score analysis (CSA) and analysis of covariance (ANCOVA) respond differently to baseline imbalance in randomized controlled trials. However, no empirical studies appear to have quantified the differential bias and precision of estimates derived from these methods of analysis, and their relative statistical power, in relation to combinations of levels of key trial characteristics. This simulation study therefore examined the relative bias, precision and statistical power of these three analyses using simulated trial data. 126 hypothetical trial scenarios were evaluated (126,000 datasets), each with continuous data simulated by using a combination of levels of: treatment effect; pretest-posttest correlation; direction and magnitude of baseline imbalance. The bias, precision and power of each method of analysis were calculated for each scenario. Compared to the unbiased estimates produced by ANCOVA, both ANOVA and CSA are subject to bias, in relation to pretest-posttest correlation and the direction of baseline imbalance. Additionally, ANOVA and CSA are less precise than ANCOVA, especially when pretest-posttest correlation ≥ 0.3. When groups are balanced at baseline, ANCOVA is at least as powerful as the other analyses. Apparently greater power of ANOVA and CSA at certain imbalances is achieved in respect of a biased treatment effect. Across a range of correlations between pre- and post-treatment scores and at varying levels and direction of baseline imbalance, ANCOVA remains the optimum statistical method for the analysis of continuous outcomes in RCTs, in terms of bias, precision and statistical power.

  2. Bias, precision and statistical power of analysis of covariance in the analysis of randomized trials with baseline imbalance: a simulation study

    PubMed Central

    2014-01-01

    Background Analysis of variance (ANOVA), change-score analysis (CSA) and analysis of covariance (ANCOVA) respond differently to baseline imbalance in randomized controlled trials. However, no empirical studies appear to have quantified the differential bias and precision of estimates derived from these methods of analysis, and their relative statistical power, in relation to combinations of levels of key trial characteristics. This simulation study therefore examined the relative bias, precision and statistical power of these three analyses using simulated trial data. Methods 126 hypothetical trial scenarios were evaluated (126 000 datasets), each with continuous data simulated by using a combination of levels of: treatment effect; pretest-posttest correlation; direction and magnitude of baseline imbalance. The bias, precision and power of each method of analysis were calculated for each scenario. Results Compared to the unbiased estimates produced by ANCOVA, both ANOVA and CSA are subject to bias, in relation to pretest-posttest correlation and the direction of baseline imbalance. Additionally, ANOVA and CSA are less precise than ANCOVA, especially when pretest-posttest correlation ≥ 0.3. When groups are balanced at baseline, ANCOVA is at least as powerful as the other analyses. Apparently greater power of ANOVA and CSA at certain imbalances is achieved in respect of a biased treatment effect. Conclusions Across a range of correlations between pre- and post-treatment scores and at varying levels and direction of baseline imbalance, ANCOVA remains the optimum statistical method for the analysis of continuous outcomes in RCTs, in terms of bias, precision and statistical power. PMID:24712304

  3. Conservative Tests under Satisficing Models of Publication Bias.

    PubMed

    McCrary, Justin; Christensen, Garret; Fanelli, Daniele

    2016-01-01

    Publication bias leads consumers of research to observe a selected sample of statistical estimates calculated by producers of research. We calculate critical values for statistical significance that could help to adjust after the fact for the distortions created by this selection effect, assuming that the only source of publication bias is file drawer bias. These adjusted critical values are easy to calculate and differ from unadjusted critical values by approximately 50%-rather than rejecting a null hypothesis when the t-ratio exceeds 2, the analysis suggests rejecting a null hypothesis when the t-ratio exceeds 3. Samples of published social science research indicate that on average, across research fields, approximately 30% of published t-statistics fall between the standard and adjusted cutoffs.

  4. Conservative Tests under Satisficing Models of Publication Bias

    PubMed Central

    McCrary, Justin; Christensen, Garret; Fanelli, Daniele

    2016-01-01

    Publication bias leads consumers of research to observe a selected sample of statistical estimates calculated by producers of research. We calculate critical values for statistical significance that could help to adjust after the fact for the distortions created by this selection effect, assuming that the only source of publication bias is file drawer bias. These adjusted critical values are easy to calculate and differ from unadjusted critical values by approximately 50%—rather than rejecting a null hypothesis when the t-ratio exceeds 2, the analysis suggests rejecting a null hypothesis when the t-ratio exceeds 3. Samples of published social science research indicate that on average, across research fields, approximately 30% of published t-statistics fall between the standard and adjusted cutoffs. PMID:26901834

  5. Velocity bias in the distribution of dark matter halos

    NASA Astrophysics Data System (ADS)

    Baldauf, Tobias; Desjacques, Vincent; Seljak, Uroš

    2015-12-01

    The standard formalism for the coevolution of halos and dark matter predicts that any initial halo velocity bias rapidly decays to zero. We argue that, when the purpose is to compute statistics like power spectra etc., the coupling in the momentum conservation equation for the biased tracers must be modified. Our new formulation predicts the constancy in time of any statistical halo velocity bias present in the initial conditions, in agreement with peak theory. We test this prediction by studying the evolution of a conserved halo population in N -body simulations. We establish that the initial simulated halo density and velocity statistics show distinct features of the peak model and, thus, deviate from the simple local Lagrangian bias. We demonstrate, for the first time, that the time evolution of their velocity is in tension with the rapid decay expected in the standard approach.

  6. Multivariate Bias Correction Procedures for Improving Water Quality Predictions from the SWAT Model

    NASA Astrophysics Data System (ADS)

    Arumugam, S.; Libera, D.

    2017-12-01

    Water quality observations are usually not available on a continuous basis for longer than 1-2 years at a time over a decadal period given the labor requirements making calibrating and validating mechanistic models difficult. Further, any physical model predictions inherently have bias (i.e., under/over estimation) and require post-simulation techniques to preserve the long-term mean monthly attributes. This study suggests a multivariate bias-correction technique and compares to a common technique in improving the performance of the SWAT model in predicting daily streamflow and TN loads across the southeast based on split-sample validation. The approach is a dimension reduction technique, canonical correlation analysis (CCA) that regresses the observed multivariate attributes with the SWAT model simulated values. The common approach is a regression based technique that uses an ordinary least squares regression to adjust model values. The observed cross-correlation between loadings and streamflow is better preserved when using canonical correlation while simultaneously reducing individual biases. Additionally, canonical correlation analysis does a better job in preserving the observed joint likelihood of observed streamflow and loadings. These procedures were applied to 3 watersheds chosen from the Water Quality Network in the Southeast Region; specifically, watersheds with sufficiently large drainage areas and number of observed data points. The performance of these two approaches are compared for the observed period and over a multi-decadal period using loading estimates from the USGS LOADEST model. Lastly, the CCA technique is applied in a forecasting sense by using 1-month ahead forecasts of P & T from ECHAM4.5 as forcings in the SWAT model. Skill in using the SWAT model for forecasting loadings and streamflow at the monthly and seasonal timescale is also discussed.

  7. Assessment of data pre-processing methods for LC-MS/MS-based metabolomics of uterine cervix cancer.

    PubMed

    Chen, Yanhua; Xu, Jing; Zhang, Ruiping; Shen, Guoqing; Song, Yongmei; Sun, Jianghao; He, Jiuming; Zhan, Qimin; Abliz, Zeper

    2013-05-07

    A metabolomics strategy based on rapid resolution liquid chromatography/tandem mass spectrometry (RRLC-MS/MS) and multivariate statistics has been implemented to identify potential biomarkers in uterine cervix cancer. Due to the importance of the data pre-processing method, three popular software packages have been compared. Then they have been used to acquire respective data matrices from the same LC-MS/MS data. Multivariate statistics was subsequently used to identify significantly changed biomarkers for uterine cervix cancer from the resulting data matrices. The reliabilities of the identified discriminated metabolites have been further validated on the basis of manually extracted data and ROC curves. Nine potential biomarkers have been identified as having a close relationship with uterine cervix cancer. Considering these in combination as a biomarker group, the AUC amounted to 0.997, with a sensitivity of 92.9% and a specificity of 95.6%. The prediction accuracy was 96.6%. Among these potential biomarkers, the amounts of four purine derivatives were greatly decreased, which might be related to a P2 receptor that might lead to a decrease in cell number through apoptosis. Moreover, only two of them were identified simultaneously by all of the pre-processing tools. The results have demonstrated that the data pre-processing method could seriously bias the metabolomics results. Therefore, application of two or more data pre-processing methods would reveal a more comprehensive set of potential biomarkers in non-targeted metabolomics, before a further validation with LC-MS/MS based targeted metabolomics in MRM mode could be conducted.

  8. Radar error statistics for the space shuttle

    NASA Technical Reports Server (NTRS)

    Lear, W. M.

    1979-01-01

    Radar error statistics of C-band and S-band that are recommended for use with the groundtracking programs to process space shuttle tracking data are presented. The statistics are divided into two parts: bias error statistics, using the subscript B, and high frequency error statistics, using the subscript q. Bias errors may be slowly varying to constant. High frequency random errors (noise) are rapidly varying and may or may not be correlated from sample to sample. Bias errors were mainly due to hardware defects and to errors in correction for atmospheric refraction effects. High frequency noise was mainly due to hardware and due to atmospheric scintillation. Three types of atmospheric scintillation were identified: horizontal, vertical, and line of sight. This was the first time that horizontal and line of sight scintillations were identified.

  9. A quantitative approach to evolution of music and philosophy

    NASA Astrophysics Data System (ADS)

    Vieira, Vilson; Fabbri, Renato; Travieso, Gonzalo; Oliveira, Osvaldo N., Jr.; da Fontoura Costa, Luciano

    2012-08-01

    The development of new statistical and computational methods is increasingly making it possible to bridge the gap between hard sciences and humanities. In this study, we propose an approach based on a quantitative evaluation of attributes of objects in fields of humanities, from which concepts such as dialectics and opposition are formally defined mathematically. As case studies, we analyzed the temporal evolution of classical music and philosophy by obtaining data for 8 features characterizing the corresponding fields for 7 well-known composers and philosophers, which were treated with multivariate statistics and pattern recognition methods. A bootstrap method was applied to avoid statistical bias caused by the small sample data set, with which hundreds of artificial composers and philosophers were generated, influenced by the 7 names originally chosen. Upon defining indices for opposition, skewness and counter-dialectics, we confirmed the intuitive analysis of historians in that classical music evolved according to a master-apprentice tradition, while in philosophy changes were driven by opposition. Though these case studies were meant only to show the possibility of treating phenomena in humanities quantitatively, including a quantitative measure of concepts such as dialectics and opposition, the results are encouraging for further application of the approach presented here to many other areas, since it is entirely generic.

  10. Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens.

    PubMed

    Taylor, Sandra L; Ruhaak, L Renee; Weiss, Robert H; Kelly, Karen; Kim, Kyoungmi

    2017-01-01

    High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact between-biospecimen correlation and multivariate analysis results. We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen. We provide R functions to implement and illustrate our method as supplementary information CONTACT: sltaylor@ucdavis.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials

    PubMed Central

    Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo

    2018-01-01

    This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555

  12. Bias correction of daily satellite precipitation data using genetic algorithm

    NASA Astrophysics Data System (ADS)

    Pratama, A. W.; Buono, A.; Hidayat, R.; Harsa, H.

    2018-05-01

    Climate Hazards Group InfraRed Precipitation with Stations (CHIRPS) was producted by blending Satellite-only Climate Hazards Group InfraRed Precipitation (CHIRP) with Stasion observations data. The blending process was aimed to reduce bias of CHIRP. However, Biases of CHIRPS on statistical moment and quantil values were high during wet season over Java Island. This paper presented a bias correction scheme to adjust statistical moment of CHIRP using observation precipitation data. The scheme combined Genetic Algorithm and Nonlinear Power Transformation, the results was evaluated based on different season and different elevation level. The experiment results revealed that the scheme robustly reduced bias on variance around 100% reduction and leaded to reduction of first, and second quantile biases. However, bias on third quantile only reduced during dry months. Based on different level of elevation, the performance of bias correction process is only significantly different on skewness indicators.

  13. Genetic and codon usage bias analyses of polymerase genes of equine influenza virus and its relation to evolution.

    PubMed

    Bera, Bidhan Ch; Virmani, Nitin; Kumar, Naveen; Anand, Taruna; Pavulraj, S; Rash, Adam; Elton, Debra; Rash, Nicola; Bhatia, Sandeep; Sood, Richa; Singh, Raj Kumar; Tripathi, Bhupendra Nath

    2017-08-23

    Equine influenza is a major health problem of equines worldwide. The polymerase genes of influenza virus have key roles in virus replication, transcription, transmission between hosts and pathogenesis. Hence, the comprehensive genetic and codon usage bias of polymerase genes of equine influenza virus (EIV) were analyzed to elucidate the genetic and evolutionary relationships in a novel perspective. The group - specific consensus amino acid substitutions were identified in all polymerase genes of EIVs that led to divergence of EIVs into various clades. The consistent amino acid changes were also detected in the Florida clade 2 EIVs circulating in Europe and Asia since 2007. To study the codon usage patterns, a total of 281,324 codons of polymerase genes of EIV H3N8 isolates from 1963 to 2015 were systemically analyzed. The polymerase genes of EIVs exhibit a weak codon usage bias. The ENc-GC3s and Neutrality plots indicated that natural selection is the major influencing factor of codon usage bias, and that the impact of mutation pressure is comparatively minor. The methods for estimating host imposed translation pressure suggested that the polymerase acidic (PA) gene seems to be under less translational pressure compared to polymerase basic 1 (PB1) and polymerase basic 2 (PB2) genes. The multivariate statistical analysis of polymerase genes divided EIVs into four evolutionary diverged clusters - Pre-divergent, Eurasian, Florida sub-lineage 1 and 2. Various lineage specific amino acid substitutions observed in all polymerase genes of EIVs and especially, clade 2 EIVs underwent major variations which led to the emergence of a phylogenetically distinct group of EIVs originating from Richmond/1/07. The codon usage bias was low in all the polymerase genes of EIVs that was influenced by the multiple factors such as the nucleotide compositions, mutation pressure, aromaticity and hydropathicity. However, natural selection was the major influencing factor in defining the codon usage patterns and evolution of polymerase genes of EIVs.

  14. Performance of the disease risk score in a cohort study with policy-induced selection bias.

    PubMed

    Tadrous, Mina; Mamdani, Muhammad M; Juurlink, David N; Krahn, Murray D; Lévesque, Linda E; Cadarette, Suzanne M

    2015-11-01

    To examine the performance of the disease risk score (DRS) in a cohort study with evidence of policy-induced selection bias. We examined two cohorts of new users of bisphosphonates. Estimates for 1-year hip fracture rates between agents using DRS, exposure propensity scores and traditional multivariable analysis were compared. The results for the cohort with no evidence of policy-induced selection bias showed little variation across analyses (-4.1-2.0%). Analysis of the cohort with evidence of policy-induced selection bias showed greater variation (-13.5-8.1%), with the greatest difference seen with DRS analyses. Our findings suggest that caution may be warranted when using DRS methods in cohort studies with policy-induced selection bias, further research is needed.

  15. A Hierarchical Multivariate Bayesian Approach to Ensemble Model output Statistics in Atmospheric Prediction

    DTIC Science & Technology

    2017-09-01

    efficacy of statistical post-processing methods downstream of these dynamical model components with a hierarchical multivariate Bayesian approach to...Bayesian hierarchical modeling, Markov chain Monte Carlo methods , Metropolis algorithm, machine learning, atmospheric prediction 15. NUMBER OF PAGES...scale processes. However, this dissertation explores the efficacy of statistical post-processing methods downstream of these dynamical model components

  16. The Covariance Adjustment Approaches for Combining Incomparable Cox Regressions Caused by Unbalanced Covariates Adjustment: A Multivariate Meta-Analysis Study.

    PubMed

    Dehesh, Tania; Zare, Najaf; Ayatollahi, Seyyed Mohammad Taghi

    2015-01-01

    Univariate meta-analysis (UM) procedure, as a technique that provides a single overall result, has become increasingly popular. Neglecting the existence of other concomitant covariates in the models leads to loss of treatment efficiency. Our aim was proposing four new approximation approaches for the covariance matrix of the coefficients, which is not readily available for the multivariate generalized least square (MGLS) method as a multivariate meta-analysis approach. We evaluated the efficiency of four new approaches including zero correlation (ZC), common correlation (CC), estimated correlation (EC), and multivariate multilevel correlation (MMC) on the estimation bias, mean square error (MSE), and 95% probability coverage of the confidence interval (CI) in the synthesis of Cox proportional hazard models coefficients in a simulation study. Comparing the results of the simulation study on the MSE, bias, and CI of the estimated coefficients indicated that MMC approach was the most accurate procedure compared to EC, CC, and ZC procedures. The precision ranking of the four approaches according to all above settings was MMC ≥ EC ≥ CC ≥ ZC. This study highlights advantages of MGLS meta-analysis on UM approach. The results suggested the use of MMC procedure to overcome the lack of information for having a complete covariance matrix of the coefficients.

  17. Bias correction in the hierarchical likelihood approach to the analysis of multivariate survival data.

    PubMed

    Jeon, Jihyoun; Hsu, Li; Gorfine, Malka

    2012-07-01

    Frailty models are useful for measuring unobserved heterogeneity in risk of failures across clusters, providing cluster-specific risk prediction. In a frailty model, the latent frailties shared by members within a cluster are assumed to act multiplicatively on the hazard function. In order to obtain parameter and frailty variate estimates, we consider the hierarchical likelihood (H-likelihood) approach (Ha, Lee and Song, 2001. Hierarchical-likelihood approach for frailty models. Biometrika 88, 233-243) in which the latent frailties are treated as "parameters" and estimated jointly with other parameters of interest. We find that the H-likelihood estimators perform well when the censoring rate is low, however, they are substantially biased when the censoring rate is moderate to high. In this paper, we propose a simple and easy-to-implement bias correction method for the H-likelihood estimators under a shared frailty model. We also extend the method to a multivariate frailty model, which incorporates complex dependence structure within clusters. We conduct an extensive simulation study and show that the proposed approach performs very well for censoring rates as high as 80%. We also illustrate the method with a breast cancer data set. Since the H-likelihood is the same as the penalized likelihood function, the proposed bias correction method is also applicable to the penalized likelihood estimators.

  18. Cosmological Constraints from Fourier Phase Statistics

    NASA Astrophysics Data System (ADS)

    Ali, Kamran; Obreschkow, Danail; Howlett, Cullan; Bonvin, Camille; Llinares, Claudio; Oliveira Franco, Felipe; Power, Chris

    2018-06-01

    Most statistical inference from cosmic large-scale structure relies on two-point statistics, i.e. on the galaxy-galaxy correlation function (2PCF) or the power spectrum. These statistics capture the full information encoded in the Fourier amplitudes of the galaxy density field but do not describe the Fourier phases of the field. Here, we quantify the information contained in the line correlation function (LCF), a three-point Fourier phase correlation function. Using cosmological simulations, we estimate the Fisher information (at redshift z = 0) of the 2PCF, LCF and their combination, regarding the cosmological parameters of the standard ΛCDM model, as well as a Warm Dark Matter (WDM) model and the f(R) and Symmetron modified gravity models. The galaxy bias is accounted for at the level of a linear bias. The relative information of the 2PCF and the LCF depends on the survey volume, sampling density (shot noise) and the bias uncertainty. For a volume of 1h^{-3}Gpc^3, sampled with points of mean density \\bar{n} = 2× 10^{-3} h3 Mpc^{-3} and a bias uncertainty of 13%, the LCF improves the parameter constraints by about 20% in the ΛCDM cosmology and potentially even more in alternative models. Finally, since a linear bias only affects the Fourier amplitudes (2PCF), but not the phases (LCF), the combination of the 2PCF and the LCF can be used to break the degeneracy between the linear bias and σ8, present in 2-point statistics.

  19. Multivariate meta-analysis: potential and promise.

    PubMed

    Jackson, Dan; Riley, Richard; White, Ian R

    2011-09-10

    The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day 'Multivariate meta-analysis' event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd.

  20. Multivariate meta-analysis: Potential and promise

    PubMed Central

    Jackson, Dan; Riley, Richard; White, Ian R

    2011-01-01

    The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day ‘Multivariate meta-analysis’ event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd. PMID:21268052

  1. A New Look at Bias in Aptitude Tests.

    ERIC Educational Resources Information Center

    Scheuneman, Janice Dowd

    1981-01-01

    Statistical bias in measurement and ethnic-group bias in testing are discussed, reviewing predictive and construct validity studies. Item bias is reconceptualized to include distance of item content from respondent's experience. Differing values of mean and standard deviation for bias parameter are analyzed in a simulation. References are…

  2. Does RAIM with Correct Exclusion Produce Unbiased Positions?

    PubMed Central

    Teunissen, Peter J. G.; Imparato, Davide; Tiberius, Christian C. J. M.

    2017-01-01

    As the navigation solution of exclusion-based RAIM follows from a combination of least-squares estimation and a statistically based exclusion-process, the computation of the integrity of the navigation solution has to take the propagated uncertainty of the combined estimation-testing procedure into account. In this contribution, we analyse, theoretically as well as empirically, the effect that this combination has on the first statistical moment, i.e., the mean, of the computed navigation solution. It will be shown, although statistical testing is intended to remove biases from the data, that biases will always remain under the alternative hypothesis, even when the correct alternative hypothesis is properly identified. The a posteriori exclusion of a biased satellite range from the position solution will therefore never remove the bias in the position solution completely. PMID:28672862

  3. Impact of spurious shear on cosmological parameter estimates from weak lensing observables

    DOE PAGES

    Petri, Andrea; May, Morgan; Haiman, Zoltán; ...

    2014-12-30

    We research, residual errors in shear measurements, after corrections for instrument systematics and atmospheric effects, can impact cosmological parameters derived from weak lensing observations. Here we combine convergence maps from our suite of ray-tracing simulations with random realizations of spurious shear. This allows us to quantify the errors and biases of the triplet (Ω m,w,σ 8) derived from the power spectrum (PS), as well as from three different sets of non-Gaussian statistics of the lensing convergence field: Minkowski functionals (MFs), low-order moments (LMs), and peak counts (PKs). Our main results are as follows: (i) We find an order of magnitudemore » smaller biases from the PS than in previous work. (ii) The PS and LM yield biases much smaller than the morphological statistics (MF, PK). (iii) For strictly Gaussian spurious shear with integrated amplitude as low as its current estimate of σ sys 2 ≈ 10 -7, biases from the PS and LM would be unimportant even for a survey with the statistical power of Large Synoptic Survey Telescope. However, we find that for surveys larger than ≈ 100 deg 2, non-Gaussianity in the noise (not included in our analysis) will likely be important and must be quantified to assess the biases. (iv) The morphological statistics (MF, PK) introduce important biases even for Gaussian noise, which must be corrected in large surveys. The biases are in different directions in (Ωm,w,σ8) parameter space, allowing self-calibration by combining multiple statistics. Our results warrant follow-up studies with more extensive lensing simulations and more accurate spurious shear estimates.« less

  4. Statistical bias correction method applied on CMIP5 datasets over the Indian region during the summer monsoon season for climate change applications

    NASA Astrophysics Data System (ADS)

    Prasanna, V.

    2018-01-01

    This study makes use of temperature and precipitation from CMIP5 climate model output for climate change application studies over the Indian region during the summer monsoon season (JJAS). Bias correction of temperature and precipitation from CMIP5 GCM simulation results with respect to observation is discussed in detail. The non-linear statistical bias correction is a suitable bias correction method for climate change data because it is simple and does not add up artificial uncertainties to the impact assessment of climate change scenarios for climate change application studies (agricultural production changes) in the future. The simple statistical bias correction uses observational constraints on the GCM baseline, and the projected results are scaled with respect to the changing magnitude in future scenarios, varying from one model to the other. Two types of bias correction techniques are shown here: (1) a simple bias correction using a percentile-based quantile-mapping algorithm and (2) a simple but improved bias correction method, a cumulative distribution function (CDF; Weibull distribution function)-based quantile-mapping algorithm. This study shows that the percentile-based quantile mapping method gives results similar to the CDF (Weibull)-based quantile mapping method, and both the methods are comparable. The bias correction is applied on temperature and precipitation variables for present climate and future projected data to make use of it in a simple statistical model to understand the future changes in crop production over the Indian region during the summer monsoon season. In total, 12 CMIP5 models are used for Historical (1901-2005), RCP4.5 (2005-2100), and RCP8.5 (2005-2100) scenarios. The climate index from each CMIP5 model and the observed agricultural yield index over the Indian region are used in a regression model to project the changes in the agricultural yield over India from RCP4.5 and RCP8.5 scenarios. The results revealed a better convergence of model projections in the bias corrected data compared to the uncorrected data. The study can be extended to localized regional domains aimed at understanding the changes in the agricultural productivity in the future with an agro-economy or a simple statistical model. The statistical model indicated that the total food grain yield is going to increase over the Indian region in the future, the increase in the total food grain yield is approximately 50 kg/ ha for the RCP4.5 scenario from 2001 until the end of 2100, and the increase in the total food grain yield is approximately 90 kg/ha for the RCP8.5 scenario from 2001 until the end of 2100. There are many studies using bias correction techniques, but this study applies the bias correction technique to future climate scenario data from CMIP5 models and applied it to crop statistics to find future crop yield changes over the Indian region.

  5. mvMapper: statistical and geographical data exploration and visualization of multivariate analysis of population structure

    USDA-ARS?s Scientific Manuscript database

    Characterizing population genetic structure across geographic space is a fundamental challenge in population genetics. Multivariate statistical analyses are powerful tools for summarizing genetic variability, but geographic information and accompanying metadata is not always easily integrated into t...

  6. Modelling multiple sources of dissemination bias in meta-analysis.

    PubMed

    Bowden, Jack; Jackson, Dan; Thompson, Simon G

    2010-03-30

    Asymmetry in the funnel plot for a meta-analysis suggests the presence of dissemination bias. This may be caused by publication bias through the decisions of journal editors, by selective reporting of research results by authors or by a combination of both. Typically, study results that are statistically significant or have larger estimated effect sizes are more likely to appear in the published literature, hence giving a biased picture of the evidence-base. Previous statistical approaches for addressing dissemination bias have assumed only a single selection mechanism. Here we consider a more realistic scenario in which multiple dissemination processes, involving both the publishing authors and journals, are operating. In practical applications, the methods can be used to provide sensitivity analyses for the potential effects of multiple dissemination biases operating in meta-analysis.

  7. Statistical treatment for the wet bias in tree-ring chronologies: A case study from the InteriorWest, USA

    Treesearch

    Yan Sun; Matthew F. Bekker; R. Justin DeRose; Roger Kjelgren; S. -Y. Simon Wang

    2017-01-01

    Dendroclimatic research has long assumed a linear relationship between tree-ring increment and climate variables. However, ring width frequently underestimates extremely wet years, a phenomenon we refer to as ‘wet bias’. In this paper, we present statistical evidence for wet bias that is obscured by the assumption of linearity. To improve tree-ring-climate modeling, we...

  8. Free Energy Computations by Minimization of Kullback-Leibler Divergence: An Efficient Adaptive Biasing Potential Method for Sparse Representations

    DTIC Science & Technology

    2011-10-14

    landscapes. It is motivated by statistical learning arguments and unifies the tasks of biasing the molecular dynamics to escape free energy wells and...statistical learning arguments and unifies the tasks of biasing the molecular dynamics to escape free energy wells and estimating the free energy...experimentally, to characterize global changes as well as investigate relative stabilities. In most applications, a brute- force computation based on

  9. Spatial scale and distribution of neurovascular signals underlying decoding of orientation and eye of origin from fMRI data

    PubMed Central

    Harrison, Charlotte; Jackson, Jade; Oh, Seung-Mock; Zeringyte, Vaida

    2016-01-01

    Multivariate pattern analysis of functional magnetic resonance imaging (fMRI) data is widely used, yet the spatial scales and origin of neurovascular signals underlying such analyses remain unclear. We compared decoding performance for stimulus orientation and eye of origin from fMRI measurements in human visual cortex with predictions based on the columnar organization of each feature and estimated the spatial scales of patterns driving decoding. Both orientation and eye of origin could be decoded significantly above chance in early visual areas (V1–V3). Contrary to predictions based on a columnar origin of response biases, decoding performance for eye of origin in V2 and V3 was not significantly lower than that in V1, nor did decoding performance for orientation and eye of origin differ significantly. Instead, response biases for both features showed large-scale organization, evident as a radial bias for orientation, and a nasotemporal bias for eye preference. To determine whether these patterns could drive classification, we quantified the effect on classification performance of binning voxels according to visual field position. Consistent with large-scale biases driving classification, binning by polar angle yielded significantly better decoding performance for orientation than random binning in V1–V3. Similarly, binning by hemifield significantly improved decoding performance for eye of origin. Patterns of orientation and eye preference bias in V2 and V3 showed a substantial degree of spatial correlation with the corresponding patterns in V1, suggesting that response biases in these areas originate in V1. Together, these findings indicate that multivariate classification results need not reflect the underlying columnar organization of neuronal response selectivities in early visual areas. NEW & NOTEWORTHY Large-scale response biases can account for decoding of orientation and eye of origin in human early visual areas V1–V3. For eye of origin this pattern is a nasotemporal bias; for orientation it is a radial bias. Differences in decoding performance across areas and stimulus features are not well predicted by differences in columnar-scale organization of each feature. Large-scale biases in extrastriate areas are spatially correlated with those in V1, suggesting biases originate in primary visual cortex. PMID:27903637

  10. Multivariate Strategies in Functional Magnetic Resonance Imaging

    ERIC Educational Resources Information Center

    Hansen, Lars Kai

    2007-01-01

    We discuss aspects of multivariate fMRI modeling, including the statistical evaluation of multivariate models and means for dimensional reduction. In a case study we analyze linear and non-linear dimensional reduction tools in the context of a "mind reading" predictive multivariate fMRI model.

  11. A procedure for linking psychosocial job characteristics data to health surveys.

    PubMed Central

    Schwartz, J E; Pieper, C F; Karasek, R A

    1988-01-01

    A system is presented for linking information about psychosocial characteristics of job situations to national health surveys. Job information can be imputed to individuals on surveys that contain three-digit US Census occupation codes. Occupational mean scores on psychosocial job characteristics-control over task situation (decision latitude), psychological work load, physical exertion, and other measures-for the linkage system are derived from US national surveys of working conditions (Quality of Employment Surveys 1969, 1972, and 1977). This paper discusses a new method for reducing the biases in multivariate analyses that are likely to arise when utilizing linkage systems based on mean scores. Such biases are reduced by modifying the linkage system to adjust imputed individual scores for demographic factors such as age, education, race, marital status and, implicitly, sex (since men and women have separate linkage data bases). Statistics on the linkage system's efficiency and reliability are reported. All dimensions have high inter-survey reproducibility. Despite their psychosocial nature, decision latitude and physical exertion can be more efficiently imputed with the linkage system than earnings (a non-psychosocial job characteristic). The linkage system presented here is a useful tool for initial epidemiological studies of the consequences of psychosocial job characteristics and constitutes the methodological basis for the subsequent paper. PMID:3389426

  12. Systematic Biases in Parameter Estimation of Binary Black-Hole Mergers

    NASA Technical Reports Server (NTRS)

    Littenberg, Tyson B.; Baker, John G.; Buonanno, Alessandra; Kelly, Bernard J.

    2012-01-01

    Parameter estimation of binary-black-hole merger events in gravitational-wave data relies on matched filtering techniques, which, in turn, depend on accurate model waveforms. Here we characterize the systematic biases introduced in measuring astrophysical parameters of binary black holes by applying the currently most accurate effective-one-body templates to simulated data containing non-spinning numerical-relativity waveforms. For advanced ground-based detectors, we find that the systematic biases are well within the statistical error for realistic signal-to-noise ratios (SNR). These biases grow to be comparable to the statistical errors at high signal-to-noise ratios for ground-based instruments (SNR approximately 50) but never dominate the error budget. At the much larger signal-to-noise ratios expected for space-based detectors, these biases will become large compared to the statistical errors but are small enough (at most a few percent in the black-hole masses) that we expect they should not affect broad astrophysical conclusions that may be drawn from the data.

  13. Short-term Outcomes After Open and Laparoscopic Colostomy Creation.

    PubMed

    Ivatury, Srinivas Joga; Bostock Rosenzweig, Ian C; Holubar, Stefan D

    2016-06-01

    Colostomy creation is a common procedure performed in colon and rectal surgery. Outcomes by technique have not been well studied. This study evaluated outcomes related to open versus laparoscopic colostomy creation. This was a retrospective review of patients undergoing colostomy creation using univariate and multivariate propensity score analyses. Hospitals participating in the American College of Surgeons National Surgical Quality Improvement Program database were included. Data on patients were obtained from the American College of Surgeons National Surgical Quality Improvement Program 2005-2011 Participant Use Data Files. We measured 30-day mortality, 30-day complications, and predictors of 30-day mortality. A total of 2179 subjects were in the open group and 1132 in the laparoscopic group. The open group had increased age (open, 64 years vs laparoscopic, 60 years), admission from facility (17.0% vs 14.9%), and disseminated cancer (26.1% vs 21.4%). All were statistically significant. The open group had a significantly higher percentage of emergency operations (24.9% vs 7.9%). Operative time was statistically different (81 vs 86 minutes). Thirty-day mortality was significantly higher in the open group (8.7% vs 3.5%), as was any 30-day complication (25.4% vs 17.0%). Propensity-matching analysis on elective patients only revealed that postoperative length of stay and rate of any wound complication were statistically higher in the open group. Multivariate analysis for mortality was performed on the full, elective, and propensity-matched cohorts; age >65 years and dependent functional status were associated with an increased risk of mortality in all of the models. This study has the potential for selection bias and limited generalizability. Colostomy creation at American College of Surgeons National Surgical Quality Improvement Program hospitals is more commonly performed open rather than laparoscopically. Patient age >65 years and dependent functional status are associated with an increased risk of 30-day mortality.

  14. Association of unconscious race and social class bias with vignette-based clinical assessments by medical students.

    PubMed

    Haider, Adil H; Sexton, Janel; Sriram, N; Cooper, Lisa A; Efron, David T; Swoboda, Sandra; Villegas, Cassandra V; Haut, Elliott R; Bonds, Morgan; Pronovost, Peter J; Lipsett, Pamela A; Freischlag, Julie A; Cornwell, Edward E

    2011-09-07

    Studies involving physicians suggest that unconscious bias may be related to clinical decision making and may predict poor patient-physician interaction. The presence of unconscious race and social class bias and its association with clinical assessments or decision making among medical students is unknown. To estimate unconscious race and social class bias among first-year medical students and investigate its relationship with assessments made during clinical vignettes. A secure Web-based survey was administered to 211 medical students entering classes at Johns Hopkins School of Medicine, Baltimore, Maryland, in August 2009 and August 2010. The survey included the Implicit Association Test (IAT) to assess unconscious preferences, direct questions regarding students' explicit race and social class preferences, and 8 clinical assessment vignettes focused on pain assessment, informed consent, patient reliability, and patient trust. Adjusting for student demographics, multiple logistic regression was used to determine whether responses to the vignettes were associated with unconscious race or social class preferences. Association of scores on an established IAT for race and a novel IAT for social class with vignette responses. Among the 202 students who completed the survey, IAT responses were consistent with an implicit preference toward white persons among 140 students (69%, 95% CI, 61%-75%). Responses were consistent with a preference toward those in the upper class among 174 students (86%, 95% CI, 80%-90%). Assessments generally did not vary by patient race or occupation, and multivariable analyses for all vignettes found no significant relationship between implicit biases and clinical assessments. Regression coefficient for the association between pain assessment and race IAT scores was -0.49 (95% CI, -1.00 to 0.03) and for social class, the coefficient was -0.04 (95% CI, -0.50 to 0.41). Adjusted odds ratios for other vignettes ranged from 0.69 to 3.03 per unit change in IAT score, but none were statistically significant. Analysis stratified by vignette patient race or class status yielded similarly negative results. Tests for interactions between patient race or class status and student IAT D scores in predicting clinical assessments were not statistically significant. The majority of first-year medical students at a single school had IAT scores consistent with implicit preference for white persons and possibly for those in the upper class. However, overall vignette-based clinical assessments were not associated with patient race or occupation, and no association existed between implicit preferences and the assessments.

  15. Association of Unconscious Race and Social Class Bias With Vignette-Based Clinical Assessments by Medical Students

    PubMed Central

    Haider, Adil H.; Sexton, Janel; Sriram, N.; Cooper, Lisa A.; Efron, David T.; Swoboda, Sandra; Villegas, Cassandra V.; Haut, Elliott R.; Bonds, Morgan; Pronovost, Peter J.; Lipsett, Pamela A.; Freischlag, Julie A.; Cornwell, Edward E.

    2012-01-01

    Context Studies involving physicians suggest that unconscious bias may be related to clinical decision making and may predict poor patient-physician interaction. The presence of unconscious race and social class bias and its association with clinical assessments or decision making among medical students is unknown. Objective To estimate unconscious race and social class bias among first-year medical students and investigate its relationship with assessments made during clinical vignettes. Design, Setting, and Participants A secure Web-based survey was administered to 211 medical students entering classes at Johns Hopkins School of Medicine, Baltimore, Maryland, in August 2009 and August 2010. The survey included the Implicit Association Test (IAT) to assess unconscious preferences, direct questions regarding students’ explicit race and social class preferences, and 8 clinical assessment vignettes focused on pain assessment, informed consent, patient reliability, and patient trust. Adjusting for student demographics, multiple logistic regression was used to determine whether responses to the vignettes were associated with unconscious race or social class preferences. Main Outcome Measures Association of scores on an established IAT for race and a novel IAT for social class with vignette responses. Results Among the 202 students who completed the survey, IAT responses were consistent with an implicit preference toward white persons among 140 students (69%, 95% CI, 61%–75%). Responses were consistent with a preference toward those in the upper class among 174 students (86%, 95% CI, 80%–90%). Assessments generally did not vary by patient race or occupation, and multivariable analyses for all vignettes found no significant relationship between implicit biases and clinical assessments. Regression coefficient for the association between pain assessment and race IAT scores was −0.49 (95% CI, −1.00 to 0.03) and for social class, the coefficient was −0.04 (95% CI, −0.50 to 0.41). Adjusted odds ratios for other vignettes ranged from 0.69 to 3.03 per unit change in IAT score, but none were statistically significant. Analysis stratified by vignette patient race or class status yielded similarly negative results. Tests for interactions between patient race or class status and student IAT D scores in predicting clinical assessments were not statistically significant. Conclusions The majority of first-year medical students at a single school had IAT scores consistent with implicit preference for white persons and possibly for those in the upper class. However, overall vignette-based clinical assessments were not associated with patient race or occupation, and no association existed between implicit preferences and the assessments. PMID:21900134

  16. Comparison of projection skills of deterministic ensemble methods using pseudo-simulation data generated from multivariate Gaussian distribution

    NASA Astrophysics Data System (ADS)

    Oh, Seok-Geun; Suh, Myoung-Seok

    2017-07-01

    The projection skills of five ensemble methods were analyzed according to simulation skills, training period, and ensemble members, using 198 sets of pseudo-simulation data (PSD) produced by random number generation assuming the simulated temperature of regional climate models. The PSD sets were classified into 18 categories according to the relative magnitude of bias, variance ratio, and correlation coefficient, where each category had 11 sets (including 1 truth set) with 50 samples. The ensemble methods used were as follows: equal weighted averaging without bias correction (EWA_NBC), EWA with bias correction (EWA_WBC), weighted ensemble averaging based on root mean square errors and correlation (WEA_RAC), WEA based on the Taylor score (WEA_Tay), and multivariate linear regression (Mul_Reg). The projection skills of the ensemble methods improved generally as compared with the best member for each category. However, their projection skills are significantly affected by the simulation skills of the ensemble member. The weighted ensemble methods showed better projection skills than non-weighted methods, in particular, for the PSD categories having systematic biases and various correlation coefficients. The EWA_NBC showed considerably lower projection skills than the other methods, in particular, for the PSD categories with systematic biases. Although Mul_Reg showed relatively good skills, it showed strong sensitivity to the PSD categories, training periods, and number of members. On the other hand, the WEA_Tay and WEA_RAC showed relatively superior skills in both the accuracy and reliability for all the sensitivity experiments. This indicates that WEA_Tay and WEA_RAC are applicable even for simulation data with systematic biases, a short training period, and a small number of ensemble members.

  17. A Civilian/Military Trauma Institute: National Trauma Coordinating Center

    DTIC Science & Technology

    2015-12-01

    zip codes was used in “proximity to violence” analysis. Data were analyzed using SPSS (version 20.0, SPSS Inc., Chicago, IL). Multivariable linear...number of adverse events and serious events was not statistically higher in one group, the incidence of deep venous thrombosis (DVT) was statistically ...subjects the lack of statistical difference on multivariate analysis may be related to an underpowered sample size. It was recommended that the

  18. Implicit and explicit weight bias in a national sample of 4,732 medical students: the medical student CHANGES study.

    PubMed

    Phelan, Sean M; Dovidio, John F; Puhl, Rebecca M; Burgess, Diana J; Nelson, David B; Yeazel, Mark W; Hardeman, Rachel; Perry, Sylvia; van Ryn, Michelle

    2014-04-01

    To examine the magnitude of explicit and implicit weight biases compared to biases against other groups; and identify student factors predicting bias in a large national sample of medical students. A web-based survey was completed by 4,732 1st year medical students from 49 medical schools as part of a longitudinal study of medical education. The survey included a validated measure of implicit weight bias, the implicit association test, and 2 measures of explicit bias: a feeling thermometer and the anti-fat attitudes test. A majority of students exhibited implicit (74%) and explicit (67%) weight bias. Implicit weight bias scores were comparable to reported bias against racial minorities. Explicit attitudes were more negative toward obese people than toward racial minorities, gays, lesbians, and poor people. In multivariate regression models, implicit and explicit weight bias was predicted by lower BMI, male sex, and non-Black race. Either implicit or explicit bias was also predicted by age, SES, country of birth, and specialty choice. Implicit and explicit weight bias is common among 1st year medical students, and varies across student factors. Future research should assess implications of biases and test interventions to reduce their impact. Copyright © 2013 The Obesity Society.

  19. A new test of multivariate nonlinear causality

    PubMed Central

    Bai, Zhidong; Jiang, Dandan; Lv, Zhihui; Wong, Wing-Keung; Zheng, Shurong

    2018-01-01

    The multivariate nonlinear Granger causality developed by Bai et al. (2010) (Mathematics and Computers in simulation. 2010; 81: 5-17) plays an important role in detecting the dynamic interrelationships between two groups of variables. Following the idea of Hiemstra-Jones (HJ) test proposed by Hiemstra and Jones (1994) (Journal of Finance. 1994; 49(5): 1639-1664), they attempt to establish a central limit theorem (CLT) of their test statistic by applying the asymptotical property of multivariate U-statistic. However, Bai et al. (2016) (2016; arXiv: 1701.03992) revisit the HJ test and find that the test statistic given by HJ is NOT a function of U-statistics which implies that the CLT neither proposed by Hiemstra and Jones (1994) nor the one extended by Bai et al. (2010) is valid for statistical inference. In this paper, we re-estimate the probabilities and reestablish the CLT of the new test statistic. Numerical simulation shows that our new estimates are consistent and our new test performs decent size and power. PMID:29304085

  20. A new test of multivariate nonlinear causality.

    PubMed

    Bai, Zhidong; Hui, Yongchang; Jiang, Dandan; Lv, Zhihui; Wong, Wing-Keung; Zheng, Shurong

    2018-01-01

    The multivariate nonlinear Granger causality developed by Bai et al. (2010) (Mathematics and Computers in simulation. 2010; 81: 5-17) plays an important role in detecting the dynamic interrelationships between two groups of variables. Following the idea of Hiemstra-Jones (HJ) test proposed by Hiemstra and Jones (1994) (Journal of Finance. 1994; 49(5): 1639-1664), they attempt to establish a central limit theorem (CLT) of their test statistic by applying the asymptotical property of multivariate U-statistic. However, Bai et al. (2016) (2016; arXiv: 1701.03992) revisit the HJ test and find that the test statistic given by HJ is NOT a function of U-statistics which implies that the CLT neither proposed by Hiemstra and Jones (1994) nor the one extended by Bai et al. (2010) is valid for statistical inference. In this paper, we re-estimate the probabilities and reestablish the CLT of the new test statistic. Numerical simulation shows that our new estimates are consistent and our new test performs decent size and power.

  1. Applied Statistics: From Bivariate through Multivariate Techniques [with CD-ROM

    ERIC Educational Resources Information Center

    Warner, Rebecca M.

    2007-01-01

    This book provides a clear introduction to widely used topics in bivariate and multivariate statistics, including multiple regression, discriminant analysis, MANOVA, factor analysis, and binary logistic regression. The approach is applied and does not require formal mathematics; equations are accompanied by verbal explanations. Students are asked…

  2. Do Orthopaedic Surgeons Acknowledge Uncertainty?

    PubMed

    Teunis, Teun; Janssen, Stein; Guitton, Thierry G; Ring, David; Parisien, Robert

    2016-06-01

    Much of the decision-making in orthopaedics rests on uncertain evidence. Uncertainty is therefore part of our normal daily practice, and yet physician uncertainty regarding treatment could diminish patients' health. It is not known if physician uncertainty is a function of the evidence alone or if other factors are involved. With added experience, uncertainty could be expected to diminish, but perhaps more influential are things like physician confidence, belief in the veracity of what is published, and even one's religious beliefs. In addition, it is plausible that the kind of practice a physician works in can affect the experience of uncertainty. Practicing physicians may not be immediately aware of these effects on how uncertainty is experienced in their clinical decision-making. We asked: (1) Does uncertainty and overconfidence bias decrease with years of practice? (2) What sociodemographic factors are independently associated with less recognition of uncertainty, in particular belief in God or other deity or deities, and how is atheism associated with recognition of uncertainty? (3) Do confidence bias (confidence that one's skill is greater than it actually is), degree of trust in the orthopaedic evidence, and degree of statistical sophistication correlate independently with recognition of uncertainty? We created a survey to establish an overall recognition of uncertainty score (four questions), trust in the orthopaedic evidence base (four questions), confidence bias (three questions), and statistical understanding (six questions). Seven hundred six members of the Science of Variation Group, a collaboration that aims to study variation in the definition and treatment of human illness, were approached to complete our survey. This group represents mainly orthopaedic surgeons specializing in trauma or hand and wrist surgery, practicing in Europe and North America, of whom the majority is involved in teaching. Approximately half of the group has more than 10 years of experience. Two hundred forty-two (34%) members completed the survey. We found no differences between responders and nonresponders. Each survey item measured its own trait better than any of the other traits. Recognition of uncertainty (0.70) and confidence bias (0.75) had relatively high Cronbach alpha levels, meaning that the questions making up these traits are closely related and probably measure the same construct. This was lower for statistical understanding (0.48) and trust in the orthopaedic evidence base (0.37). Subsequently, combining each trait's individual questions, we calculated a 0 to 10 score for each trait. The mean recognition of uncertainty score was 3.2 ± 1.4. Recognition of uncertainty in daily practice did not vary by years in practice (0-5 years, 3.2 ± 1.3; 6-10 years, 2.9 ± 1.3; 11-20 years, 3.2 ± 1.4; 21-30 years, 3.3 ± 1.6 years; p = 0.51), but overconfidence bias did correlate with years in practice (0-5 years, 6.2 ± 1.4; 6-10 years, 7.1 ± 1.3; 11-20 years, 7.4 ± 1.4; 21-30 years, 7.1 ± 1.2 years; p < 0.001). Accounting for a potential interaction of variables using multivariable analysis, less recognition of uncertainty was independently but weakly associated with working in a multispecialty group compared with academic practice (β regression coefficient, -0.53; 95% confidence interval [CI], -1.0 to -0.055; partial R(2), 0.021; p = 0.029), belief in God or any other deity/deities (β, -0.57; 95% CI, -1.0 to -0.11; partial R(2), 0.026; p = 0.015), greater confidence bias (β, -0.26; 95% CI, -0.37 to -0.14; partial R(2), 0.084; p < 0.001), and greater trust in the orthopaedic evidence base (β, -0.16; 95% CI, -0.26 to -0.058; partial R(2), 0.040; p = 0.002). Better statistical understanding was independently, and more strongly, associated with greater recognition of uncertainty (β, 0.25; 95% CI, 0.17-0.34; partial R(2), 0.13; p < 0.001). Our full model accounted for 29% of the variability in recognition of uncertainty (adjusted R(2), 0.29). The relatively low levels of uncertainty among orthopaedic surgeons and confidence bias seem inconsistent with the paucity of definitive evidence. If patients want to be informed of the areas of uncertainty and surgeon-to-surgeon variation relevant to their care, it seems possible that a low recognition of uncertainty and surgeon confidence bias might hinder adequately informing patients, informed decisions, and consent. Moreover, limited recognition of uncertainty is associated with modifiable factors such as confidence bias, trust in orthopaedic evidence base, and statistical understanding. Perhaps improved statistical teaching in residency, journal clubs to improve the critique of evidence and awareness of bias, and acknowledgment of knowledge gaps at courses and conferences might create awareness about existing uncertainties. Level 1, prognostic study.

  3. Multivariate Statistical Postprocessing of Ensemble Forcasts of Precipitation and Temperature over four River Basins in California

    NASA Astrophysics Data System (ADS)

    Scheuerer, Michael; Hamill, Thomas M.; Whitin, Brett; He, Minxue; Henkel, Arthur

    2017-04-01

    Hydrological forecasts strongly rely on predictions of precipitation amounts and temperature as meteorological inputs to hydrological models. Ensemble weather predictions provide a number of different scenarios that reflect the uncertainty about these meteorological inputs, but are often biased and underdispersive, and therefore require statistical postprocessing. In hydrological applications it is crucial that spatial and temporal (i.e. between different forecast lead times) dependencies as well as dependence between the two weather variables is adequately represented by the recalibrated forecasts. We present a study with temperature and precipitation forecasts over four river basins over California that are postprocessed with a variant of the nonhomogeneous Gaussian regression method (Gneiting et al., 2005) and the censored, shifted gamma distribution approach (Scheuerer and Hamill, 2015) respectively. For modelling spatial, temporal and inter-variable dependence we propose a variant of the Schaake Shuffle (Clark et al., 2005) that uses spatio-temporal trajectories of observed temperture and precipitation as a dependence template, and chooses the historic dates in such a way that the divergence between the marginal distributions of these trajectories and the univariate forecast distributions is minimized. For the four river basins considered in our study, this new multivariate modelling technique consistently improves upon the Schaake Shuffle and yields reliable spatio-temporal forecast trajectories of temperature and precipitation that can be used to force hydrological forecast systems. References: Clark, M., Gangopadhyay, S., Hay, L., Rajagopalan, B., Wilby, R., 2004. The Schaake Shuffle: A method for reconstructing space-time variability in forecasted precipitation and temperature fields. Journal of Hydrometeorology, 5, pp.243-262. Gneiting, T., Raftery, A.E., Westveld, A.H., Goldman, T., 2005. Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS. Monthly Weather Review, 133, pp.1098-1118. Scheuerer, M., Hamill, T.M., 2015. Statistical postprocessing of ensemble precipitation forecasts by fitting censored, shifted gamma distributions. Monthly Weather Review, 143, pp.4578-4596. Scheuerer, M., Hamill, T.M., Whitin, B., He, M., and Henkel, A., 2016: A method for preferential selection of dates in the Schaake shuffle approach to constructing spatio-temporal forecast fields of temperature and precipitation. Water Resources Research, submitted.

  4. Acoustic neuroma: potential risk factors and audiometric surveillance in the aluminium industry

    PubMed Central

    Taiwo, Oyebode; Galusha, Deron; Tessier-Sherman, Baylah; Kirsche, Sharon; Cantley, Linda; Slade, Martin D; Cullen, Mark R; Donoghue, A Michael

    2014-01-01

    Objectives To look for an association between acoustic neuroma (AN) and participation in a hearing conservation programme (HCP) and also for an association between AN and possible occupational risk factors in the aluminium industry. Methods We conducted a case–control analysis of a population of US aluminium production workers in 8 smelters and 43 other plants. Using insurance claims data, 97 cases of AN were identified between 1996 and 2009. Each was matched with four controls. Covariates included participation in a HCP, working in an aluminium smelter, working in an electrical job and hearing loss. Results In the bivariate analyses, covariates associated with AN were participation in the HCP (OR=1.72; 95% CI 1.09 to 2.69) and smelter work (OR=1.88; 95% CI 1.06 to 3.36). Electrical work was not significant (OR=1.60; 95% CI 0.65 to 3.94). Owing to high participation in the HCP in smelters, multivariate subanalyses were required. In the multivariate analyses, participation in the HCP was the only statistically significant risk factor for AN. In the multivariate analysis restricted to employees not working in a smelter, the OR was 1.81 (95% CI 1.04 to 3.17). Hearing loss, an indirect measure of in-ear noise dose, was not predictive of AN. Conclusions Our results suggest the incidental detection of previously undiagnosed tumours in workers who participated in the company-sponsored HCP. The increased medical surveillance among this population of workers most likely introduced detection bias, leading to the identification of AN cases that would have otherwise remained undetected. PMID:25015928

  5. Application of multivariate Gaussian detection theory to known non-Gaussian probability density functions

    NASA Astrophysics Data System (ADS)

    Schwartz, Craig R.; Thelen, Brian J.; Kenton, Arthur C.

    1995-06-01

    A statistical parametric multispectral sensor performance model was developed by ERIM to support mine field detection studies, multispectral sensor design/performance trade-off studies, and target detection algorithm development. The model assumes target detection algorithms and their performance models which are based on data assumed to obey multivariate Gaussian probability distribution functions (PDFs). The applicability of these algorithms and performance models can be generalized to data having non-Gaussian PDFs through the use of transforms which convert non-Gaussian data to Gaussian (or near-Gaussian) data. An example of one such transform is the Box-Cox power law transform. In practice, such a transform can be applied to non-Gaussian data prior to the introduction of a detection algorithm that is formally based on the assumption of multivariate Gaussian data. This paper presents an extension of these techniques to the case where the joint multivariate probability density function of the non-Gaussian input data is known, and where the joint estimate of the multivariate Gaussian statistics, under the Box-Cox transform, is desired. The jointly estimated multivariate Gaussian statistics can then be used to predict the performance of a target detection algorithm which has an associated Gaussian performance model.

  6. Census Undercount and the Undercount of the Black Population.

    ERIC Educational Resources Information Center

    Nguyen, Phung

    1996-01-01

    Argues that blacks have been disproportionately undercounted in the census and this undercount has resulted in such biased statistics as proportions of black female-headed households, victimization rates of black males, or crude death rates of blacks. Further, these statistics are biased upward and downward, and together they generate distorted…

  7. Publication Bias: The Achilles' Heel of Systematic Reviews?

    ERIC Educational Resources Information Center

    Torgerson, Carole J.

    2006-01-01

    The term "publication bias" usually refers to the tendency for a greater proportion of statistically significant positive results of experiments to be published and, conversely, a greater proportion of statistically significant negative or null results not to be published. It is widely accepted in the fields of healthcare and psychological…

  8. Constructing diagnostic likelihood: clinical decisions using subjective versus statistical probability.

    PubMed

    Kinnear, John; Jackson, Ruth

    2017-07-01

    Although physicians are highly trained in the application of evidence-based medicine, and are assumed to make rational decisions, there is evidence that their decision making is prone to biases. One of the biases that has been shown to affect accuracy of judgements is that of representativeness and base-rate neglect, where the saliency of a person's features leads to overestimation of their likelihood of belonging to a group. This results in the substitution of 'subjective' probability for statistical probability. This study examines clinicians' propensity to make estimations of subjective probability when presented with clinical information that is considered typical of a medical condition. The strength of the representativeness bias is tested by presenting choices in textual and graphic form. Understanding of statistical probability is also tested by omitting all clinical information. For the questions that included clinical information, 46.7% and 45.5% of clinicians made judgements of statistical probability, respectively. Where the question omitted clinical information, 79.9% of clinicians made a judgement consistent with statistical probability. There was a statistically significant difference in responses to the questions with and without representativeness information (χ2 (1, n=254)=54.45, p<0.0001). Physicians are strongly influenced by a representativeness bias, leading to base-rate neglect, even though they understand the application of statistical probability. One of the causes for this representativeness bias may be the way clinical medicine is taught where stereotypic presentations are emphasised in diagnostic decision making. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  9. Large-scale galaxy bias

    NASA Astrophysics Data System (ADS)

    Jeong, Donghui; Desjacques, Vincent; Schmidt, Fabian

    2018-01-01

    Here, we briefly introduce the key results of the recent review (arXiv:1611.09787), whose abstract is as following. This review presents a comprehensive overview of galaxy bias, that is, the statistical relation between the distribution of galaxies and matter. We focus on large scales where cosmic density fields are quasi-linear. On these scales, the clustering of galaxies can be described by a perturbative bias expansion, and the complicated physics of galaxy formation is absorbed by a finite set of coefficients of the expansion, called bias parameters. The review begins with a detailed derivation of this very important result, which forms the basis of the rigorous perturbative description of galaxy clustering, under the assumptions of General Relativity and Gaussian, adiabatic initial conditions. Key components of the bias expansion are all leading local gravitational observables, which include the matter density but also tidal fields and their time derivatives. We hence expand the definition of local bias to encompass all these contributions. This derivation is followed by a presentation of the peak-background split in its general form, which elucidates the physical meaning of the bias parameters, and a detailed description of the connection between bias parameters and galaxy (or halo) statistics. We then review the excursion set formalism and peak theory which provide predictions for the values of the bias parameters. In the remainder of the review, we consider the generalizations of galaxy bias required in the presence of various types of cosmological physics that go beyond pressureless matter with adiabatic, Gaussian initial conditions: primordial non-Gaussianity, massive neutrinos, baryon-CDM isocurvature perturbations, dark energy, and modified gravity. Finally, we discuss how the description of galaxy bias in the galaxies' rest frame is related to clustering statistics measured from the observed angular positions and redshifts in actual galaxy catalogs.

  10. Large-scale galaxy bias

    NASA Astrophysics Data System (ADS)

    Desjacques, Vincent; Jeong, Donghui; Schmidt, Fabian

    2018-02-01

    This review presents a comprehensive overview of galaxy bias, that is, the statistical relation between the distribution of galaxies and matter. We focus on large scales where cosmic density fields are quasi-linear. On these scales, the clustering of galaxies can be described by a perturbative bias expansion, and the complicated physics of galaxy formation is absorbed by a finite set of coefficients of the expansion, called bias parameters. The review begins with a detailed derivation of this very important result, which forms the basis of the rigorous perturbative description of galaxy clustering, under the assumptions of General Relativity and Gaussian, adiabatic initial conditions. Key components of the bias expansion are all leading local gravitational observables, which include the matter density but also tidal fields and their time derivatives. We hence expand the definition of local bias to encompass all these contributions. This derivation is followed by a presentation of the peak-background split in its general form, which elucidates the physical meaning of the bias parameters, and a detailed description of the connection between bias parameters and galaxy statistics. We then review the excursion-set formalism and peak theory which provide predictions for the values of the bias parameters. In the remainder of the review, we consider the generalizations of galaxy bias required in the presence of various types of cosmological physics that go beyond pressureless matter with adiabatic, Gaussian initial conditions: primordial non-Gaussianity, massive neutrinos, baryon-CDM isocurvature perturbations, dark energy, and modified gravity. Finally, we discuss how the description of galaxy bias in the galaxies' rest frame is related to clustering statistics measured from the observed angular positions and redshifts in actual galaxy catalogs.

  11. Big Data Analytics for Scanning Transmission Electron Microscopy Ptychography

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jesse, S.; Chi, M.; Belianinov, A.

    Electron microscopy is undergoing a transition; from the model of producing only a few micrographs, through the current state where many images and spectra can be digitally recorded, to a new mode where very large volumes of data (movies, ptychographic and multi-dimensional series) can be rapidly obtained. In this paper, we discuss the application of so-called “big-data” methods to high dimensional microscopy data, using unsupervised multivariate statistical techniques, in order to explore salient image features in a specific example of BiFeO 3 domains. Remarkably, k-means clustering reveals domain differentiation despite the fact that the algorithm is purely statistical in naturemore » and does not require any prior information regarding the material, any coexisting phases, or any differentiating structures. While this is a somewhat trivial case, this example signifies the extraction of useful physical and structural information without any prior bias regarding the sample or the instrumental modality. Further interpretation of these types of results may still require human intervention. Finally, however, the open nature of this algorithm and its wide availability, enable broad collaborations and exploratory work necessary to enable efficient data analysis in electron microscopy.« less

  12. Big Data Analytics for Scanning Transmission Electron Microscopy Ptychography

    DOE PAGES

    Jesse, S.; Chi, M.; Belianinov, A.; ...

    2016-05-23

    Electron microscopy is undergoing a transition; from the model of producing only a few micrographs, through the current state where many images and spectra can be digitally recorded, to a new mode where very large volumes of data (movies, ptychographic and multi-dimensional series) can be rapidly obtained. In this paper, we discuss the application of so-called “big-data” methods to high dimensional microscopy data, using unsupervised multivariate statistical techniques, in order to explore salient image features in a specific example of BiFeO 3 domains. Remarkably, k-means clustering reveals domain differentiation despite the fact that the algorithm is purely statistical in naturemore » and does not require any prior information regarding the material, any coexisting phases, or any differentiating structures. While this is a somewhat trivial case, this example signifies the extraction of useful physical and structural information without any prior bias regarding the sample or the instrumental modality. Further interpretation of these types of results may still require human intervention. Finally, however, the open nature of this algorithm and its wide availability, enable broad collaborations and exploratory work necessary to enable efficient data analysis in electron microscopy.« less

  13. Big Data Analytics for Scanning Transmission Electron Microscopy Ptychography

    PubMed Central

    Jesse, S.; Chi, M.; Belianinov, A.; Beekman, C.; Kalinin, S. V.; Borisevich, A. Y.; Lupini, A. R.

    2016-01-01

    Electron microscopy is undergoing a transition; from the model of producing only a few micrographs, through the current state where many images and spectra can be digitally recorded, to a new mode where very large volumes of data (movies, ptychographic and multi-dimensional series) can be rapidly obtained. Here, we discuss the application of so-called “big-data” methods to high dimensional microscopy data, using unsupervised multivariate statistical techniques, in order to explore salient image features in a specific example of BiFeO3 domains. Remarkably, k-means clustering reveals domain differentiation despite the fact that the algorithm is purely statistical in nature and does not require any prior information regarding the material, any coexisting phases, or any differentiating structures. While this is a somewhat trivial case, this example signifies the extraction of useful physical and structural information without any prior bias regarding the sample or the instrumental modality. Further interpretation of these types of results may still require human intervention. However, the open nature of this algorithm and its wide availability, enable broad collaborations and exploratory work necessary to enable efficient data analysis in electron microscopy. PMID:27211523

  14. Mean template for tensor-based morphometry using deformation tensors.

    PubMed

    Leporé, Natasha; Brun, Caroline; Pennec, Xavier; Chou, Yi-Yu; Lopez, Oscar L; Aizenstein, Howard J; Becker, James T; Toga, Arthur W; Thompson, Paul M

    2007-01-01

    Tensor-based morphometry (TBM) studies anatomical differences between brain images statistically, to identify regions that differ between groups, over time, or correlate with cognitive or clinical measures. Using a nonlinear registration algorithm, all images are mapped to a common space, and statistics are most commonly performed on the Jacobian determinant (local expansion factor) of the deformation fields. In, it was shown that the detection sensitivity of the standard TBM approach could be increased by using the full deformation tensors in a multivariate statistical analysis. Here we set out to improve the common space itself, by choosing the shape that minimizes a natural metric on the deformation tensors from that space to the population of control subjects. This method avoids statistical bias and should ease nonlinear registration of new subjects data to a template that is 'closest' to all subjects' anatomies. As deformation tensors are symmetric positive-definite matrices and do not form a vector space, all computations are performed in the log-Euclidean framework. The control brain B that is already the closest to 'average' is found. A gradient descent algorithm is then used to perform the minimization that iteratively deforms this template and obtains the mean shape. We apply our method to map the profile of anatomical differences in a dataset of 26 HIV/AIDS patients and 14 controls, via a log-Euclidean Hotelling's T2 test on the deformation tensors. These results are compared to the ones found using the 'best' control, B. Statistics on both shapes are evaluated using cumulative distribution functions of the p-values in maps of inter-group differences.

  15. The Likelihood of Injury Among Bias Crimes: An Analysis of General and Specific Bias Types.

    PubMed

    Pezzella, Frank S; Fetzer, Matthew D

    2015-06-18

    In 2009, President Barack Obama signed the Mathew Sheppard and James Byrd Jr. Hate Crimes Protection act and thereby extended the list of previously protected classes of victims from actual or perceived race, color, religion, national origin, disability and sex orientation to gender and gender identity. Over 45 states, the District of Columbia and the federal government now include hate crime statutes that increase penalties when offenders perpetrate hate crimes against protected classes of victims. Penalty enhancement statutes sanction unlawful bias conduct arguably because they result in more severe injuries relative to non-bias conduct. We contend that physical injuries vary by bias type and are not equally injurious. Data on bias crimes was analyzed from the National Incident Based Reporting System. Descriptive patterns of bias crimes were identified by offense type, bias motivation and major and minor injuries. Using Multivariate analyses, we found an escalating trend of violence against racial minorities. Moreover, relative to non-bias crimes, only anti-White and anti-lesbian bias crimes experienced our two prong "animus" criteria of disproportionate prevalence and severity of injury. However, when compared to anti-White bias, anti-Black bias crimes were more prevalent and likely to suffer serious injuries. Implications for hate crime jurisprudence are discussed. © The Author(s) 2015.

  16. Multi-criteria evaluation of CMIP5 GCMs for climate change impact analysis

    NASA Astrophysics Data System (ADS)

    Ahmadalipour, Ali; Rana, Arun; Moradkhani, Hamid; Sharma, Ashish

    2017-04-01

    Climate change is expected to have severe impacts on global hydrological cycle along with food-water-energy nexus. Currently, there are many climate models used in predicting important climatic variables. Though there have been advances in the field, there are still many problems to be resolved related to reliability, uncertainty, and computing needs, among many others. In the present work, we have analyzed performance of 20 different global climate models (GCMs) from Climate Model Intercomparison Project Phase 5 (CMIP5) dataset over the Columbia River Basin (CRB) in the Pacific Northwest USA. We demonstrate a statistical multicriteria approach, using univariate and multivariate techniques, for selecting suitable GCMs to be used for climate change impact analysis in the region. Univariate methods includes mean, standard deviation, coefficient of variation, relative change (variability), Mann-Kendall test, and Kolmogorov-Smirnov test (KS-test); whereas multivariate methods used were principal component analysis (PCA), singular value decomposition (SVD), canonical correlation analysis (CCA), and cluster analysis. The analysis is performed on raw GCM data, i.e., before bias correction, for precipitation and temperature climatic variables for all the 20 models to capture the reliability and nature of the particular model at regional scale. The analysis is based on spatially averaged datasets of GCMs and observation for the period of 1970 to 2000. Ranking is provided to each of the GCMs based on the performance evaluated against gridded observational data on various temporal scales (daily, monthly, and seasonal). Results have provided insight into each of the methods and various statistical properties addressed by them employed in ranking GCMs. Further; evaluation was also performed for raw GCM simulations against different sets of gridded observational dataset in the area.

  17. Multivariate Meta-Analysis of Genetic Association Studies: A Simulation Study

    PubMed Central

    Neupane, Binod; Beyene, Joseph

    2015-01-01

    In a meta-analysis with multiple end points of interests that are correlated between or within studies, multivariate approach to meta-analysis has a potential to produce more precise estimates of effects by exploiting the correlation structure between end points. However, under random-effects assumption the multivariate estimation is more complex (as it involves estimation of more parameters simultaneously) than univariate estimation, and sometimes can produce unrealistic parameter estimates. Usefulness of multivariate approach to meta-analysis of the effects of a genetic variant on two or more correlated traits is not well understood in the area of genetic association studies. In such studies, genetic variants are expected to roughly maintain Hardy-Weinberg equilibrium within studies, and also their effects on complex traits are generally very small to modest and could be heterogeneous across studies for genuine reasons. We carried out extensive simulation to explore the comparative performance of multivariate approach with most commonly used univariate inverse-variance weighted approach under random-effects assumption in various realistic meta-analytic scenarios of genetic association studies of correlated end points. We evaluated the performance with respect to relative mean bias percentage, and root mean square error (RMSE) of the estimate and coverage probability of corresponding 95% confidence interval of the effect for each end point. Our simulation results suggest that multivariate approach performs similarly or better than univariate method when correlations between end points within or between studies are at least moderate and between-study variation is similar or larger than average within-study variation for meta-analyses of 10 or more genetic studies. Multivariate approach produces estimates with smaller bias and RMSE especially for the end point that has randomly or informatively missing summary data in some individual studies, when the missing data in the endpoint are imputed with null effects and quite large variance. PMID:26196398

  18. Multivariate Meta-Analysis of Genetic Association Studies: A Simulation Study.

    PubMed

    Neupane, Binod; Beyene, Joseph

    2015-01-01

    In a meta-analysis with multiple end points of interests that are correlated between or within studies, multivariate approach to meta-analysis has a potential to produce more precise estimates of effects by exploiting the correlation structure between end points. However, under random-effects assumption the multivariate estimation is more complex (as it involves estimation of more parameters simultaneously) than univariate estimation, and sometimes can produce unrealistic parameter estimates. Usefulness of multivariate approach to meta-analysis of the effects of a genetic variant on two or more correlated traits is not well understood in the area of genetic association studies. In such studies, genetic variants are expected to roughly maintain Hardy-Weinberg equilibrium within studies, and also their effects on complex traits are generally very small to modest and could be heterogeneous across studies for genuine reasons. We carried out extensive simulation to explore the comparative performance of multivariate approach with most commonly used univariate inverse-variance weighted approach under random-effects assumption in various realistic meta-analytic scenarios of genetic association studies of correlated end points. We evaluated the performance with respect to relative mean bias percentage, and root mean square error (RMSE) of the estimate and coverage probability of corresponding 95% confidence interval of the effect for each end point. Our simulation results suggest that multivariate approach performs similarly or better than univariate method when correlations between end points within or between studies are at least moderate and between-study variation is similar or larger than average within-study variation for meta-analyses of 10 or more genetic studies. Multivariate approach produces estimates with smaller bias and RMSE especially for the end point that has randomly or informatively missing summary data in some individual studies, when the missing data in the endpoint are imputed with null effects and quite large variance.

  19. Data analysis techniques

    NASA Technical Reports Server (NTRS)

    Park, Steve

    1990-01-01

    A large and diverse number of computational techniques are routinely used to process and analyze remotely sensed data. These techniques include: univariate statistics; multivariate statistics; principal component analysis; pattern recognition and classification; other multivariate techniques; geometric correction; registration and resampling; radiometric correction; enhancement; restoration; Fourier analysis; and filtering. Each of these techniques will be considered, in order.

  20. Integration of ecological indices in the multivariate evaluation of an urban inventory of street trees

    Treesearch

    J. Grabinsky; A. Aldama; A. Chacalo; H. J. Vazquez

    2000-01-01

    Inventory data of Mexico City's street trees were studied using classical statistical arboricultural and ecological statistical approaches. Multivariate techniques were applied to both. Results did not differ substantially and were complementary. It was possible to reduce inventory data and to group species, boroughs, blocks, and variables.

  1. Adaptable history biases in human perceptual decisions.

    PubMed

    Abrahamyan, Arman; Silva, Laura Luz; Dakin, Steven C; Carandini, Matteo; Gardner, Justin L

    2016-06-21

    When making choices under conditions of perceptual uncertainty, past experience can play a vital role. However, it can also lead to biases that worsen decisions. Consistent with previous observations, we found that human choices are influenced by the success or failure of past choices even in a standard two-alternative detection task, where choice history is irrelevant. The typical bias was one that made the subject switch choices after a failure. These choice history biases led to poorer performance and were similar for observers in different countries. They were well captured by a simple logistic regression model that had been previously applied to describe psychophysical performance in mice. Such irrational biases seem at odds with the principles of reinforcement learning, which would predict exquisite adaptability to choice history. We therefore asked whether subjects could adapt their irrational biases following changes in trial order statistics. Adaptability was strong in the direction that confirmed a subject's default biases, but weaker in the opposite direction, so that existing biases could not be eradicated. We conclude that humans can adapt choice history biases, but cannot easily overcome existing biases even if irrational in the current context: adaptation is more sensitive to confirmatory than contradictory statistics.

  2. Correction of stream quality trends for the effects of laboratory measurement bias

    USGS Publications Warehouse

    Alexander, Richard B.; Smith, Richard A.; Schwarz, Gregory E.

    1993-01-01

    We present a statistical model relating measurements of water quality to associated errors in laboratory methods. Estimation of the model allows us to correct trends in water quality for long-term and short-term variations in laboratory measurement errors. An illustration of the bias correction method for a large national set of stream water quality and quality assurance data shows that reductions in the bias of estimates of water quality trend slopes are achieved at the expense of increases in the variance of these estimates. Slight improvements occur in the precision of estimates of trend in bias by using correlative information on bias and water quality to estimate random variations in measurement bias. The results of this investigation stress the need for reliable, long-term quality assurance data and efficient statistical methods to assess the effects of measurement errors on the detection of water quality trends.

  3. The Effect of the Multivariate Box-Cox Transformation on the Power of MANOVA.

    ERIC Educational Resources Information Center

    Kirisci, Levent; Hsu, Tse-Chi

    Most of the multivariate statistical techniques rely on the assumption of multivariate normality. The effects of non-normality on multivariate tests are assumed to be negligible when variance-covariance matrices and sample sizes are equal. Therefore, in practice, investigators do not usually attempt to remove non-normality. In this simulation…

  4. Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: a Monte Carlo study.

    PubMed

    Chou, C P; Bentler, P M; Satorra, A

    1991-11-01

    Research studying robustness of maximum likelihood (ML) statistics in covariance structure analysis has concluded that test statistics and standard errors are biased under severe non-normality. An estimation procedure known as asymptotic distribution free (ADF), making no distributional assumption, has been suggested to avoid these biases. Corrections to the normal theory statistics to yield more adequate performance have also been proposed. This study compares the performance of a scaled test statistic and robust standard errors for two models under several non-normal conditions and also compares these with the results from ML and ADF methods. Both ML and ADF test statistics performed rather well in one model and considerably worse in the other. In general, the scaled test statistic seemed to behave better than the ML test statistic and the ADF statistic performed the worst. The robust and ADF standard errors yielded more appropriate estimates of sampling variability than the ML standard errors, which were usually downward biased, in both models under most of the non-normal conditions. ML test statistics and standard errors were found to be quite robust to the violation of the normality assumption when data had either symmetric and platykurtic distributions, or non-symmetric and zero kurtotic distributions.

  5. The saccadic flow baseline: Accounting for image-independent biases in fixation behavior.

    PubMed

    Clarke, Alasdair D F; Stainer, Matthew J; Tatler, Benjamin W; Hunt, Amelia R

    2017-09-01

    Much effort has been made to explain eye guidance during natural scene viewing. However, a substantial component of fixation placement appears to be a set of consistent biases in eye movement behavior. We introduce the concept of saccadic flow, a generalization of the central bias that describes the image-independent conditional probability of making a saccade to (xi+1, yi+1), given a fixation at (xi, yi). We suggest that saccadic flow can be a useful prior when carrying out analyses of fixation locations, and can be used as a submodule in models of eye movements during scene viewing. We demonstrate the utility of this idea by presenting bias-weighted gaze landscapes, and show that there is a link between the likelihood of a saccade under the flow model, and the salience of the following fixation. We also present a minor improvement to our central bias model (based on using a multivariate truncated Gaussian), and investigate the leftwards and coarse-to-fine biases in scene viewing.

  6. On the linearity of tracer bias around voids

    NASA Astrophysics Data System (ADS)

    Pollina, Giorgia; Hamaus, Nico; Dolag, Klaus; Weller, Jochen; Baldi, Marco; Moscardini, Lauro

    2017-07-01

    The large-scale structure of the Universe can be observed only via luminous tracers of the dark matter. However, the clustering statistics of tracers are biased and depend on various properties, such as their host-halo mass and assembly history. On very large scales, this tracer bias results in a constant offset in the clustering amplitude, known as linear bias. Towards smaller non-linear scales, this is no longer the case and tracer bias becomes a complicated function of scale and time. We focus on tracer bias centred on cosmic voids, I.e. depressions of the density field that spatially dominate the Universe. We consider three types of tracers: galaxies, galaxy clusters and active galactic nuclei, extracted from the hydrodynamical simulation Magneticum Pathfinder. In contrast to common clustering statistics that focus on auto-correlations of tracers, we find that void-tracer cross-correlations are successfully described by a linear bias relation. The tracer-density profile of voids can thus be related to their matter-density profile by a single number. We show that it coincides with the linear tracer bias extracted from the large-scale auto-correlation function and expectations from theory, if sufficiently large voids are considered. For smaller voids we observe a shift towards higher values. This has important consequences on cosmological parameter inference, as the problem of unknown tracer bias is alleviated up to a constant number. The smallest scales in existing data sets become accessible to simpler models, providing numerous modes of the density field that have been disregarded so far, but may help to further reduce statistical errors in constraining cosmology.

  7. Multivariate Regression Analysis and Slaughter Livestock,

    DTIC Science & Technology

    AGRICULTURE, *ECONOMICS), (*MEAT, PRODUCTION), MULTIVARIATE ANALYSIS, REGRESSION ANALYSIS , ANIMALS, WEIGHT, COSTS, PREDICTIONS, STABILITY, MATHEMATICAL MODELS, STORAGE, BEEF, PORK, FOOD, STATISTICAL DATA, ACCURACY

  8. A Procedure To Detect Test Bias Present Simultaneously in Several Items.

    ERIC Educational Resources Information Center

    Shealy, Robin; Stout, William

    A statistical procedure is presented that is designed to test for unidirectional test bias existing simultaneously in several items of an ability test, based on the assumption that test bias is incipient within the two groups' ability differences. The proposed procedure--Simultaneous Item Bias (SIB)--is based on a multidimensional item response…

  9. The Probability Distribution for a Biased Spinner

    ERIC Educational Resources Information Center

    Foster, Colin

    2012-01-01

    This article advocates biased spinners as an engaging context for statistics students. Calculating the probability of a biased spinner landing on a particular side makes valuable connections between probability and other areas of mathematics. (Contains 2 figures and 1 table.)

  10. Cancer Survival Estimates Due to Non-Uniform Loss to Follow-Up and Non-Proportional Hazards

    PubMed

    K M, Jagathnath Krishna; Mathew, Aleyamma; Sara George, Preethi

    2017-06-25

    Background: Cancer survival depends on loss to follow-up (LFU) and non-proportional hazards (non-PH). If LFU is high, survival will be over-estimated. If hazard is non-PH, rank tests will provide biased inference and Cox-model will provide biased hazard-ratio. We assessed the bias due to LFU and non-PH factor in cancer survival and provided alternate methods for unbiased inference and hazard-ratio. Materials and Methods: Kaplan-Meier survival were plotted using a realistic breast cancer (BC) data-set, with >40%, 5-year LFU and compared it using another BC data-set with <15%, 5-year LFU to assess the bias in survival due to high LFU. Age at diagnosis of the latter data set was used to illustrate the bias due to a non-PH factor. Log-rank test was employed to assess the bias in p-value and Cox-model was used to assess the bias in hazard-ratio for the non-PH factor. Schoenfeld statistic was used to test the non-PH of age. For the non-PH factor, we employed Renyi statistic for inference and time dependent Cox-model for hazard-ratio. Results: Five-year BC survival was 69% (SE: 1.1%) vs. 90% (SE: 0.7%) for data with low vs. high LFU respectively. Age (<45, 46-54 & >54 years) was a non-PH factor (p-value: 0.036). However, survival by age was significant (log-rank p-value: 0.026), but not significant using Renyi statistic (p=0.067). Hazard ratio (HR) for age using Cox-model was 1.012 (95%CI: 1.004 -1.019) and the same using time-dependent Cox-model was in the other direction (HR: 0.997; 95% CI: 0.997- 0.998). Conclusion: Over-estimated survival was observed for cancer with high LFU. Log-rank statistic and Cox-model provided biased results for non-PH factor. For data with non-PH factors, Renyi statistic and time dependent Cox-model can be used as alternate methods to obtain unbiased inference and estimates. Creative Commons Attribution License

  11. Sleep Apnea, Sleep Duration and Brain MRI Markers of Cerebral Vascular Disease and Alzheimer's Disease: The Atherosclerosis Risk in Communities Study (ARIC).

    PubMed

    Lutsey, Pamela L; Norby, Faye L; Gottesman, Rebecca F; Mosley, Thomas; MacLehose, Richard F; Punjabi, Naresh M; Shahar, Eyal; Jack, Clifford R; Alonso, Alvaro

    2016-01-01

    A growing body of literature has suggested that obstructive sleep apnea (OSA) and habitual short sleep duration are linked to poor cognitive function. Neuroimaging studies may provide insight into this relation. We tested the hypotheses that OSA and habitual short sleep duration, measured at ages 54-73 years, would be associated with adverse brain morphology at ages 67-89 years. Included in this analysis are 312 ARIC study participants who underwent in-home overnight polysomnography in 1996-1998 and brain MRI scans about 15 years later (2012-2013). Sleep apnea was quantified by the apnea-hypopnea index and categorized as moderate/severe (≥15.0 events/hour), mild (5.0-14.9 events/hour), or normal (<5.0 events/hour). Habitual sleep duration was categorized, in hours, as <7, 7 to <8, ≥8. MRI outcomes included number of infarcts (total, subcortical, and cortical) and white matter hyperintensity (WMH) and Alzheimer's disease signature region volumes. Multivariable adjusted logistic and linear regression models were used. All models incorporated inverse probability weighting, to adjust for potential selection bias. At the time of the sleep study participants were 61.7 (SD: 5.0) years old and 54% female; 19% had moderate/severe sleep apnea. MRI imaging took place 14.8 (SD: 1.0) years later, when participants were 76.5 (SD: 5.2) years old. In multivariable models which accounted for body mass index, neither OSA nor abnormal sleep duration were statistically significantly associated with odds of cerebral infarcts, WMH brain volumes or regional brain volumes. In this community-based sample, mid-life OSA and habitually short sleep duration were not associated with later-life cerebral markers of vascular dementia and Alzheimer's disease. However, selection bias may have influenced our results and the modest sample size led to relatively imprecise associations.

  12. Statistical analysis of multivariate atmospheric variables. [cloud cover

    NASA Technical Reports Server (NTRS)

    Tubbs, J. D.

    1979-01-01

    Topics covered include: (1) estimation in discrete multivariate distributions; (2) a procedure to predict cloud cover frequencies in the bivariate case; (3) a program to compute conditional bivariate normal parameters; (4) the transformation of nonnormal multivariate to near-normal; (5) test of fit for the extreme value distribution based upon the generalized minimum chi-square; (6) test of fit for continuous distributions based upon the generalized minimum chi-square; (7) effect of correlated observations on confidence sets based upon chi-square statistics; and (8) generation of random variates from specified distributions.

  13. Multivariate mixed linear model analysis of longitudinal data: an information-rich statistical technique for analyzing disease resistance data

    USDA-ARS?s Scientific Manuscript database

    The mixed linear model (MLM) is currently among the most advanced and flexible statistical modeling techniques and its use in tackling problems in plant pathology has begun surfacing in the literature. The longitudinal MLM is a multivariate extension that handles repeatedly measured data, such as r...

  14. Functional Path Analysis as a Multivariate Technique in Developing a Theory of Participation in Adult Education.

    ERIC Educational Resources Information Center

    Martin, James L.

    This paper reports on attempts by the author to construct a theoretical framework of adult education participation using a theory development process and the corresponding multivariate statistical techniques. Two problems are identified: the lack of theoretical framework in studying problems, and the limiting of statistical analysis to univariate…

  15. Methodological approaches in analysing observational data: A practical example on how to address clustering and selection bias.

    PubMed

    Trutschel, Diana; Palm, Rebecca; Holle, Bernhard; Simon, Michael

    2017-11-01

    Because not every scientific question on effectiveness can be answered with randomised controlled trials, research methods that minimise bias in observational studies are required. Two major concerns influence the internal validity of effect estimates: selection bias and clustering. Hence, to reduce the bias of the effect estimates, more sophisticated statistical methods are needed. To introduce statistical approaches such as propensity score matching and mixed models into representative real-world analysis and to conduct the implementation in statistical software R to reproduce the results. Additionally, the implementation in R is presented to allow the results to be reproduced. We perform a two-level analytic strategy to address the problems of bias and clustering: (i) generalised models with different abilities to adjust for dependencies are used to analyse binary data and (ii) the genetic matching and covariate adjustment methods are used to adjust for selection bias. Hence, we analyse the data from two population samples, the sample produced by the matching method and the full sample. The different analysis methods in this article present different results but still point in the same direction. In our example, the estimate of the probability of receiving a case conference is higher in the treatment group than in the control group. Both strategies, genetic matching and covariate adjustment, have their limitations but complement each other to provide the whole picture. The statistical approaches were feasible for reducing bias but were nevertheless limited by the sample used. For each study and obtained sample, the pros and cons of the different methods have to be weighted. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  16. Explanation of Two Anomalous Results in Statistical Mediation Analysis.

    PubMed

    Fritz, Matthew S; Taylor, Aaron B; Mackinnon, David P

    2012-01-01

    Previous studies of different methods of testing mediation models have consistently found two anomalous results. The first result is elevated Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap tests not found in nonresampling tests or in resampling tests that did not include a bias correction. This is of special concern as the bias-corrected bootstrap is often recommended and used due to its higher statistical power compared with other tests. The second result is statistical power reaching an asymptote far below 1.0 and in some conditions even declining slightly as the size of the relationship between X and M , a , increased. Two computer simulations were conducted to examine these findings in greater detail. Results from the first simulation found that the increased Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap are a function of an interaction between the size of the individual paths making up the mediated effect and the sample size, such that elevated Type I error rates occur when the sample size is small and the effect size of the nonzero path is medium or larger. Results from the second simulation found that stagnation and decreases in statistical power as a function of the effect size of the a path occurred primarily when the path between M and Y , b , was small. Two empirical mediation examples are provided using data from a steroid prevention and health promotion program aimed at high school football players (Athletes Training and Learning to Avoid Steroids; Goldberg et al., 1996), one to illustrate a possible Type I error for the bias-corrected bootstrap test and a second to illustrate a loss in power related to the size of a . Implications of these findings are discussed.

  17. Selection bias of Internet panel surveys: a comparison with a paper-based survey and national governmental statistics in Japan.

    PubMed

    Tsuboi, Satoshi; Yoshida, Honami; Ae, Ryusuke; Kojo, Takao; Nakamura, Yosikazu; Kitamura, Kunio

    2015-03-01

    To investigate the selection bias of an Internet panel survey organized by a commercial company. A descriptive study was conducted. The authors compared the characteristics of the Internet panel survey with a national paper-based survey and with national governmental statistics in Japan. The participants in the Internet panel survey were composed of more women, were older, and resided in large cities. Regardless of age and sex, the prevalence of highly educated people in the Internet panel survey was higher than in the paper-based survey and the national statistics. In men, the prevalence of heavy drinkers among the 30- to 49-year-old population and of habitual smokers among the 20- to 49-year-old population in the Internet panel survey was lower than what was found in the national statistics. The estimated characteristics of commercial Internet panel surveys were quite different from the national statistical data. In a commercial Internet panel survey, selection bias should not be underestimated. © 2012 APJPH.

  18. A multivariate model and statistical method for validating tree grade lumber yield equations

    Treesearch

    Donald W. Seegrist

    1975-01-01

    Lumber yields within lumber grades can be described by a multivariate linear model. A method for validating lumber yield prediction equations when there are several tree grades is presented. The method is based on multivariate simultaneous test procedures.

  19. The impact of moderate wine consumption on the risk of developing prostate cancer.

    PubMed

    Vartolomei, Mihai Dorin; Kimura, Shoji; Ferro, Matteo; Foerster, Beat; Abufaraj, Mohammad; Briganti, Alberto; Karakiewicz, Pierre I; Shariat, Shahrokh F

    2018-01-01

    To investigate the impact of moderate wine consumption on the risk of prostate cancer (PCa). We focused on the differential effect of moderate consumption of red versus white wine. This study was a meta-analysis that includes data from case-control and cohort studies. A systematic search of Web of Science, Medline/PubMed, and Cochrane library was performed on December 1, 2017. Studies were deemed eligible if they assessed the risk of PCa due to red, white, or any wine using multivariable logistic regression analysis. We performed a formal meta-analysis for the risk of PCa according to moderate wine and wine type consumption (white or red). Heterogeneity between studies was assessed using Cochrane's Q test and I 2 statistics. Publication bias was assessed using Egger's regression test. A total of 930 abstracts and titles were initially identified. After removal of duplicates, reviews, and conference abstracts, 83 full-text original articles were screened. Seventeen studies (611,169 subjects) were included for final evaluation and fulfilled the inclusion criteria. In the case of moderate wine consumption: the pooled risk ratio (RR) for the risk of PCa was 0.98 (95% CI 0.92-1.05, p =0.57) in the multivariable analysis. Moderate white wine consumption increased the risk of PCa with a pooled RR of 1.26 (95% CI 1.10-1.43, p =0.001) in the multi-variable analysis. Meanwhile, moderate red wine consumption had a protective role reducing the risk by 12% (RR 0.88, 95% CI 0.78-0.999, p =0.047) in the multivariable analysis that comprised 222,447 subjects. In this meta-analysis, moderate wine consumption did not impact the risk of PCa. Interestingly, regarding the type of wine, moderate consumption of white wine increased the risk of PCa, whereas moderate consumption of red wine had a protective effect. Further analyses are needed to assess the differential molecular effect of white and red wine conferring their impact on PCa risk.

  20. Predicting the multi-domain progression of Parkinson's disease: a Bayesian multivariate generalized linear mixed-effect model.

    PubMed

    Wang, Ming; Li, Zheng; Lee, Eun Young; Lewis, Mechelle M; Zhang, Lijun; Sterling, Nicholas W; Wagner, Daymond; Eslinger, Paul; Du, Guangwei; Huang, Xuemei

    2017-09-25

    It is challenging for current statistical models to predict clinical progression of Parkinson's disease (PD) because of the involvement of multi-domains and longitudinal data. Past univariate longitudinal or multivariate analyses from cross-sectional trials have limited power to predict individual outcomes or a single moment. The multivariate generalized linear mixed-effect model (GLMM) under the Bayesian framework was proposed to study multi-domain longitudinal outcomes obtained at baseline, 18-, and 36-month. The outcomes included motor, non-motor, and postural instability scores from the MDS-UPDRS, and demographic and standardized clinical data were utilized as covariates. The dynamic prediction was performed for both internal and external subjects using the samples from the posterior distributions of the parameter estimates and random effects, and also the predictive accuracy was evaluated based on the root of mean square error (RMSE), absolute bias (AB) and the area under the receiver operating characteristic (ROC) curve. First, our prediction model identified clinical data that were differentially associated with motor, non-motor, and postural stability scores. Second, the predictive accuracy of our model for the training data was assessed, and improved prediction was gained in particularly for non-motor (RMSE and AB: 2.89 and 2.20) compared to univariate analysis (RMSE and AB: 3.04 and 2.35). Third, the individual-level predictions of longitudinal trajectories for the testing data were performed, with ~80% observed values falling within the 95% credible intervals. Multivariate general mixed models hold promise to predict clinical progression of individual outcomes in PD. The data was obtained from Dr. Xuemei Huang's NIH grant R01 NS060722 , part of NINDS PD Biomarker Program (PDBP). All data was entered within 24 h of collection to the Data Management Repository (DMR), which is publically available ( https://pdbp.ninds.nih.gov/data-management ).

  1. Atrial Electrogram Fractionation Distribution before and after Pulmonary Vein Isolation in Human Persistent Atrial Fibrillation-A Retrospective Multivariate Statistical Analysis.

    PubMed

    Almeida, Tiago P; Chu, Gavin S; Li, Xin; Dastagir, Nawshin; Tuan, Jiun H; Stafford, Peter J; Schlindwein, Fernando S; Ng, G André

    2017-01-01

    Purpose: Complex fractionated atrial electrograms (CFAE)-guided ablation after pulmonary vein isolation (PVI) has been used for persistent atrial fibrillation (persAF) therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model. Methods: 207 pairs of atrial electrograms (AEGs) were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA) and linear discriminant analysis (LDA) have been used to characterize the atrial regions and AEGs. Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P < 0.0001). Four types of LA regions were identified, based on the AEGs characteristics: (i) fractionated before PVI that remained fractionated after PVI (31% of the collected points); (ii) fractionated that converted to normal (39%); (iii) normal prior to PVI that became fractionated (9%) and; (iv) normal that remained normal (21%). Individually, the attributes failed to distinguish these LA regions, but multivariate statistical models were effective in their discrimination ( P < 0.0001). Conclusion: Our results have unveiled that there are LA regions resistant to PVI, while others are affected by it. Although, traditional methods were unable to identify these different regions, the proposed multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information.

  2. Effects of poverty and lack of insurance on perceptions of racial and ethnic bias in health care.

    PubMed

    Stepanikova, Irena; Cook, Karen S

    2008-06-01

    To investigate whether poverty and lack of insurance are associated with perceived racial and ethnic bias in health care. 2001 Survey on Disparities in Quality of Health Care, a nationally representative telephone survey. We use data on black, Hispanic, and white adults who have a regular physician (N=4,556). We estimate multivariate logistic regression models to examine the effects of poverty and lack of health insurance on perceived racial and ethnic bias in health care for all respondents and by racial, ethnic, and language groups. Controlling for sociodemographic and other factors, uninsured blacks and Hispanics interviewed in English are more likely to report racial and ethnic bias in health care compared with their privately insured counterparts. Poor whites are more likely to report racial and ethnic bias in health care compared with other whites. Good physician-patient communication is negatively associated with perceived racial and ethnic bias. Compared with their more socioeconomically advantaged counterparts, poor whites, uninsured blacks, and some uninsured Hispanics are more likely to perceive that racial and ethnic bias operates in the health care they receive. Providing health insurance for the uninsured may help reduce this perceived bias among some minority groups.

  3. Application of multivariate statistical techniques in microbial ecology

    PubMed Central

    Paliy, O.; Shankar, V.

    2016-01-01

    Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large scale ecological datasets. Especially noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions, and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amounts of data, powerful statistical techniques of multivariate analysis are well suited to analyze and interpret these datasets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular dataset. In this review we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive, and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and dataset structure. PMID:26786791

  4. Multivariate analysis in thoracic research.

    PubMed

    Mengual-Macenlle, Noemí; Marcos, Pedro J; Golpe, Rafael; González-Rivas, Diego

    2015-03-01

    Multivariate analysis is based in observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. The development of multivariate methods emerged to analyze large databases and increasingly complex data. Since the best way to represent the knowledge of reality is the modeling, we should use multivariate statistical methods. Multivariate methods are designed to simultaneously analyze data sets, i.e., the analysis of different variables for each person or object studied. Keep in mind at all times that all variables must be treated accurately reflect the reality of the problem addressed. There are different types of multivariate analysis and each one should be employed according to the type of variables to analyze: dependent, interdependence and structural methods. In conclusion, multivariate methods are ideal for the analysis of large data sets and to find the cause and effect relationships between variables; there is a wide range of analysis types that we can use.

  5. A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments

    PubMed Central

    Avalappampatty Sivasamy, Aneetha; Sundan, Bose

    2015-01-01

    The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668

  6. A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments.

    PubMed

    Sivasamy, Aneetha Avalappampatty; Sundan, Bose

    2015-01-01

    The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.

  7. Multivariate analysis: A statistical approach for computations

    NASA Astrophysics Data System (ADS)

    Michu, Sachin; Kaushik, Vandana

    2014-10-01

    Multivariate analysis is a type of multivariate statistical approach commonly used in, automotive diagnosis, education evaluating clusters in finance etc and more recently in the health-related professions. The objective of the paper is to provide a detailed exploratory discussion about factor analysis (FA) in image retrieval method and correlation analysis (CA) of network traffic. Image retrieval methods aim to retrieve relevant images from a collected database, based on their content. The problem is made more difficult due to the high dimension of the variable space in which the images are represented. Multivariate correlation analysis proposes an anomaly detection and analysis method based on the correlation coefficient matrix. Anomaly behaviors in the network include the various attacks on the network like DDOs attacks and network scanning.

  8. Addressing the mischaracterization of extreme rainfall in regional climate model simulations - A synoptic pattern based bias correction approach

    NASA Astrophysics Data System (ADS)

    Li, Jingwan; Sharma, Ashish; Evans, Jason; Johnson, Fiona

    2018-01-01

    Addressing systematic biases in regional climate model simulations of extreme rainfall is a necessary first step before assessing changes in future rainfall extremes. Commonly used bias correction methods are designed to match statistics of the overall simulated rainfall with observations. This assumes that change in the mix of different types of extreme rainfall events (i.e. convective and non-convective) in a warmer climate is of little relevance in the estimation of overall change, an assumption that is not supported by empirical or physical evidence. This study proposes an alternative approach to account for the potential change of alternate rainfall types, characterized here by synoptic weather patterns (SPs) using self-organizing maps classification. The objective of this study is to evaluate the added influence of SPs on the bias correction, which is achieved by comparing the corrected distribution of future extreme rainfall with that using conventional quantile mapping. A comprehensive synthetic experiment is first defined to investigate the conditions under which the additional information of SPs makes a significant difference to the bias correction. Using over 600,000 synthetic cases, statistically significant differences are found to be present in 46% cases. This is followed by a case study over the Sydney region using a high-resolution run of the Weather Research and Forecasting (WRF) regional climate model, which indicates a small change in the proportions of the SPs and a statistically significant change in the extreme rainfall over the region, although the differences between the changes obtained from the two bias correction methods are not statistically significant.

  9. Explicating the Conditions Under Which Multilevel Multiple Imputation Mitigates Bias Resulting from Random Coefficient-Dependent Missing Longitudinal Data.

    PubMed

    Gottfredson, Nisha C; Sterba, Sonya K; Jackson, Kristina M

    2017-01-01

    Random coefficient-dependent (RCD) missingness is a non-ignorable mechanism through which missing data can arise in longitudinal designs. RCD, for which we cannot test, is a problematic form of missingness that occurs if subject-specific random effects correlate with propensity for missingness or dropout. Particularly when covariate missingness is a problem, investigators typically handle missing longitudinal data by using single-level multiple imputation procedures implemented with long-format data, which ignores within-person dependency entirely, or implemented with wide-format (i.e., multivariate) data, which ignores some aspects of within-person dependency. When either of these standard approaches to handling missing longitudinal data is used, RCD missingness leads to parameter bias and incorrect inference. We explain why multilevel multiple imputation (MMI) should alleviate bias induced by a RCD missing data mechanism under conditions that contribute to stronger determinacy of random coefficients. We evaluate our hypothesis with a simulation study. Three design factors are considered: intraclass correlation (ICC; ranging from .25 to .75), number of waves (ranging from 4 to 8), and percent of missing data (ranging from 20 to 50%). We find that MMI greatly outperforms the single-level wide-format (multivariate) method for imputation under a RCD mechanism. For the MMI analyses, bias was most alleviated when the ICC is high, there were more waves of data, and when there was less missing data. Practical recommendations for handling longitudinal missing data are suggested.

  10. Selecting climate simulations for impact studies based on multivariate patterns of climate change.

    PubMed

    Mendlik, Thomas; Gobiet, Andreas

    In climate change impact research it is crucial to carefully select the meteorological input for impact models. We present a method for model selection that enables the user to shrink the ensemble to a few representative members, conserving the model spread and accounting for model similarity. This is done in three steps: First, using principal component analysis for a multitude of meteorological parameters, to find common patterns of climate change within the multi-model ensemble. Second, detecting model similarities with regard to these multivariate patterns using cluster analysis. And third, sampling models from each cluster, to generate a subset of representative simulations. We present an application based on the ENSEMBLES regional multi-model ensemble with the aim to provide input for a variety of climate impact studies. We find that the two most dominant patterns of climate change relate to temperature and humidity patterns. The ensemble can be reduced from 25 to 5 simulations while still maintaining its essential characteristics. Having such a representative subset of simulations reduces computational costs for climate impact modeling and enhances the quality of the ensemble at the same time, as it prevents double-counting of dependent simulations that would lead to biased statistics. The online version of this article (doi:10.1007/s10584-015-1582-0) contains supplementary material, which is available to authorized users.

  11. Long working hours and coronary heart disease: a systematic review and meta-analysis.

    PubMed

    Virtanen, Marianna; Heikkilä, Katriina; Jokela, Markus; Ferrie, Jane E; Batty, G David; Vahtera, Jussi; Kivimäki, Mika

    2012-10-01

    The authors aggregated the results of observational studies examining the association between long working hours and coronary heart disease (CHD). Data sources used were MEDLINE (through January 19, 2011) and Web of Science (through March 14, 2011). Two investigators independently extracted results from eligible studies. Heterogeneity between the studies was assessed using the I(2) statistic, and the possibility of publication bias was assessed using the funnel plot and Egger's test for small-study effects. Twelve studies were identified (7 case-control, 4 prospective, and 1 cross-sectional). For a total of 22,518 participants (2,313 CHD cases), the minimally adjusted relative risk of CHD for long working hours was 1.80 (95% confidence interval (CI): 1.42, 2.29), and in the maximally (multivariate-) adjusted analysis the relative risk was 1.59 (95% CI: 1.23, 2.07). The 4 prospective studies produced a relative risk of 1.39 (95% CI: 1.12, 1.72), while the corresponding relative risk in the 7 case-control studies was 2.43 (95% CI: 1.81, 3.26). Little evidence of publication bias but relatively large heterogeneity was observed. Studies varied in size, design, measurement of exposure and outcome, and adjustments. In conclusion, results from prospective observational studies suggest an approximately 40% excess risk of CHD in employees working long hours.

  12. Long Working Hours and Coronary Heart Disease: A Systematic Review and Meta-Analysis

    PubMed Central

    Virtanen, Marianna; Heikkilä, Katriina; Jokela, Markus; Ferrie, Jane E.; Batty, G. David; Vahtera, Jussi; Kivimäki, Mika

    2012-01-01

    The authors aggregated the results of observational studies examining the association between long working hours and coronary heart disease (CHD). Data sources used were MEDLINE (through January 19, 2011) and Web of Science (through March 14, 2011). Two investigators independently extracted results from eligible studies. Heterogeneity between the studies was assessed using the I2 statistic, and the possibility of publication bias was assessed using the funnel plot and Egger's test for small-study effects. Twelve studies were identified (7 case-control, 4 prospective, and 1 cross-sectional). For a total of 22,518 participants (2,313 CHD cases), the minimally adjusted relative risk of CHD for long working hours was 1.80 (95% confidence interval (CI): 1.42, 2.29), and in the maximally (multivariate-) adjusted analysis the relative risk was 1.59 (95% CI: 1.23, 2.07). The 4 prospective studies produced a relative risk of 1.39 (95% CI: 1.12, 1.72), while the corresponding relative risk in the 7 case-control studies was 2.43 (95% CI: 1.81, 3.26). Little evidence of publication bias but relatively large heterogeneity was observed. Studies varied in size, design, measurement of exposure and outcome, and adjustments. In conclusion, results from prospective observational studies suggest an approximately 40% excess risk of CHD in employees working long hours. PMID:22952309

  13. Evaluating change in attitude towards mathematics using the 'then-now' procedure in a cooperative learning programme.

    PubMed

    Townsend, Michael; Wilton, Keri

    2003-12-01

    Tertiary students' attitudes to mathematics are frequently negative and resistant to change, reflecting low self-efficacy. Some educators believe that greater use should be made of small group, collaborative teaching. However, the results of such interventions should be subject to assessments of bias caused by a shift in the frame of reference used by students in reporting their attitudes. This study was designed to assess whether traditional pretest-post-test procedures would indicate positive changes in mathematics attitude during a programme of cooperative learning, and whether an examination of any attitudinal change using the 'then-now' procedure would indicate bias in the results due to a shift in the internal standards for expressing attitude. Participants were 141 undergraduate students enrolled in a 12-week statistics and research design component of a course in educational psychology. Using multivariate procedures, pretest, post-test, and then-test measures of mathematics self-concept and anxiety were examined in conjunction with a cooperative learning approach to teaching. Significant positive changes between pretest and post-test were found for both mathematics self-concept and mathematics anxiety. There were no significant differences between the actual pretest and retrospective pretest measures of attitude. The results were not moderated by prior level of mathematics study. Conclusions about the apparent effectiveness of a cooperative learning programme were strengthened by the use of the retrospective pretest procedure.

  14. Glyphosate epidemiology expert panel review: a weight of evidence systematic review of the relationship between glyphosate exposure and non-Hodgkin's lymphoma or multiple myeloma.

    PubMed

    Acquavella, John; Garabrant, David; Marsh, Gary; Sorahan, Tom; Weed, Douglas L

    2016-09-01

    We conducted a systematic review of the epidemiologic literature for glyphosate focusing on non-Hodgkin's lymphoma (NHL) and multiple myeloma (MM) - two cancers that were the focus of a recent review by an International Agency for Research on Cancer Working Group. Our approach was consistent with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines for systematic reviews. We evaluated each relevant study according to a priori criteria for study quality: adequacy of study size, likelihood of confounding, potential for other biases and adequacy of the statistical analyses. Our evaluation included seven unique studies for NHL and four for MM, all but one of which were case control studies for each cancer. For NHL, the case-control studies were all limited by the potential for recall bias and the lack of adequate multivariate adjustment for multiple pesticide and other farming exposures. Only the Agricultural Health (cohort) Study met our a priori quality standards and this study found no evidence of an association between glyphosate and NHL. For MM, the case control studies shared the same limitations as noted for the NHL case-control studies and, in aggregate, the data were too sparse to enable an informed causal judgment. Overall, our review did not find support in the epidemiologic literature for a causal association between glyphosate and NHL or MM.

  15. Analytic Methods for Adjusting Subjective Rating Schemes.

    ERIC Educational Resources Information Center

    Cooper, Richard V. L.; Nelson, Gary R.

    Statistical and econometric techniques of correcting for supervisor bias in models of individual performance appraisal were developed, using a variant of the classical linear regression model. Location bias occurs when individual performance is systematically overestimated or underestimated, while scale bias results when raters either exaggerate…

  16. Can We Spin Straw Into Gold? An Evaluation of Immigrant Legal Status Imputation Approaches

    PubMed Central

    Van Hook, Jennifer; Bachmeier, James D.; Coffman, Donna; Harel, Ofer

    2014-01-01

    Researchers have developed logical, demographic, and statistical strategies for imputing immigrants’ legal status, but these methods have never been empirically assessed. We used Monte Carlo simulations to test whether, and under what conditions, legal status imputation approaches yield unbiased estimates of the association of unauthorized status with health insurance coverage. We tested five methods under a range of missing data scenarios. Logical and demographic imputation methods yielded biased estimates across all missing data scenarios. Statistical imputation approaches yielded unbiased estimates only when unauthorized status was jointly observed with insurance coverage; when this condition was not met, these methods overestimated insurance coverage for unauthorized relative to legal immigrants. We next showed how bias can be reduced by incorporating prior information about unauthorized immigrants. Finally, we demonstrated the utility of the best-performing statistical method for increasing power. We used it to produce state/regional estimates of insurance coverage among unauthorized immigrants in the Current Population Survey, a data source that contains no direct measures of immigrants’ legal status. We conclude that commonly employed legal status imputation approaches are likely to produce biased estimates, but data and statistical methods exist that could substantially reduce these biases. PMID:25511332

  17. MesoNAM Verification Phase II

    NASA Technical Reports Server (NTRS)

    Watson, Leela R.

    2011-01-01

    The 45th Weather Squadron Launch Weather Officers use the 12-km resolution North American Mesoscale model (MesoNAM) forecasts to support launch weather operations. In Phase I, the performance of the model at KSC/CCAFS was measured objectively by conducting a detailed statistical analysis of model output compared to observed values. The objective analysis compared the MesoNAM forecast winds, temperature, and dew point to the observed values from the sensors in the KSC/CCAFS wind tower network. In Phase II, the AMU modified the current tool by adding an additional 15 months of model output to the database and recalculating the verification statistics. The bias, standard deviation of bias, Root Mean Square Error, and Hypothesis test for bias were calculated to verify the performance of the model. The results indicated that the accuracy decreased as the forecast progressed, there was a diurnal signal in temperature with a cool bias during the late night and a warm bias during the afternoon, and there was a diurnal signal in dewpoint temperature with a low bias during the afternoon and a high bias during the late night.

  18. Acoustic neuroma: potential risk factors and audiometric surveillance in the aluminium industry.

    PubMed

    Taiwo, Oyebode; Galusha, Deron; Tessier-Sherman, Baylah; Kirsche, Sharon; Cantley, Linda; Slade, Martin D; Cullen, Mark R; Donoghue, A Michael

    2014-09-01

    To look for an association between acoustic neuroma (AN) and participation in a hearing conservation programme (HCP) and also for an association between AN and possible occupational risk factors in the aluminium industry. We conducted a case-control analysis of a population of US aluminium production workers in 8 smelters and 43 other plants. Using insurance claims data, 97 cases of AN were identified between 1996 and 2009. Each was matched with four controls. Covariates included participation in a HCP, working in an aluminium smelter, working in an electrical job and hearing loss. In the bivariate analyses, covariates associated with AN were participation in the HCP (OR=1.72; 95% CI 1.09 to 2.69) and smelter work (OR=1.88; 95% CI 1.06 to 3.36). Electrical work was not significant (OR=1.60; 95% CI 0.65 to 3.94). Owing to high participation in the HCP in smelters, multivariate subanalyses were required. In the multivariate analyses, participation in the HCP was the only statistically significant risk factor for AN. In the multivariate analysis restricted to employees not working in a smelter, the OR was 1.81 (95% CI 1.04 to 3.17). Hearing loss, an indirect measure of in-ear noise dose, was not predictive of AN. Our results suggest the incidental detection of previously undiagnosed tumours in workers who participated in the company-sponsored HCP. The increased medical surveillance among this population of workers most likely introduced detection bias, leading to the identification of AN cases that would have otherwise remained undetected. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  19. Improved Correction of Misclassification Bias With Bootstrap Imputation.

    PubMed

    van Walraven, Carl

    2018-07-01

    Diagnostic codes used in administrative database research can create bias due to misclassification. Quantitative bias analysis (QBA) can correct for this bias, requires only code sensitivity and specificity, but may return invalid results. Bootstrap imputation (BI) can also address misclassification bias but traditionally requires multivariate models to accurately estimate disease probability. This study compared misclassification bias correction using QBA and BI. Serum creatinine measures were used to determine severe renal failure status in 100,000 hospitalized patients. Prevalence of severe renal failure in 86 patient strata and its association with 43 covariates was determined and compared with results in which renal failure status was determined using diagnostic codes (sensitivity 71.3%, specificity 96.2%). Differences in results (misclassification bias) were then corrected with QBA or BI (using progressively more complex methods to estimate disease probability). In total, 7.4% of patients had severe renal failure. Imputing disease status with diagnostic codes exaggerated prevalence estimates [median relative change (range), 16.6% (0.8%-74.5%)] and its association with covariates [median (range) exponentiated absolute parameter estimate difference, 1.16 (1.01-2.04)]. QBA produced invalid results 9.3% of the time and increased bias in estimates of both disease prevalence and covariate associations. BI decreased misclassification bias with increasingly accurate disease probability estimates. QBA can produce invalid results and increase misclassification bias. BI avoids invalid results and can importantly decrease misclassification bias when accurate disease probability estimates are used.

  20. Deconstructing multivariate decoding for the study of brain function.

    PubMed

    Hebart, Martin N; Baker, Chris I

    2017-08-04

    Multivariate decoding methods were developed originally as tools to enable accurate predictions in real-world applications. The realization that these methods can also be employed to study brain function has led to their widespread adoption in the neurosciences. However, prior to the rise of multivariate decoding, the study of brain function was firmly embedded in a statistical philosophy grounded on univariate methods of data analysis. In this way, multivariate decoding for brain interpretation grew out of two established frameworks: multivariate decoding for predictions in real-world applications, and classical univariate analysis based on the study and interpretation of brain activation. We argue that this led to two confusions, one reflecting a mixture of multivariate decoding for prediction or interpretation, and the other a mixture of the conceptual and statistical philosophies underlying multivariate decoding and classical univariate analysis. Here we attempt to systematically disambiguate multivariate decoding for the study of brain function from the frameworks it grew out of. After elaborating these confusions and their consequences, we describe six, often unappreciated, differences between classical univariate analysis and multivariate decoding. We then focus on how the common interpretation of what is signal and noise changes in multivariate decoding. Finally, we use four examples to illustrate where these confusions may impact the interpretation of neuroimaging data. We conclude with a discussion of potential strategies to help resolve these confusions in interpreting multivariate decoding results, including the potential departure from multivariate decoding methods for the study of brain function. Copyright © 2017. Published by Elsevier Inc.

  1. NSAIDs and spontaneous abortions – true effect or an indication bias?

    PubMed Central

    Daniel, Sharon; Koren, Gideon; Lunenfeld, Eitan; Levy, Amalia

    2015-01-01

    Aim The aim of the study was to characterize the extent of indication bias resulting from the excessive use of NSAIDs on the days preceding a spontaneous abortion to relieve pain. Methods We used data from a retrospective cohort study assessing the risk for spontaneous abortions following exposure to NSAIDs. Three definitions of exposure for cases of spontaneous abortions were compared, from the first day of pregnancy until the day of spontaneous abortion and until 3 and 2 days before a spontaneous abortion. Statistical analysis was performed using multivariate time programmed Cox regression. Results A sharp increase was observed in the dispensation of indomethacin, diclofenac and naproxen, and a milder increase was found in the use of ibuprofen during the week before a spontaneous abortion. Non- selective COX inhibitors in general and specifically diclofenac and indomethacin were found to be associated with spontaneous abortions when the exposure period was defined until the day of spontaneous abortion (hazard ratio (HR) 1.15, 95% confidence interval (CI) 1.04, 1.28; HR 1.31, 95% CI 1.08, 1.59 and HR 3.33, 95% CI 2.09, 5.29, respectively). The effect disappears by excluding exposures occurring on the day before the spontaneous abortion for non-selective COX inhibitors and on the last week before the spontaneous abortion for indomethacin. In general, decreasing HRs were found with the exclusion of exposures occurring on the days immediately before the spontaneous abortion. Conclusions The increased use of NSAIDs during the last few days that preceded a spontaneous abortion to relieve pain associated with the miscarriage could bias studies assessing the association between exposure to NSAIDs and spontaneous abortions. PMID:25858169

  2. A Quantile Mapping Bias Correction Method Based on Hydroclimatic Classification of the Guiana Shield

    PubMed Central

    Ringard, Justine; Seyler, Frederique; Linguet, Laurent

    2017-01-01

    Satellite precipitation products (SPPs) provide alternative precipitation data for regions with sparse rain gauge measurements. However, SPPs are subject to different types of error that need correction. Most SPP bias correction methods use the statistical properties of the rain gauge data to adjust the corresponding SPP data. The statistical adjustment does not make it possible to correct the pixels of SPP data for which there is no rain gauge data. The solution proposed in this article is to correct the daily SPP data for the Guiana Shield using a novel two set approach, without taking into account the daily gauge data of the pixel to be corrected, but the daily gauge data from surrounding pixels. In this case, a spatial analysis must be involved. The first step defines hydroclimatic areas using a spatial classification that considers precipitation data with the same temporal distributions. The second step uses the Quantile Mapping bias correction method to correct the daily SPP data contained within each hydroclimatic area. We validate the results by comparing the corrected SPP data and daily rain gauge measurements using relative RMSE and relative bias statistical errors. The results show that analysis scale variation reduces rBIAS and rRMSE significantly. The spatial classification avoids mixing rainfall data with different temporal characteristics in each hydroclimatic area, and the defined bias correction parameters are more realistic and appropriate. This study demonstrates that hydroclimatic classification is relevant for implementing bias correction methods at the local scale. PMID:28621723

  3. A Quantile Mapping Bias Correction Method Based on Hydroclimatic Classification of the Guiana Shield.

    PubMed

    Ringard, Justine; Seyler, Frederique; Linguet, Laurent

    2017-06-16

    Satellite precipitation products (SPPs) provide alternative precipitation data for regions with sparse rain gauge measurements. However, SPPs are subject to different types of error that need correction. Most SPP bias correction methods use the statistical properties of the rain gauge data to adjust the corresponding SPP data. The statistical adjustment does not make it possible to correct the pixels of SPP data for which there is no rain gauge data. The solution proposed in this article is to correct the daily SPP data for the Guiana Shield using a novel two set approach, without taking into account the daily gauge data of the pixel to be corrected, but the daily gauge data from surrounding pixels. In this case, a spatial analysis must be involved. The first step defines hydroclimatic areas using a spatial classification that considers precipitation data with the same temporal distributions. The second step uses the Quantile Mapping bias correction method to correct the daily SPP data contained within each hydroclimatic area. We validate the results by comparing the corrected SPP data and daily rain gauge measurements using relative RMSE and relative bias statistical errors. The results show that analysis scale variation reduces rBIAS and rRMSE significantly. The spatial classification avoids mixing rainfall data with different temporal characteristics in each hydroclimatic area, and the defined bias correction parameters are more realistic and appropriate. This study demonstrates that hydroclimatic classification is relevant for implementing bias correction methods at the local scale.

  4. OSSOS: X. How to use a Survey Simulator: Statistical Testing of Dynamical Models Against the Real Kuiper Belt

    NASA Astrophysics Data System (ADS)

    Lawler, Samantha M.; Kavelaars, J. J.; Alexandersen, Mike; Bannister, Michele T.; Gladman, Brett; Petit, Jean-Marc; Shankman, Cory

    2018-05-01

    All surveys include observational biases, which makes it impossible to directly compare properties of discovered trans-Neptunian Objects (TNOs) with dynamical models. However, by carefully keeping track of survey pointings on the sky, detection limits, tracking fractions, and rate cuts, the biases from a survey can be modelled in Survey Simulator software. A Survey Simulator takes an intrinsic orbital model (from, for example, the output of a dynamical Kuiper belt emplacement simulation) and applies the survey biases, so that the biased simulated objects can be directly compared with real discoveries. This methodology has been used with great success in the Outer Solar System Origins Survey (OSSOS) and its predecessor surveys. In this chapter, we give four examples of ways to use the OSSOS Survey Simulator to gain knowledge about the true structure of the Kuiper Belt. We demonstrate how to statistically compare different dynamical model outputs with real TNO discoveries, how to quantify detection biases within a TNO population, how to measure intrinsic population sizes, and how to use upper limits from non-detections. We hope this will provide a framework for dynamical modellers to statistically test the validity of their models.

  5. Orientation decoding: Sense in spirals?

    PubMed

    Clifford, Colin W G; Mannion, Damien J

    2015-04-15

    The orientation of a visual stimulus can be successfully decoded from the multivariate pattern of fMRI activity in human visual cortex. Whether this capacity requires coarse-scale orientation biases is controversial. We and others have advocated the use of spiral stimuli to eliminate a potential coarse-scale bias-the radial bias toward local orientations that are collinear with the centre of gaze-and hence narrow down the potential coarse-scale biases that could contribute to orientation decoding. The usefulness of this strategy is challenged by the computational simulations of Carlson (2014), who reported the ability to successfully decode spirals of opposite sense (opening clockwise or counter-clockwise) from the pooled output of purportedly unbiased orientation filters. Here, we elaborate the mathematical relationship between spirals of opposite sense to confirm that they cannot be discriminated on the basis of the pooled output of unbiased or radially biased orientation filters. We then demonstrate that Carlson's (2014) reported decoding ability is consistent with the presence of inadvertent biases in the set of orientation filters; biases introduced by their digital implementation and unrelated to the brain's processing of orientation. These analyses demonstrate that spirals must be processed with an orientation bias other than the radial bias for successful decoding of spiral sense. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. Interior renovation of a general practitioner office leads to a perceptual bias on patient experience for over one year.

    PubMed

    Gauthey, Jérôme; Tièche, Raphaël; Streit, Sven

    2018-01-01

    Measuring patient experience is key when assessing quality of care but can be biased: A perceptual bias occurs when renovations of the interior design of a general practitioner (GP) office improves how patients assessed quality of care. The aim was to assess the length of perceptual bias and if it could be reproduced after a second renovation. A GP office with 2 GPs in Switzerland was renovated twice within 3 years. We assessed patient experience at baseline, 2 months and 14 months after the first and 3 months after the second renovation. Each time, we invited a sample of 180 consecutive patients that anonymously graded patient experience in 4 domains: appearance of the office; qualities of medical assistants and GPs; and general satisfaction. We compared crude mean scores per domain from baseline until follow-up. In a multivariate model, we adjusted for patient's age, gender and for how long patients had been their GP. At baseline, patients aged 60.9 (17.7) years, 52% females. After the first renovation, we found a regression to the baseline level of patient experience after 14 months except for appearance of the office (p<0.001). After the second renovation, patient experience improved again in appearance of the office (p = 0.008), qualities of the GP (p = 0.008), and general satisfaction (p = 0.014). Qualities of the medical assistant showed a slight improvement (p = 0.068). Results were unchanged in the multivariate model. Interior renovation of a GP office probably causes a perceptual bias for >1 year that improves how patients rate quality of care. This bias could be reproduced after a second renovation strengthening a possible causal relationship. These findings imply to appropriately time measurement of patient experience to at least one year after interior renovation of GP practices to avoid environmental changes influences the estimates when measuring patient experience.

  7. Interior renovation of a general practitioner office leads to a perceptual bias on patient experience for over one year

    PubMed Central

    2018-01-01

    Introduction Measuring patient experience is key when assessing quality of care but can be biased: A perceptual bias occurs when renovations of the interior design of a general practitioner (GP) office improves how patients assessed quality of care. The aim was to assess the length of perceptual bias and if it could be reproduced after a second renovation. Methods A GP office with 2 GPs in Switzerland was renovated twice within 3 years. We assessed patient experience at baseline, 2 months and 14 months after the first and 3 months after the second renovation. Each time, we invited a sample of 180 consecutive patients that anonymously graded patient experience in 4 domains: appearance of the office; qualities of medical assistants and GPs; and general satisfaction. We compared crude mean scores per domain from baseline until follow-up. In a multivariate model, we adjusted for patient’s age, gender and for how long patients had been their GP. Results At baseline, patients aged 60.9 (17.7) years, 52% females. After the first renovation, we found a regression to the baseline level of patient experience after 14 months except for appearance of the office (p<0.001). After the second renovation, patient experience improved again in appearance of the office (p = 0.008), qualities of the GP (p = 0.008), and general satisfaction (p = 0.014). Qualities of the medical assistant showed a slight improvement (p = 0.068). Results were unchanged in the multivariate model. Conclusions Interior renovation of a GP office probably causes a perceptual bias for >1 year that improves how patients rate quality of care. This bias could be reproduced after a second renovation strengthening a possible causal relationship. These findings imply to appropriately time measurement of patient experience to at least one year after interior renovation of GP practices to avoid environmental changes influences the estimates when measuring patient experience. PMID:29462196

  8. Multivariate meta-analysis: a robust approach based on the theory of U-statistic.

    PubMed

    Ma, Yan; Mazumdar, Madhu

    2011-10-30

    Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.

  9. Time Series Model Identification by Estimating Information.

    DTIC Science & Technology

    1982-11-01

    principle, Applications of Statistics, P. R. Krishnaiah , ed., North-Holland: Amsterdam, 27-41. Anderson, T. W. (1971). The Statistical Analysis of Time Series...E. (1969). Multiple Time Series Modeling, Multivariate Analysis II, edited by P. Krishnaiah , Academic Press: New York, 389-409. Parzen, E. (1981...Newton, H. J. (1980). Multiple Time Series Modeling, II Multivariate Analysis - V, edited by P. Krishnaiah , North Holland: Amsterdam, 181-197. Shibata, R

  10. A generalized K statistic for estimating phylogenetic signal from shape and other high-dimensional multivariate data.

    PubMed

    Adams, Dean C

    2014-09-01

    Phylogenetic signal is the tendency for closely related species to display similar trait values due to their common ancestry. Several methods have been developed for quantifying phylogenetic signal in univariate traits and for sets of traits treated simultaneously, and the statistical properties of these approaches have been extensively studied. However, methods for assessing phylogenetic signal in high-dimensional multivariate traits like shape are less well developed, and their statistical performance is not well characterized. In this article, I describe a generalization of the K statistic of Blomberg et al. that is useful for quantifying and evaluating phylogenetic signal in highly dimensional multivariate data. The method (K(mult)) is found from the equivalency between statistical methods based on covariance matrices and those based on distance matrices. Using computer simulations based on Brownian motion, I demonstrate that the expected value of K(mult) remains at 1.0 as trait variation among species is increased or decreased, and as the number of trait dimensions is increased. By contrast, estimates of phylogenetic signal found with a squared-change parsimony procedure for multivariate data change with increasing trait variation among species and with increasing numbers of trait dimensions, confounding biological interpretations. I also evaluate the statistical performance of hypothesis testing procedures based on K(mult) and find that the method displays appropriate Type I error and high statistical power for detecting phylogenetic signal in high-dimensional data. Statistical properties of K(mult) were consistent for simulations using bifurcating and random phylogenies, for simulations using different numbers of species, for simulations that varied the number of trait dimensions, and for different underlying models of trait covariance structure. Overall these findings demonstrate that K(mult) provides a useful means of evaluating phylogenetic signal in high-dimensional multivariate traits. Finally, I illustrate the utility of the new approach by evaluating the strength of phylogenetic signal for head shape in a lineage of Plethodon salamanders. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  11. Optical biopsy using fluorescence spectroscopy for prostate cancer diagnosis

    NASA Astrophysics Data System (ADS)

    Wu, Binlin; Gao, Xin; Smith, Jason; Bailin, Jacob

    2017-02-01

    Native fluorescence spectra are acquired from fresh normal and cancerous human prostate tissues. The fluorescence data are analyzed using a multivariate analysis algorithm such as non-negative matrix factorization. The nonnegative spectral components are retrieved and attributed to the native fluorophores such as collagen, reduced nicotinamide adenine dinucleotide (NADH), and flavin adenine dinucleotide (FAD) in tissue. The retrieved weights of the components, e.g. NADH and FAD are used to estimate the relative concentrations of the native fluorophores and the redox ratio. A machine learning algorithm such as support vector machine (SVM) is used for classification to distinguish normal and cancerous tissue samples based on either the relative concentrations of NADH and FAD or the redox ratio alone. The classification performance is shown based on statistical measures such as sensitivity, specificity, and accuracy, along with the area under receiver operating characteristic (ROC) curve. A cross validation method such as leave-one-out is used to evaluate the predictive performance of the SVM classifier to avoid bias due to overfitting.

  12. Synonymous codon choices in the extremely GC-poor genome of Plasmodium falciparum: compositional constraints and translational selection.

    PubMed

    Musto, H; Romero, H; Zavala, A; Jabbari, K; Bernardi, G

    1999-07-01

    We have analyzed the patterns of synonymous codon preferences of the nuclear genes of Plasmodium falciparum, a unicellular parasite characterized by an extremely GC-poor genome. When all genes are considered, codon usage is strongly biased toward A and T in third codon positions, as expected, but multivariate statistical analysis detects a major trend among genes. At one end genes display codon choices determined mainly by the extreme genome composition of this parasite, and very probably their expression level is low. At the other end a few genes exhibit an increased relative usage of a particular subset of codons, many of which are C-ending. Since the majority of these few genes is putatively highly expressed, we postulate that the increased C-ending codons are translationally optimal. In conclusion, while codon usage of the majority of P. falciparum genes is determined mainly by compositional constraints, a small number of genes exhibit translational selection.

  13. Normalization, bias correction, and peak calling for ChIP-seq

    PubMed Central

    Diaz, Aaron; Park, Kiyoub; Lim, Daniel A.; Song, Jun S.

    2012-01-01

    Next-generation sequencing is rapidly transforming our ability to profile the transcriptional, genetic, and epigenetic states of a cell. In particular, sequencing DNA from the immunoprecipitation of protein-DNA complexes (ChIP-seq) and methylated DNA (MeDIP-seq) can reveal the locations of protein binding sites and epigenetic modifications. These approaches contain numerous biases which may significantly influence the interpretation of the resulting data. Rigorous computational methods for detecting and removing such biases are still lacking. Also, multi-sample normalization still remains an important open problem. This theoretical paper systematically characterizes the biases and properties of ChIP-seq data by comparing 62 separate publicly available datasets, using rigorous statistical models and signal processing techniques. Statistical methods for separating ChIP-seq signal from background noise, as well as correcting enrichment test statistics for sequence-dependent and sonication biases, are presented. Our method effectively separates reads into signal and background components prior to normalization, improving the signal-to-noise ratio. Moreover, most peak callers currently use a generic null model which suffers from low specificity at the sensitivity level requisite for detecting subtle, but true, ChIP enrichment. The proposed method of determining a cell type-specific null model, which accounts for cell type-specific biases, is shown to be capable of achieving a lower false discovery rate at a given significance threshold than current methods. PMID:22499706

  14. Language learning, language use and the evolution of linguistic variation

    PubMed Central

    Perfors, Amy; Fehér, Olga; Samara, Anna; Swoboda, Kate; Wonnacott, Elizabeth

    2017-01-01

    Linguistic universals arise from the interaction between the processes of language learning and language use. A test case for the relationship between these factors is linguistic variation, which tends to be conditioned on linguistic or sociolinguistic criteria. How can we explain the scarcity of unpredictable variation in natural language, and to what extent is this property of language a straightforward reflection of biases in statistical learning? We review three strands of experimental work exploring these questions, and introduce a Bayesian model of the learning and transmission of linguistic variation along with a closely matched artificial language learning experiment with adult participants. Our results show that while the biases of language learners can potentially play a role in shaping linguistic systems, the relationship between biases of learners and the structure of languages is not straightforward. Weak biases can have strong effects on language structure as they accumulate over repeated transmission. But the opposite can also be true: strong biases can have weak or no effects. Furthermore, the use of language during interaction can reshape linguistic systems. Combining data and insights from studies of learning, transmission and use is therefore essential if we are to understand how biases in statistical learning interact with language transmission and language use to shape the structural properties of language. This article is part of the themed issue ‘New frontiers for statistical learning in the cognitive sciences’. PMID:27872370

  15. Language learning, language use and the evolution of linguistic variation.

    PubMed

    Smith, Kenny; Perfors, Amy; Fehér, Olga; Samara, Anna; Swoboda, Kate; Wonnacott, Elizabeth

    2017-01-05

    Linguistic universals arise from the interaction between the processes of language learning and language use. A test case for the relationship between these factors is linguistic variation, which tends to be conditioned on linguistic or sociolinguistic criteria. How can we explain the scarcity of unpredictable variation in natural language, and to what extent is this property of language a straightforward reflection of biases in statistical learning? We review three strands of experimental work exploring these questions, and introduce a Bayesian model of the learning and transmission of linguistic variation along with a closely matched artificial language learning experiment with adult participants. Our results show that while the biases of language learners can potentially play a role in shaping linguistic systems, the relationship between biases of learners and the structure of languages is not straightforward. Weak biases can have strong effects on language structure as they accumulate over repeated transmission. But the opposite can also be true: strong biases can have weak or no effects. Furthermore, the use of language during interaction can reshape linguistic systems. Combining data and insights from studies of learning, transmission and use is therefore essential if we are to understand how biases in statistical learning interact with language transmission and language use to shape the structural properties of language.This article is part of the themed issue 'New frontiers for statistical learning in the cognitive sciences'. © 2016 The Authors.

  16. The effect of berberine on insulin resistance in women with polycystic ovary syndrome: detailed statistical analysis plan (SAP) for a multicenter randomized controlled trial.

    PubMed

    Zhang, Ying; Sun, Jin; Zhang, Yun-Jiao; Chai, Qian-Yun; Zhang, Kang; Ma, Hong-Li; Wu, Xiao-Ke; Liu, Jian-Ping

    2016-10-21

    Although Traditional Chinese Medicine (TCM) has been widely used in clinical settings, a major challenge that remains in TCM is to evaluate its efficacy scientifically. This randomized controlled trial aims to evaluate the efficacy and safety of berberine in the treatment of patients with polycystic ovary syndrome. In order to improve the transparency and research quality of this clinical trial, we prepared this statistical analysis plan (SAP). The trial design, primary and secondary outcomes, and safety outcomes were declared to reduce selection biases in data analysis and result reporting. We specified detailed methods for data management and statistical analyses. Statistics in corresponding tables, listings, and graphs were outlined. The SAP provided more detailed information than trial protocol on data management and statistical analysis methods. Any post hoc analyses could be identified via referring to this SAP, and the possible selection bias and performance bias will be reduced in the trial. This study is registered at ClinicalTrials.gov, NCT01138930 , registered on 7 June 2010.

  17. Exploring and accounting for publication bias in mental health: a brief overview of methods.

    PubMed

    Mavridis, Dimitris; Salanti, Georgia

    2014-02-01

    OBJECTIVE Publication bias undermines the integrity of published research. The aim of this paper is to present a synopsis of methods for exploring and accounting for publication bias. METHODS We discussed the main features of the following methods to assess publication bias: funnel plot analysis; trim-and-fill methods; regression techniques and selection models. We applied these methods to a well-known example of antidepressants trials that compared trials submitted to the Food and Drug Administration (FDA) for regulatory approval. RESULTS The funnel plot-related methods (visual inspection, trim-and-fill, regression models) revealed an association between effect size and SE. Contours of statistical significance showed that asymmetry in the funnel plot is probably due to publication bias. Selection model found a significant correlation between effect size and propensity for publication. CONCLUSIONS Researchers should always consider the possible impact of publication bias. Funnel plot-related methods should be seen as a means of examining for small-study effects and not be directly equated with publication bias. Possible causes for funnel plot asymmetry should be explored. Contours of statistical significance may help disentangle whether asymmetry in a funnel plot is caused by publication bias or not. Selection models, although underused, could be useful resource when publication bias and heterogeneity are suspected because they address directly the problem of publication bias and not that of small-study effects.

  18. The halo Boltzmann equation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Biagetti, Matteo; Desjacques, Vincent; Kehagias, Alex

    2016-04-01

    Dark matter halos are the building blocks of the universe as they host galaxies and clusters. The knowledge of the clustering properties of halos is therefore essential for the understanding of the galaxy statistical properties. We derive an effective halo Boltzmann equation which can be used to describe the halo clustering statistics. In particular, we show how the halo Boltzmann equation encodes a statistically biased gravitational force which generates a bias in the peculiar velocities of virialized halos with respect to the underlying dark matter, as recently observed in N-body simulations.

  19. A Study of Effects of MultiCollinearity in the Multivariable Analysis

    PubMed Central

    Yoo, Wonsuk; Mayberry, Robert; Bae, Sejong; Singh, Karan; (Peter) He, Qinghua; Lillard, James W.

    2015-01-01

    A multivariable analysis is the most popular approach when investigating associations between risk factors and disease. However, efficiency of multivariable analysis highly depends on correlation structure among predictive variables. When the covariates in the model are not independent one another, collinearity/multicollinearity problems arise in the analysis, which leads to biased estimation. This work aims to perform a simulation study with various scenarios of different collinearity structures to investigate the effects of collinearity under various correlation structures amongst predictive and explanatory variables and to compare these results with existing guidelines to decide harmful collinearity. Three correlation scenarios among predictor variables are considered: (1) bivariate collinear structure as the most simple collinearity case, (2) multivariate collinear structure where an explanatory variable is correlated with two other covariates, (3) a more realistic scenario when an independent variable can be expressed by various functions including the other variables. PMID:25664257

  20. A Study of Effects of MultiCollinearity in the Multivariable Analysis.

    PubMed

    Yoo, Wonsuk; Mayberry, Robert; Bae, Sejong; Singh, Karan; Peter He, Qinghua; Lillard, James W

    2014-10-01

    A multivariable analysis is the most popular approach when investigating associations between risk factors and disease. However, efficiency of multivariable analysis highly depends on correlation structure among predictive variables. When the covariates in the model are not independent one another, collinearity/multicollinearity problems arise in the analysis, which leads to biased estimation. This work aims to perform a simulation study with various scenarios of different collinearity structures to investigate the effects of collinearity under various correlation structures amongst predictive and explanatory variables and to compare these results with existing guidelines to decide harmful collinearity. Three correlation scenarios among predictor variables are considered: (1) bivariate collinear structure as the most simple collinearity case, (2) multivariate collinear structure where an explanatory variable is correlated with two other covariates, (3) a more realistic scenario when an independent variable can be expressed by various functions including the other variables.

  1. Mixed Model Association with Family-Biased Case-Control Ascertainment.

    PubMed

    Hayeck, Tristan J; Loh, Po-Ru; Pollack, Samuela; Gusev, Alexander; Patterson, Nick; Zaitlen, Noah A; Price, Alkes L

    2017-01-05

    Mixed models have become the tool of choice for genetic association studies; however, standard mixed model methods may be poorly calibrated or underpowered under family sampling bias and/or case-control ascertainment. Previously, we introduced a liability threshold-based mixed model association statistic (LTMLM) to address case-control ascertainment in unrelated samples. Here, we consider family-biased case-control ascertainment, where case and control subjects are ascertained non-randomly with respect to family relatedness. Previous work has shown that this type of ascertainment can severely bias heritability estimates; we show here that it also impacts mixed model association statistics. We introduce a family-based association statistic (LT-Fam) that is robust to this problem. Similar to LTMLM, LT-Fam is computed from posterior mean liabilities (PML) under a liability threshold model; however, LT-Fam uses published narrow-sense heritability estimates to avoid the problem of biased heritability estimation, enabling correct calibration. In simulations with family-biased case-control ascertainment, LT-Fam was correctly calibrated (average χ 2 = 1.00-1.02 for null SNPs), whereas the Armitage trend test (ATT), standard mixed model association (MLM), and case-control retrospective association test (CARAT) were mis-calibrated (e.g., average χ 2 = 0.50-1.22 for MLM, 0.89-2.65 for CARAT). LT-Fam also attained higher power than other methods in some settings. In 1,259 type 2 diabetes-affected case subjects and 5,765 control subjects from the CARe cohort, downsampled to induce family-biased ascertainment, LT-Fam was correctly calibrated whereas ATT, MLM, and CARAT were again mis-calibrated. Our results highlight the importance of modeling family sampling bias in case-control datasets with related samples. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  2. Alternative Regression Equations for Estimation of Annual Peak-Streamflow Frequency for Undeveloped Watersheds in Texas using PRESS Minimization

    USGS Publications Warehouse

    Asquith, William H.; Thompson, David B.

    2008-01-01

    The U.S. Geological Survey, in cooperation with the Texas Department of Transportation and in partnership with Texas Tech University, investigated a refinement of the regional regression method and developed alternative equations for estimation of peak-streamflow frequency for undeveloped watersheds in Texas. A common model for estimation of peak-streamflow frequency is based on the regional regression method. The current (2008) regional regression equations for 11 regions of Texas are based on log10 transformations of all regression variables (drainage area, main-channel slope, and watershed shape). Exclusive use of log10-transformation does not fully linearize the relations between the variables. As a result, some systematic bias remains in the current equations. The bias results in overestimation of peak streamflow for both the smallest and largest watersheds. The bias increases with increasing recurrence interval. The primary source of the bias is the discernible curvilinear relation in log10 space between peak streamflow and drainage area. Bias is demonstrated by selected residual plots with superimposed LOWESS trend lines. To address the bias, a statistical framework based on minimization of the PRESS statistic through power transformation of drainage area is described and implemented, and the resulting regression equations are reported. Compared to log10-exclusive equations, the equations derived from PRESS minimization have PRESS statistics and residual standard errors less than the log10 exclusive equations. Selected residual plots for the PRESS-minimized equations are presented to demonstrate that systematic bias in regional regression equations for peak-streamflow frequency estimation in Texas can be reduced. Because the overall error is similar to the error associated with previous equations and because the bias is reduced, the PRESS-minimized equations reported here provide alternative equations for peak-streamflow frequency estimation.

  3. Optimism bias leads to inconclusive results - an empirical study

    PubMed Central

    Djulbegovic, Benjamin; Kumar, Ambuj; Magazin, Anja; Schroen, Anneke T.; Soares, Heloisa; Hozo, Iztok; Clarke, Mike; Sargent, Daniel; Schell, Michael J.

    2010-01-01

    Objective Optimism bias refers to unwarranted belief in the efficacy of new therapies. We assessed the impact of optimism bias on a proportion of trials that did not answer their research question successfully, and explored whether poor accrual or optimism bias is responsible for inconclusive results. Study Design Systematic review Setting Retrospective analysis of a consecutive series phase III randomized controlled trials (RCTs) performed under the aegis of National Cancer Institute Cooperative groups. Results 359 trials (374 comparisons) enrolling 150,232 patients were analyzed. 70% (262/374) of the trials generated conclusive results according to the statistical criteria. Investigators made definitive statements related to the treatment preference in 73% (273/374) of studies. Investigators’ judgments and statistical inferences were concordant in 75% (279/374) of trials. Investigators consistently overestimated their expected treatment effects, but to a significantly larger extent for inconclusive trials. The median ratio of expected over observed hazard ratio or odds ratio was 1.34 (range 0.19 – 15.40) in conclusive trials compared to 1.86 (range 1.09 – 12.00) in inconclusive studies (p<0.0001). Only 17% of the trials had treatment effects that matched original researchers’ expectations. Conclusion Formal statistical inference is sufficient to answer the research question in 75% of RCTs. The answers to the other 25% depend mostly on subjective judgments, which at times are in conflict with statistical inference. Optimism bias significantly contributes to inconclusive results. PMID:21163620

  4. Optimism bias leads to inconclusive results-an empirical study.

    PubMed

    Djulbegovic, Benjamin; Kumar, Ambuj; Magazin, Anja; Schroen, Anneke T; Soares, Heloisa; Hozo, Iztok; Clarke, Mike; Sargent, Daniel; Schell, Michael J

    2011-06-01

    Optimism bias refers to unwarranted belief in the efficacy of new therapies. We assessed the impact of optimism bias on a proportion of trials that did not answer their research question successfully and explored whether poor accrual or optimism bias is responsible for inconclusive results. Systematic review. Retrospective analysis of a consecutive-series phase III randomized controlled trials (RCTs) performed under the aegis of National Cancer Institute Cooperative groups. Three hundred fifty-nine trials (374 comparisons) enrolling 150,232 patients were analyzed. Seventy percent (262 of 374) of the trials generated conclusive results according to the statistical criteria. Investigators made definitive statements related to the treatment preference in 73% (273 of 374) of studies. Investigators' judgments and statistical inferences were concordant in 75% (279 of 374) of trials. Investigators consistently overestimated their expected treatment effects but to a significantly larger extent for inconclusive trials. The median ratio of expected and observed hazard ratio or odds ratio was 1.34 (range: 0.19-15.40) in conclusive trials compared with 1.86 (range: 1.09-12.00) in inconclusive studies (P<0.0001). Only 17% of the trials had treatment effects that matched original researchers' expectations. Formal statistical inference is sufficient to answer the research question in 75% of RCTs. The answers to the other 25% depend mostly on subjective judgments, which at times are in conflict with statistical inference. Optimism bias significantly contributes to inconclusive results. Copyright © 2011 Elsevier Inc. All rights reserved.

  5. Unconscious race and social class bias among acute care surgical clinicians and clinical treatment decisions.

    PubMed

    Haider, Adil H; Schneider, Eric B; Sriram, N; Dossick, Deborah S; Scott, Valerie K; Swoboda, Sandra M; Losonczy, Lia; Haut, Elliott R; Efron, David T; Pronovost, Peter J; Lipsett, Pamela A; Cornwell, Edward E; MacKenzie, Ellen J; Cooper, Lisa A; Freischlag, Julie A

    2015-05-01

    Significant health inequities persist among minority and socially disadvantaged patients. Better understanding of how unconscious biases affect clinical decision making may help to illuminate clinicians' roles in propagating disparities. To determine whether clinicians' unconscious race and/or social class biases correlate with patient management decisions. We conducted a web-based survey among 230 physicians from surgery and related specialties at an academic, level I trauma center from December 1, 2011, through January 31, 2012. We administered clinical vignettes, each with 3 management questions. Eight vignettes assessed the relationship between unconscious bias and clinical decision making. We performed ordered logistic regression analysis on the Implicit Association Test (IAT) scores and used multivariable analysis to determine whether implicit bias was associated with the vignette responses. Differential response times (D scores) on the IAT as a surrogate for unconscious bias. Patient management vignettes varied by patient race or social class. Resulting D scores were calculated for each management decision. In total, 215 clinicians were included and consisted of 74 attending surgeons, 32 fellows, 86 residents, 19 interns, and 4 physicians with an undetermined level of education. Specialties included surgery (32.1%), anesthesia (18.1%), emergency medicine (18.1%), orthopedics (7.9%), otolaryngology (7.0%), neurosurgery (7.0%), critical care (6.0%), and urology (2.8%); 1.9% did not report a departmental affiliation. Implicit race and social class biases were present in most respondents. Among all clinicians, mean IAT D scores for race and social class were 0.42 (95% CI, 0.37-0.48) and 0.71 (95% CI, 0.65-0.78), respectively. Race and class scores were similar across departments (general surgery, orthopedics, urology, etc), race, or age. Women demonstrated less bias concerning race (mean IAT D score, 0.39 [95% CI, 0.29-0.49]) and social class (mean IAT D score, 0.66 [95% CI, 0.57-0.75]) relative to men (mean IAT D scores, 0.44 [95% CI, 0.37-0.52] and 0.82 [95% CI, 0.75-0.89], respectively). In univariate analyses, we found an association between race/social class bias and 3 of 27 possible patient-care decisions. Multivariable analyses revealed no association between the IAT D scores and vignette-based clinical assessments. Unconscious social class and race biases were not significantly associated with clinical decision making among acute care surgical clinicians. Further studies involving real physician-patient interactions may be warranted.

  6. A Statistical Discrimination Experiment for Eurasian Events Using a Twenty-Seven-Station Network

    DTIC Science & Technology

    1980-07-08

    to test the effectiveness of a multivariate method of analysis for distinguishing earthquakes from explosions. The data base for the experiment...to test the effectiveness of a multivariate method of analysis for distinguishing earthquakes from explosions. The data base for the experiment...the weight assigned to each variable whenever a new one is added. Jennrich, R. I. (1977). Stepwise discriminant analysis , in Statistical Methods for

  7. Matching pollution with adaptive changes in mangrove plants by multivariate statistics. A case study, Rhizophora mangle from four neotropical mangroves in Brazil.

    PubMed

    Souza, Iara da Costa; Morozesk, Mariana; Duarte, Ian Drumond; Bonomo, Marina Marques; Rocha, Lívia Dorsch; Furlan, Larissa Maria; Arrivabene, Hiulana Pereira; Monferrán, Magdalena Victoria; Matsumoto, Silvia Tamie; Milanez, Camilla Rozindo Dias; Wunderlin, Daniel Alberto; Fernandes, Marisa Narciso

    2014-08-01

    Roots of mangrove trees have an important role in depurating water and sediments by retaining metals that may accumulate in different plant tissues, affecting physiological processes and anatomy. The present study aimed to evaluate adaptive changes in root of Rhizophora mangle in response to different levels of chemical elements (metals/metalloids) in interstitial water and sediments from four neotropical mangroves in Brazil. What sets this study apart from other studies is that we not only investigate adaptive modifications in R. mangle but also changes in environments where this plant grows, evaluating correspondence between physical, chemical and biological issues by a combined set of multivariate statistical methods (pattern recognition). Thus, we looked to match changes in the environment with adaptations in plants. Multivariate statistics highlighted that the lignified periderm and the air gaps are directly related to the environmental contamination. Current results provide new evidences of root anatomical strategies to deal with contaminated environments. Multivariate statistics greatly contributes to extrapolate results from complex data matrixes obtained when analyzing environmental issues, pointing out parameters involved in environmental changes and also evidencing the adaptive response of the exposed biota. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. Estimating an Effect Size in One-Way Multivariate Analysis of Variance (MANOVA)

    ERIC Educational Resources Information Center

    Steyn, H. S., Jr.; Ellis, S. M.

    2009-01-01

    When two or more univariate population means are compared, the proportion of variation in the dependent variable accounted for by population group membership is eta-squared. This effect size can be generalized by using multivariate measures of association, based on the multivariate analysis of variance (MANOVA) statistics, to establish whether…

  9. Root Cause Analysis of Quality Defects Using HPLC-MS Fingerprint Knowledgebase for Batch-to-batch Quality Control of Herbal Drugs.

    PubMed

    Yan, Binjun; Fang, Zhonghua; Shen, Lijuan; Qu, Haibin

    2015-01-01

    The batch-to-batch quality consistency of herbal drugs has always been an important issue. To propose a methodology for batch-to-batch quality control based on HPLC-MS fingerprints and process knowledgebase. The extraction process of Compound E-jiao Oral Liquid was taken as a case study. After establishing the HPLC-MS fingerprint analysis method, the fingerprints of the extract solutions produced under normal and abnormal operation conditions were obtained. Multivariate statistical models were built for fault detection and a discriminant analysis model was built using the probabilistic discriminant partial-least-squares method for fault diagnosis. Based on multivariate statistical analysis, process knowledge was acquired and the cause-effect relationship between process deviations and quality defects was revealed. The quality defects were detected successfully by multivariate statistical control charts and the type of process deviations were diagnosed correctly by discriminant analysis. This work has demonstrated the benefits of combining HPLC-MS fingerprints, process knowledge and multivariate analysis for the quality control of herbal drugs. Copyright © 2015 John Wiley & Sons, Ltd.

  10. Application of multivariate statistical techniques in microbial ecology.

    PubMed

    Paliy, O; Shankar, V

    2016-03-01

    Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure. © 2016 John Wiley & Sons Ltd.

  11. Applying the multivariate time-rescaling theorem to neural population models

    PubMed Central

    Gerhard, Felipe; Haslinger, Robert; Pipa, Gordon

    2011-01-01

    Statistical models of neural activity are integral to modern neuroscience. Recently, interest has grown in modeling the spiking activity of populations of simultaneously recorded neurons to study the effects of correlations and functional connectivity on neural information processing. However any statistical model must be validated by an appropriate goodness-of-fit test. Kolmogorov-Smirnov tests based upon the time-rescaling theorem have proven to be useful for evaluating point-process-based statistical models of single-neuron spike trains. Here we discuss the extension of the time-rescaling theorem to the multivariate (neural population) case. We show that even in the presence of strong correlations between spike trains, models which neglect couplings between neurons can be erroneously passed by the univariate time-rescaling test. We present the multivariate version of the time-rescaling theorem, and provide a practical step-by-step procedure for applying it towards testing the sufficiency of neural population models. Using several simple analytically tractable models and also more complex simulated and real data sets, we demonstrate that important features of the population activity can only be detected using the multivariate extension of the test. PMID:21395436

  12. A comparison of ensemble post-processing approaches that preserve correlation structures

    NASA Astrophysics Data System (ADS)

    Schefzik, Roman; Van Schaeybroeck, Bert; Vannitsem, Stéphane

    2016-04-01

    Despite the fact that ensemble forecasts address the major sources of uncertainty, they exhibit biases and dispersion errors and therefore are known to improve by calibration or statistical post-processing. For instance the ensemble model output statistics (EMOS) method, also known as non-homogeneous regression approach (Gneiting et al., 2005) is known to strongly improve forecast skill. EMOS is based on fitting and adjusting a parametric probability density function (PDF). However, EMOS and other common post-processing approaches apply to a single weather quantity at a single location for a single look-ahead time. They are therefore unable of taking into account spatial, inter-variable and temporal dependence structures. Recently many research efforts have been invested in designing post-processing methods that resolve this drawback but also in verification methods that enable the detection of dependence structures. New verification methods are applied on two classes of post-processing methods, both generating physically coherent ensembles. A first class uses the ensemble copula coupling (ECC) that starts from EMOS but adjusts the rank structure (Schefzik et al., 2013). The second class is a member-by-member post-processing (MBM) approach that maps each raw ensemble member to a corrected one (Van Schaeybroeck and Vannitsem, 2015). We compare variants of the EMOS-ECC and MBM classes and highlight a specific theoretical connection between them. All post-processing variants are applied in the context of the ensemble system of the European Centre of Weather Forecasts (ECMWF) and compared using multivariate verification tools including the energy score, the variogram score (Scheuerer and Hamill, 2015) and the band depth rank histogram (Thorarinsdottir et al., 2015). Gneiting, Raftery, Westveld, and Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., {133}, 1098-1118. Scheuerer and Hamill, 2015. Variogram-based proper scoring rules for probabilistic forecasts of multivariate quantities. Mon. Wea. Rev. {143},1321-1334. Schefzik, Thorarinsdottir, Gneiting. Uncertainty quantification in complex simulation models using ensemble copula coupling. Statistical Science {28},616-640, 2013. Thorarinsdottir, M. Scheuerer, and C. Heinz, 2015. Assessing the calibration of high-dimensional ensemble forecasts using rank histograms, arXiv:1310.0236. Van Schaeybroeck and Vannitsem, 2015: Ensemble post-processing using member-by-member approaches: theoretical aspects. Q.J.R. Meteorol. Soc., 141: 807-818.

  13. Comparison of measurement methods with a mixed effects procedure accounting for replicated evaluations (COM3PARE): method comparison algorithm implementation for head and neck IGRT positional verification.

    PubMed

    Roy, Anuradha; Fuller, Clifton D; Rosenthal, David I; Thomas, Charles R

    2015-08-28

    Comparison of imaging measurement devices in the absence of a gold-standard comparator remains a vexing problem; especially in scenarios where multiple, non-paired, replicated measurements occur, as in image-guided radiotherapy (IGRT). As the number of commercially available IGRT presents a challenge to determine whether different IGRT methods may be used interchangeably, an unmet need conceptually parsimonious and statistically robust method to evaluate the agreement between two methods with replicated observations. Consequently, we sought to determine, using an previously reported head and neck positional verification dataset, the feasibility and utility of a Comparison of Measurement Methods with the Mixed Effects Procedure Accounting for Replicated Evaluations (COM3PARE), a unified conceptual schema and analytic algorithm based upon Roy's linear mixed effects (LME) model with Kronecker product covariance structure in a doubly multivariate set-up, for IGRT method comparison. An anonymized dataset consisting of 100 paired coordinate (X/ measurements from a sequential series of head and neck cancer patients imaged near-simultaneously with cone beam CT (CBCT) and kilovoltage X-ray (KVX) imaging was used for model implementation. Software-suggested CBCT and KVX shifts for the lateral (X), vertical (Y) and longitudinal (Z) dimensions were evaluated for bias, inter-method (between-subject variation), intra-method (within-subject variation), and overall agreement using with a script implementing COM3PARE with the MIXED procedure of the statistical software package SAS (SAS Institute, Cary, NC, USA). COM3PARE showed statistically significant bias agreement and difference in inter-method between CBCT and KVX was observed in the Z-axis (both p - value<0.01). Intra-method and overall agreement differences were noted as statistically significant for both the X- and Z-axes (all p - value<0.01). Using pre-specified criteria, based on intra-method agreement, CBCT was deemed preferable for X-axis positional verification, with KVX preferred for superoinferior alignment. The COM3PARE methodology was validated as feasible and useful in this pilot head and neck cancer positional verification dataset. COM3PARE represents a flexible and robust standardized analytic methodology for IGRT comparison. The implemented SAS script is included to encourage other groups to implement COM3PARE in other anatomic sites or IGRT platforms.

  14. Restoration of MRI Data for Field Nonuniformities using High Order Neighborhood Statistics

    PubMed Central

    Hadjidemetriou, Stathis; Studholme, Colin; Mueller, Susanne; Weiner, Michael; Schuff, Norbert

    2007-01-01

    MRI at high magnetic fields (> 3.0 T ) is complicated by strong inhomogeneous radio-frequency fields, sometimes termed the “bias field”. These lead to nonuniformity of image intensity, greatly complicating further analysis such as registration and segmentation. Existing methods for bias field correction are effective for 1.5 T or 3.0 T MRI, but are not completely satisfactory for higher field data. This paper develops an effective bias field correction for high field MRI based on the assumption that the nonuniformity is smoothly varying in space. Also, nonuniformity is quantified and unmixed using high order neighborhood statistics of intensity cooccurrences. They are computed within spherical windows of limited size over the entire image. The restoration is iterative and makes use of a novel stable stopping criterion that depends on the scaled entropy of the cooccurrence statistics, which is a non monotonic function of the iterations; the Shannon entropy of the cooccurrence statistics normalized to the effective dynamic range of the image. The algorithm restores whole head data, is robust to intense nonuniformities present in high field acquisitions, and is robust to variations in anatomy. This algorithm significantly improves bias field correction in comparison to N3 on phantom 1.5 T head data and high field 4 T human head data. PMID:18193095

  15. Errors in causal inference: an organizational schema for systematic error and random error.

    PubMed

    Suzuki, Etsuji; Tsuda, Toshihide; Mitsuhashi, Toshiharu; Mansournia, Mohammad Ali; Yamamoto, Eiji

    2016-11-01

    To provide an organizational schema for systematic error and random error in estimating causal measures, aimed at clarifying the concept of errors from the perspective of causal inference. We propose to divide systematic error into structural error and analytic error. With regard to random error, our schema shows its four major sources: nondeterministic counterfactuals, sampling variability, a mechanism that generates exposure events and measurement variability. Structural error is defined from the perspective of counterfactual reasoning and divided into nonexchangeability bias (which comprises confounding bias and selection bias) and measurement bias. Directed acyclic graphs are useful to illustrate this kind of error. Nonexchangeability bias implies a lack of "exchangeability" between the selected exposed and unexposed groups. A lack of exchangeability is not a primary concern of measurement bias, justifying its separation from confounding bias and selection bias. Many forms of analytic errors result from the small-sample properties of the estimator used and vanish asymptotically. Analytic error also results from wrong (misspecified) statistical models and inappropriate statistical methods. Our organizational schema is helpful for understanding the relationship between systematic error and random error from a previously less investigated aspect, enabling us to better understand the relationship between accuracy, validity, and precision. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. A minimalist approach to bias estimation for passive sensor measurements with targets of opportunity

    NASA Astrophysics Data System (ADS)

    Belfadel, Djedjiga; Osborne, Richard W.; Bar-Shalom, Yaakov

    2013-09-01

    In order to carry out data fusion, registration error correction is crucial in multisensor systems. This requires estimation of the sensor measurement biases. It is important to correct for these bias errors so that the multiple sensor measurements and/or tracks can be referenced as accurately as possible to a common tracking coordinate system. This paper provides a solution for bias estimation for the minimum number of passive sensors (two), when only targets of opportunity are available. The sensor measurements are assumed time-coincident (synchronous) and perfectly associated. Since these sensors provide only line of sight (LOS) measurements, the formation of a single composite Cartesian measurement obtained from fusing the LOS measurements from different sensors is needed to avoid the need for nonlinear filtering. We evaluate the Cramer-Rao Lower Bound (CRLB) on the covariance of the bias estimate, i.e., the quantification of the available information about the biases. Statistical tests on the results of simulations show that this method is statistically efficient, even for small sample sizes (as few as two sensors and six points on the trajectory of a single target of opportunity). We also show that the RMS position error is significantly improved with bias estimation compared with the target position estimation using the original biased measurements.

  17. Bootstrap Estimation of Sample Statistic Bias in Structural Equation Modeling.

    ERIC Educational Resources Information Center

    Thompson, Bruce; Fan, Xitao

    This study empirically investigated bootstrap bias estimation in the area of structural equation modeling (SEM). Three correctly specified SEM models were used under four different sample size conditions. Monte Carlo experiments were carried out to generate the criteria against which bootstrap bias estimation should be judged. For SEM fit indices,…

  18. Systematic Error Modeling and Bias Estimation

    PubMed Central

    Zhang, Feihu; Knoll, Alois

    2016-01-01

    This paper analyzes the statistic properties of the systematic error in terms of range and bearing during the transformation process. Furthermore, we rely on a weighted nonlinear least square method to calculate the biases based on the proposed models. The results show the high performance of the proposed approach for error modeling and bias estimation. PMID:27213386

  19. Explanation of Two Anomalous Results in Statistical Mediation Analysis

    ERIC Educational Resources Information Center

    Fritz, Matthew S.; Taylor, Aaron B.; MacKinnon, David P.

    2012-01-01

    Previous studies of different methods of testing mediation models have consistently found two anomalous results. The first result is elevated Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap tests not found in nonresampling tests or in resampling tests that did not include a bias correction. This is of special…

  20. Publication Bias in Meta-Analyses of the Efficacy of Psychotherapeutic Interventions for Depression

    ERIC Educational Resources Information Center

    Niemeyer, Helen; Musch, Jochen; Pietrowsky, Reinhard

    2013-01-01

    Objective: The aim of this study was to assess whether systematic reviews investigating psychotherapeutic interventions for depression are affected by publication bias. Only homogeneous data sets were included, as heterogeneous data sets can distort statistical tests of publication bias. Method: We applied Begg and Mazumdar's adjusted rank…

  1. Investigating the Stability of Four Methods for Estimating Item Bias.

    ERIC Educational Resources Information Center

    Perlman, Carole L.; And Others

    The reliability of item bias estimates was studied for four methods: (1) the transformed delta method; (2) Shepard's modified delta method; (3) Rasch's one-parameter residual analysis; and (4) the Mantel-Haenszel procedure. Bias statistics were computed for each sample using all methods. Data were from administration of multiple-choice items from…

  2. Conceptual Complexity and the Bias/Variance Tradeoff

    ERIC Educational Resources Information Center

    Briscoe, Erica; Feldman, Jacob

    2011-01-01

    In this paper we propose that the conventional dichotomy between exemplar-based and prototype-based models of concept learning is helpfully viewed as an instance of what is known in the statistical learning literature as the "bias/variance tradeoff". The bias/variance tradeoff can be thought of as a sliding scale that modulates how closely any…

  3. Ranking Bias in Association Studies

    PubMed Central

    Jeffries, Neal O.

    2009-01-01

    Background It is widely appreciated that genomewide association studies often yield overestimates of the association of a marker with disease when attention focuses upon the marker showing the strongest relationship. For example, in a case-control setting the largest (in absolute value) estimated odds ratio has been found to typically overstate the association as measured in a second, independent set of data. The most common reason given for this observation is that the choice of the most extreme test statistic is often conditional upon first observing a significant p value associated with the marker. A second, less appreciated reason is described here. Under common circumstances it is the multiple testing of many markers and subsequent focus upon those with most extreme test statistics (i.e. highly ranked results) that leads to bias in the estimated effect sizes. Conclusions This bias, termed ranking bias, is separate from that arising from conditioning on a significant p value and may often be a more important factor in generating bias. An analytic description of this bias, simulations demonstrating its extent, and identification of some factors leading to its exacerbation are presented. PMID:19172085

  4. Cognitive biases, linguistic universals, and constraint-based grammar learning.

    PubMed

    Culbertson, Jennifer; Smolensky, Paul; Wilson, Colin

    2013-07-01

    According to classical arguments, language learning is both facilitated and constrained by cognitive biases. These biases are reflected in linguistic typology-the distribution of linguistic patterns across the world's languages-and can be probed with artificial grammar experiments on child and adult learners. Beginning with a widely successful approach to typology (Optimality Theory), and adapting techniques from computational approaches to statistical learning, we develop a Bayesian model of cognitive biases and show that it accounts for the detailed pattern of results of artificial grammar experiments on noun-phrase word order (Culbertson, Smolensky, & Legendre, 2012). Our proposal has several novel properties that distinguish it from prior work in the domains of linguistic theory, computational cognitive science, and machine learning. This study illustrates how ideas from these domains can be synthesized into a model of language learning in which biases range in strength from hard (absolute) to soft (statistical), and in which language-specific and domain-general biases combine to account for data from the macro-level scale of typological distribution to the micro-level scale of learning by individuals. Copyright © 2013 Cognitive Science Society, Inc.

  5. Self-referent information processing in individuals with bipolar spectrum disorders.

    PubMed

    Molz Adams, Ashleigh; Shapero, Benjamin G; Pendergast, Laura H; Alloy, Lauren B; Abramson, Lyn Y

    2014-01-01

    Bipolar spectrum disorders (BSDs) are common and impairing, which has led to an examination of risk factors for their development and maintenance. Historically, research has examined cognitive vulnerabilities to BSDs derived largely from the unipolar depression literature. Specifically, theorists propose that dysfunctional information processing guided by negative self-schemata may be a risk factor for depression. However, few studies have examined whether BSD individuals also show self-referent processing biases. This study examined self-referent information processing differences between 66 individuals with and 58 individuals without a BSD in a young adult sample (age M=19.65, SD=1.74; 62% female; 47% Caucasian). Repeated measures multivariate analysis of variance (MANOVA) was conducted to examine multivariate effects of BSD diagnosis on 4 self-referent processing variables (self-referent judgments, response latency, behavioral predictions, and recall) in response to depression-related and nondepression-related stimuli. Bipolar individuals endorsed and recalled more negative and fewer positive self-referent adjectives, as well as made more negative and fewer positive behavioral predictions. Many of these information-processing biases were partially, but not fully, mediated by depressive symptoms. Our sample was not a clinical or treatment-seeking sample, so we cannot generalize our results to clinical BSD samples. No participants had a bipolar I disorder at baseline. This study provides further evidence that individuals with BSDs exhibit a negative self-referent information processing bias. This may mean that those with BSDs have selective attention and recall of negative information about themselves, highlighting the need for attention to cognitive biases in therapy. © 2013 Elsevier B.V. All rights reserved.

  6. Self-referent information processing in individuals with bipolar spectrum disorders

    PubMed Central

    Molz Adams, Ashleigh; Shapero, Benjamin G.; Pendergast, Laura H.; Alloy, Lauren B.; Abramson, Lyn Y.

    2014-01-01

    Background Bipolar spectrum disorders (BSDs) are common and impairing, which has led to an examination of risk factors for their development and maintenance. Historically, research has examined cognitive vulnerabilities to BSDs derived largely from the unipolar depression literature. Specifically, theorists propose that dysfunctional information processing guided by negative self-schemata may be a risk factor for depression. However, few studies have examined whether BSD individuals also show self-referent processing biases. Methods This study examined self-referent information processing differences between 66 individuals with and 58 individuals without a BSD in a young adult sample (age M = 19.65, SD = 1.74; 62% female; 47% Caucasian). Repeated measures multivariate analysis of variance (MANOVA) was conducted to examine multivariate effects of BSD diagnosis on 4 self-referent processing variables (self-referent judgments, response latency, behavioral predictions, and recall) in response to depression-related and nondepression-related stimuli. Results Bipolar individuals endorsed and recalled more negative and fewer positive self-referent adjectives, as well as made more negative and fewer positive behavioral predictions. Many of these information-processing biases were partially, but not fully, mediated by depressive symptoms. Limitations Our sample was not a clinical or treatment-seeking sample, so we cannot generalize our results to clinical BSD samples. No participants had a bipolar I disorder at baseline. Conclusions This study provides further evidence that individuals with BSDs exhibit a negative self-referent information processing bias. This may mean that those with BSDs have selective attention and recall of negative information about themselves, highlighting the need for attention to cognitive biases in therapy. PMID:24074480

  7. Examining publication bias—a simulation-based evaluation of statistical tests on publication bias

    PubMed Central

    2017-01-01

    Background Publication bias is a form of scientific misconduct. It threatens the validity of research results and the credibility of science. Although several tests on publication bias exist, no in-depth evaluations are available that examine which test performs best for different research settings. Methods Four tests on publication bias, Egger’s test (FAT), p-uniform, the test of excess significance (TES), as well as the caliper test, were evaluated in a Monte Carlo simulation. Two different types of publication bias and its degree (0%, 50%, 100%) were simulated. The type of publication bias was defined either as file-drawer, meaning the repeated analysis of new datasets, or p-hacking, meaning the inclusion of covariates in order to obtain a significant result. In addition, the underlying effect (β = 0, 0.5, 1, 1.5), effect heterogeneity, the number of observations in the simulated primary studies (N = 100, 500), and the number of observations for the publication bias tests (K = 100, 1,000) were varied. Results All tests evaluated were able to identify publication bias both in the file-drawer and p-hacking condition. The false positive rates were, with the exception of the 15%- and 20%-caliper test, unbiased. The FAT had the largest statistical power in the file-drawer conditions, whereas under p-hacking the TES was, except under effect heterogeneity, slightly better. The CTs were, however, inferior to the other tests under effect homogeneity and had a decent statistical power only in conditions with 1,000 primary studies. Discussion The FAT is recommended as a test for publication bias in standard meta-analyses with no or only small effect heterogeneity. If two-sided publication bias is suspected as well as under p-hacking the TES is the first alternative to the FAT. The 5%-caliper test is recommended under conditions of effect heterogeneity and a large number of primary studies, which may be found if publication bias is examined in a discipline-wide setting when primary studies cover different research problems. PMID:29204324

  8. Comparative forensic soil analysis of New Jersey state parks using a combination of simple techniques with multivariate statistics.

    PubMed

    Bonetti, Jennifer; Quarino, Lawrence

    2014-05-01

    This study has shown that the combination of simple techniques with the use of multivariate statistics offers the potential for the comparative analysis of soil samples. Five samples were obtained from each of twelve state parks across New Jersey in both the summer and fall seasons. Each sample was examined using particle-size distribution, pH analysis in both water and 1 M CaCl2 , and a loss on ignition technique. Data from each of the techniques were combined, and principal component analysis (PCA) and canonical discriminant analysis (CDA) were used for multivariate data transformation. Samples from different locations could be visually differentiated from one another using these multivariate plots. Hold-one-out cross-validation analysis showed error rates as low as 3.33%. Ten blind study samples were analyzed resulting in no misclassifications using Mahalanobis distance calculations and visual examinations of multivariate plots. Seasonal variation was minimal between corresponding samples, suggesting potential success in forensic applications. © 2014 American Academy of Forensic Sciences.

  9. Implicit Physician Biases in Periviability Counseling.

    PubMed

    Shapiro, Natasha; Wachtel, Elena V; Bailey, Sean M; Espiritu, Michael M

    2018-06-01

    To assess whether neonatologists show implicit racial and/or socioeconomic biases and whether these are predictive of recommendations at extreme periviability. A nationwide survey using a clinical vignette of a woman in labor at 23 2/7 weeks of gestation asked physicians how likely they were to recommend intensive vs comfort care. Participants were randomized to 1 of 4 versions of the vignette in which racial and socioeconomic stimuli were varied, followed by 2 implicit association tests (IATs). IATs revealed implicit preferences favoring white (mean IAT score = 0.48, P < .001) and greater socioeconomic status (mean IAT score = 0.73, P < .001). Multivariable linear regression analysis showed that physicians with implicit bias toward greater socioeconomic status were more likely than those without bias to recommend comfort care when presented with a patient of high socioeconomic status (P = .037). No significant effect was seen for implicit racial bias. Building on previous demonstrations of unconscious racial and socioeconomic biases among physicians and their predictive validity, our results suggest that unconscious socioeconomic bias influences recommendations when counseling at the limits of viability. Physicians who display a negative socioeconomic bias are less likely to recommend resuscitation when counseling women of high socioeconomic status. The influence of implicit socioeconomic bias on recommendations at periviability may influence neonatal healthcare disparities and should be explored in future studies. Copyright © 2018 Elsevier Inc. All rights reserved.

  10. Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.

    PubMed

    Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V

    2007-01-01

    The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.

  11. Application of multivariate statistical techniques for differentiation of ripe banana flour based on the composition of elements.

    PubMed

    Alkarkhi, Abbas F M; Ramli, Saifullah Bin; Easa, Azhar Mat

    2009-01-01

    Major (sodium, potassium, calcium, magnesium) and minor elements (iron, copper, zinc, manganese) and one heavy metal (lead) of Cavendish banana flour and Dream banana flour were determined, and data were analyzed using multivariate statistical techniques of factor analysis and discriminant analysis. Factor analysis yielded four factors explaining more than 81% of the total variance: the first factor explained 28.73%, comprising magnesium, sodium, and iron; the second factor explained 21.47%, comprising only manganese and copper; the third factor explained 15.66%, comprising zinc and lead; while the fourth factor explained 15.50%, comprising potassium. Discriminant analysis showed that magnesium and sodium exhibited a strong contribution in discriminating the two types of banana flour, affording 100% correct assignation. This study presents the usefulness of multivariate statistical techniques for analysis and interpretation of complex mineral content data from banana flour of different varieties.

  12. Non-Gaussian bias: insights from discrete density peaks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Desjacques, Vincent; Riotto, Antonio; Gong, Jinn-Ouk, E-mail: Vincent.Desjacques@unige.ch, E-mail: jinn-ouk.gong@apctp.org, E-mail: Antonio.Riotto@unige.ch

    2013-09-01

    Corrections induced by primordial non-Gaussianity to the linear halo bias can be computed from a peak-background split or the widespread local bias model. However, numerical simulations clearly support the prediction of the former, in which the non-Gaussian amplitude is proportional to the linear halo bias. To understand better the reasons behind the failure of standard Lagrangian local bias, in which the halo overdensity is a function of the local mass overdensity only, we explore the effect of a primordial bispectrum on the 2-point correlation of discrete density peaks. We show that the effective local bias expansion to peak clustering vastlymore » simplifies the calculation. We generalize this approach to excursion set peaks and demonstrate that the resulting non-Gaussian amplitude, which is a weighted sum of quadratic bias factors, precisely agrees with the peak-background split expectation, which is a logarithmic derivative of the halo mass function with respect to the normalisation amplitude. We point out that statistics of thresholded regions can be computed using the same formalism. Our results suggest that halo clustering statistics can be modelled consistently (in the sense that the Gaussian and non-Gaussian bias factors agree with peak-background split expectations) from a Lagrangian bias relation only if the latter is specified as a set of constraints imposed on the linear density field. This is clearly not the case of standard Lagrangian local bias. Therefore, one is led to consider additional variables beyond the local mass overdensity.« less

  13. Bias Reduction and Filter Convergence for Long Range Stereo

    NASA Technical Reports Server (NTRS)

    Sibley, Gabe; Matthies, Larry; Sukhatme, Gaurav

    2005-01-01

    We are concerned here with improving long range stereo by filtering image sequences. Traditionally, measurement errors from stereo camera systems have been approximated as 3-D Gaussians, where the mean is derived by triangulation and the covariance by linearized error propagation. However, there are two problems that arise when filtering such 3-D measurements. First, stereo triangulation suffers from a range dependent statistical bias; when filtering this leads to over-estimating the true range. Second, filtering 3-D measurements derived via linearized error propagation leads to apparent filter divergence; the estimator is biased to under-estimate range. To address the first issue, we examine the statistical behavior of stereo triangulation and show how to remove the bias by series expansion. The solution to the second problem is to filter with image coordinates as measurements instead of triangulated 3-D coordinates.

  14. Bias correction of satellite precipitation products for flood forecasting application at the Upper Mahanadi River Basin in Eastern India

    NASA Astrophysics Data System (ADS)

    Beria, H.; Nanda, T., Sr.; Chatterjee, C.

    2015-12-01

    High resolution satellite precipitation products such as Tropical Rainfall Measuring Mission (TRMM), Climate Forecast System Reanalysis (CFSR), European Centre for Medium-Range Weather Forecasts (ECMWF), etc., offer a promising alternative to flood forecasting in data scarce regions. At the current state-of-art, these products cannot be used in the raw form for flood forecasting, even at smaller lead times. In the current study, these precipitation products are bias corrected using statistical techniques, such as additive and multiplicative bias corrections, and wavelet multi-resolution analysis (MRA) with India Meteorological Department (IMD) gridded precipitation product,obtained from gauge-based rainfall estimates. Neural network based rainfall-runoff modeling using these bias corrected products provide encouraging results for flood forecasting upto 48 hours lead time. We will present various statistical and graphical interpretations of catchment response to high rainfall events using both the raw and bias corrected precipitation products at different lead times.

  15. Unifying quantum heat transfer in a nonequilibrium spin-boson model with full counting statistics

    NASA Astrophysics Data System (ADS)

    Wang, Chen; Ren, Jie; Cao, Jianshu

    2017-02-01

    To study the full counting statistics of quantum heat transfer in a driven nonequilibrium spin-boson model, we develop a generalized nonequilibrium polaron-transformed Redfield equation with an auxiliary counting field. This enables us to study the impact of qubit-bath coupling ranging from weak to strong regimes. Without external modulations, we observe maximal values of both steady-state heat flux and noise power in moderate coupling regimes, below which we find that these two transport quantities are enhanced by the finite-qubit-energy bias. With external modulations, the geometric-phase-induced heat flux shows a monotonic decrease upon increasing the qubit-bath coupling at zero qubit energy bias (without bias). While under the finite-qubit-energy bias (with bias), the geometric-phase-induced heat flux exhibits an interesting reversal behavior in the strong coupling regime. Our results unify the seemingly contradictory results in weak and strong qubit-bath coupling regimes and provide detailed dissections for the quantum fluctuation of nonequilibrium heat transfer.

  16. Calibrating genomic and allelic coverage bias in single-cell sequencing.

    PubMed

    Zhang, Cheng-Zhong; Adalsteinsson, Viktor A; Francis, Joshua; Cornils, Hauke; Jung, Joonil; Maire, Cecile; Ligon, Keith L; Meyerson, Matthew; Love, J Christopher

    2015-04-16

    Artifacts introduced in whole-genome amplification (WGA) make it difficult to derive accurate genomic information from single-cell genomes and require different analytical strategies from bulk genome analysis. Here, we describe statistical methods to quantitatively assess the amplification bias resulting from whole-genome amplification of single-cell genomic DNA. Analysis of single-cell DNA libraries generated by different technologies revealed universal features of the genome coverage bias predominantly generated at the amplicon level (1-10 kb). The magnitude of coverage bias can be accurately calibrated from low-pass sequencing (∼0.1 × ) to predict the depth-of-coverage yield of single-cell DNA libraries sequenced at arbitrary depths. We further provide a benchmark comparison of single-cell libraries generated by multi-strand displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Finally, we develop statistical models to calibrate allelic bias in single-cell whole-genome amplification and demonstrate a census-based strategy for efficient and accurate variant detection from low-input biopsy samples.

  17. Calibrating genomic and allelic coverage bias in single-cell sequencing

    PubMed Central

    Francis, Joshua; Cornils, Hauke; Jung, Joonil; Maire, Cecile; Ligon, Keith L.; Meyerson, Matthew; Love, J. Christopher

    2016-01-01

    Artifacts introduced in whole-genome amplification (WGA) make it difficult to derive accurate genomic information from single-cell genomes and require different analytical strategies from bulk genome analysis. Here, we describe statistical methods to quantitatively assess the amplification bias resulting from whole-genome amplification of single-cell genomic DNA. Analysis of single-cell DNA libraries generated by different technologies revealed universal features of the genome coverage bias predominantly generated at the amplicon level (1–10 kb). The magnitude of coverage bias can be accurately calibrated from low-pass sequencing (~0.1 ×) to predict the depth-of-coverage yield of single-cell DNA libraries sequenced at arbitrary depths. We further provide a benchmark comparison of single-cell libraries generated by multi-strand displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Finally, we develop statistical models to calibrate allelic bias in single-cell whole-genome amplification and demonstrate a census-based strategy for efficient and accurate variant detection from low-input biopsy samples. PMID:25879913

  18. A Not-So-Fundamental Limitation on Studying Complex Systems with Statistics: Comment on Rabin (2011)

    NASA Astrophysics Data System (ADS)

    Thomas, Drew M.

    2012-12-01

    Although living organisms are affected by many interrelated and unidentified variables, this complexity does not automatically impose a fundamental limitation on statistical inference. Nor need one invoke such complexity as an explanation of the "Truth Wears Off" or "decline" effect; similar "decline" effects occur with far simpler systems studied in physics. Selective reporting and publication bias, and scientists' biases in favor of reporting eye-catching results (in general) or conforming to others' results (in physics) better explain this feature of the "Truth Wears Off" effect than Rabin's suggested limitation on statistical inference.

  19. Discordance between net analyte signal theory and practical multivariate calibration.

    PubMed

    Brown, Christopher D

    2004-08-01

    Lorber's concept of net analyte signal is reviewed in the context of classical and inverse least-squares approaches to multivariate calibration. It is shown that, in the presence of device measurement error, the classical and inverse calibration procedures have radically different theoretical prediction objectives, and the assertion that the popular inverse least-squares procedures (including partial least squares, principal components regression) approximate Lorber's net analyte signal vector in the limit is disproved. Exact theoretical expressions for the prediction error bias, variance, and mean-squared error are given under general measurement error conditions, which reinforce the very discrepant behavior between these two predictive approaches, and Lorber's net analyte signal theory. Implications for multivariate figures of merit and numerous recently proposed preprocessing treatments involving orthogonal projections are also discussed.

  20. Why is "S" a Biased Estimate of [sigma]?

    ERIC Educational Resources Information Center

    Sanqui, Jose Almer T.; Arnholt, Alan T.

    2011-01-01

    This article describes a simulation activity that can be used to help students see that the estimator "S" is a biased estimator of [sigma]. The activity can be implemented using either a statistical package such as R, Minitab, or a Web applet. In the activity, the students investigate and compare the bias of "S" when sampling from different…

  1. Self-Regulated Learning Strategies in Relation with Statistics Anxiety

    ERIC Educational Resources Information Center

    Kesici, Sahin; Baloglu, Mustafa; Deniz, M. Engin

    2011-01-01

    Dealing with students' attitudinal problems related to statistics is an important aspect of statistics instruction. Employing the appropriate learning strategies may have a relationship with anxiety during the process of statistics learning. Thus, the present study investigated multivariate relationships between self-regulated learning strategies…

  2. EXTENDING MULTIVARIATE DISTANCE MATRIX REGRESSION WITH AN EFFECT SIZE MEASURE AND THE ASYMPTOTIC NULL DISTRIBUTION OF THE TEST STATISTIC

    PubMed Central

    McArtor, Daniel B.; Lubke, Gitta H.; Bergeman, C. S.

    2017-01-01

    Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains. PMID:27738957

  3. Extending multivariate distance matrix regression with an effect size measure and the asymptotic null distribution of the test statistic.

    PubMed

    McArtor, Daniel B; Lubke, Gitta H; Bergeman, C S

    2017-12-01

    Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains.

  4. Bootstrap imputation with a disease probability model minimized bias from misclassification due to administrative database codes.

    PubMed

    van Walraven, Carl

    2017-04-01

    Diagnostic codes used in administrative databases cause bias due to misclassification of patient disease status. It is unclear which methods minimize this bias. Serum creatinine measures were used to determine severe renal failure status in 50,074 hospitalized patients. The true prevalence of severe renal failure and its association with covariates were measured. These were compared to results for which renal failure status was determined using surrogate measures including the following: (1) diagnostic codes; (2) categorization of probability estimates of renal failure determined from a previously validated model; or (3) bootstrap methods imputation of disease status using model-derived probability estimates. Bias in estimates of severe renal failure prevalence and its association with covariates were minimal when bootstrap methods were used to impute renal failure status from model-based probability estimates. In contrast, biases were extensive when renal failure status was determined using codes or methods in which model-based condition probability was categorized. Bias due to misclassification from inaccurate diagnostic codes can be minimized using bootstrap methods to impute condition status using multivariable model-derived probability estimates. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. qFeature

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2015-09-14

    This package contains statistical routines for extracting features from multivariate time-series data which can then be used for subsequent multivariate statistical analysis to identify patterns and anomalous behavior. It calculates local linear or quadratic regression model fits to moving windows for each series and then summarizes the model coefficients across user-defined time intervals for each series. These methods are domain agnostic-but they have been successfully applied to a variety of domains, including commercial aviation and electric power grid data.

  6. Asymptotic Distribution of the Likelihood Ratio Test Statistic for Sphericity of Complex Multivariate Normal Distribution.

    DTIC Science & Technology

    1981-08-01

    RATIO TEST STATISTIC FOR SPHERICITY OF COMPLEX MULTIVARIATE NORMAL DISTRIBUTION* C. Fang P. R. Krishnaiah B. N. Nagarsenker** August 1981 Technical...and their applications in time sEries, the reader is referred to Krishnaiah (1976). Motivated by the applications in the area of inference on multiple...for practical purposes. Here, we note that Krishnaiah , Lee and Chang (1976) approxi- mated the null distribution of certain power of the likeli

  7. Implications of the methodological choices for hydrologic portrayals of climate change over the contiguous United States: Statistically downscaled forcing data and hydrologic models

    USGS Publications Warehouse

    Mizukami, Naoki; Clark, Martyn P.; Gutmann, Ethan D.; Mendoza, Pablo A.; Newman, Andrew J.; Nijssen, Bart; Livneh, Ben; Hay, Lauren E.; Arnold, Jeffrey R.; Brekke, Levi D.

    2016-01-01

    Continental-domain assessments of climate change impacts on water resources typically rely on statistically downscaled climate model outputs to force hydrologic models at a finer spatial resolution. This study examines the effects of four statistical downscaling methods [bias-corrected constructed analog (BCCA), bias-corrected spatial disaggregation applied at daily (BCSDd) and monthly scales (BCSDm), and asynchronous regression (AR)] on retrospective hydrologic simulations using three hydrologic models with their default parameters (the Community Land Model, version 4.0; the Variable Infiltration Capacity model, version 4.1.2; and the Precipitation–Runoff Modeling System, version 3.0.4) over the contiguous United States (CONUS). Biases of hydrologic simulations forced by statistically downscaled climate data relative to the simulation with observation-based gridded data are presented. Each statistical downscaling method produces different meteorological portrayals including precipitation amount, wet-day frequency, and the energy input (i.e., shortwave radiation), and their interplay affects estimations of precipitation partitioning between evapotranspiration and runoff, extreme runoff, and hydrologic states (i.e., snow and soil moisture). The analyses show that BCCA underestimates annual precipitation by as much as −250 mm, leading to unreasonable hydrologic portrayals over the CONUS for all models. Although the other three statistical downscaling methods produce a comparable precipitation bias ranging from −10 to 8 mm across the CONUS, BCSDd severely overestimates the wet-day fraction by up to 0.25, leading to different precipitation partitioning compared to the simulations with other downscaled data. Overall, the choice of downscaling method contributes to less spread in runoff estimates (by a factor of 1.5–3) than the choice of hydrologic model with use of the default parameters if BCCA is excluded.

  8. Effects of freezing on white perch Morone americana (Gmelin, 1789): Implications for multivariate morphometrics

    USGS Publications Warehouse

    Kocovsky, Patrick

    2016-01-01

    This study tested the hypothesis that duration of freezing differentially affects whole-body morphometrics of a derived teleost. Whole-body morphometrics are frequently analyzed to test hypotheses of different species, or stocks within a species, of fishes. Specimens used for morphometric analyses are typically fixed or preserved prior to analysis, yet little research has been done on how fixation or preservation methods or duration of preservation of specimens might affect outcomes of multivariate statistical analyses of differences in shape. To determine whether whole-body morphometrics changed as a result of freezing, 23 whole-body morphometrics of age-1 white perch (Morone americana) from western Lake Erie (n = 211) were analyzed immediately after capture, after being held on ice overnight, and after freezing for 100 or 200 days. Discriminant function analysis revealed that all four groups differed significantly from one another (P < 0.0001). The first canonical axis reflected long-axis morphometrics, where there was a clear pattern of positive translation along this axis with duration of preservation. Re-classification analysis demonstrated fish were typically assigned to their original preservation class except for fish frozen 100 days, which assigned mostly to frozen 200 days. Morphometric comparisons using frozen fish must be done on fish frozen for identical periods of time to avoid biases related to the length of time they were frozen. Similar experiments should be conducted on other species and also using formalin- and alcohol-preserved specimens.

  9. Policy Safeguards and the Legitimacy of Highway Interdiction

    DTIC Science & Technology

    2016-12-01

    17 B. BIAS WITHIN LAW ENFORCEMENT ..............................................19 C. STATISTICAL DATA GATHERING...32 3. Controlling Discretion .................................................................36 4. Statistical Data Collection for Traffic Stops...49 A. DESCRIPTION OF STATISTICAL DATA COLLECTED ...............50 B. DATA ORGANIZATION AND ANALYSIS

  10. Biased relevance filtering in the auditory system: A test of confidence-weighted first-impressions.

    PubMed

    Mullens, D; Winkler, I; Damaso, K; Heathcote, A; Whitson, L; Provost, A; Todd, J

    2016-03-01

    Although first-impressions are known to impact decision-making and to have prolonged effects on reasoning, it is less well known that the same type of rapidly formed assumptions can explain biases in automatic relevance filtering outside of deliberate behavior. This paper features two studies in which participants have been asked to ignore sequences of sound while focusing attention on a silent movie. The sequences consisted of blocks, each with a high-probability repetition interrupted by rare acoustic deviations (i.e., a sound of different pitch or duration). The probabilities of the two different sounds alternated across the concatenated blocks within the sequence (i.e., short-to-long and long-to-short). The sound probabilities are rapidly and automatically learned for each block and a perceptual inference is formed predicting the most likely characteristics of the upcoming sound. Deviations elicit a prediction-error signal known as mismatch negativity (MMN). Computational models of MMN generally assume that its elicitation is governed by transition statistics that define what sound attributes are most likely to follow the current sound. MMN amplitude reflects prediction confidence, which is derived from the stability of the current transition statistics. However, our prior research showed that MMN amplitude is modulated by a strong first-impression bias that outweighs transition statistics. Here we test the hypothesis that this bias can be attributed to assumptions about predictable vs. unpredictable nature of each tone within the first encountered context, which is weighted by the stability of that context. The results of Study 1 show that this bias is initially prevented if there is no 1:1 mapping between sound attributes and probability, but it returns once the auditory system determines which properties provide the highest predictive value. The results of Study 2 show that confidence in the first-impression bias drops if assumptions about the temporal stability of the transition-statistics are violated. Both studies provide compelling evidence that the auditory system extrapolates patterns on multiple timescales to adjust its response to prediction-errors, while profoundly distorting the effects of transition-statistics by the assumptions formed on the basis of first-impressions. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. Fractures in women treated with raloxifene or alendronate: a retrospective database analysis

    PubMed Central

    2013-01-01

    Background Raloxifene and alendronate are anti-resorptive therapies approved for the prevention and treatment of postmenopausal osteoporosis. Raloxifene is also indicated to reduce the risk of invasive breast cancer in postmenopausal women with osteoporosis and in postmenopausal women at high risk of invasive breast cancer. A definitive study comparing the fracture effectiveness and rate of breast cancer for raloxifene and alendronate has not been published. The purpose of this retrospective cohort study was to evaluate fracture and breast cancer rates among patients treated with raloxifene or alendronate. Methods Females ≥45 years who initiated raloxifene or alendronate in 1998–2006 Truven Health Analytics MarketScan® Databases, had continuous enrollment 12 months prior to and at least 12 months after the index date, and had a treatment medication possession ratio ≥80% were included in this study. Rates of vertebral and nonvertebral fractures and breast cancer during 1, 3, 5, 6, 7, and 8 years of treatment with raloxifene or alendronate were evaluated. Fracture rates were adjusted for potential treatment bias using inverse probability of treatment weights. Multivariate hazard ratios were estimated for vertebral and nonvertebral fractures. Results Raloxifene patients had statistically significantly lower rates of vertebral fractures in 1, 3, 5, and 7 years and for nonvertebral fractures in 1 and 5 years. There were no statistically significant differences in the adjusted fracture rates between raloxifene and alendronate cohorts, except in the 3-year nonvertebral fracture rates where raloxifene was higher. Multivariate hazard ratios of raloxifene versus alendronate cohorts were not significantly different for vertebral and nonvertebral fracture in 1, 3, 5, 6, 7, and 8 years. Unweighted and weighted breast cancer rates were lower among raloxifene recipients. Conclusions Patients treated with alendronate and raloxifene had similar adjusted fracture rates in up to 8 years of adherent treatment, and raloxifene patients had lower breast cancer rates. PMID:23521803

  12. Self-testing of vaginal pH to prevent preterm delivery: a controlled trial.

    PubMed

    Bitzer, Eva-Maria; Schneider, Andrea; Wenzlaff, Paul; Hoyme, Udo B; Siegmund-Schultze, Elisabeth

    2011-02-01

    From 2004 to 2006, in a model project carried out by four German health insurers, expectant mothers were offered self-testing of vaginal pH in order to prevent preterm delivery. They were given pH test gloves on request so that they could measure their vaginal pH twice a week from the 12(th) to the 32(nd) week of gestation. They were instructed to consult with a gynecologist after any positive result. All further diagnostic or therapeutic decisions were at the discretion of the treating gynecologist. We assessed the effectiveness of the screening intervention, using delivery before the 37th week of gestation as the primary endpoint. In this prospective, controlled trial, we collected data on deliveries from 2004 to 2006 that were covered by the four participating insurers in five German federal states. We compared the outcomes of pregnancy in women who did and did not request test gloves (intervention group, [IG], and control group, [CG]). The data were derived from claims data of the participating insurers, as well as from a nationwide quality assurance auditing program for obstetrics and perinatal care. Propensity score matching and multivariate adjustment were used to control for the expected self-selection bias. The study sample comprised 149 082 deliveries. 13% of the expectant mothers requested test gloves, about half of them up to the 16(th) week of gestation. As expected, women with an elevated risk of preterm birth requested test gloves more often. Delivery before the 37(th) week of gestation was slightly more common in the intervention group than in the control group (IG 7.97%, CG 7.52%, relative risk 1.06, 95% confidence interval 1.00-1.12). This result was of borderline statistical significance in the propensity score matched analysis, but it was not statistically significant in the multivariate model. This trial did not demonstrate the efficacy of self-testing of vaginal pH for the prevention of preterm delivery (< 37 weeks of gestation).

  13. Fresh Biomass Estimation in Heterogeneous Grassland Using Hyperspectral Measurements and Multivariate Statistical Analysis

    NASA Astrophysics Data System (ADS)

    Darvishzadeh, R.; Skidmore, A. K.; Mirzaie, M.; Atzberger, C.; Schlerf, M.

    2014-12-01

    Accurate estimation of grassland biomass at their peak productivity can provide crucial information regarding the functioning and productivity of the rangelands. Hyperspectral remote sensing has proved to be valuable for estimation of vegetation biophysical parameters such as biomass using different statistical techniques. However, in statistical analysis of hyperspectral data, multicollinearity is a common problem due to large amount of correlated hyper-spectral reflectance measurements. The aim of this study was to examine the prospect of above ground biomass estimation in a heterogeneous Mediterranean rangeland employing multivariate calibration methods. Canopy spectral measurements were made in the field using a GER 3700 spectroradiometer, along with concomitant in situ measurements of above ground biomass for 170 sample plots. Multivariate calibrations including partial least squares regression (PLSR), principal component regression (PCR), and Least-Squared Support Vector Machine (LS-SVM) were used to estimate the above ground biomass. The prediction accuracy of the multivariate calibration methods were assessed using cross validated R2 and RMSE. The best model performance was obtained using LS_SVM and then PLSR both calibrated with first derivative reflectance dataset with R2cv = 0.88 & 0.86 and RMSEcv= 1.15 & 1.07 respectively. The weakest prediction accuracy was appeared when PCR were used (R2cv = 0.31 and RMSEcv= 2.48). The obtained results highlight the importance of multivariate calibration methods for biomass estimation when hyperspectral data are used.

  14. GAISE 2016 Promotes Statistical Literacy

    ERIC Educational Resources Information Center

    Schield, Milo

    2017-01-01

    In the 2005 Guidelines for Assessment and Instruction in Statistics Education (GAISE), statistical literacy featured as a primary goal. The 2016 revision eliminated statistical literacy as a stated goal. Although this looks like a rejection, this paper argues that by including multivariate thinking and--more importantly--confounding as recommended…

  15. Accuracy and the Effect of Possible Subject-Based Confounders of Magnitude-Based MRI for Estimating Hepatic Proton Density Fat Fraction in Adults, Using MR Spectroscopy as Reference

    PubMed Central

    Heba, Elhamy R.; Desai, Ajinkya; Zand, Kevin A.; Hamilton, Gavin; Wolfson, Tanya; Schlein, Alexandra N.; Gamst, Anthony; Loomba, Rohit; Sirlin, Claude B.; Middleton, Michael S.

    2016-01-01

    Purpose To determine the accuracy and the effect of possible subject-based confounders of magnitude-based magnetic resonance imaging (MRI) for estimating hepatic proton density fat fraction (PDFF) for different numbers of echoes in adults with known or suspected nonalcoholic fatty liver disease, using MR spectroscopy (MRS) as a reference. Materials and Methods In this retrospective analysis of 506 adults, hepatic PDFF was estimated by unenhanced 3.0T MRI, using right-lobe MRS as reference. Regions of interest placed on source images and on six-echo parametric PDFF maps were colocalized to MRS voxel location. Accuracy using different numbers of echoes was assessed by regression and Bland–Altman analysis; slope, intercept, average bias, and R2 were calculated. The effect of age, sex, and body mass index (BMI) on hepatic PDFF accuracy was investigated using multivariate linear regression analyses. Results MRI closely agreed with MRS for all tested methods. For three- to six-echo methods, slope, regression intercept, average bias, and R2 were 1.01–0.99, 0.11–0.62%, 0.24–0.56%, and 0.981–0.982, respectively. Slope was closest to unity for the five-echo method. The two-echo method was least accurate, underestimating PDFF by an average of 2.93%, compared to an average of 0.23–0.69% for the other methods. Statistically significant but clinically nonmeaningful effects on PDFF error were found for subject BMI (P range: 0.0016 to 0.0783), male sex (P range: 0.015 to 0.037), and no statistically significant effect was found for subject age (P range: 0.18–0.24). Conclusion Hepatic magnitude-based MRI PDFF estimates using three, four, five, and six echoes, and six-echo parametric maps are accurate compared to reference MRS values, and that accuracy is not meaningfully confounded by age, sex, or BMI. PMID:26201284

  16. Do state-of-the-art CMIP5 ESMs accurately represent observed vegetation-rainfall feedbacks? Focus on the Sahel

    NASA Astrophysics Data System (ADS)

    Notaro, M.; Wang, F.; Yu, Y.; Mao, J.; Shi, X.; Wei, Y.

    2017-12-01

    The semi-arid Sahel ecoregion is an established hotspot of land-atmosphere coupling. Ocean-land-atmosphere interactions received considerable attention by modeling studies in response to the devastating 1970s-90s Sahel drought, which models suggest was driven by sea-surface temperature (SST) anomalies and amplified by local vegetation-atmosphere feedbacks. Vegetation affects the atmosphere through biophysical feedbacks by altering the albedo, roughness, and transpiration and thereby modifying exchanges of energy, momentum, and moisture with the atmosphere. The current understanding of these potentially competing processes is primarily based on modeling studies, with biophysical feedbacks serving as a key uncertainty source in regional climate change projections among Earth System Models (ESMs). In order to reduce this uncertainty, it is critical to rigorously evaluate the representation of vegetation feedbacks in ESMs against an observational benchmark in order to diagnose systematic biases and their sources. However, it is challenging to successfully isolate vegetation's feedbacks on the atmosphere, since the atmospheric control on vegetation growth dominates the atmospheric feedback response to vegetation anomalies and the atmosphere is simultaneously influenced by oceanic and terrestrial anomalies. In response to this challenge, a model-validated multivariate statistical method, Stepwise Generalized Equilibrium Feedback Assessment (SGEFA), is developed, which extracts the forcing of a slowly-evolving environmental variable [e.g. SST or leaf area index (LAI)] on the rapidly-evolving atmosphere. By applying SGEFA to observational and remotely-sensed data, an observational benchmark is established for Sahel vegetation feedbacks. In this work, the simulated responses in key atmospheric variables, including evapotranspiration, albedo, wind speed, vertical motion, temperature, stability, and rainfall, to Sahel LAI anomalies are statistically assessed in Coupled Model Intercomparison Project Phase 5 (CMIP5) ESMs through SGEFA. The dominant mechanism, such as albedo feedback, moisture recycling, or momentum feedback, in each ESM is evaluated against the observed benchmark. SGEFA facilitates a systematic assessment of model biases in land-atmosphere interactions.

  17. The heterogeneity statistic I(2) can be biased in small meta-analyses.

    PubMed

    von Hippel, Paul T

    2015-04-14

    Estimated effects vary across studies, partly because of random sampling error and partly because of heterogeneity. In meta-analysis, the fraction of variance that is due to heterogeneity is estimated by the statistic I(2). We calculate the bias of I(2), focusing on the situation where the number of studies in the meta-analysis is small. Small meta-analyses are common; in the Cochrane Library, the median number of studies per meta-analysis is 7 or fewer. We use Mathematica software to calculate the expectation and bias of I(2). I(2) has a substantial bias when the number of studies is small. The bias is positive when the true fraction of heterogeneity is small, but the bias is typically negative when the true fraction of heterogeneity is large. For example, with 7 studies and no true heterogeneity, I(2) will overestimate heterogeneity by an average of 12 percentage points, but with 7 studies and 80 percent true heterogeneity, I(2) can underestimate heterogeneity by an average of 28 percentage points. Biases of 12-28 percentage points are not trivial when one considers that, in the Cochrane Library, the median I(2) estimate is 21 percent. The point estimate I(2) should be interpreted cautiously when a meta-analysis has few studies. In small meta-analyses, confidence intervals should supplement or replace the biased point estimate I(2).

  18. Southeast Atlantic Cloud Properties in a Multivariate Statistical Model - How Relevant is Air Mass History for Local Cloud Properties?

    NASA Astrophysics Data System (ADS)

    Fuchs, Julia; Cermak, Jan; Andersen, Hendrik

    2017-04-01

    This study aims at untangling the impacts of external dynamics and local conditions on cloud properties in the Southeast Atlantic (SEA) by combining satellite and reanalysis data using multivariate statistics. The understanding of clouds and their determinants at different scales is important for constraining the Earth's radiative budget, and thus prominent in climate-system research. In this study, SEA stratocumulus cloud properties are observed not only as the result of local environmental conditions but also as affected by external dynamics and spatial origins of air masses entering the study area. In order to assess to what extent cloud properties are impacted by aerosol concentration, air mass history, and meteorology, a multivariate approach is conducted using satellite observations of aerosol and cloud properties (MODIS, SEVIRI), information on aerosol species composition (MACC) and meteorological context (ERA-Interim reanalysis). To account for the often-neglected but important role of air mass origin, information on air mass history based on HYSPLIT modeling is included in the statistical model. This multivariate approach is intended to lead to a better understanding of the physical processes behind observed stratocumulus cloud properties in the SEA.

  19. Cardinal rules: Visual orientation perception reflects knowledge of environmental statistics

    PubMed Central

    Girshick, Ahna R.; Landy, Michael S.; Simoncelli, Eero P.

    2011-01-01

    Humans are remarkably good at performing visual tasks, but experimental measurements reveal substantial biases in the perception of basic visual attributes. An appealing hypothesis is that these biases arise through a process of statistical inference, in which information from noisy measurements is fused with a probabilistic model of the environment. But such inference is optimal only if the observer’s internal model matches the environment. Here, we provide evidence that this is the case. We measured performance in an orientation-estimation task, demonstrating the well-known fact that orientation judgements are more accurate at cardinal (horizontal and vertical) orientations, along with a new observation that judgements made under conditions of uncertainty are strongly biased toward cardinal orientations. We estimate observers’ internal models for orientation and find that they match the local orientation distribution measured in photographs. We also show how a neural population could embed probabilistic information responsible for such biases. PMID:21642976

  20. Investigation of Particle Sampling Bias in the Shear Flow Field Downstream of a Backward Facing Step

    NASA Technical Reports Server (NTRS)

    Meyers, James F.; Kjelgaard, Scott O.; Hepner, Timothy E.

    1990-01-01

    The flow field about a backward facing step was investigated to determine the characteristics of particle sampling bias in the various flow phenomena. The investigation used the calculation of the velocity:data rate correlation coefficient as a measure of statistical dependence and thus the degree of velocity bias. While the investigation found negligible dependence within the free stream region, increased dependence was found within the boundary and shear layers. Full classic correction techniques over-compensated the data since the dependence was weak, even in the boundary layer and shear regions. The paper emphasizes the necessity to determine the degree of particle sampling bias for each measurement ensemble and not use generalized assumptions to correct the data. Further, it recommends the calculation of the velocity:data rate correlation coefficient become a standard statistical calculation in the analysis of all laser velocimeter data.

  1. Integrated GIS and multivariate statistical analysis for regional scale assessment of heavy metal soil contamination: A critical review.

    PubMed

    Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan

    2017-12-01

    Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis (PCA) and cluster analysis (CA). Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. Resemblance profiles as clustering decision criteria: Estimating statistical power, error, and correspondence for a hypothesis test for multivariate structure.

    PubMed

    Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F

    2017-04-01

    Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.

  3. Learning investment indicators through data extension

    NASA Astrophysics Data System (ADS)

    Dvořák, Marek

    2017-07-01

    Stock prices in the form of time series were analysed using single and multivariate statistical methods. After simple data preprocessing in the form of logarithmic differences, we augmented this single variate time series to a multivariate representation. This method makes use of sliding windows to calculate several dozen of new variables using simple statistic tools like first and second moments as well as more complicated statistic, like auto-regression coefficients and residual analysis, followed by an optional quadratic transformation that was further used for data extension. These were used as a explanatory variables in a regularized logistic LASSO regression which tried to estimate Buy-Sell Index (BSI) from real stock market data.

  4. Sympathetic bias.

    PubMed

    Levy, David M; Peart, Sandra J

    2008-06-01

    We wish to deal with investigator bias in a statistical context. We sketch how a textbook solution to the problem of "outliers" which avoids one sort of investigator bias, creates the temptation for another sort. We write down a model of the approbation seeking statistician who is tempted by sympathy for client to violate the disciplinary standards. We give a simple account of one context in which we might expect investigator bias to flourish. Finally, we offer tentative suggestions to deal with the problem of investigator bias which follow from our account. As we have given a very sparse and stylized account of investigator bias, we ask what might be done to overcome this limitation.

  5. Validation of satellite-based rainfall in Kalahari

    NASA Astrophysics Data System (ADS)

    Lekula, Moiteela; Lubczynski, Maciek W.; Shemang, Elisha M.; Verhoef, Wouter

    2018-06-01

    Water resources management in arid and semi-arid areas is hampered by insufficient rainfall data, typically obtained from sparsely distributed rain gauges. Satellite-based rainfall estimates (SREs) are alternative sources of such data in these areas. In this study, daily rainfall estimates from FEWS-RFE∼11 km, TRMM-3B42∼27 km, CMOPRH∼27 km and CMORPH∼8 km were evaluated against nine, daily rain gauge records in Central Kalahari Basin (CKB), over a five-year period, 01/01/2001-31/12/2005. The aims were to evaluate the daily rainfall detection capabilities of the four SRE algorithms, analyze the spatio-temporal variability of rainfall in the CKB and perform bias-correction of the four SREs. Evaluation methods included scatter plot analysis, descriptive statistics, categorical statistics and bias decomposition. The spatio-temporal variability of rainfall, was assessed using the SREs' mean annual rainfall, standard deviation, coefficient of variation and spatial correlation functions. Bias correction of the four SREs was conducted using a Time-Varying Space-Fixed bias-correction scheme. The results underlined the importance of validating daily SREs, as they had different rainfall detection capabilities in the CKB. The FEWS-RFE∼11 km performed best, providing better results of descriptive and categorical statistics than the other three SREs, although bias decomposition showed that all SREs underestimated rainfall. The analysis showed that the most reliable SREs performance analysis indicator were the frequency of "miss" rainfall events and the "miss-bias", as they directly indicated SREs' sensitivity and bias of rainfall detection, respectively. The Time Varying and Space Fixed (TVSF) bias-correction scheme, improved some error measures but resulted in the reduction of the spatial correlation distance, thus increased, already high, spatial rainfall variability of all the four SREs. This study highlighted SREs as valuable source of daily rainfall data providing good spatio-temporal data coverage especially suitable for areas with limited rain gauges, such as the CKB, but also emphasized SREs' drawbacks, creating avenue for follow up research.

  6. Verify MesoNAM Performance

    NASA Technical Reports Server (NTRS)

    Bauman, William H., III

    2010-01-01

    The AMU conducted an objective analysis of the MesoNAM forecasts compared to observed values from sensors at specified KSC/CCAFS wind towers by calculating the following statistics to verify the performance of the model: 1) Bias (mean difference), 2) Standard deviation of Bias, 3) Root Mean Square Error (RMSE), and 4) Hypothesis test for Bias = O. The 45 WS LWOs use the MesoNAM to support launch weather operations. However, the actual performance of the model at KSC and CCAFS had not been measured objectively. The analysis compared the MesoNAM forecast winds, temperature and dew point to the observed values from the sensors on wind towers. The data were stratified by tower sensor, month and onshore/offshore wind direction based on the orientation of the coastline to each tower's location. The model's performance statistics were then calculated for each wind tower based on sensor height and model initialization time. The period of record for the data used in this task was based on the operational start of the current MesoNAM in mid-August 2006 and so the task began with the first full month of data, September 2006, through May 2010. The analysis of model performance indicated: a) The accuracy decreased as the forecast valid time from the model initialization increased, b) There was a diurnal signal in T with a cool bias during the late night and a warm bias during the afternoon, c) There was a diurnal signal in Td with a low bias during the afternoon and a high bias during the late night, and d) The model parameters at each vertical level most closely matched the observed parameters at heights closest to those vertical levels. The AMU developed a GUI that consists of a multi-level drop-down menu written in JavaScript embedded within the HTML code. This tool allows the LWO to easily and efficiently navigate among the charts and spreadsheet files containing the model performance statistics. The objective statistics give the LWOs knowledge of the model's strengths and weaknesses and the GUI allows quick access to the data which will result in improved forecasts for operations.

  7. Publication Bias in Antipsychotic Trials: An Analysis of Efficacy Comparing the Published Literature to the US Food and Drug Administration Database

    PubMed Central

    Turner, Erick H.; Knoepflmacher, Daniel; Shapley, Lee

    2012-01-01

    Background Publication bias compromises the validity of evidence-based medicine, yet a growing body of research shows that this problem is widespread. Efficacy data from drug regulatory agencies, e.g., the US Food and Drug Administration (FDA), can serve as a benchmark or control against which data in journal articles can be checked. Thus one may determine whether publication bias is present and quantify the extent to which it inflates apparent drug efficacy. Methods and Findings FDA Drug Approval Packages for eight second-generation antipsychotics—aripiprazole, iloperidone, olanzapine, paliperidone, quetiapine, risperidone, risperidone long-acting injection (risperidone LAI), and ziprasidone—were used to identify a cohort of 24 FDA-registered premarketing trials. The results of these trials according to the FDA were compared with the results conveyed in corresponding journal articles. The relationship between study outcome and publication status was examined, and effect sizes derived from the two data sources were compared. Among the 24 FDA-registered trials, four (17%) were unpublished. Of these, three failed to show that the study drug had a statistical advantage over placebo, and one showed the study drug was statistically inferior to the active comparator. Among the 20 published trials, the five that were not positive, according to the FDA, showed some evidence of outcome reporting bias. However, the association between trial outcome and publication status did not reach statistical significance. Further, the apparent increase in the effect size point estimate due to publication bias was modest (8%) and not statistically significant. On the other hand, the effect size for unpublished trials (0.23, 95% confidence interval 0.07 to 0.39) was less than half that for the published trials (0.47, 95% confidence interval 0.40 to 0.54), a difference that was significant. Conclusions The magnitude of publication bias found for antipsychotics was less than that found previously for antidepressants, possibly because antipsychotics demonstrate superiority to placebo more consistently. Without increased access to regulatory agency data, publication bias will continue to blur distinctions between effective and ineffective drugs. Please see later in the article for the Editors' Summary PMID:22448149

  8. Multivariate meta-analysis using individual participant data

    PubMed Central

    Riley, R. D.; Price, M. J.; Jackson, D.; Wardle, M.; Gueyffier, F.; Wang, J.; Staessen, J. A.; White, I. R.

    2016-01-01

    When combining results across related studies, a multivariate meta-analysis allows the joint synthesis of correlated effect estimates from multiple outcomes. Joint synthesis can improve efficiency over separate univariate syntheses, may reduce selective outcome reporting biases, and enables joint inferences across the outcomes. A common issue is that within-study correlations needed to fit the multivariate model are unknown from published reports. However, provision of individual participant data (IPD) allows them to be calculated directly. Here, we illustrate how to use IPD to estimate within-study correlations, using a joint linear regression for multiple continuous outcomes and bootstrapping methods for binary, survival and mixed outcomes. In a meta-analysis of 10 hypertension trials, we then show how these methods enable multivariate meta-analysis to address novel clinical questions about continuous, survival and binary outcomes; treatment–covariate interactions; adjusted risk/prognostic factor effects; longitudinal data; prognostic and multiparameter models; and multiple treatment comparisons. Both frequentist and Bayesian approaches are applied, with example software code provided to derive within-study correlations and to fit the models. PMID:26099484

  9. Dark skin decreases the accuracy of pulse oximeters at low oxygen saturation: the effects of oximeter probe type and gender.

    PubMed

    Feiner, John R; Severinghaus, John W; Bickler, Philip E

    2007-12-01

    Pulse oximetry may overestimate arterial oxyhemoglobin saturation (Sao2) at low Sao2 levels in individuals with darkly pigmented skin, but other factors, such as gender and oximeter probe type, remain less studied. We studied the relationship between skin pigment and oximeter accuracy in 36 subjects (19 males, 17 females) of a range of skin tones. Clip-on type sensors and adhesive/disposable finger probes for the Masimo Radical, Nellcor N-595, and Nonin 9700 were studied. Semisupine subjects breathed air-nitrogen-CO2 mixtures via a mouthpiece to rapidly achieve 2- to 3-min stable plateaus of Sao2. Comparisons of Sao2 measured by pulse oximetry (Spo2) with Sao2 (by Radiometer OSM-3) were used in a multivariate model to assess the source of errors. The mean bias (Spo2 - Sao2) for the 70%-80% saturation range was 2.61% for the Masimo Radical with clip-on sensor, -1.58% for the Radical with disposable sensor, 2.59% for the Nellcor clip, 3.6% for the Nellcor disposable, -0.60% for the Nonin clip, and 2.43% for the Nonin disposable. Dark skin increased bias at low Sao2; greater bias was seen with adhesive/disposable sensors than with the clip-on types. Up to 10% differences in saturation estimates were found among different instruments in dark-skinned subjects at low Sao2. Multivariate analysis indicated that Sao2 level, sensor type, skin color, and gender were predictive of errors in Spo2 estimates at low Sao2 levels. The data suggest that clinically important bias should be considered when monitoring patients with saturations below 80%, especially those with darkly pigmented skin; but further study is needed to confirm these observations in the relevant populations.

  10. Applying quantitative bias analysis to estimate the plausible effects of selection bias in a cluster randomised controlled trial: secondary analysis of the Primary care Osteoarthritis Screening Trial (POST).

    PubMed

    Barnett, L A; Lewis, M; Mallen, C D; Peat, G

    2017-12-04

    Selection bias is a concern when designing cluster randomised controlled trials (c-RCT). Despite addressing potential issues at the design stage, bias cannot always be eradicated from a trial design. The application of bias analysis presents an important step forward in evaluating whether trial findings are credible. The aim of this paper is to give an example of the technique to quantify potential selection bias in c-RCTs. This analysis uses data from the Primary care Osteoarthritis Screening Trial (POST). The primary aim of this trial was to test whether screening for anxiety and depression, and providing appropriate care for patients consulting their GP with osteoarthritis would improve clinical outcomes. Quantitative bias analysis is a seldom-used technique that can quantify types of bias present in studies. Due to lack of information on the selection probability, probabilistic bias analysis with a range of triangular distributions was also used, applied at all three follow-up time points; 3, 6, and 12 months post consultation. A simple bias analysis was also applied to the study. Worse pain outcomes were observed among intervention participants than control participants (crude odds ratio at 3, 6, and 12 months: 1.30 (95% CI 1.01, 1.67), 1.39 (1.07, 1.80), and 1.17 (95% CI 0.90, 1.53), respectively). Probabilistic bias analysis suggested that the observed effect became statistically non-significant if the selection probability ratio was between 1.2 and 1.4. Selection probability ratios of > 1.8 were needed to mask a statistically significant benefit of the intervention. The use of probabilistic bias analysis in this c-RCT suggested that worse outcomes observed in the intervention arm could plausibly be attributed to selection bias. A very large degree of selection of bias was needed to mask a beneficial effect of intervention making this interpretation less plausible.

  11. Invited Commentary: The Need for Cognitive Science in Methodology.

    PubMed

    Greenland, Sander

    2017-09-15

    There is no complete solution for the problem of abuse of statistics, but methodological training needs to cover cognitive biases and other psychosocial factors affecting inferences. The present paper discusses 3 common cognitive distortions: 1) dichotomania, the compulsion to perceive quantities as dichotomous even when dichotomization is unnecessary and misleading, as in inferences based on whether a P value is "statistically significant"; 2) nullism, the tendency to privilege the hypothesis of no difference or no effect when there is no scientific basis for doing so, as when testing only the null hypothesis; and 3) statistical reification, treating hypothetical data distributions and statistical models as if they reflect known physical laws rather than speculative assumptions for thought experiments. As commonly misused, null-hypothesis significance testing combines these cognitive problems to produce highly distorted interpretation and reporting of study results. Interval estimation has so far proven to be an inadequate solution because it involves dichotomization, an avenue for nullism. Sensitivity and bias analyses have been proposed to address reproducibility problems (Am J Epidemiol. 2017;186(6):646-647); these methods can indeed address reification, but they can also introduce new distortions via misleading specifications for bias parameters. P values can be reframed to lessen distortions by presenting them without reference to a cutoff, providing them for relevant alternatives to the null, and recognizing their dependence on all assumptions used in their computation; they nonetheless require rescaling for measuring evidence. I conclude that methodological development and training should go beyond coverage of mechanistic biases (e.g., confounding, selection bias, measurement error) to cover distortions of conclusions produced by statistical methods and psychosocial forces. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  12. Multivariate statistical analysis of low-voltage EDS spectrum images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, I.M.

    1998-03-01

    Whereas energy-dispersive X-ray spectrometry (EDS) has been used for compositional analysis in the scanning electron microscope for 30 years, the benefits of using low operating voltages for such analyses have been explored only during the last few years. This paper couples low-voltage EDS with two other emerging areas of characterization: spectrum imaging and multivariate statistical analysis. The specimen analyzed for this study was a finished Intel Pentium processor, with the polyimide protective coating stripped off to expose the final active layers.

  13. Elemental analysis of soils using laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS) and laser-induced breakdown spectroscopy (LIBS) with multivariate discrimination: tape mounting as an alternative to pellets for small forensic transfer specimens.

    PubMed

    Jantzi, Sarah C; Almirall, José R

    2014-01-01

    Elemental analysis of soil is a useful application of both laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS) and laser-induced breakdown spectroscopy (LIBS) in geological, agricultural, environmental, archeological, planetary, and forensic sciences. In forensic science, the question to be answered is often whether soil specimens found on objects (e.g., shoes, tires, or tools) originated from the crime scene or other location of interest. Elemental analysis of the soil from the object and the locations of interest results in a characteristic elemental profile of each specimen, consisting of the amount of each element present. Because multiple elements are measured, multivariate statistics can be used to compare the elemental profiles in order to determine whether the specimen from the object is similar to one of the locations of interest. Previous work involved milling and pressing 0.5 g of soil into pellets before analysis using LA-ICP-MS and LIBS. However, forensic examiners prefer techniques that require smaller samples, are less time consuming, and are less destructive, allowing for future analysis by other techniques. An alternative sample introduction method was developed to meet these needs while still providing quantitative results suitable for multivariate comparisons. The tape-mounting method involved deposition of a thin layer of soil onto double-sided adhesive tape. A comparison of tape-mounting and pellet method performance is reported for both LA-ICP-MS and LIBS. Calibration standards and reference materials, prepared using the tape method, were analyzed by LA-ICP-MS and LIBS. As with the pellet method, linear calibration curves were achieved with the tape method, as well as good precision and low bias. Soil specimens from Miami-Dade County were prepared by both the pellet and tape methods and analyzed by LA-ICP-MS and LIBS. Principal components analysis and linear discriminant analysis were applied to the multivariate data. Results from both the tape method and the pellet method were nearly identical, with clear groupings and correct classification rates of >94%.

  14. The impact of moderate wine consumption on the risk of developing prostate cancer

    PubMed Central

    Ferro, Matteo; Foerster, Beat; Abufaraj, Mohammad; Briganti, Alberto; Karakiewicz, Pierre I; Shariat, Shahrokh F

    2018-01-01

    Objective To investigate the impact of moderate wine consumption on the risk of prostate cancer (PCa). We focused on the differential effect of moderate consumption of red versus white wine. Design This study was a meta-analysis that includes data from case–control and cohort studies. Materials and methods A systematic search of Web of Science, Medline/PubMed, and Cochrane library was performed on December 1, 2017. Studies were deemed eligible if they assessed the risk of PCa due to red, white, or any wine using multivariable logistic regression analysis. We performed a formal meta-analysis for the risk of PCa according to moderate wine and wine type consumption (white or red). Heterogeneity between studies was assessed using Cochrane’s Q test and I2 statistics. Publication bias was assessed using Egger’s regression test. Results A total of 930 abstracts and titles were initially identified. After removal of duplicates, reviews, and conference abstracts, 83 full-text original articles were screened. Seventeen studies (611,169 subjects) were included for final evaluation and fulfilled the inclusion criteria. In the case of moderate wine consumption: the pooled risk ratio (RR) for the risk of PCa was 0.98 (95% CI 0.92–1.05, p=0.57) in the multivariable analysis. Moderate white wine consumption increased the risk of PCa with a pooled RR of 1.26 (95% CI 1.10–1.43, p=0.001) in the multi-variable analysis. Meanwhile, moderate red wine consumption had a protective role reducing the risk by 12% (RR 0.88, 95% CI 0.78–0.999, p=0.047) in the multivariable analysis that comprised 222,447 subjects. Conclusions In this meta-analysis, moderate wine consumption did not impact the risk of PCa. Interestingly, regarding the type of wine, moderate consumption of white wine increased the risk of PCa, whereas moderate consumption of red wine had a protective effect. Further analyses are needed to assess the differential molecular effect of white and red wine conferring their impact on PCa risk. PMID:29713200

  15. Fine-Tuning Cross-Battery Assessment Procedures: After Follow-Up Testing, Use All Valid Scores, Cohesive or Not

    ERIC Educational Resources Information Center

    Schneider, W. Joel; Roman, Zachary

    2018-01-01

    We used data simulations to test whether composites consisting of cohesive subtest scores are more accurate than composites consisting of divergent subtest scores. We demonstrate that when multivariate normality holds, divergent and cohesive scores are equally accurate. Furthermore, excluding divergent scores results in biased estimates of…

  16. A General Approach for Estimating Scale Score Reliability for Panel Survey Data

    ERIC Educational Resources Information Center

    Biemer, Paul P.; Christ, Sharon L.; Wiesen, Christopher A.

    2009-01-01

    Scale score measures are ubiquitous in the psychological literature and can be used as both dependent and independent variables in data analysis. Poor reliability of scale score measures leads to inflated standard errors and/or biased estimates, particularly in multivariate analysis. Reliability estimation is usually an integral step to assess…

  17. Exploring Sex Differences in Worry with a Cognitive Vulnerability Model

    ERIC Educational Resources Information Center

    Zalta, Alyson K.; Chambless, Dianne L.

    2008-01-01

    A multivariate model was developed to examine the relative contributions of mastery, stress, interpretive bias, and coping to sex differences in worry. Rumination was incorporated as a second outcome variable to test the specificity of these associations. Participants included two samples of undergraduates totaling 302 men and 379 women. A path…

  18. Quick Overview Scout 2008 Version 1.0

    EPA Science Inventory

    The Scout 2008 version 1.0 statistical software package has been updated from past DOS and Windows versions to provide classical and robust univariate and multivariate graphical and statistical methods that are not typically available in commercial or freeware statistical softwar...

  19. Parametric study of statistical bias in laser Doppler velocimetry

    NASA Technical Reports Server (NTRS)

    Gould, Richard D.; Stevenson, Warren H.; Thompson, H. Doyle

    1989-01-01

    Analytical studies have often assumed that LDV velocity bias depends on turbulence intensity in conjunction with one or more characteristic time scales, such as the time between validated signals, the time between data samples, and the integral turbulence time-scale. These parameters are presently varied independently, in an effort to quantify the biasing effect. Neither of the post facto correction methods employed is entirely accurate. The mean velocity bias error is found to be nearly independent of data validation rate.

  20. The Statistical Consulting Center for Astronomy (SCCA)

    NASA Technical Reports Server (NTRS)

    Akritas, Michael

    2001-01-01

    The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to scientists both within and outside of astronomy. The most popular sections are multivariate techniques, image analysis, and time series analysis. Hundreds of copies of the ASURV, SLOPES and CENS-TAU codes developed by SCCA scientists were also downloaded from the StatCodes site. In addition to formal SCCA duties, SCCA scientists continued a variety of related activities in astrostatistics, including refereeing of statistically oriented papers submitted to the Astrophysical Journal, talks in meetings including Feigelson's talk to science journalists entitled "The reemergence of astrostatistics" at the American Association for the Advancement of Science meeting, and published papers of astrostatistical content.

  1. The use of multivariate statistics in studies of wildlife habitat

    Treesearch

    David E. Capen

    1981-01-01

    This report contains edited and reviewed versions of papers presented at a workshop held at the University of Vermont in April 1980. Topics include sampling avian habitats, multivariate methods, applications, examples, and new approaches to analysis and interpretation.

  2. Rejection of Multivariate Outliers.

    DTIC Science & Technology

    1983-05-01

    available in Gnanadesikan (1977). 2 The motivation for the present investigation lies in a recent paper of Schvager and Margolin (1982) who derive a... Gnanadesikan , R. (1977). Methods for Statistical Data Analysis of Multivariate Observations. Wiley, New York. [7] Hawkins, D.M. (1980). Identification of

  3. Multivariate analysis: greater insights into complex systems

    USDA-ARS?s Scientific Manuscript database

    Many agronomic researchers measure and collect multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate (MV) statistical methods encompass the simultaneous analysis of all random variables (RV) measured on each experimental or sampling ...

  4. Sources of method bias in social science research and recommendations on how to control it.

    PubMed

    Podsakoff, Philip M; MacKenzie, Scott B; Podsakoff, Nathan P

    2012-01-01

    Despite the concern that has been expressed about potential method biases, and the pervasiveness of research settings with the potential to produce them, there is disagreement about whether they really are a problem for researchers in the behavioral sciences. Therefore, the purpose of this review is to explore the current state of knowledge about method biases. First, we explore the meaning of the terms "method" and "method bias" and then we examine whether method biases influence all measures equally. Next, we review the evidence of the effects that method biases have on individual measures and on the covariation between different constructs. Following this, we evaluate the procedural and statistical remedies that have been used to control method biases and provide recommendations for minimizing method bias.

  5. Length bias correction in gene ontology enrichment analysis using logistic regression.

    PubMed

    Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H

    2012-01-01

    When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.

  6. Design, analysis, and interpretation of field quality-control data for water-sampling projects

    USGS Publications Warehouse

    Mueller, David K.; Schertz, Terry L.; Martin, Jeffrey D.; Sandstrom, Mark W.

    2015-01-01

    The report provides extensive information about statistical methods used to analyze quality-control data in order to estimate potential bias and variability in environmental data. These methods include construction of confidence intervals on various statistical measures, such as the mean, percentiles and percentages, and standard deviation. The methods are used to compare quality-control results with the larger set of environmental data in order to determine whether the effects of bias and variability might interfere with interpretation of these data. Examples from published reports are presented to illustrate how the methods are applied, how bias and variability are reported, and how the interpretation of environmental data can be qualified based on the quality-control analysis.

  7. Dealing with dietary measurement error in nutritional cohort studies.

    PubMed

    Freedman, Laurence S; Schatzkin, Arthur; Midthune, Douglas; Kipnis, Victor

    2011-07-20

    Dietary measurement error creates serious challenges to reliably discovering new diet-disease associations in nutritional cohort studies. Such error causes substantial underestimation of relative risks and reduction of statistical power for detecting associations. On the basis of data from the Observing Protein and Energy Nutrition Study, we recommend the following approaches to deal with these problems. Regarding data analysis of cohort studies using food-frequency questionnaires, we recommend 1) using energy adjustment for relative risk estimation; 2) reporting estimates adjusted for measurement error along with the usual relative risk estimates, whenever possible (this requires data from a relevant, preferably internal, validation study in which participants report intakes using both the main instrument and a more detailed reference instrument such as a 24-hour recall or multiple-day food record); 3) performing statistical adjustment of relative risks, based on such validation data, if they exist, using univariate (only for energy-adjusted intakes such as densities or residuals) or multivariate regression calibration. We note that whereas unadjusted relative risk estimates are biased toward the null value, statistical significance tests of unadjusted relative risk estimates are approximately valid. Regarding study design, we recommend increasing the sample size to remedy loss of power; however, it is important to understand that this will often be an incomplete solution because the attenuated signal may be too small to distinguish from unmeasured confounding in the model relating disease to reported intake. Future work should be devoted to alleviating the problem of signal attenuation, possibly through the use of improved self-report instruments or by combining dietary biomarkers with self-report instruments.

  8. A Framework for Establishing Standard Reference Scale of Texture by Multivariate Statistical Analysis Based on Instrumental Measurement and Sensory Evaluation.

    PubMed

    Zhi, Ruicong; Zhao, Lei; Xie, Nan; Wang, Houyin; Shi, Bolin; Shi, Jingye

    2016-01-13

    A framework of establishing standard reference scale (texture) is proposed by multivariate statistical analysis according to instrumental measurement and sensory evaluation. Multivariate statistical analysis is conducted to rapidly select typical reference samples with characteristics of universality, representativeness, stability, substitutability, and traceability. The reasonableness of the framework method is verified by establishing standard reference scale of texture attribute (hardness) with Chinese well-known food. More than 100 food products in 16 categories were tested using instrumental measurement (TPA test), and the result was analyzed with clustering analysis, principal component analysis, relative standard deviation, and analysis of variance. As a result, nine kinds of foods were determined to construct the hardness standard reference scale. The results indicate that the regression coefficient between the estimated sensory value and the instrumentally measured value is significant (R(2) = 0.9765), which fits well with Stevens's theory. The research provides reliable a theoretical basis and practical guide for quantitative standard reference scale establishment on food texture characteristics.

  9. Controlling Guessing Bias in the Dichotomous Rasch Model Applied to a Large-Scale, Vertically Scaled Testing Program

    ERIC Educational Resources Information Center

    Andrich, David; Marais, Ida; Humphry, Stephen Mark

    2016-01-01

    Recent research has shown how the statistical bias in Rasch model difficulty estimates induced by guessing in multiple-choice items can be eliminated. Using vertical scaling of a high-profile national reading test, it is shown that the dominant effect of removing such bias is a nonlinear change in the unit of scale across the continuum. The…

  10. An Examination of Potential Racial and Gender Bias in the Principal Version of the Interactive Computer Interview System

    ERIC Educational Resources Information Center

    DiPonio, Joseph M.

    2010-01-01

    The primary object of this study was to determine whether racial and/or gender bias were evidenced in the use of the ICIS-Principal. Specifically, will the use of the ICIS-Principal result in biased scores at a statistically significant level when rating current practicing administrators of varying gender and race. The study involved simulated…

  11. A Statistical Analysis of Reviewer Agreement and Bias in Evaluating Medical Abstracts 1

    PubMed Central

    Cicchetti, Domenic V.; Conn, Harold O.

    1976-01-01

    Observer variability affects virtually all aspects of clinical medicine and investigation. One important aspect, not previously examined, is the selection of abstracts for presentation at national medical meetings. In the present study, 109 abstracts, submitted to the American Association for the Study of Liver Disease, were evaluated by three “blind” reviewers for originality, design-execution, importance, and overall scientific merit. Of the 77 abstracts rated for all parameters by all observers, interobserver agreement ranged between 81 and 88%. However, corresponding intraclass correlations varied between 0.16 (approaching statistical significance) and 0.37 (p < 0.01). Specific tests of systematic differences in scoring revealed statistically significant levels of observer bias on most of the abstract components. Moreover, the mean differences in interobserver ratings were quite small compared to the standard deviations of these differences. These results emphasize the importance of evaluating the simple percentage of rater agreement within the broader context of observer variability and systematic bias. PMID:997596

  12. Symmetries, invariants and generating functions: higher-order statistics of biased tracers

    NASA Astrophysics Data System (ADS)

    Munshi, Dipak

    2018-01-01

    Gravitationally collapsed objects are known to be biased tracers of an underlying density contrast. Using symmetry arguments, generalised biasing schemes have recently been developed to relate the halo density contrast δh with the underlying density contrast δ, divergence of velocity θ and their higher-order derivatives. This is done by constructing invariants such as s, t, ψ,η. We show how the generating function formalism in Eulerian standard perturbation theory (SPT) can be used to show that many of the additional terms based on extended Galilean and Lifshitz symmetry actually do not make any contribution to the higher-order statistics of biased tracers. Other terms can also be drastically simplified allowing us to write the vertices associated with δh in terms of the vertices of δ and θ, the higher-order derivatives and the bias coefficients. We also compute the cumulant correlators (CCs) for two different tracer populations. These perturbative results are valid for tree-level contributions but at an arbitrary order. We also take into account the stochastic nature bias in our analysis. Extending previous results of a local polynomial model of bias, we express the one-point cumulants Script SN and their two-point counterparts, the CCs i.e. Script Cpq, of biased tracers in terms of that of their underlying density contrast counterparts. As a by-product of our calculation we also discuss the results using approximations based on Lagrangian perturbation theory (LPT).

  13. Linear and non-linear bias: predictions versus measurements

    NASA Astrophysics Data System (ADS)

    Hoffmann, K.; Bel, J.; Gaztañaga, E.

    2017-02-01

    We study the linear and non-linear bias parameters which determine the mapping between the distributions of galaxies and the full matter density fields, comparing different measurements and predictions. Associating galaxies with dark matter haloes in the Marenostrum Institut de Ciències de l'Espai (MICE) Grand Challenge N-body simulation, we directly measure the bias parameters by comparing the smoothed density fluctuations of haloes and matter in the same region at different positions as a function of smoothing scale. Alternatively, we measure the bias parameters by matching the probability distributions of halo and matter density fluctuations, which can be applied to observations. These direct bias measurements are compared to corresponding measurements from two-point and different third-order correlations, as well as predictions from the peak-background model, which we presented in previous papers using the same data. We find an overall variation of the linear bias measurements and predictions of ˜5 per cent with respect to results from two-point correlations for different halo samples with masses between ˜1012and1015 h-1 M⊙ at the redshifts z = 0.0 and 0.5. Variations between the second- and third-order bias parameters from the different methods show larger variations, but with consistent trends in mass and redshift. The various bias measurements reveal a tight relation between the linear and the quadratic bias parameters, which is consistent with results from the literature based on simulations with different cosmologies. Such a universal relation might improve constraints on cosmological models, derived from second-order clustering statistics at small scales or higher order clustering statistics.

  14. Interpreting support vector machine models for multivariate group wise analysis in neuroimaging

    PubMed Central

    Gaonkar, Bilwaj; Shinohara, Russell T; Davatzikos, Christos

    2015-01-01

    Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier’s decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification. PMID:26210913

  15. Traditional agricultural practices and the sex ratio today

    PubMed Central

    2018-01-01

    We study the historical origins of cross-country differences in the male-to-female sex ratio. Our analysis focuses on the use of the plough in traditional agriculture. In societies that did not use the plough, women tended to participate in agriculture as actively as men. By contrast, in societies that used the plough, men specialized in agricultural work, due to the physical strength needed to pull the plough or control the animal that pulls it. We hypothesize that this difference caused plough-using societies to value boys more than girls. Today, this belief is reflected in male-biased sex ratios, which arise due to sex-selective abortion or infanticide, or gender-differences in access to family resources, which results in higher mortality rates for girls. Testing this hypothesis, we show that descendants of societies that traditionally practiced plough agriculture today have higher average male-to-female sex ratios. We find that this effect systematically increases in magnitude and statistical significance as one looks at older cohorts. Estimates using instrumental variables confirm our findings from multivariate OLS analysis. PMID:29338023

  16. Assessment of trace elements levels in patients with Type 2 diabetes using multivariate statistical analysis.

    PubMed

    Badran, M; Morsy, R; Soliman, H; Elnimr, T

    2016-01-01

    The trace elements metabolism has been reported to possess specific roles in the pathogenesis and progress of diabetes mellitus. Due to the continuous increase in the population of patients with Type 2 diabetes (T2D), this study aims to assess the levels and inter-relationships of fast blood glucose (FBG) and serum trace elements in Type 2 diabetic patients. This study was conducted on 40 Egyptian Type 2 diabetic patients and 36 healthy volunteers (Hospital of Tanta University, Tanta, Egypt). The blood serum was digested and then used to determine the levels of 24 trace elements using an inductive coupled plasma mass spectroscopy (ICP-MS). Multivariate statistical analysis depended on correlation coefficient, cluster analysis (CA) and principal component analysis (PCA), were used to analysis the data. The results exhibited significant changes in FBG and eight of trace elements, Zn, Cu, Se, Fe, Mn, Cr, Mg, and As, levels in the blood serum of Type 2 diabetic patients relative to those of healthy controls. The statistical analyses using multivariate statistical techniques were obvious in the reduction of the experimental variables, and grouping the trace elements in patients into three clusters. The application of PCA revealed a distinct difference in associations of trace elements and their clustering patterns in control and patients group in particular for Mg, Fe, Cu, and Zn that appeared to be the most crucial factors which related with Type 2 diabetes. Therefore, on the basis of this study, the contributors of trace elements content in Type 2 diabetic patients can be determine and specify with correlation relationship and multivariate statistical analysis, which confirm that the alteration of some essential trace metals may play a role in the development of diabetes mellitus. Copyright © 2015 Elsevier GmbH. All rights reserved.

  17. Antiviral treatment of Bell's palsy based on baseline severity: a systematic review and meta-analysis.

    PubMed

    Turgeon, Ricky D; Wilby, Kyle J; Ensom, Mary H H

    2015-06-01

    We conducted a systematic review with meta-analysis to evaluate the efficacy of antiviral agents on complete recovery of Bell's palsy. We searched CENTRAL, Embase, MEDLINE, International Pharmaceutical Abstracts, and sources of unpublished literature to November 1, 2014. Primary and secondary outcomes were complete and satisfactory recovery, respectively. To evaluate statistical heterogeneity, we performed subgroup analysis of baseline severity of Bell's palsy and between-study sensitivity analyses based on risk of allocation and detection bias. The 10 included randomized controlled trials (2419 patients; 807 with severe Bell's palsy at onset) had variable risk of bias, with 9 trials having a high risk of bias in at least 1 domain. Complete recovery was not statistically significantly greater with antiviral use versus no antiviral use in the random-effects meta-analysis of 6 trials (relative risk, 1.06; 95% confidence interval, 0.97-1.16; I(2) = 65%). Conversely, random-effects meta-analysis of 9 trials showed a statistically significant difference in satisfactory recovery (relative risk, 1.10; 95% confidence interval, 1.02-1.18; I(2) = 63%). Response to antiviral agents did not differ visually or statistically between patients with severe symptoms at baseline and those with milder disease (test for interaction, P = .11). Sensitivity analyses did not show a clear effect of bias on outcomes. Antiviral agents are not efficacious in increasing the proportion of patients with Bell's palsy who achieved complete recovery, regardless of baseline symptom severity. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. Science Possible Selves and the Desire to be a Scientist: Mindsets, Gender Bias, and Confidence during Early Adolescence.

    PubMed

    Hill, Patricia Wonch; McQuillan, Julia; Talbert, Eli; Spiegel, Amy; Gauthier, G Robin; Diamond, Judy

    2017-06-01

    In the United States, gender gaps in science interest widen during the middle school years. Recent research on adults shows that gender gaps in some academic fields are associated with mindsets about ability and gender-science biases. In a sample of 529 students in a U.S. middle school, we assess how explicit boy-science bias, science confidence, science possible self (belief in being able to become a scientist), and desire to be a scientist vary by gender. Guided by theories and prior research, we use a series of multivariate logistic regression models to examine the relationships between mindsets about ability and these variables. We control for self-reported science grades, social capital, and race/ethnic minority status. Results show that seeing academic ability as innate ("fixed mindsets") is associated with boy-science bias, and that younger girls have less boy-science bias than older girls. Fixed mindsets and boy-science bias are both negatively associated with a science possible self; science confidence is positively associated with a science possible self. In the final model, high science confident and having a science possible self are positively associated with a desire to be a scientist. Facilitating growth mindsets and countering boy-science bias in middle school may be fruitful interventions for widening participation in science careers.

  19. Science Possible Selves and the Desire to be a Scientist: Mindsets, Gender Bias, and Confidence during Early Adolescence

    PubMed Central

    McQuillan, Julia; Talbert, Eli; Spiegel, Amy; Gauthier, G. Robin; Diamond, Judy

    2017-01-01

    In the United States, gender gaps in science interest widen during the middle school years. Recent research on adults shows that gender gaps in some academic fields are associated with mindsets about ability and gender-science biases. In a sample of 529 students in a U.S. middle school, we assess how explicit boy-science bias, science confidence, science possible self (belief in being able to become a scientist), and desire to be a scientist vary by gender. Guided by theories and prior research, we use a series of multivariate logistic regression models to examine the relationships between mindsets about ability and these variables. We control for self-reported science grades, social capital, and race/ethnic minority status. Results show that seeing academic ability as innate (“fixed mindsets”) is associated with boy-science bias, and that younger girls have less boy-science bias than older girls. Fixed mindsets and boy-science bias are both negatively associated with a science possible self; science confidence is positively associated with a science possible self. In the final model, high science confident and having a science possible self are positively associated with a desire to be a scientist. Facilitating growth mindsets and countering boy-science bias in middle school may be fruitful interventions for widening participation in science careers. PMID:29527360

  20. LD Score Regression Distinguishes Confounding from Polygenicity in Genome-Wide Association Studies

    PubMed Central

    Bulik-Sullivan, Brendan K.; Loh, Po-Ru; Finucane, Hilary; Ripke, Stephan; Yang, Jian; Patterson, Nick; Daly, Mark J.; Price, Alkes L.; Neale, Benjamin M.

    2015-01-01

    Both polygenicity (i.e., many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield an inflated distribution of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from true polygenic signal and bias. We have developed an approach, LD Score regression, that quantifies the contribution of each by examining the relationship between test statistics and linkage disequilibrium (LD). The LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of test statistic inflation in many GWAS of large sample size. PMID:25642630

  1. Observers Exploit Stochastic Models of Sensory Change to Help Judge the Passage of Time

    PubMed Central

    Ahrens, Misha B.; Sahani, Maneesh

    2011-01-01

    Summary Sensory stimulation can systematically bias the perceived passage of time [1–5], but why and how this happens is mysterious. In this report, we provide evidence that such biases may ultimately derive from an innate and adaptive use of stochastically evolving dynamic stimuli to help refine estimates derived from internal timekeeping mechanisms [6–15]. A simplified statistical model based on probabilistic expectations of stimulus change derived from the second-order temporal statistics of the natural environment [16, 17] makes three predictions. First, random noise-like stimuli whose statistics violate natural expectations should induce timing bias. Second, a previously unexplored obverse of this effect is that similar noise stimuli with natural statistics should reduce the variability of timing estimates. Finally, this reduction in variability should scale with the interval being timed, so as to preserve the overall Weber law of interval timing. All three predictions are borne out experimentally. Thus, in the context of our novel theoretical framework, these results suggest that observers routinely rely on sensory input to augment their sense of the passage of time, through a process of Bayesian inference based on expectations of change in the natural environment. PMID:21256018

  2. Multivariate Analysis and Prediction of Dioxin-Furan ...

    EPA Pesticide Factsheets

    Peer Review Draft of Regional Methods Initiative Final Report Dioxins, which are bioaccumulative and environmentally persistent, pose an ongoing risk to human and ecosystem health. Fish constitute a significant source of dioxin exposure for humans and fish-eating wildlife. Current dioxin analytical methods are costly, time-consuming, and produce hazardous by-products. A Danish team developed a novel, multivariate statistical methodology based on the covariance of dioxin-furan congener Toxic Equivalences (TEQs) and fatty acid methyl esters (FAMEs) and applied it to North Atlantic Ocean fishmeal samples. The goal of the current study was to attempt to extend this Danish methodology to 77 whole and composite fish samples from three trophic groups: predator (whole largemouth bass), benthic (whole flathead and channel catfish) and forage fish (composite bluegill, pumpkinseed and green sunfish) from two dioxin contaminated rivers (Pocatalico R. and Kanawha R.) in West Virginia, USA. Multivariate statistical analyses, including, Principal Components Analysis (PCA), Hierarchical Clustering, and Partial Least Squares Regression (PLS), were used to assess the relationship between the FAMEs and TEQs in these dioxin contaminated freshwater fish from the Kanawha and Pocatalico Rivers. These three multivariate statistical methods all confirm that the pattern of Fatty Acid Methyl Esters (FAMEs) in these freshwater fish covaries with and is predictive of the WHO TE

  3. Identifying Pleiotropic Genes in Genome-Wide Association Studies for Multivariate Phenotypes with Mixed Measurement Scales

    PubMed Central

    Williams, L. Keoki; Buu, Anne

    2017-01-01

    We propose a multivariate genome-wide association test for mixed continuous, binary, and ordinal phenotypes. A latent response model is used to estimate the correlation between phenotypes with different measurement scales so that the empirical distribution of the Fisher’s combination statistic under the null hypothesis is estimated efficiently. The simulation study shows that our proposed correlation estimation methods have high levels of accuracy. More importantly, our approach conservatively estimates the variance of the test statistic so that the type I error rate is controlled. The simulation also shows that the proposed test maintains the power at the level very close to that of the ideal analysis based on known latent phenotypes while controlling the type I error. In contrast, conventional approaches–dichotomizing all observed phenotypes or treating them as continuous variables–could either reduce the power or employ a linear regression model unfit for the data. Furthermore, the statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that conducting a multivariate test on multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests. The proposed method also offers a new approach to analyzing the Fagerström Test for Nicotine Dependence as multivariate phenotypes in genome-wide association studies. PMID:28081206

  4. Impact of a statistical bias correction on the projected simulated hydrological changes obtained from three GCMs and two hydrology models

    NASA Astrophysics Data System (ADS)

    Hagemann, Stefan; Chen, Cui; Haerter, Jan O.; Gerten, Dieter; Heinke, Jens; Piani, Claudio

    2010-05-01

    Future climate model scenarios depend crucially on their adequate representation of the hydrological cycle. Within the European project "Water and Global Change" (WATCH) special care is taken to couple state-of-the-art climate model output to a suite of hydrological models. This coupling is expected to lead to a better assessment of changes in the hydrological cycle. However, due to the systematic model errors of climate models, their output is often not directly applicable as input for hydrological models. Thus, the methodology of a statistical bias correction has been developed, which can be used for correcting climate model output to produce internally consistent fields that have the same statistical intensity distribution as the observations. As observations, global re-analysed daily data of precipitation and temperature are used that are obtained in the WATCH project. We will apply the bias correction to global climate model data of precipitation and temperature from the GCMs ECHAM5/MPIOM, CNRM-CM3 and LMDZ-4, and intercompare the bias corrected data to the original GCM data and the observations. Then, the orginal and the bias corrected GCM data will be used to force two global hydrology models: (1) the hydrological model of the Max Planck Institute for Meteorology (MPI-HM) consisting of the Simplified Land surface (SL) scheme and the Hydrological Discharge (HD) model, and (2) the dynamic vegetation model LPJmL operated by the Potsdam Institute for Climate Impact Research. The impact of the bias correction on the projected simulated hydrological changes will be analysed, and the resulting behaviour of the two hydrology models will be compared.

  5. Borrowing of strength and study weights in multivariate and network meta-analysis.

    PubMed

    Jackson, Dan; White, Ian R; Price, Malcolm; Copas, John; Riley, Richard D

    2017-12-01

    Multivariate and network meta-analysis have the potential for the estimated mean of one effect to borrow strength from the data on other effects of interest. The extent of this borrowing of strength is usually assessed informally. We present new mathematical definitions of 'borrowing of strength'. Our main proposal is based on a decomposition of the score statistic, which we show can be interpreted as comparing the precision of estimates from the multivariate and univariate models. Our definition of borrowing of strength therefore emulates the usual informal assessment. We also derive a method for calculating study weights, which we embed into the same framework as our borrowing of strength statistics, so that percentage study weights can accompany the results from multivariate and network meta-analyses as they do in conventional univariate meta-analyses. Our proposals are illustrated using three meta-analyses involving correlated effects for multiple outcomes, multiple risk factor associations and multiple treatments (network meta-analysis).

  6. Borrowing of strength and study weights in multivariate and network meta-analysis

    PubMed Central

    Jackson, Dan; White, Ian R; Price, Malcolm; Copas, John; Riley, Richard D

    2016-01-01

    Multivariate and network meta-analysis have the potential for the estimated mean of one effect to borrow strength from the data on other effects of interest. The extent of this borrowing of strength is usually assessed informally. We present new mathematical definitions of ‘borrowing of strength’. Our main proposal is based on a decomposition of the score statistic, which we show can be interpreted as comparing the precision of estimates from the multivariate and univariate models. Our definition of borrowing of strength therefore emulates the usual informal assessment. We also derive a method for calculating study weights, which we embed into the same framework as our borrowing of strength statistics, so that percentage study weights can accompany the results from multivariate and network meta-analyses as they do in conventional univariate meta-analyses. Our proposals are illustrated using three meta-analyses involving correlated effects for multiple outcomes, multiple risk factor associations and multiple treatments (network meta-analysis). PMID:26546254

  7. Effects of Kindergarten Retention on Children's Social-Emotional Development: An Application of Propensity Score Method to Multivariate, Multilevel Data

    ERIC Educational Resources Information Center

    Hong, Guanglei; Yu, Bing

    2008-01-01

    This study examines the effects of kindergarten retention on children's social-emotional development in the early, middle, and late elementary years. Previous studies have generated mixed results partly due to some major methodological challenges, including selection bias, measurement error, and divergent perceptions of multiple respondents in…

  8. Analyzing gene expression from relative codon usage bias in Yeast genome: a statistical significance and biological relevance.

    PubMed

    Das, Shibsankar; Roymondal, Uttam; Sahoo, Satyabrata

    2009-08-15

    Based on the hypothesis that highly expressed genes are often characterized by strong compositional bias in terms of codon usage, there are a number of measures currently in use that quantify codon usage bias in genes, and hence provide numerical indices to predict the expression levels of genes. With the recent advent of expression measure from the score of the relative codon usage bias (RCBS), we have explicitly tested the performance of this numerical measure to predict the gene expression level and illustrate this with an analysis of Yeast genomes. In contradiction with previous other studies, we observe a weak correlations between GC content and RCBS, but a selective pressure on the codon preferences in highly expressed genes. The assertion that the expression of a given gene depends on the score of relative codon usage bias (RCBS) is supported by the data. We further observe a strong correlation between RCBS and protein length indicating natural selection in favour of shorter genes to be expressed at higher level. We also attempt a statistical analysis to assess the strength of relative codon bias in genes as a guide to their likely expression level, suggesting a decrease of the informational entropy in the highly expressed genes.

  9. Assessing signal-to-noise in quantitative proteomics: multivariate statistical analysis in DIGE experiments.

    PubMed

    Friedman, David B

    2012-01-01

    All quantitative proteomics experiments measure variation between samples. When performing large-scale experiments that involve multiple conditions or treatments, the experimental design should include the appropriate number of individual biological replicates from each condition to enable the distinction between a relevant biological signal from technical noise. Multivariate statistical analyses, such as principal component analysis (PCA), provide a global perspective on experimental variation, thereby enabling the assessment of whether the variation describes the expected biological signal or the unanticipated technical/biological noise inherent in the system. Examples will be shown from high-resolution multivariable DIGE experiments where PCA was instrumental in demonstrating biologically significant variation as well as sample outliers, fouled samples, and overriding technical variation that would not be readily observed using standard univariate tests.

  10. Applying Sociocultural Theory to Teaching Statistics for Doctoral Social Work Students

    ERIC Educational Resources Information Center

    Mogro-Wilson, Cristina; Reeves, Michael G.; Charter, Mollie Lazar

    2015-01-01

    This article describes the development of two doctoral-level multivariate statistics courses utilizing sociocultural theory, an integrative pedagogical framework. In the first course, the implementation of sociocultural theory helps to support the students through a rigorous introduction to statistics. The second course involves students…

  11. A review on the multivariate statistical methods for dimensional reduction studies

    NASA Astrophysics Data System (ADS)

    Aik, Lim Eng; Kiang, Lam Chee; Mohamed, Zulkifley Bin; Hong, Tan Wei

    2017-05-01

    In this research study we have discussed multivariate statistical methods for dimensional reduction, which has been done by various researchers. The reduction of dimensionality is valuable to accelerate algorithm progression, as well as really may offer assistance with the last grouping/clustering precision. A lot of boisterous or even flawed info information regularly prompts a not exactly alluring algorithm progression. Expelling un-useful or dis-instructive information segments may for sure help the algorithm discover more broad grouping locales and principles and generally speaking accomplish better exhibitions on new data set.

  12. An experimental verification of laser-velocimeter sampling bias and its correction

    NASA Technical Reports Server (NTRS)

    Johnson, D. A.; Modarress, D.; Owen, F. K.

    1982-01-01

    The existence of 'sampling bias' in individual-realization laser velocimeter measurements is experimentally verified and shown to be independent of sample rate. The experiments were performed in a simple two-stream mixing shear flow with the standard for comparison being laser-velocimeter results obtained under continuous-wave conditions. It is also demonstrated that the errors resulting from sampling bias can be removed by a proper interpretation of the sampling statistics. In addition, data obtained in a shock-induced separated flow and in the near-wake of airfoils are presented, both bias-corrected and uncorrected, to illustrate the effects of sampling bias in the extreme.

  13. Generating an Empirical Probability Distribution for the Andrews-Pregibon Statistic.

    ERIC Educational Resources Information Center

    Jarrell, Michele G.

    A probability distribution was developed for the Andrews-Pregibon (AP) statistic. The statistic, developed by D. F. Andrews and D. Pregibon (1978), identifies multivariate outliers. It is a ratio of the determinant of the data matrix with an observation deleted to the determinant of the entire data matrix. Although the AP statistic has been used…

  14. The bias of the log power spectrum for discrete surveys

    NASA Astrophysics Data System (ADS)

    Repp, Andrew; Szapudi, István

    2018-03-01

    A primary goal of galaxy surveys is to tighten constraints on cosmological parameters, and the power spectrum P(k) is the standard means of doing so. However, at translinear scales P(k) is blind to much of these surveys' information - information which the log density power spectrum recovers. For discrete fields (such as the galaxy density), A* denotes the statistic analogous to the log density: A* is a `sufficient statistic' in that its power spectrum (and mean) capture virtually all of a discrete survey's information. However, the power spectrum of A* is biased with respect to the corresponding log spectrum for continuous fields, and to use P_{A^*}(k) to constrain the values of cosmological parameters, we require some means of predicting this bias. Here, we present a prescription for doing so; for Euclid-like surveys (with cubical cells 16h-1 Mpc across) our bias prescription's error is less than 3 per cent. This prediction will facilitate optimal utilization of the information in future galaxy surveys.

  15. A method to preserve trends in quantile mapping bias correction of climate modeled temperature

    NASA Astrophysics Data System (ADS)

    Grillakis, Manolis G.; Koutroulis, Aristeidis G.; Daliakopoulos, Ioannis N.; Tsanis, Ioannis K.

    2017-09-01

    Bias correction of climate variables is a standard practice in climate change impact (CCI) studies. Various methodologies have been developed within the framework of quantile mapping. However, it is well known that quantile mapping may significantly modify the long-term statistics due to the time dependency of the temperature bias. Here, a method to overcome this issue without compromising the day-to-day correction statistics is presented. The methodology separates the modeled temperature signal into a normalized and a residual component relative to the modeled reference period climatology, in order to adjust the biases only for the former and preserve the signal of the later. The results show that this method allows for the preservation of the originally modeled long-term signal in the mean, the standard deviation and higher and lower percentiles of temperature. To illustrate the improvements, the methodology is tested on daily time series obtained from five Euro CORDEX regional climate models (RCMs).

  16. A comparator-hypothesis account of biased contingency detection.

    PubMed

    Vadillo, Miguel A; Barberia, Itxaso

    2018-02-12

    Our ability to detect statistical dependencies between different events in the environment is strongly biased by the number of coincidences between them. Even when there is no true covariation between a cue and an outcome, if the marginal probability of either of them is high, people tend to perceive some degree of statistical contingency between both events. The present paper explores the ability of the Comparator Hypothesis to explain the general pattern of results observed in this literature. Our simulations show that this model can account for the biasing effects of the marginal probabilities of cues and outcomes. Furthermore, the overall fit of the Comparator Hypothesis to a sample of experimental conditions from previous studies is comparable to that of the popular Rescorla-Wagner model. These results should encourage researchers to further explore and put to the test the predictions of the Comparator Hypothesis in the domain of biased contingency detection. Copyright © 2018 Elsevier B.V. All rights reserved.

  17. Verification and classification bias interactions in diagnostic test accuracy studies for fine-needle aspiration biopsy.

    PubMed

    Schmidt, Robert L; Walker, Brandon S; Cohen, Michael B

    2015-03-01

    Reliable estimates of accuracy are important for any diagnostic test. Diagnostic accuracy studies are subject to unique sources of bias. Verification bias and classification bias are 2 sources of bias that commonly occur in diagnostic accuracy studies. Statistical methods are available to estimate the impact of these sources of bias when they occur alone. The impact of interactions when these types of bias occur together has not been investigated. We developed mathematical relationships to show the combined effect of verification bias and classification bias. A wide range of case scenarios were generated to assess the impact of bias components and interactions on total bias. Interactions between verification bias and classification bias caused overestimation of sensitivity and underestimation of specificity. Interactions had more effect on sensitivity than specificity. Sensitivity was overestimated by at least 7% in approximately 6% of the tested scenarios. Specificity was underestimated by at least 7% in less than 0.1% of the scenarios. Interactions between verification bias and classification bias create distortions in accuracy estimates that are greater than would be predicted from each source of bias acting independently. © 2014 American Cancer Society.

  18. The effect of rare variants on inflation of the test statistics in case-control analyses.

    PubMed

    Pirie, Ailith; Wood, Angela; Lush, Michael; Tyrer, Jonathan; Pharoah, Paul D P

    2015-02-20

    The detection of bias due to cryptic population structure is an important step in the evaluation of findings of genetic association studies. The standard method of measuring this bias in a genetic association study is to compare the observed median association test statistic to the expected median test statistic. This ratio is inflated in the presence of cryptic population structure. However, inflation may also be caused by the properties of the association test itself particularly in the analysis of rare variants. We compared the properties of the three most commonly used association tests: the likelihood ratio test, the Wald test and the score test when testing rare variants for association using simulated data. We found evidence of inflation in the median test statistics of the likelihood ratio and score tests for tests of variants with less than 20 heterozygotes across the sample, regardless of the total sample size. The test statistics for the Wald test were under-inflated at the median for variants below the same minor allele frequency. In a genetic association study, if a substantial proportion of the genetic variants tested have rare minor allele frequencies, the properties of the association test may mask the presence or absence of bias due to population structure. The use of either the likelihood ratio test or the score test is likely to lead to inflation in the median test statistic in the absence of population structure. In contrast, the use of the Wald test is likely to result in under-inflation of the median test statistic which may mask the presence of population structure.

  19. --No Title--

    Science.gov Websites

    Documentation (pdf) Latest statistics (comparing FNMOC raw and bias corrected ensemble forecast) Statistics For September (comparing NCEP20s, NCEP20sb, NAEFS40nb, NAEFS/NUOPC60gb) Statistics For October (comparing NCEP20s, NCEP20sb, NAEFS40nb, NAEFS/NUOPC60gb) Statistics For November (comparing NCEP20s, NCEP20sb

  20. A new u-statistic with superior design sensitivity in matched observational studies.

    PubMed

    Rosenbaum, Paul R

    2011-09-01

    In an observational or nonrandomized study of treatment effects, a sensitivity analysis indicates the magnitude of bias from unmeasured covariates that would need to be present to alter the conclusions of a naïve analysis that presumes adjustments for observed covariates suffice to remove all bias. The power of sensitivity analysis is the probability that it will reject a false hypothesis about treatment effects allowing for a departure from random assignment of a specified magnitude; in particular, if this specified magnitude is "no departure" then this is the same as the power of a randomization test in a randomized experiment. A new family of u-statistics is proposed that includes Wilcoxon's signed rank statistic but also includes other statistics with substantially higher power when a sensitivity analysis is performed in an observational study. Wilcoxon's statistic has high power to detect small effects in large randomized experiments-that is, it often has good Pitman efficiency-but small effects are invariably sensitive to small unobserved biases. Members of this family of u-statistics that emphasize medium to large effects can have substantially higher power in a sensitivity analysis. For example, in one situation with 250 pair differences that are Normal with expectation 1/2 and variance 1, the power of a sensitivity analysis that uses Wilcoxon's statistic is 0.08 while the power of another member of the family of u-statistics is 0.66. The topic is examined by performing a sensitivity analysis in three observational studies, using an asymptotic measure called the design sensitivity, and by simulating power in finite samples. The three examples are drawn from epidemiology, clinical medicine, and genetic toxicology. © 2010, The International Biometric Society.

  1. Pre-Then-Post Testing: A Tool To Improve the Accuracy of Management Training Program Evaluation.

    ERIC Educational Resources Information Center

    Mezoff, Bob

    1981-01-01

    Explains a procedure to avoid the detrimental biases of conventional self-reports of training outcomes. The evaluation format provided is a method for using statistical procedures to increase the accuracy of self-reports by overcoming response-shift-bias. (Author/MER)

  2. Climate model biases and statistical downscaling for application in hydrologic model

    USDA-ARS?s Scientific Manuscript database

    Climate change impact studies use global climate model (GCM) simulations to define future temperature and precipitation. The best available bias-corrected GCM output was obtained from Coupled Model Intercomparison Project phase 5 (CMIP5). CMIP5 data (temperature and precipitation) are available in d...

  3. Four Reasons to Question the Accuracy of a Biotic Index; the Risk of Metric Bias and the Scope to Improve Accuracy

    PubMed Central

    Monaghan, Kieran A.

    2016-01-01

    Natural ecological variability and analytical design can bias the derived value of a biotic index through the variable influence of indicator body-size, abundance, richness, and ascribed tolerance scores. Descriptive statistics highlight this risk for 26 aquatic indicator systems; detailed analysis is provided for contrasting weighted-average indices applying the example of the BMWP, which has the best supporting data. Differences in body size between taxa from respective tolerance classes is a common feature of indicator systems; in some it represents a trend ranging from comparatively small pollution tolerant to larger intolerant organisms. Under this scenario, the propensity to collect a greater proportion of smaller organisms is associated with negative bias however, positive bias may occur when equipment (e.g. mesh-size) selectively samples larger organisms. Biotic indices are often derived from systems where indicator taxa are unevenly distributed along the gradient of tolerance classes. Such skews in indicator richness can distort index values in the direction of taxonomically rich indicator classes with the subsequent degree of bias related to the treatment of abundance data. The misclassification of indicator taxa causes bias that varies with the magnitude of the misclassification, the relative abundance of misclassified taxa and the treatment of abundance data. These artifacts of assessment design can compromise the ability to monitor biological quality. The statistical treatment of abundance data and the manipulation of indicator assignment and class richness can be used to improve index accuracy. While advances in methods of data collection (i.e. DNA barcoding) may facilitate improvement, the scope to reduce systematic bias is ultimately limited to a strategy of optimal compromise. The shortfall in accuracy must be addressed by statistical pragmatism. At any particular site, the net bias is a probabilistic function of the sample data, resulting in an error variance around an average deviation. Following standardized protocols and assigning precise reference conditions, the error variance of their comparative ratio (test-site:reference) can be measured and used to estimate the accuracy of the resultant assessment. PMID:27392036

  4. Birth order and sibship size: evaluation of the role of selection bias in a case-control study of non-Hodgkin's lymphoma.

    PubMed

    Mensah, F K; Willett, E V; Simpson, J; Smith, A G; Roman, E

    2007-09-15

    Substantial heterogeneity has been observed among case-control studies investigating associations between non-Hodgkin's lymphoma and familial characteristics, such as birth order and sibship size. The potential role of selection bias in explaining such heterogeneity is considered within this study. Selection bias according to familial characteristics and socioeconomic status is investigated within a United Kingdom-based case-control study of non-Hodgkin's lymphoma diagnosed during 1998-2001. Reported distributions of birth order and maternal age are each compared with expected reference distributions derived using national birth statistics from the United Kingdom. A method is detailed in which yearly data are used to derive expected distributions, taking account of variability in birth statistics over time. Census data are used to reweight both the case and control study populations such that they are comparable with the general population with regard to socioeconomic status. The authors found little support for an association between non-Hodgkin's lymphoma and birth order or family size and little evidence for an influence of selection bias. However, the findings suggest that between-study heterogeneity could be explained by selection biases that influence the demographic characteristics of participants.

  5. Reirradiation for recurrent head and neck cancers using charged particle or photon radiotherapy.

    PubMed

    Yamazaki, Hideya; Demizu, Yusuke; Okimoto, Tomoaki; Ogita, Mikio; Himei, Kengo; Nakamura, Satoaki; Suzuki, Gen; Yoshida, Ken; Kotsuma, Tadayuki; Yoshioka, Yasuo; Oh, Ryoongjin

    2017-07-01

    To examine the outcomes of reirradiation for recurrent head and neck cancers using different modalities. This retrospective study included 26 patients who received charged particle radiotherapy (CP) and 150 who received photon radiotherapy (117 CyberKnife radiotherapy [CK] and 36 intensity-modulated radiotherapy [IMRT]). Inverse probability of treatment weighting (IPTW) involving propensity scores was used to reduce background selection bias. Higher prescribed doses were used in CP than photon radiotherapy. The 1‑year overall survival (OS) rates were 67.9% for CP and 54.1% for photon radiotherapy (p = 0.15; 55% for CK and 51% for IMRT). In multivariate Cox regression, the significant prognostic factors for better survival were nasopharyngeal cancer, higher prescribed dose, and lower tumor volume. IPTW showed a statistically significant difference between CP and photon radiotherapy (p = 0.04). The local control rates for patients treated with CP and photon radiotherapy at 1 year were 66.9% (range 46.3-87.5%) and 67.1% (range 58.3-75.9%), respectively. A total of 48 patients (27%) experienced toxicity grade ≥3 (24% in the photon radiotherapy group and 46% in the CP group), including 17 patients with grade 5 toxicity. Multivariate analysis revealed that younger age and a larger planning target volume (PTV) were significant risk factors for grade 3 or worse toxicity. CP provided superior survival outcome compared to photon radiotherapy. Tumor volume, primary site (nasopharyngeal), and prescribed dose were identified as survival factors. Younger patients with a larger PTV experienced toxicity grade ≥3.

  6. Total Shoulder Arthroplasty: Is Less Time in the Hospital Better?

    PubMed

    Duchman, Kyle R; Anthony, Chris A; Westermann, Robert W; Pugely, Andrew J; Gao, Yubo; Hettrich, Carolyn M

    2017-01-01

    The incidence of total shoulder arthroplasty (TSA) has increased significantly over the last decade. Short-stay protocols for other highvolume procedures have been shown to be safe and effective but have yet to be fully explored for TSA. Our purpose in comparing short-stay and inpatient TSA was to determine: (1) patient demographics and comorbidities, (2) 30-day morbidity, mortality, and readmissions using a matched analysis, and (3) independent predictors of 30-day complications. The American College of Surgeons National Surgical Quality Improvement (ACS NSQIP) database was queried and all patients undergoing elective, primary TSA between 2006 and 2013 were identified. Patients were categorized as short-stay or inpatient based on day of discharge. Propensity score matching was used to adjust for selection bias. Univariate and multivariate statistical analysis was used to compare 30-day morbidity and mortality between the two cohorts. Overall, 4,619 cases were available, with inpatient admission occurring in 65.7% of patients. Prior to propensity score matching, short-stay patients were significantly younger, more frequently male, with fewer comorbid conditions. After matching, inpatient admission was associated with increased rates of urinary tract infection (1.1% vs. 0.1%; p = 0.001), blood transfusion (5.3% vs. 0.8%; p < 0.001), and total complications (4.7% vs. 1.8%; p < 0.001). Multivariate analysis identified inpatient admission as an independent risk factor for 30-day complication following TSA. Short-stay TSA is a safe option for the appropriately selected patient. Inpatient admission was an independent risk factor for complication following TSA. Level of Evidence: III.

  7. Illustrating, Quantifying, and Correcting for Bias in Post-hoc Analysis of Gene-Based Rare Variant Tests of Association

    PubMed Central

    Grinde, Kelsey E.; Arbet, Jaron; Green, Alden; O'Connell, Michael; Valcarcel, Alessandra; Westra, Jason; Tintle, Nathan

    2017-01-01

    To date, gene-based rare variant testing approaches have focused on aggregating information across sets of variants to maximize statistical power in identifying genes showing significant association with diseases. Beyond identifying genes that are associated with diseases, the identification of causal variant(s) in those genes and estimation of their effect is crucial for planning replication studies and characterizing the genetic architecture of the locus. However, we illustrate that straightforward single-marker association statistics can suffer from substantial bias introduced by conditioning on gene-based test significance, due to the phenomenon often referred to as “winner's curse.” We illustrate the ramifications of this bias on variant effect size estimation and variant prioritization/ranking approaches, outline parameters of genetic architecture that affect this bias, and propose a bootstrap resampling method to correct for this bias. We find that our correction method significantly reduces the bias due to winner's curse (average two-fold decrease in bias, p < 2.2 × 10−6) and, consequently, substantially improves mean squared error and variant prioritization/ranking. The method is particularly helpful in adjustment for winner's curse effects when the initial gene-based test has low power and for relatively more common, non-causal variants. Adjustment for winner's curse is recommended for all post-hoc estimation and ranking of variants after a gene-based test. Further work is necessary to continue seeking ways to reduce bias and improve inference in post-hoc analysis of gene-based tests under a wide variety of genetic architectures. PMID:28959274

  8. Schematic for efficient computation of GC, GC3, and AT3 bias spectra of genome

    PubMed Central

    Rizvi, Ahsan Z; Venu Gopal, T; Bhattacharya, C

    2012-01-01

    Selection of synonymous codons for an amino acid is biased in protein translation process. This biased selection causes repetition of synonymous codons in structural parts of genome that stands for high N/3 peaks in DNA spectrum. Period-3 spectral property is utilized here to produce a 3-phase network model based on polyphase filterbank concepts for derivation of codon bias spectra (CBS). Modification of parameters in this model can produce GC, GC3, and AT3 bias spectra. Complete schematic in LabVIEW platform is presented here for efficient and parallel computation of GC, GC3, and AT3 bias spectra of genomes alongwith results of CBS patterns. We have performed the correlation coefficient analysis of GC, GC3, and AT3 bias spectra with codon bias patterns of CBS for biological and statistical significance of this model. PMID:22368390

  9. Schematic for efficient computation of GC, GC3, and AT3 bias spectra of genome.

    PubMed

    Rizvi, Ahsan Z; Venu Gopal, T; Bhattacharya, C

    2012-01-01

    Selection of synonymous codons for an amino acid is biased in protein translation process. This biased selection causes repetition of synonymous codons in structural parts of genome that stands for high N/3 peaks in DNA spectrum. Period-3 spectral property is utilized here to produce a 3-phase network model based on polyphase filterbank concepts for derivation of codon bias spectra (CBS). Modification of parameters in this model can produce GC, GC3, and AT3 bias spectra. Complete schematic in LabVIEW platform is presented here for efficient and parallel computation of GC, GC3, and AT3 bias spectra of genomes alongwith results of CBS patterns. We have performed the correlation coefficient analysis of GC, GC3, and AT3 bias spectra with codon bias patterns of CBS for biological and statistical significance of this model.

  10. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, George

    1993-01-01

    The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multiparameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resource.

  11. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, Stanislav

    1992-01-01

    The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multi parameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resources.

  12. The Empirical Nature and Statistical Treatment of Missing Data

    ERIC Educational Resources Information Center

    Tannenbaum, Christyn E.

    2009-01-01

    Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…

  13. Risk factors for chronic subdural haematoma formation do not account for the established male bias.

    PubMed

    Marshman, Laurence A G; Manickam, Appukutty; Carter, Danielle

    2015-04-01

    The 'subdural space' is an artefact of inner dural border layer disruption: it is not anatomical but always pathological. A male bias has long been accepted for chronic subdural haematomas (CSDH), and increased male frequencies of trauma and/or alcohol abuse are often cited as likely explanations: however, no study has validated this. We investigated to see which risk factors accounted for the male bias with CSDH. Retrospective review of prospectively collected data. A male bias (M:F 97:58) for CSDH was confirmed in n=155 patients. The largest risk factor for CSDH was cerebral atrophy (M:F 94% vs. 91%): whilst a male bias prevailed in mild-moderate cases (M:F 58% vs. 41%), a female bias prevailed for severe atrophy (F:M 50% vs. 36%) (χ(2)=3.88, P=0.14). Risk factors for atrophy also demonstrated a female bias, some approached statistical significance: atrial fibrillation (P=0.05), stroke/TIA (P=0.06) and diabetes mellitus (P=0.07). There was also a trend for older age in females (F:M 72±13 years vs. 68±15 years, P=0.09). The third largest risk factor, after atrophy and trauma (i.e. anti-coagulant and anti-platelet use) was statistically significantly biased towards females (F:M 50% vs. 33%, P=0.04). No risk factor accounted for the established male bias with CSDH. In particular, a history of trauma (head injury or fall [M:F 50% vs. 57%, P=0.37]), and alcohol abuse (M:F 17% vs. 16%, P=0.89) was remarkably similar between genders. No recognised risk factor for CSDH formation accounted for the established male bias: risk factor trends generally favoured females. In particular, and in contrast to popular belief, a male CSDH bias did not relate to increased male frequencies of trauma and/or alcohol abuse. Crown Copyright © 2015. Published by Elsevier B.V. All rights reserved.

  14. Dangers in Using Analysis of Covariance Procedures.

    ERIC Educational Resources Information Center

    Campbell, Kathleen T.

    Problems associated with the use of analysis of covariance (ANCOVA) as a statistical control technique are explained. Three problems relate to the use of "OVA" methods (analysis of variance, analysis of covariance, multivariate analysis of variance, and multivariate analysis of covariance) in general. These are: (1) the wasting of information when…

  15. A Multivariate Solution of the Multivariate Ranking and Selection Problem

    DTIC Science & Technology

    1980-02-01

    Taneja (1972)), a ’a for a vector of constants c (Krishnaiah and Rizvi (1966)), the generalized variance ( Gnanadesikan and Gupta (1970)), iegier (1976...Olk-in, I. and Sobel, M. (1977). Selecting and Ordering Populations: A New Statistical Methodology, John Wiley & Sons, Inc., New York. Gnanadesikan

  16. Evaluation of Meterorite Amono Acid Analysis Data Using Multivariate Techniques

    NASA Technical Reports Server (NTRS)

    McDonald, G.; Storrie-Lombardi, M.; Nealson, K.

    1999-01-01

    The amino acid distributions in the Murchison carbonaceous chondrite, Mars meteorite ALH84001, and ice from the Allan Hills region of Antarctica are shown, using a multivariate technique known as Principal Component Analysis (PCA), to be statistically distinct from the average amino acid compostion of 101 terrestrial protein superfamilies.

  17. Bias-correction and Spatial Disaggregation for Climate Change Impact Assessments at a basin scale

    NASA Astrophysics Data System (ADS)

    Nyunt, Cho; Koike, Toshio; Yamamoto, Akio; Nemoto, Toshihoro; Kitsuregawa, Masaru

    2013-04-01

    Basin-scale climate change impact studies mainly rely on general circulation models (GCMs) comprising the related emission scenarios. Realistic and reliable data from GCM is crucial for national scale or basin scale impact and vulnerability assessments to build safety society under climate change. However, GCM fail to simulate regional climate features due to the imprecise parameterization schemes in atmospheric physics and coarse resolution scale. This study describes how to exclude some unsatisfactory GCMs with respect to focused basin, how to minimize the biases of GCM precipitation through statistical bias correction and how to cover spatial disaggregation scheme, a kind of downscaling, within in a basin. GCMs rejection is based on the regional climate features of seasonal evolution as a bench mark and mainly depends on spatial correlation and root mean square error of precipitation and atmospheric variables over the target region. Global Precipitation Climatology Project (GPCP) and Japanese 25-uear Reanalysis Project (JRA-25) are specified as references in figuring spatial pattern and error of GCM. Statistical bias-correction scheme comprises improvements of three main flaws of GCM precipitation such as low intensity drizzled rain days with no dry day, underestimation of heavy rainfall and inter-annual variability of local climate. Biases of heavy rainfall are conducted by generalized Pareto distribution (GPD) fitting over a peak over threshold series. Frequency of rain day error is fixed by rank order statistics and seasonal variation problem is solved by using a gamma distribution fitting in each month against insi-tu stations vs. corresponding GCM grids. By implementing the proposed bias-correction technique to all insi-tu stations and their respective GCM grid, an easy and effective downscaling process for impact studies at the basin scale is accomplished. The proposed method have been examined its applicability to some of the basins in various climate regions all over the world. The biases are controlled very well by using this scheme in all applied basins. After that, bias-corrected and downscaled GCM precipitation are ready to use for simulating the Water and Energy Budget based Distributed Hydrological Model (WEB-DHM) to analyse the stream flow change or water availability of a target basin under the climate change in near future. Furthermore, it can be investigated any inter-disciplinary studies such as drought, flood, food, health and so on.In summary, an effective and comprehensive statistical bias-correction method was established to fulfil the generative applicability of GCM scale to basin scale without difficulty. This gap filling also promotes the sound decision of river management in the basin with more reliable information to build the resilience society.

  18. Multivariate meta-analysis using individual participant data.

    PubMed

    Riley, R D; Price, M J; Jackson, D; Wardle, M; Gueyffier, F; Wang, J; Staessen, J A; White, I R

    2015-06-01

    When combining results across related studies, a multivariate meta-analysis allows the joint synthesis of correlated effect estimates from multiple outcomes. Joint synthesis can improve efficiency over separate univariate syntheses, may reduce selective outcome reporting biases, and enables joint inferences across the outcomes. A common issue is that within-study correlations needed to fit the multivariate model are unknown from published reports. However, provision of individual participant data (IPD) allows them to be calculated directly. Here, we illustrate how to use IPD to estimate within-study correlations, using a joint linear regression for multiple continuous outcomes and bootstrapping methods for binary, survival and mixed outcomes. In a meta-analysis of 10 hypertension trials, we then show how these methods enable multivariate meta-analysis to address novel clinical questions about continuous, survival and binary outcomes; treatment-covariate interactions; adjusted risk/prognostic factor effects; longitudinal data; prognostic and multiparameter models; and multiple treatment comparisons. Both frequentist and Bayesian approaches are applied, with example software code provided to derive within-study correlations and to fit the models. © 2014 The Authors. Research Synthesis Methods published by John Wiley & Sons, Ltd.

  19. Unveiling Galaxy Bias via the Halo Model, KiDS and GAMA

    NASA Astrophysics Data System (ADS)

    Dvornik, Andrej; Hoekstra, Henk; Kuijken, Konrad; Schneider, Peter; Amon, Alexandra; Nakajima, Reiko; Viola, Massimo; Choi, Ami; Erben, Thomas; Farrow, Daniel J.; Heymans, Catherine; Hildebrandt, Hendrik; Sifón, Cristóbal; Wang, Lingyu

    2018-06-01

    We measure the projected galaxy clustering and galaxy-galaxy lensing signals using the Galaxy And Mass Assembly (GAMA) survey and Kilo-Degree Survey (KiDS) to study galaxy bias. We use the concept of non-linear and stochastic galaxy biasing in the framework of halo occupation statistics to constrain the parameters of the halo occupation statistics and to unveil the origin of galaxy biasing. The bias function Γgm(rp), where rp is the projected comoving separation, is evaluated using the analytical halo model from which the scale dependence of Γgm(rp), and the origin of the non-linearity and stochasticity in halo occupation models can be inferred. Our observations unveil the physical reason for the non-linearity and stochasticity, further explored using hydrodynamical simulations, with the stochasticity mostly originating from the non-Poissonian behaviour of satellite galaxies in the dark matter haloes and their spatial distribution, which does not follow the spatial distribution of dark matter in the halo. The observed non-linearity is mostly due to the presence of the central galaxies, as was noted from previous theoretical work on the same topic. We also see that overall, more massive galaxies reveal a stronger scale dependence, and out to a larger radius. Our results show that a wealth of information about galaxy bias is hidden in halo occupation models. These models should therefore be used to determine the influence of galaxy bias in cosmological studies.

  20. Bias Corrections for Regional Estimates of the Time-averaged Geomagnetic Field

    NASA Astrophysics Data System (ADS)

    Constable, C.; Johnson, C. L.

    2009-05-01

    We assess two sources of bias in the time-averaged geomagnetic field (TAF) and paleosecular variation (PSV): inadequate temporal sampling, and the use of unit vectors in deriving temporal averages of the regional geomagnetic field. For the first temporal sampling question we use statistical resampling of existing data sets to minimize and correct for bias arising from uneven temporal sampling in studies of the time- averaged geomagnetic field (TAF) and its paleosecular variation (PSV). The techniques are illustrated using data derived from Hawaiian lava flows for 0-5~Ma: directional observations are an updated version of a previously published compilation of paleomagnetic directional data centered on ± 20° latitude by Lawrence et al./(2006); intensity data are drawn from Tauxe & Yamazaki, (2007). We conclude that poor temporal sampling can produce biased estimates of TAF and PSV, and resampling to appropriate statistical distribution of ages reduces this bias. We suggest that similar resampling should be attempted as a bias correction for all regional paleomagnetic data to be used in TAF and PSV modeling. The second potential source of bias is the use of directional data in place of full vector data to estimate the average field. This is investigated for the full vector subset of the updated Hawaiian data set. Lawrence, K.P., C.G. Constable, and C.L. Johnson, 2006, Geochem. Geophys. Geosyst., 7, Q07007, DOI 10.1029/2005GC001181. Tauxe, L., & Yamazkai, 2007, Treatise on Geophysics,5, Geomagnetism, Elsevier, Amsterdam, Chapter 13,p509

  1. Self-calibration of photometric redshift scatter in weak-lensing surveys

    DOE PAGES

    Zhang, Pengjie; Pen, Ue -Li; Bernstein, Gary

    2010-06-11

    Photo-z errors, especially catastrophic errors, are a major uncertainty for precision weak lensing cosmology. We find that the shear-(galaxy number) density and density-density cross correlation measurements between photo-z bins, available from the same lensing surveys, contain valuable information for self-calibration of the scattering probabilities between the true-z and photo-z bins. The self-calibration technique we propose does not rely on cosmological priors nor parameterization of the photo-z probability distribution function, and preserves all of the cosmological information available from shear-shear measurement. We estimate the calibration accuracy through the Fisher matrix formalism. We find that, for advanced lensing surveys such as themore » planned stage IV surveys, the rate of photo-z outliers can be determined with statistical uncertainties of 0.01-1% for z < 2 galaxies. Among the several sources of calibration error that we identify and investigate, the galaxy distribution bias is likely the most dominant systematic error, whereby photo-z outliers have different redshift distributions and/or bias than non-outliers from the same bin. This bias affects all photo-z calibration techniques based on correlation measurements. As a result, galaxy bias variations of O(0.1) produce biases in photo-z outlier rates similar to the statistical errors of our method, so this galaxy distribution bias may bias the reconstructed scatters at several-σ level, but is unlikely to completely invalidate the self-calibration technique.« less

  2. Multivariate probability distribution for sewer system vulnerability assessment under data-limited conditions.

    PubMed

    Del Giudice, G; Padulano, R; Siciliano, D

    2016-01-01

    The lack of geometrical and hydraulic information about sewer networks often excludes the adoption of in-deep modeling tools to obtain prioritization strategies for funds management. The present paper describes a novel statistical procedure for defining the prioritization scheme for preventive maintenance strategies based on a small sample of failure data collected by the Sewer Office of the Municipality of Naples (IT). Novelty issues involve, among others, considering sewer parameters as continuous statistical variables and accounting for their interdependences. After a statistical analysis of maintenance interventions, the most important available factors affecting the process are selected and their mutual correlations identified. Then, after a Box-Cox transformation of the original variables, a methodology is provided for the evaluation of a vulnerability map of the sewer network by adopting a joint multivariate normal distribution with different parameter sets. The goodness-of-fit is eventually tested for each distribution by means of a multivariate plotting position. The developed methodology is expected to assist municipal engineers in identifying critical sewers, prioritizing sewer inspections in order to fulfill rehabilitation requirements.

  3. A guide to statistical analysis in microbial ecology: a community-focused, living review of multivariate data analyses.

    PubMed

    Buttigieg, Pier Luigi; Ramette, Alban

    2014-12-01

    The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynamic, web-based resource providing accessible descriptions of numerous multivariate techniques relevant to microbial ecologists. A combination of interactive elements allows users to discover and navigate between methods relevant to their needs and examine how they have been used by others in the field. We have designed GUSTA ME to become a community-led and -curated service, which we hope will provide a common reference and forum to discuss and disseminate analytical techniques relevant to the microbial ecology community. © 2014 The Authors. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies.

  4. Ensembles of radial basis function networks for spectroscopic detection of cervical precancer

    NASA Technical Reports Server (NTRS)

    Tumer, K.; Ramanujam, N.; Ghosh, J.; Richards-Kortum, R.

    1998-01-01

    The mortality related to cervical cancer can be substantially reduced through early detection and treatment. However, current detection techniques, such as Pap smear and colposcopy, fail to achieve a concurrently high sensitivity and specificity. In vivo fluorescence spectroscopy is a technique which quickly, noninvasively and quantitatively probes the biochemical and morphological changes that occur in precancerous tissue. A multivariate statistical algorithm was used to extract clinically useful information from tissue spectra acquired from 361 cervical sites from 95 patients at 337-, 380-, and 460-nm excitation wavelengths. The multivariate statistical analysis was also employed to reduce the number of fluorescence excitation-emission wavelength pairs required to discriminate healthy tissue samples from precancerous tissue samples. The use of connectionist methods such as multilayered perceptrons, radial basis function (RBF) networks, and ensembles of such networks was investigated. RBF ensemble algorithms based on fluorescence spectra potentially provide automated and near real-time implementation of precancer detection in the hands of nonexperts. The results are more reliable, direct, and accurate than those achieved by either human experts or multivariate statistical algorithms.

  5. SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *

    PubMed Central

    Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.

    2014-01-01

    The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844

  6. A statistical approach for segregating cognitive task stages from multivariate fMRI BOLD time series.

    PubMed

    Demanuele, Charmaine; Bähner, Florian; Plichta, Michael M; Kirsch, Peter; Tost, Heike; Meyer-Lindenberg, Andreas; Durstewitz, Daniel

    2015-01-01

    Multivariate pattern analysis can reveal new information from neuroimaging data to illuminate human cognition and its disturbances. Here, we develop a methodological approach, based on multivariate statistical/machine learning and time series analysis, to discern cognitive processing stages from functional magnetic resonance imaging (fMRI) blood oxygenation level dependent (BOLD) time series. We apply this method to data recorded from a group of healthy adults whilst performing a virtual reality version of the delayed win-shift radial arm maze (RAM) task. This task has been frequently used to study working memory and decision making in rodents. Using linear classifiers and multivariate test statistics in conjunction with time series bootstraps, we show that different cognitive stages of the task, as defined by the experimenter, namely, the encoding/retrieval, choice, reward and delay stages, can be statistically discriminated from the BOLD time series in brain areas relevant for decision making and working memory. Discrimination of these task stages was significantly reduced during poor behavioral performance in dorsolateral prefrontal cortex (DLPFC), but not in the primary visual cortex (V1). Experimenter-defined dissection of time series into class labels based on task structure was confirmed by an unsupervised, bottom-up approach based on Hidden Markov Models. Furthermore, we show that different groupings of recorded time points into cognitive event classes can be used to test hypotheses about the specific cognitive role of a given brain region during task execution. We found that whilst the DLPFC strongly differentiated between task stages associated with different memory loads, but not between different visual-spatial aspects, the reverse was true for V1. Our methodology illustrates how different aspects of cognitive information processing during one and the same task can be separated and attributed to specific brain regions based on information contained in multivariate patterns of voxel activity.

  7. Hunting high and low: disentangling primordial and late-time non-Gaussianity with cosmic densities in spheres

    NASA Astrophysics Data System (ADS)

    Uhlemann, C.; Pajer, E.; Pichon, C.; Nishimichi, T.; Codis, S.; Bernardeau, F.

    2018-03-01

    Non-Gaussianities of dynamical origin are disentangled from primordial ones using the formalism of large deviation statistics with spherical collapse dynamics. This is achieved by relying on accurate analytical predictions for the one-point probability distribution function and the two-point clustering of spherically averaged cosmic densities (sphere bias). Sphere bias extends the idea of halo bias to intermediate density environments and voids as underdense regions. In the presence of primordial non-Gaussianity, sphere bias displays a strong scale dependence relevant for both high- and low-density regions, which is predicted analytically. The statistics of densities in spheres are built to model primordial non-Gaussianity via an initial skewness with a scale dependence that depends on the bispectrum of the underlying model. The analytical formulas with the measured non-linear dark matter variance as input are successfully tested against numerical simulations. For local non-Gaussianity with a range from fNL = -100 to +100, they are found to agree within 2 per cent or better for densities ρ ∈ [0.5, 3] in spheres of radius 15 Mpc h-1 down to z = 0.35. The validity of the large deviation statistics formalism is thereby established for all observationally relevant local-type departures from perfectly Gaussian initial conditions. The corresponding estimators for the amplitude of the non-linear variance σ8 and primordial skewness fNL are validated using a fiducial joint maximum likelihood experiment. The influence of observational effects and the prospects for a future detection of primordial non-Gaussianity from joint one- and two-point densities-in-spheres statistics are discussed.

  8. Calypso: a user-friendly web-server for mining and visualizing microbiome-environment interactions.

    PubMed

    Zakrzewski, Martha; Proietti, Carla; Ellis, Jonathan J; Hasan, Shihab; Brion, Marie-Jo; Berger, Bernard; Krause, Lutz

    2017-03-01

    Calypso is an easy-to-use online software suite that allows non-expert users to mine, interpret and compare taxonomic information from metagenomic or 16S rDNA datasets. Calypso has a focus on multivariate statistical approaches that can identify complex environment-microbiome associations. The software enables quantitative visualizations, statistical testing, multivariate analysis, supervised learning, factor analysis, multivariable regression, network analysis and diversity estimates. Comprehensive help pages, tutorials and videos are provided via a wiki page. The web-interface is accessible via http://cgenome.net/calypso/ . The software is programmed in Java, PERL and R and the source code is available from Zenodo ( https://zenodo.org/record/50931 ). The software is freely available for non-commercial users. l.krause@uq.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  9. Bayesian inference on risk differences: an application to multivariate meta-analysis of adverse events in clinical trials.

    PubMed

    Chen, Yong; Luo, Sheng; Chu, Haitao; Wei, Peng

    2013-05-01

    Multivariate meta-analysis is useful in combining evidence from independent studies which involve several comparisons among groups based on a single outcome. For binary outcomes, the commonly used statistical models for multivariate meta-analysis are multivariate generalized linear mixed effects models which assume risks, after some transformation, follow a multivariate normal distribution with possible correlations. In this article, we consider an alternative model for multivariate meta-analysis where the risks are modeled by the multivariate beta distribution proposed by Sarmanov (1966). This model have several attractive features compared to the conventional multivariate generalized linear mixed effects models, including simplicity of likelihood function, no need to specify a link function, and has a closed-form expression of distribution functions for study-specific risk differences. We investigate the finite sample performance of this model by simulation studies and illustrate its use with an application to multivariate meta-analysis of adverse events of tricyclic antidepressants treatment in clinical trials.

  10. Predictors of Major Depression and Posttraumatic Stress Disorder Following Traumatic Brain Injury: A Systematic Review and Meta-Analysis.

    PubMed

    Cnossen, Maryse C; Scholten, Annemieke C; Lingsma, Hester F; Synnot, Anneliese; Haagsma, Juanita; Steyerberg, Prof Ewout W; Polinder, Suzanne

    2017-01-01

    Although major depressive disorder (MDD) and posttraumatic stress disorder (PTSD) are prevalent after traumatic brain injury (TBI), little is known about which patients are at risk for developing them. The authors systematically reviewed the literature on predictors and multivariable models for MDD and PTSD after TBI. The authors included 26 observational studies. MDD was associated with female gender, preinjury depression, postinjury unemployment, and lower brain volume, whereas PTSD was related to shorter posttraumatic amnesia, memory of the traumatic event, and early posttraumatic symptoms. Risk of bias ratings for most studies were acceptable, although studies that developed a multivariable model suffered from methodological shortcomings.

  11. The impact of non-response bias due to sampling in public health studies: A comparison of voluntary versus mandatory recruitment in a Dutch national survey on adolescent health.

    PubMed

    Cheung, Kei Long; Ten Klooster, Peter M; Smit, Cees; de Vries, Hein; Pieterse, Marcel E

    2017-03-23

    In public health monitoring of young people it is critical to understand the effects of selective non-response, in particular when a controversial topic is involved like substance abuse or sexual behaviour. Research that is dependent upon voluntary subject participation is particularly vulnerable to sampling bias. As respondents whose participation is hardest to elicit on a voluntary basis are also more likely to report risk behaviour, this potentially leads to underestimation of risk factor prevalence. Inviting adolescents to participate in a home-sent postal survey is a typical voluntary recruitment strategy with high non-response, as opposed to mandatory participation during school time. This study examines the extent to which prevalence estimates of adolescent health-related characteristics are biased due to different sampling methods, and whether this also biases within-subject analyses. Cross-sectional datasets collected in 2011 in Twente and IJsselland, two similar and adjacent regions in the Netherlands, were used. In total, 9360 youngsters in a mandatory sample (Twente) and 1952 youngsters in a voluntary sample (IJsselland) participated in the study. To test whether the samples differed on health-related variables, we conducted both univariate and multivariable logistic regression analyses controlling for any demographic difference between the samples. Additional multivariable logistic regressions were conducted to examine moderating effects of sampling method on associations between health-related variables. As expected, females, older individuals, as well as individuals with higher education levels, were over-represented in the voluntary sample, compared to the mandatory sample. Respondents in the voluntary sample tended to smoke less, consume less alcohol (ever, lifetime, and past four weeks), have better mental health, have better subjective health status, have more positive school experiences and have less sexual intercourse than respondents in the mandatory sample. No moderating effects were found for sampling method on associations between variables. This is one of first studies to provide strong evidence that voluntary recruitment may lead to a strong non-response bias in health-related prevalence estimates in adolescents, as compared to mandatory recruitment. The resulting underestimation in prevalence of health behaviours and well-being measures appeared large, up to a four-fold lower proportion for self-reported alcohol consumption. Correlations between variables, though, appeared to be insensitive to sampling bias.

  12. Race and Gender Bias in the Administration of Corporal Punishment.

    ERIC Educational Resources Information Center

    Shaw, Steven R.; Braden, Jeffery B.

    1990-01-01

    Examined disciplinary actions taken by school building administrators after receiving discipline referral to identify evidence of race and gender bias in administration of corporal punishment (CP). Analysis of discipline files (n=6,244) demonstrated statistically significant relationships between race and CP and between gender and CP. Results…

  13. Instrumentation for Assessing Culturally Different: Some Conceptual Issues.

    ERIC Educational Resources Information Center

    Rosenbach, John H.

    Since their beginning, intelligence tests have favored the higher social classes. Despite federal mandates to the contrary, bias in assessment is likely to remain a problem. Claims of test bias can be categorized as popular (naive); clinical (intuitive); statistical (predictive); and psychometric (construct and content). A literature review has…

  14. ESL Student Bias in Instructional Evaluation.

    ERIC Educational Resources Information Center

    Wennerstrom, Ann K.; Heiser, Patty

    1992-01-01

    Reports on a statistical analysis of English-as-a-Second-Language (ESL) student evaluations of teachers in two large programs. Results indicated a systematic bias occurs in ESL student evaluations, raising issues of fairness in the use of student evaluations of ESL teachers for purposes of personnel decisions. (21 references) (GLR)

  15. Misclassification bias in areal estimates

    Treesearch

    Raymond L. Czaplewski

    1992-01-01

    In addition to thematic maps, remote sensing provides estimates of area in different thematic categories. Areal estimates are frequently used for resource inventories, management planning, and assessment analyses. Misclassification causes bias in these statistical areal estimates. For example, if a small percentage of a common cover type is misclassified as a rare...

  16. Racial Bias in the Juvenile Justice System in the United States.

    ERIC Educational Resources Information Center

    Pratt, Menah A. E.

    1993-01-01

    Examines research on racial bias, focusing on the discretionary decision-making junctures of the juvenile court system. Research and statistics continue to suggest that the public, the police, the prosecution, and the jury all treat blacks more unfairly based to a large extent on race. (SLD)

  17. Detecting subtle hydrochemical anomalies with multivariate statistics: an example from homogeneous groundwaters in the Great Artesian Basin, Australia

    NASA Astrophysics Data System (ADS)

    O'Shea, Bethany; Jankowski, Jerzy

    2006-12-01

    The major ion composition of Great Artesian Basin groundwater in the lower Namoi River valley is relatively homogeneous in chemical composition. Traditional graphical techniques have been combined with multivariate statistical methods to determine whether subtle differences in the chemical composition of these waters can be delineated. Hierarchical cluster analysis and principal components analysis were successful in delineating minor variations within the groundwaters of the study area that were not visually identified in the graphical techniques applied. Hydrochemical interpretation allowed geochemical processes to be identified in each statistically defined water type and illustrated how these groundwaters differ from one another. Three main geochemical processes were identified in the groundwaters: ion exchange, precipitation, and mixing between waters from different sources. Both statistical methods delineated an anomalous sample suspected of being influenced by magmatic CO2 input. The use of statistical methods to complement traditional graphical techniques for waters appearing homogeneous is emphasized for all investigations of this type. Copyright

  18. Phylogeography, intraspecific structure and sex-biased dispersal of Dall's porpoise, Phocoenoides dalli, revealed by mitochondrial and microsatellite DNA analyses.

    PubMed

    Escorza-Treviño, S; Dizon, A E

    2000-08-01

    Mitochondrial DNA (mtDNA) control-region sequences and microsatellite loci length polymorphisms were used to estimate phylogeographical patterns (historical patterns underlying contemporary distribution), intraspecific population structure and gender-biased dispersal of Phocoenoides dalli dalli across its entire range. One-hundred and thirteen animals from several geographical strata were sequenced over 379 bp of mtDNA, resulting in 58 mtDNA haplotypes. Analysis using F(ST) values (based on haplotype frequencies) and phi(ST) values (based on frequencies and genetic distances between haplotypes) yielded statistically significant separation (bootstrap values P < 0.05) among most of the stocks currently used for management purposes. A minimum spanning network of haplotypes showed two very distinctive clusters, differentially occupied by western and eastern populations, with some common widespread haplotypes. This suggests some degree of phyletic radiation from west to east, superimposed on gene flow. Highly male-biased migration was detected for several population comparisons. Nuclear microsatellite DNA markers (119 individuals and six loci) provided additional support for population subdivision and gender-biased dispersal detected in the mtDNA sequences. Analysis using F(ST) values (based on allelic frequencies) yielded statistically significant separation between some, but not all, populations distinguished by mtDNA analysis. R(ST) values (based on frequencies of and genetic distance between alleles) showed no statistically significant subdivision. Again, highly male-biased dispersal was detected for all population comparisons, suggesting, together with morphological and reproductive data, the existence of sexual selection. Our molecular results argue for nine distinct dalli-type populations that should be treated as separate units for management purposes.

  19. Collaborative Project: The problem of bias in defining uncertainty in computationally enabled strategies for data-driven climate model development. Final Technical Report.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huerta, Gabriel

    The objective of the project is to develop strategies for better representing scientific sensibilities within statistical measures of model skill that then can be used within a Bayesian statistical framework for data-driven climate model development and improved measures of model scientific uncertainty. One of the thorny issues in model evaluation is quantifying the effect of biases on climate projections. While any bias is not desirable, only those biases that affect feedbacks affect scatter in climate projections. The effort at the University of Texas is to analyze previously calculated ensembles of CAM3.1 with perturbed parameters to discover how biases affect projectionsmore » of global warming. The hypothesis is that compensating errors in the control model can be identified by their effect on a combination of processes and that developing metrics that are sensitive to dependencies among state variables would provide a way to select version of climate models that may reduce scatter in climate projections. Gabriel Huerta at the University of New Mexico is responsible for developing statistical methods for evaluating these field dependencies. The UT effort will incorporate these developments into MECS, which is a set of python scripts being developed at the University of Texas for managing the workflow associated with data-driven climate model development over HPC resources. This report reflects the main activities at the University of New Mexico where the PI (Huerta) and the Postdocs (Nosedal, Hattab and Karki) worked on the project.« less

  20. Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.

    PubMed

    Harrington, Peter de Boves

    2018-01-02

    Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.

  1. Targeted estimation of nuisance parameters to obtain valid statistical inference.

    PubMed

    van der Laan, Mark J

    2014-01-01

    In order to obtain concrete results, we focus on estimation of the treatment specific mean, controlling for all measured baseline covariates, based on observing independent and identically distributed copies of a random variable consisting of baseline covariates, a subsequently assigned binary treatment, and a final outcome. The statistical model only assumes possible restrictions on the conditional distribution of treatment, given the covariates, the so-called propensity score. Estimators of the treatment specific mean involve estimation of the propensity score and/or estimation of the conditional mean of the outcome, given the treatment and covariates. In order to make these estimators asymptotically unbiased at any data distribution in the statistical model, it is essential to use data-adaptive estimators of these nuisance parameters such as ensemble learning, and specifically super-learning. Because such estimators involve optimal trade-off of bias and variance w.r.t. the infinite dimensional nuisance parameter itself, they result in a sub-optimal bias/variance trade-off for the resulting real-valued estimator of the estimand. We demonstrate that additional targeting of the estimators of these nuisance parameters guarantees that this bias for the estimand is second order and thereby allows us to prove theorems that establish asymptotic linearity of the estimator of the treatment specific mean under regularity conditions. These insights result in novel targeted minimum loss-based estimators (TMLEs) that use ensemble learning with additional targeted bias reduction to construct estimators of the nuisance parameters. In particular, we construct collaborative TMLEs (C-TMLEs) with known influence curve allowing for statistical inference, even though these C-TMLEs involve variable selection for the propensity score based on a criterion that measures how effective the resulting fit of the propensity score is in removing bias for the estimand. As a particular special case, we also demonstrate the required targeting of the propensity score for the inverse probability of treatment weighted estimator using super-learning to fit the propensity score.

  2. Can Family Planning Service Statistics Be Used to Track Population-Level Outcomes?

    PubMed

    Magnani, Robert J; Ross, John; Williamson, Jessica; Weinberger, Michelle

    2018-03-21

    The need for annual family planning program tracking data under the Family Planning 2020 (FP2020) initiative has contributed to renewed interest in family planning service statistics as a potential data source for annual estimates of the modern contraceptive prevalence rate (mCPR). We sought to assess (1) how well a set of commonly recorded data elements in routine service statistics systems could, with some fairly simple adjustments, track key population-level outcome indicators, and (2) whether some data elements performed better than others. We used data from 22 countries in Africa and Asia to analyze 3 data elements collected from service statistics: (1) number of contraceptive commodities distributed to clients, (2) number of family planning service visits, and (3) number of current contraceptive users. Data quality was assessed via analysis of mean square errors, using the United Nations Population Division World Contraceptive Use annual mCPR estimates as the "gold standard." We also examined the magnitude of several components of measurement error: (1) variance, (2) level bias, and (3) slope (or trend) bias. Our results indicate modest levels of tracking error for data on commodities to clients (7%) and service visits (10%), and somewhat higher error rates for data on current users (19%). Variance and slope bias were relatively small for all data elements. Level bias was by far the largest contributor to tracking error. Paired comparisons of data elements in countries that collected at least 2 of the 3 data elements indicated a modest advantage of data on commodities to clients. None of the data elements considered was sufficiently accurate to be used to produce reliable stand-alone annual estimates of mCPR. However, the relatively low levels of variance and slope bias indicate that trends calculated from these 3 data elements can be productively used in conjunction with the Family Planning Estimation Tool (FPET) currently used to produce annual mCPR tracking estimates for FP2020. © Magnani et al.

  3. Inferring Master Painters' Esthetic Biases from the Statistics of Portraits

    PubMed Central

    Aleem, Hassan; Correa-Herran, Ivan; Grzywacz, Norberto M.

    2017-01-01

    The Processing Fluency Theory posits that the ease of sensory information processing in the brain facilitates esthetic pleasure. Accordingly, the theory would predict that master painters should display biases toward visual properties such as symmetry, balance, and moderate complexity. Have these biases been occurring and if so, have painters been optimizing these properties (fluency variables)? Here, we address these questions with statistics of portrait paintings from the Early Renaissance period. To do this, we first developed different computational measures for each of the aforementioned fluency variables. Then, we measured their statistics in 153 portraits from 26 master painters, in 27 photographs of people in three controlled poses, and in 38 quickly snapped photographs of individual persons. A statistical comparison between Early Renaissance portraits and quickly snapped photographs revealed that painters showed a bias toward balance, symmetry, and moderate complexity. However, a comparison between portraits and controlled-pose photographs showed that painters did not optimize each of these properties. Instead, different painters presented biases toward different, narrow ranges of fluency variables. Further analysis suggested that the painters' individuality stemmed in part from having to resolve the tension between complexity vs. symmetry and balance. We additionally found that constraints on the use of different painting materials by distinct painters modulated these fluency variables systematically. In conclusion, the Processing Fluency Theory of Esthetic Pleasure would need expansion if we were to apply it to the history of visual art since it cannot explain the lack of optimization of each fluency variables. To expand the theory, we propose the existence of a Neuroesthetic Space, which encompasses the possible values that each of the fluency variables can reach in any given art period. We discuss the neural mechanisms of this Space and propose that it has a distributed representation in the human brain. We further propose that different artists reside in different, small sub-regions of the Space. This Neuroesthetic-Space hypothesis raises the question of how painters and their paintings evolve across art periods. PMID:28337133

  4. A Summary of Biases in the BATSE Burst Trigger

    NASA Technical Reports Server (NTRS)

    Meegan, Charles A; Pendleton, Geoffrey N.; Mallozzi, Robert S.

    1999-01-01

    The BATSE threshold for triggering on a gamma-ray burst is generally expressed in units of peak flux between 50 and 300 keV averaged over 1024 milliseconds. The completeness of the sample is affected by several systematic and statistical affects. A study is currently underway to characterize two of these that have not yet been Included Abstract: in the BATSE trigger efficiency calculation. They are: 1) the effects of statistical fluctuations on the measurement of peak flux and, 2) the effect on the trigger threshold of "slow risers", In which some of the burst flux is Identified as background. Some other biases that have been identified are in fact Malmquist-type biases which relate to a volume limited, rather than peak flux limited, burst source distribution and which cannot be determined without knowledge of the burst luminosity distribution.

  5. Medical School Factors Associated with Changes in Implicit and Explicit Bias Against Gay and Lesbian People among 3492 Graduating Medical Students.

    PubMed

    Phelan, Sean M; Burke, Sara E; Hardeman, Rachel R; White, Richard O; Przedworski, Julia; Dovidio, John F; Perry, Sylvia P; Plankey, Michael; A Cunningham, Brooke; Finstad, Deborah; W Yeazel, Mark; van Ryn, Michelle

    2017-11-01

    Implicit and explicit bias among providers can influence the quality of healthcare. Efforts to address sexual orientation bias in new physicians are hampered by a lack of knowledge of school factors that influence bias among students. To determine whether medical school curriculum, role modeling, diversity climate, and contact with sexual minorities predict bias among graduating students against gay and lesbian people. Prospective cohort study. A sample of 4732 first-year medical students was recruited from a stratified random sample of 49 US medical schools in the fall of 2010 (81% response; 55% of eligible), of which 94.5% (4473) identified as heterosexual. Seventy-eight percent of baseline respondents (3492) completed a follow-up survey in their final semester (spring 2014). Medical school predictors included formal curriculum, role modeling, diversity climate, and contact with sexual minorities. Outcomes were year 4 implicit and explicit bias against gay men and lesbian women, adjusted for bias at year 1. In multivariate models, lower explicit bias against gay men and lesbian women was associated with more favorable contact with LGBT faculty, residents, students, and patients, and perceived skill and preparedness for providing care to LGBT patients. Greater explicit bias against lesbian women was associated with discrimination reported by sexual minority students (b = 1.43 [0.16, 2.71]; p = 0.03). Lower implicit sexual orientation bias was associated with more frequent contact with LGBT faculty, residents, students, and patients (b = -0.04 [-0.07, -0.01); p = 0.008). Greater implicit bias was associated with more faculty role modeling of discriminatory behavior (b = 0.34 [0.11, 0.57); p = 0.004). Medical schools may reduce bias against sexual minority patients by reducing negative role modeling, improving the diversity climate, and improving student preparedness to care for this population.

  6. Skin Temperature Analysis and Bias Correction in a Coupled Land-Atmosphere Data Assimilation System

    NASA Technical Reports Server (NTRS)

    Bosilovich, Michael G.; Radakovich, Jon D.; daSilva, Arlindo; Todling, Ricardo; Verter, Frances

    2006-01-01

    In an initial investigation, remotely sensed surface temperature is assimilated into a coupled atmosphere/land global data assimilation system, with explicit accounting for biases in the model state. In this scheme, an incremental bias correction term is introduced in the model's surface energy budget. In its simplest form, the algorithm estimates and corrects a constant time mean bias for each gridpoint; additional benefits are attained with a refined version of the algorithm which allows for a correction of the mean diurnal cycle. The method is validated against the assimilated observations, as well as independent near-surface air temperature observations. In many regions, not accounting for the diurnal cycle of bias caused degradation of the diurnal amplitude of background model air temperature. Energy fluxes collected through the Coordinated Enhanced Observing Period (CEOP) are used to more closely inspect the surface energy budget. In general, sensible heat flux is improved with the surface temperature assimilation, and two stations show a reduction of bias by as much as 30 Wm(sup -2) Rondonia station in Amazonia, the Bowen ratio changes direction in an improvement related to the temperature assimilation. However, at many stations the monthly latent heat flux bias is slightly increased. These results show the impact of univariate assimilation of surface temperature observations on the surface energy budget, and suggest the need for multivariate land data assimilation. The results also show the need for independent validation data, especially flux stations in varied climate regimes.

  7. The classification of secondary colorectal liver cancer in human biopsy samples using angular dispersive x-ray diffraction and multivariate analysis

    NASA Astrophysics Data System (ADS)

    Theodorakou, Chrysoula; Farquharson, Michael J.

    2009-08-01

    The motivation behind this study is to assess whether angular dispersive x-ray diffraction (ADXRD) data, processed using multivariate analysis techniques, can be used for classifying secondary colorectal liver cancer tissue and normal surrounding liver tissue in human liver biopsy samples. The ADXRD profiles from a total of 60 samples of normal liver tissue and colorectal liver metastases were measured using a synchrotron radiation source. The data were analysed for 56 samples using nonlinear peak-fitting software. Four peaks were fitted to all of the ADXRD profiles, and the amplitude, area, amplitude and area ratios for three of the four peaks were calculated and used for the statistical and multivariate analysis. The statistical analysis showed that there are significant differences between all the peak-fitting parameters and ratios between the normal and the diseased tissue groups. The technique of soft independent modelling of class analogy (SIMCA) was used to classify normal liver tissue and colorectal liver metastases resulting in 67% of the normal tissue samples and 60% of the secondary colorectal liver tissue samples being classified correctly. This study has shown that the ADXRD data of normal and secondary colorectal liver cancer are statistically different and x-ray diffraction data analysed using multivariate analysis have the potential to be used as a method of tissue classification.

  8. SOCR Motion Charts: An Efficient, Open-Source, Interactive and Dynamic Applet for Visualizing Longitudinal Multivariate Data

    PubMed Central

    Al-Aziz, Jameel; Christou, Nicolas; Dinov, Ivo D.

    2011-01-01

    The amount, complexity and provenance of data have dramatically increased in the past five years. Visualization of observed and simulated data is a critical component of any social, environmental, biomedical or scientific quest. Dynamic, exploratory and interactive visualization of multivariate data, without preprocessing by dimensionality reduction, remains a nearly insurmountable challenge. The Statistics Online Computational Resource (www.SOCR.ucla.edu) provides portable online aids for probability and statistics education, technology-based instruction and statistical computing. We have developed a new Java-based infrastructure, SOCR Motion Charts, for discovery-based exploratory analysis of multivariate data. This interactive data visualization tool enables the visualization of high-dimensional longitudinal data. SOCR Motion Charts allows mapping of ordinal, nominal and quantitative variables onto time, 2D axes, size, colors, glyphs and appearance characteristics, which facilitates the interactive display of multidimensional data. We validated this new visualization paradigm using several publicly available multivariate datasets including Ice-Thickness, Housing Prices, Consumer Price Index, and California Ozone Data. SOCR Motion Charts is designed using object-oriented programming, implemented as a Java Web-applet and is available to the entire community on the web at www.socr.ucla.edu/SOCR_MotionCharts. It can be used as an instructional tool for rendering and interrogating high-dimensional data in the classroom, as well as a research tool for exploratory data analysis. PMID:21479108

  9. A Descriptive Study of Individual and Cross-Cultural Differences in Statistics Anxiety

    ERIC Educational Resources Information Center

    Baloglu, Mustafa; Deniz, M. Engin; Kesici, Sahin

    2011-01-01

    The present study investigated individual and cross-cultural differences in statistics anxiety among 223 Turkish and 237 American college students. A 2 x 2 between-subjects factorial multivariate analysis of covariance (MANCOVA) was performed on the six dependent variables which are the six subscales of the Statistical Anxiety Rating Scale.…

  10. Hyperspectral target detection using heavy-tailed distributions

    NASA Astrophysics Data System (ADS)

    Willis, Chris J.

    2009-09-01

    One promising approach to target detection in hyperspectral imagery exploits a statistical mixture model to represent scene content at a pixel level. The process then goes on to look for pixels which are rare, when judged against the model, and marks them as anomalies. It is assumed that military targets will themselves be rare and therefore likely to be detected amongst these anomalies. For the typical assumption of multivariate Gaussianity for the mixture components, the presence of the anomalous pixels within the training data will have a deleterious effect on the quality of the model. In particular, the derivation process itself is adversely affected by the attempt to accommodate the anomalies within the mixture components. This will bias the statistics of at least some of the components away from their true values and towards the anomalies. In many cases this will result in a reduction in the detection performance and an increased false alarm rate. This paper considers the use of heavy-tailed statistical distributions within the mixture model. Such distributions are better able to account for anomalies in the training data within the tails of their distributions, and the balance of the pixels within their central masses. This means that an improved model of the majority of the pixels in the scene may be produced, ultimately leading to a better anomaly detection result. The anomaly detection techniques are examined using both synthetic data and hyperspectral imagery with injected anomalous pixels. A range of results is presented for the baseline Gaussian mixture model and for models accommodating heavy-tailed distributions, for different parameterizations of the algorithms. These include scene understanding results, anomalous pixel maps at given significance levels and Receiver Operating Characteristic curves.

  11. Statistics of natural movements are reflected in motor errors.

    PubMed

    Howard, Ian S; Ingram, James N; Körding, Konrad P; Wolpert, Daniel M

    2009-09-01

    Humans use their arms to engage in a wide variety of motor tasks during everyday life. However, little is known about the statistics of these natural arm movements. Studies of the sensory system have shown that the statistics of sensory inputs are key to determining sensory processing. We hypothesized that the statistics of natural everyday movements may, in a similar way, influence motor performance as measured in laboratory-based tasks. We developed a portable motion-tracking system that could be worn by subjects as they went about their daily routine outside of a laboratory setting. We found that the well-documented symmetry bias is reflected in the relative incidence of movements made during everyday tasks. Specifically, symmetric and antisymmetric movements are predominant at low frequencies, whereas only symmetric movements are predominant at high frequencies. Moreover, the statistics of natural movements, that is, their relative incidence, correlated with subjects' performance on a laboratory-based phase-tracking task. These results provide a link between natural movement statistics and motor performance and confirm that the symmetry bias documented in laboratory studies is a natural feature of human movement.

  12. A Statistical Bias Correction Tool for Generating Climate Change Scenarios in Indonesia based on CMIP5 Datasets

    NASA Astrophysics Data System (ADS)

    Faqih, A.

    2017-03-01

    Providing information regarding future climate scenarios is very important in climate change study. The climate scenario can be used as basic information to support adaptation and mitigation studies. In order to deliver future climate scenarios over specific region, baseline and projection data from the outputs of global climate models (GCM) is needed. However, due to its coarse resolution, the data have to be downscaled and bias corrected in order to get scenario data with better spatial resolution that match the characteristics of the observed data. Generating this downscaled data is mostly difficult for scientist who do not have specific background, experience and skill in dealing with the complex data from the GCM outputs. In this regards, it is necessary to develop a tool that can be used to simplify the downscaling processes in order to help scientist, especially in Indonesia, for generating future climate scenario data that can be used for their climate change-related studies. In this paper, we introduce a tool called as “Statistical Bias Correction for Climate Scenarios (SiBiaS)”. The tool is specially designed to facilitate the use of CMIP5 GCM data outputs and process their statistical bias corrections relative to the reference data from observations. It is prepared for supporting capacity building in climate modeling in Indonesia as part of the Indonesia 3rd National Communication (TNC) project activities.

  13. Multivariate geomorphic analysis of forest streams: Implications for assessment of land use impacts on channel condition

    Treesearch

    Richard. D. Wood-Smith; John M. Buffington

    1996-01-01

    Multivariate statistical analyses of geomorphic variables from 23 forest stream reaches in southeast Alaska result in successful discrimination between pristine streams and those disturbed by land management, specifically timber harvesting and associated road building. Results of discriminant function analysis indicate that a three-variable model discriminates 10...

  14. Parametric Cost Models for Space Telescopes

    NASA Technical Reports Server (NTRS)

    Stahl, H. Philip

    2010-01-01

    A study is in-process to develop a multivariable parametric cost model for space telescopes. Cost and engineering parametric data has been collected on 30 different space telescopes. Statistical correlations have been developed between 19 variables of 59 variables sampled. Single Variable and Multi-Variable Cost Estimating Relationships have been developed. Results are being published.

  15. Preliminary Multi-Variable Parametric Cost Model for Space Telescopes

    NASA Technical Reports Server (NTRS)

    Stahl, H. Philip; Hendrichs, Todd

    2010-01-01

    This slide presentation reviews creating a preliminary multi-variable cost model for the contract costs of making a space telescope. There is discussion of the methodology for collecting the data, definition of the statistical analysis methodology, single variable model results, testing of historical models and an introduction of the multi variable models.

  16. Effects of Heterogeniety on Spatial Pattern Analysis of Wild Pistachio Trees in Zagros Woodlands, Iran

    NASA Astrophysics Data System (ADS)

    Erfanifard, Y.; Rezayan, F.

    2014-10-01

    Vegetation heterogeneity biases second-order summary statistics, e.g., Ripley's K-function, applied for spatial pattern analysis in ecology. Second-order investigation based on Ripley's K-function and related statistics (i.e., L- and pair correlation function g) is widely used in ecology to develop hypothesis on underlying processes by characterizing spatial patterns of vegetation. The aim of this study was to demonstrate effects of underlying heterogeneity of wild pistachio (Pistacia atlantica Desf.) trees on the second-order summary statistics of point pattern analysis in a part of Zagros woodlands, Iran. The spatial distribution of 431 wild pistachio trees was accurately mapped in a 40 ha stand in the Wild Pistachio & Almond Research Site, Fars province, Iran. Three commonly used second-order summary statistics (i.e., K-, L-, and g-functions) were applied to analyse their spatial pattern. The two-sample Kolmogorov-Smirnov goodness-of-fit test showed that the observed pattern significantly followed an inhomogeneous Poisson process null model in the study region. The results also showed that heterogeneous pattern of wild pistachio trees biased the homogeneous form of K-, L-, and g-functions, demonstrating a stronger aggregation of the trees at the scales of 0-50 m than actually existed and an aggregation at scales of 150-200 m, while regularly distributed. Consequently, we showed that heterogeneity of point patterns may bias the results of homogeneous second-order summary statistics and we also suggested applying inhomogeneous summary statistics with related null models for spatial pattern analysis of heterogeneous vegetations.

  17. Facilitating the Transition from Bright to Dim Environments

    DTIC Science & Technology

    2016-03-04

    For the parametric data, a multivariate ANOVA was used in determining the systematic presence of any statistically significant performance differences...performed. All significance levels were p < 0.05, and statistical analyses were performed with the Statistical Package for Social Sciences ( SPSS ...1950. Age changes in rate and level of visual dark adaptation. Journal of Applied Physiology, 2, 407–411. Field, A. 2009. Discovering statistics

  18. Artificial bias typically neglected in comparisons of uncertain atmospheric data

    NASA Astrophysics Data System (ADS)

    Pitkänen, Mikko R. A.; Mikkonen, Santtu; Lehtinen, Kari E. J.; Lipponen, Antti; Arola, Antti

    2016-09-01

    Publications in atmospheric sciences typically neglect biases caused by regression dilution (bias of the ordinary least squares line fitting) and regression to the mean (RTM) in comparisons of uncertain data. We use synthetic observations mimicking real atmospheric data to demonstrate how the biases arise from random data uncertainties of measurements, model output, or satellite retrieval products. Further, we provide examples of typical methods of data comparisons that have a tendency to pronounce the biases. The results show, that data uncertainties can significantly bias data comparisons due to regression dilution and RTM, a fact that is known in statistics but disregarded in atmospheric sciences. Thus, we argue that often these biases are widely regarded as measurement or modeling errors, for instance, while they in fact are artificial. It is essential that atmospheric and geoscience communities become aware of and consider these features in research.

  19. Treatment of patent ductus arteriosus and neonatal mortality/morbidities: adjustment for treatment selection bias.

    PubMed

    Mirea, Lucia; Sankaran, Koravangattu; Seshia, Mary; Ohlsson, Arne; Allen, Alexander C; Aziz, Khalid; Lee, Shoo K; Shah, Prakesh S

    2012-10-01

    To examine the association between treatment for patent ductus arteriosus (PDA) and neonatal outcomes in preterm infants, after adjustment for treatment selection bias. Secondary analyses were conducted using data collected by the Canadian Neonatal Network for neonates born at a gestational age ≤ 32 weeks and admitted to neonatal intensive care units in Canada between 2004 and 2008. Infants who had PDA and survived beyond 72 hours were included in multivariable logistic regression analyses that compared mortality or any severe neonatal morbidity (intraventricular hemorrhage grades ≥ 3, retinopathy of prematurity stages ≥ 3, bronchopulmonary dysplasia, or necrotizing enterocolitis stages ≥ 2) between treatment groups (conservative management, indomethacin only, surgical ligation only, or both indomethacin and ligation). Propensity scores (PS) were estimated for each pair of treatment comparisons, and used in PS-adjusted and PS-matched analyses. Among 3556 eligible infants with a diagnosis of PDA, 577 (16%) were conservatively managed, 2026 (57%) received indomethacin only, 327 (9%) underwent ligation only, and 626 (18%) were treated with both indomethacin and ligation. All multivariable and PS-based analyses detected significantly higher mortality/morbidities for surgically ligated infants, irrespective of prior indomethacin treatment (OR ranged from 1.25-2.35) compared with infants managed conservatively or those who received only indomethacin. No significant differences were detected between infants treated with only indomethacin and those managed conservatively. Surgical ligation of PDA in preterm neonates was associated with increased neonatal mortality/morbidity in all analyses adjusted for measured confounders that attempt to account for treatment selection bias. Copyright © 2012 Mosby, Inc. All rights reserved.

  20. A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula

    PubMed Central

    Giordano, Bruno L.; Kayser, Christoph; Rousselet, Guillaume A.; Gross, Joachim; Schyns, Philippe G.

    2016-01-01

    Abstract We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open‐source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541–1573, 2017. © 2016 Wiley Periodicals, Inc. PMID:27860095

  1. Bias estimation for moving optical sensor measurements with targets of opportunity

    NASA Astrophysics Data System (ADS)

    Belfadel, Djedjiga; Osborne, Richard W.; Bar-Shalom, Yaakov

    2014-06-01

    Integration of space based sensors into a Ballistic Missile Defense System (BMDS) allows for detection and tracking of threats over a larger area than ground based sensors [1]. This paper examines the effect of sensor bias error on the tracking quality of a Space Tracking and Surveillance System (STSS) for the highly non-linear problem of tracking a ballistic missile. The STSS constellation consists of two or more satellites (on known trajectories) for tracking ballistic targets. Each satellite is equipped with an IR sensor that provides azimuth and elevation to the target. The tracking problem is made more difficult due to a constant or slowly varying bias error present in each sensor's line of sight measurements. It is important to correct for these bias errors so that the multiple sensor measurements and/or tracks can be referenced as accurately as possible to a common tracking coordinate system. The measurements provided by these sensors are assumed time-coincident (synchronous) and perfectly associated. The line of sight (LOS) measurements from the sensors can be fused into measurements which are the Cartesian target position, i.e., linear in the target state. We evaluate the Cramér-Rao Lower Bound (CRLB) on the covariance of the bias estimates, which serves as a quantification of the available information about the biases. Statistical tests on the results of simulations show that this method is statistically efficient, even for small sample sizes (as few as two sensors and six points on the (unknown) trajectory of a single target of opportunity). We also show that the RMS position error is significantly improved with bias estimation compared with the target position estimation using the original biased measurements.

  2. Comparison of two control groups for estimation of oral cholera vaccine effectiveness using a case-control study design.

    PubMed

    Franke, Molly F; Jerome, J Gregory; Matias, Wilfredo R; Ternier, Ralph; Hilaire, Isabelle J; Harris, Jason B; Ivers, Louise C

    2017-10-13

    Case-control studies to quantify oral cholera vaccine effectiveness (VE) often rely on neighbors without diarrhea as community controls. Test-negative controls can be easily recruited and may minimize bias due to differential health-seeking behavior and recall. We compared VE estimates derived from community and test-negative controls and conducted bias-indicator analyses to assess potential bias with community controls. From October 2012 through November 2016, patients with acute watery diarrhea were recruited from cholera treatment centers in rural Haiti. Cholera cases had a positive stool culture. Non-cholera diarrhea cases (test-negative controls and non-cholera diarrhea cases for bias-indicator analyses) had a negative culture and rapid test. Up to four community controls were matched to diarrhea cases by age group, time, and neighborhood. Primary analyses included 181 cholera cases, 157 non-cholera diarrhea cases, 716 VE community controls and 625 bias-indicator community controls. VE for self-reported vaccination with two doses was consistent across the two control groups, with statistically significant VE estimates ranging from 72 to 74%. Sensitivity analyses revealed similar, though somewhat attenuated estimates for self-reported two dose VE. Bias-indicator estimates were consistently less than one, with VE estimates ranging from 19 to 43%, some of which were statistically significant. OCV estimates from case-control analyses using community and test-negative controls were similar. While bias-indicator analyses suggested possible over-estimation of VE estimates using community controls, test-negative analyses suggested this bias, if present, was minimal. Test-negative controls can be a valid low-cost and time-efficient alternative to community controls for OCV effectiveness estimation and may be especially relevant in emergency situations. Copyright © 2017. Published by Elsevier Ltd.

  3. Estimating the decomposition of predictive information in multivariate systems

    NASA Astrophysics Data System (ADS)

    Faes, Luca; Kugiumtzis, Dimitris; Nollo, Giandomenico; Jurysta, Fabrice; Marinazzo, Daniele

    2015-03-01

    In the study of complex systems from observed multivariate time series, insight into the evolution of one system may be under investigation, which can be explained by the information storage of the system and the information transfer from other interacting systems. We present a framework for the model-free estimation of information storage and information transfer computed as the terms composing the predictive information about the target of a multivariate dynamical process. The approach tackles the curse of dimensionality employing a nonuniform embedding scheme that selects progressively, among the past components of the multivariate process, only those that contribute most, in terms of conditional mutual information, to the present target process. Moreover, it computes all information-theoretic quantities using a nearest-neighbor technique designed to compensate the bias due to the different dimensionality of individual entropy terms. The resulting estimators of prediction entropy, storage entropy, transfer entropy, and partial transfer entropy are tested on simulations of coupled linear stochastic and nonlinear deterministic dynamic processes, demonstrating the superiority of the proposed approach over the traditional estimators based on uniform embedding. The framework is then applied to multivariate physiologic time series, resulting in physiologically well-interpretable information decompositions of cardiovascular and cardiorespiratory interactions during head-up tilt and of joint brain-heart dynamics during sleep.

  4. Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design

    PubMed Central

    Lu, Tsui-Shan; Longnecker, Matthew P.; Zhou, Haibo

    2016-01-01

    Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data and the general ODS design for a continuous response. While substantial work has been done for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome dependent sampling (Multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the Multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the Multivariate-ODS or the estimator from a simple random sample with the same sample size. The Multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of PCB exposure to hearing loss in children born to the Collaborative Perinatal Study. PMID:27966260

  5. Consequences of Outcome Reporting Bias in Education Research

    ERIC Educational Resources Information Center

    Williams, R. T.; Polanin, J. R.

    2014-01-01

    Publication bias is a term that typically refers to the well-known tendency for studies lacking statistically significant results to be less likely to be published in peer-reviewed journals. This happens because authors are less likely to submit, while editors and reviewers are less likely to accept for publication, papers that lack statistically…

  6. Making Heads or Tails of Probability: An Experiment with Random Generators

    ERIC Educational Resources Information Center

    Morsanyi, Kinga; Handley, Simon J.; Serpell, Sylvie

    2013-01-01

    Background: The equiprobability bias is a tendency for individuals to think of probabilistic events as "equiprobable" by nature, and to judge outcomes that occur with different probabilities as equally likely. The equiprobability bias has been repeatedly found to be related to formal education in statistics, and it is claimed to be based…

  7. The Detection and Correction of Bias in Student Ratings of Instruction.

    ERIC Educational Resources Information Center

    Haladyna, Thomas; Hess, Robert K.

    1994-01-01

    A Rasch model was used to detect and correct bias in Likert rating scales used to assess student perceptions of college teaching, using a database of ratings. Statistical corrections were significant, supporting the model's potential utility. Recommendations are made for a theoretical rationale and further research on the model. (Author/MSE)

  8. Quantile regression reveals hidden bias and uncertainty in habitat models

    Treesearch

    Brian S. Cade; Barry R. Noon; Curtis H. Flather

    2005-01-01

    We simulated the effects of missing information on statistical distributions of animal response that covaried with measured predictors of habitat to evaluate the utility and performance of quantile regression for providing more useful intervals of uncertainty in habitat relationships. These procedures were evaulated for conditions in which heterogeneity and hidden bias...

  9. Calibration of remotely sensed proportion or area estimates for misclassification error

    Treesearch

    Raymond L. Czaplewski; Glenn P. Catts

    1992-01-01

    Classifications of remotely sensed data contain misclassification errors that bias areal estimates. Monte Carlo techniques were used to compare two statistical methods that correct or calibrate remotely sensed areal estimates for misclassification bias using reference data from an error matrix. The inverse calibration estimator was consistently superior to the...

  10. Stand-Biased Versus Seated Classrooms and Childhood Obesity: A Randomized Experiment in Texas

    PubMed Central

    Wendel, Monica L.; Zhao, Hongwei; Jeffrey, Christina

    2016-01-01

    Objectives. To measure changes in body mass index (BMI) percentiles among third- and fourth-grade students in stand-biased classrooms and traditional seated classrooms in 3 Texas elementary schools. Methods. Research staff recorded the height and weight of 380 students in 24 classrooms across the 3 schools at the beginning (2011–2012) and end (2012–2013) of the 2-year study. Results. After adjustment for grade, race/ethnicity, and gender, there was a statistically significant decrease in BMI percentile in the group that used stand-biased desks for 2 consecutive years relative to the group that used standard desks during both years. Mean BMI increased by 0.1 and 0.4 kilograms per meter squared in the treatment and control groups, respectively. The between-group difference in BMI percentile change was 5.24 (SE = 2.50; P = .037). No other covariates had a statistically significant impact on BMI percentile changes. Conclusions. Changing a classroom to a stand-biased environment had a significant effect on students’ BMI percentile, indicating the need to redesign traditional classroom environments. PMID:27552276

  11. A method for estimation of bias and variability of continuous gas monitor data: application to carbon monoxide monitor accuracy.

    PubMed

    Shulman, Stanley A; Smith, Jerome P

    2002-01-01

    A method is presented for the evaluation of the bias, variability, and accuracy of gas monitors. This method is based on using the parameters for the fitted response curves of the monitors. Thereby, variability between calibrations, between dates within each calibration period, and between different units can be evaluated at several different standard concentrations. By combining variability information with bias information, accuracy can be assessed. An example using carbon monoxide monitor data is provided. Although the most general statistical software required for these tasks is not available on a spreadsheet, when the same number of dates in a calibration period are evaluated for each monitor unit, the calculations can be done on a spreadsheet. An example of such calculations, together with the formulas needed for their implementation, is provided. In addition, the methods can be extended by use of appropriate statistical models and software to evaluate monitor trends within calibration periods, as well as consider the effects of other variables, such as humidity and temperature, on monitor variability and bias.

  12. Statistical Analysis of Model Data for Operational Space Launch Weather Support at Kennedy Space Center and Cape Canaveral Air Force Station

    NASA Technical Reports Server (NTRS)

    Bauman, William H., III

    2010-01-01

    The 12-km resolution North American Mesoscale (NAM) model (MesoNAM) is used by the 45th Weather Squadron (45 WS) Launch Weather Officers at Kennedy Space Center (KSC) and Cape Canaveral Air Force Station (CCAFS) to support space launch weather operations. The 45 WS tasked the Applied Meteorology Unit to conduct an objective statistics-based analysis of MesoNAM output compared to wind tower mesonet observations and then develop a an operational tool to display the results. The National Centers for Environmental Prediction began running the current version of the MesoNAM in mid-August 2006. The period of record for the dataset was 1 September 2006 - 31 January 2010. The AMU evaluated MesoNAM hourly forecasts from 0 to 84 hours based on model initialization times of 00, 06, 12 and 18 UTC. The MesoNAM forecast winds, temperature and dew point were compared to the observed values of these parameters from the sensors in the KSC/CCAFS wind tower network. The data sets were stratified by model initialization time, month and onshore/offshore flow for each wind tower. Statistics computed included bias (mean difference), standard deviation of the bias, root mean square error (RMSE) and a hypothesis test for bias = O. Twelve wind towers located in close proximity to key launch complexes were used for the statistical analysis with the sensors on the towers positioned at varying heights to include 6 ft, 30 ft, 54 ft, 60 ft, 90 ft, 162 ft, 204 ft and 230 ft depending on the launch vehicle and associated weather launch commit criteria being evaluated. These twelve wind towers support activities for the Space Shuttle (launch and landing), Delta IV, Atlas V and Falcon 9 launch vehicles. For all twelve towers, the results indicate a diurnal signal in the bias of temperature (T) and weaker but discernable diurnal signal in the bias of dewpoint temperature (T(sub d)) in the MesoNAM forecasts. Also, the standard deviation of the bias and RMSE of T, T(sub d), wind speed and wind direction indicated the model error increased with the forecast period all four parameters. The hypothesis testing uses statistics to determine the probability that a given hypothesis is true. The goal of using the hypothesis test was to determine if the model bias of any of the parameters assessed throughout the model forecast period was statistically zero. For th is dataset, if this test produced a value >= -1 .96 or <= 1.96 for a data point, then the bias at that point was effectively zero and the model forecast for that point was considered to have no error. A graphical user interface (GUI) was developed so the 45 WS would have an operational tool at their disposal that would be easy to navigate among the multiple stratifications of information to include tower locations, month, model initialization times, sensor heights and onshore/offshore flow. The AMU developed the GUI using HyperText Markup Language (HTML) so the tool could be used in most popular web browsers with computers running different operating systems such as Microsoft Windows and Linux.

  13. Correcting power and p-value calculations for bias in diffusion tensor imaging.

    PubMed

    Lauzon, Carolyn B; Landman, Bennett A

    2013-07-01

    Diffusion tensor imaging (DTI) provides quantitative parametric maps sensitive to tissue microarchitecture (e.g., fractional anisotropy, FA). These maps are estimated through computational processes and subject to random distortions including variance and bias. Traditional statistical procedures commonly used for study planning (including power analyses and p-value/alpha-rate thresholds) specifically model variability, but neglect potential impacts of bias. Herein, we quantitatively investigate the impacts of bias in DTI on hypothesis test properties (power and alpha-rate) using a two-sided hypothesis testing framework. We present theoretical evaluation of bias on hypothesis test properties, evaluate the bias estimation technique SIMEX for DTI hypothesis testing using simulated data, and evaluate the impacts of bias on spatially varying power and alpha rates in an empirical study of 21 subjects. Bias is shown to inflame alpha rates, distort the power curve, and cause significant power loss even in empirical settings where the expected difference in bias between groups is zero. These adverse effects can be attenuated by properly accounting for bias in the calculation of power and p-values. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. A comparison of data-driven groundwater vulnerability assessment methods

    USGS Publications Warehouse

    Sorichetta, Alessandro; Ballabio, Cristiano; Masetti, Marco; Robinson, Gilpin R.; Sterlacchini, Simone

    2013-01-01

    Increasing availability of geo-environmental data has promoted the use of statistical methods to assess groundwater vulnerability. Nitrate is a widespread anthropogenic contaminant in groundwater and its occurrence can be used to identify aquifer settings vulnerable to contamination. In this study, multivariate Weights of Evidence (WofE) and Logistic Regression (LR) methods, where the response variable is binary, were used to evaluate the role and importance of a number of explanatory variables associated with nitrate sources and occurrence in groundwater in the Milan District (central part of the Po Plain, Italy). The results of these models have been used to map the spatial variation of groundwater vulnerability to nitrate in the region, and we compare the similarities and differences of their spatial patterns and associated explanatory variables. We modify the standard WofE method used in previous groundwater vulnerability studies to a form analogous to that used in LR; this provides a framework to compare the results of both models and reduces the effect of sampling bias on the results of the standard WofE model. In addition, a nonlinear Generalized Additive Model has been used to extend the LR analysis. Both approaches improved discrimination of the standard WofE and LR models, as measured by the c-statistic. Groundwater vulnerability probability outputs, based on rank-order classification of the respective model results, were similar in spatial patterns and identified similar strong explanatory variables associated with nitrate source (population density as a proxy for sewage systems and septic sources) and nitrate occurrence (groundwater depth).

  15. Ametropia, retinal anatomy, and OCT abnormality patterns in glaucoma. 1. Impacts of refractive error and interartery angle

    NASA Astrophysics Data System (ADS)

    Elze, Tobias; Baniasadi, Neda; Jin, Qingying; Wang, Hui; Wang, Mengyu

    2017-12-01

    Retinal nerve fiber layer thickness (RNFLT) measured by optical coherence tomography (OCT) is widely used in clinical practice to support glaucoma diagnosis. Clinicians frequently interpret peripapillary RNFLT areas marked as abnormal by OCT machines. However, presently, clinical OCT machines do not take individual retinal anatomy variation into account, and according diagnostic biases have been shown particularly for patients with ametropia. The angle between the two major temporal retinal arteries (interartery angle, IAA) is considered a fundamental retinal ametropia marker. Here, we analyze peripapillary spectral domain OCT RNFLT scans of 691 glaucoma patients and apply multivariate logistic regression to quantitatively compare the diagnostic bias of spherical equivalent (SE) of refractive error and IAA and to identify the precise retinal locations of false-positive/negative abnormality marks. Independent of glaucoma severity (visual field mean deviation), IAA/SE variations biased abnormality marks on OCT RNFLT printouts at 36.7%/22.9% of the peripapillary area, respectively. 17.2% of the biases due to SE are not explained by IAA variation, particularly in inferonasal areas. To conclude, the inclusion of SE and IAA in OCT RNFLT norms would help to increase diagnostic accuracy. Our detailed location maps may help clinicians to reduce diagnostic bias while interpreting retinal OCT scans.

  16. Coarse-Scale Biases for Spirals and Orientation in Human Visual Cortex

    PubMed Central

    Heeger, David J.

    2013-01-01

    Multivariate decoding analyses are widely applied to functional magnetic resonance imaging (fMRI) data, but there is controversy over their interpretation. Orientation decoding in primary visual cortex (V1) reflects coarse-scale biases, including an over-representation of radial orientations. But fMRI responses to clockwise and counter-clockwise spirals can also be decoded. Because these stimuli are matched for radial orientation, while differing in local orientation, it has been argued that fine-scale columnar selectivity for orientation contributes to orientation decoding. We measured fMRI responses in human V1 to both oriented gratings and spirals. Responses to oriented gratings exhibited a complex topography, including a radial bias that was most pronounced in the peripheral representation, and a near-vertical bias that was most pronounced near the foveal representation. Responses to clockwise and counter-clockwise spirals also exhibited coarse-scale organization, at the scale of entire visual quadrants. The preference of each voxel for clockwise or counter-clockwise spirals was predicted from the preferences of that voxel for orientation and spatial position (i.e., within the retinotopic map). Our results demonstrate a bias for local stimulus orientation that has a coarse spatial scale, is robust across stimulus classes (spirals and gratings), and suffices to explain decoding from fMRI responses in V1. PMID:24336733

  17. Parameter optimization in biased decoy-state quantum key distribution with both source errors and statistical fluctuations

    NASA Astrophysics Data System (ADS)

    Zhu, Jian-Rong; Li, Jian; Zhang, Chun-Mei; Wang, Qin

    2017-10-01

    The decoy-state method has been widely used in commercial quantum key distribution (QKD) systems. In view of the practical decoy-state QKD with both source errors and statistical fluctuations, we propose a universal model of full parameter optimization in biased decoy-state QKD with phase-randomized sources. Besides, we adopt this model to carry out simulations of two widely used sources: weak coherent source (WCS) and heralded single-photon source (HSPS). Results show that full parameter optimization can significantly improve not only the secure transmission distance but also the final key generation rate. And when taking source errors and statistical fluctuations into account, the performance of decoy-state QKD using HSPS suffered less than that of decoy-state QKD using WCS.

  18. Analysis and assessment on heavy metal sources in the coastal soils developed from alluvial deposits using multivariate statistical methods.

    PubMed

    Li, Jinling; He, Ming; Han, Wei; Gu, Yifan

    2009-05-30

    An investigation on heavy metal sources, i.e., Cu, Zn, Ni, Pb, Cr, and Cd in the coastal soils of Shanghai, China, was conducted using multivariate statistical methods (principal component analysis, clustering analysis, and correlation analysis). All the results of the multivariate analysis showed that: (i) Cu, Ni, Pb, and Cd had anthropogenic sources (e.g., overuse of chemical fertilizers and pesticides, industrial and municipal discharges, animal wastes, sewage irrigation, etc.); (ii) Zn and Cr were associated with parent materials and therefore had natural sources (e.g., the weathering process of parent materials and subsequent pedo-genesis due to the alluvial deposits). The effect of heavy metals in the soils was greatly affected by soil formation, atmospheric deposition, and human activities. These findings provided essential information on the possible sources of heavy metals, which would contribute to the monitoring and assessment process of agricultural soils in worldwide regions.

  19. Analysis/forecast experiments with a multivariate statistical analysis scheme using FGGE data

    NASA Technical Reports Server (NTRS)

    Baker, W. E.; Bloom, S. C.; Nestler, M. S.

    1985-01-01

    A three-dimensional, multivariate, statistical analysis method, optimal interpolation (OI) is described for modeling meteorological data from widely dispersed sites. The model was developed to analyze FGGE data at the NASA-Goddard Laboratory of Atmospherics. The model features a multivariate surface analysis over the oceans, including maintenance of the Ekman balance and a geographically dependent correlation function. Preliminary comparisons are made between the OI model and similar schemes employed at the European Center for Medium Range Weather Forecasts and the National Meteorological Center. The OI scheme is used to provide input to a GCM, and model error correlations are calculated for forecasts of 500 mb vertical water mixing ratios and the wind profiles. Comparisons are made between the predictions and measured data. The model is shown to be as accurate as a successive corrections model out to 4.5 days.

  20. Predicting trauma patient mortality: ICD [or ICD-10-AM] versus AIS based approaches.

    PubMed

    Willis, Cameron D; Gabbe, Belinda J; Jolley, Damien; Harrison, James E; Cameron, Peter A

    2010-11-01

    The International Classification of Diseases Injury Severity Score (ICISS) has been proposed as an International Classification of Diseases (ICD)-10-based alternative to mortality prediction tools that use Abbreviated Injury Scale (AIS) data, including the Trauma and Injury Severity Score (TRISS). To date, studies have not examined the performance of ICISS using Australian trauma registry data. This study aimed to compare the performance of ICISS with other mortality prediction tools in an Australian trauma registry. This was a retrospective review of prospectively collected data from the Victorian State Trauma Registry. A training dataset was created for model development and a validation dataset for evaluation. The multiplicative ICISS model was compared with a worst injury ICISS approach, Victorian TRISS (V-TRISS, using local coefficients), maximum AIS severity and a multivariable model including ICD-10-AM codes as predictors. Models were investigated for discrimination (C-statistic) and calibration (Hosmer-Lemeshow statistic). The multivariable approach had the highest level of discrimination (C-statistic 0.90) and calibration (H-L 7.65, P= 0.468). Worst injury ICISS, V-TRISS and maximum AIS had similar performance. The multiplicative ICISS produced the lowest level of discrimination (C-statistic 0.80) and poorest calibration (H-L 50.23, P < 0.001). The performance of ICISS may be affected by the data used to develop estimates, the ICD version employed, the methods for deriving estimates and the inclusion of covariates. In this analysis, a multivariable approach using ICD-10-AM codes was the best-performing method. A multivariable ICISS approach may therefore be a useful alternative to AIS-based methods and may have comparable predictive performance to locally derived TRISS models. © 2010 The Authors. ANZ Journal of Surgery © 2010 Royal Australasian College of Surgeons.

  1. NONPARAMETRIC MANOVA APPROACHES FOR NON-NORMAL MULTIVARIATE OUTCOMES WITH MISSING VALUES

    PubMed Central

    He, Fanyin; Mazumdar, Sati; Tang, Gong; Bhatia, Triptish; Anderson, Stewart J.; Dew, Mary Amanda; Krafty, Robert; Nimgaonkar, Vishwajit; Deshpande, Smita; Hall, Martica; Reynolds, Charles F.

    2017-01-01

    Between-group comparisons often entail many correlated response variables. The multivariate linear model, with its assumption of multivariate normality, is the accepted standard tool for these tests. When this assumption is violated, the nonparametric multivariate Kruskal-Wallis (MKW) test is frequently used. However, this test requires complete cases with no missing values in response variables. Deletion of cases with missing values likely leads to inefficient statistical inference. Here we extend the MKW test to retain information from partially-observed cases. Results of simulated studies and analysis of real data show that the proposed method provides adequate coverage and superior power to complete-case analyses. PMID:29416225

  2. Reexamining Sample Size Requirements for Multivariate, Abundance-Based Community Research: When Resources are Limited, the Research Does Not Have to Be.

    PubMed

    Forcino, Frank L; Leighton, Lindsey R; Twerdy, Pamela; Cahill, James F

    2015-01-01

    Community ecologists commonly perform multivariate techniques (e.g., ordination, cluster analysis) to assess patterns and gradients of taxonomic variation. A critical requirement for a meaningful statistical analysis is accurate information on the taxa found within an ecological sample. However, oversampling (too many individuals counted per sample) also comes at a cost, particularly for ecological systems in which identification and quantification is substantially more resource consuming than the field expedition itself. In such systems, an increasingly larger sample size will eventually result in diminishing returns in improving any pattern or gradient revealed by the data, but will also lead to continually increasing costs. Here, we examine 396 datasets: 44 previously published and 352 created datasets. Using meta-analytic and simulation-based approaches, the research within the present paper seeks (1) to determine minimal sample sizes required to produce robust multivariate statistical results when conducting abundance-based, community ecology research. Furthermore, we seek (2) to determine the dataset parameters (i.e., evenness, number of taxa, number of samples) that require larger sample sizes, regardless of resource availability. We found that in the 44 previously published and the 220 created datasets with randomly chosen abundances, a conservative estimate of a sample size of 58 produced the same multivariate results as all larger sample sizes. However, this minimal number varies as a function of evenness, where increased evenness resulted in increased minimal sample sizes. Sample sizes as small as 58 individuals are sufficient for a broad range of multivariate abundance-based research. In cases when resource availability is the limiting factor for conducting a project (e.g., small university, time to conduct the research project), statistically viable results can still be obtained with less of an investment.

  3. Bias and heteroscedastic memory error in self-reported health behavior: an investigation using covariance structure analysis

    PubMed Central

    Kupek, Emil

    2002-01-01

    Background Frequent use of self-reports for investigating recent and past behavior in medical research requires statistical techniques capable of analyzing complex sources of bias associated with this methodology. In particular, although decreasing accuracy of recalling more distant past events is commonplace, the bias due to differential in memory errors resulting from it has rarely been modeled statistically. Methods Covariance structure analysis was used to estimate the recall error of self-reported number of sexual partners for past periods of varying duration and its implication for the bias. Results Results indicated increasing levels of inaccuracy for reports about more distant past. Considerable positive bias was found for a small fraction of respondents who reported ten or more partners in the last year, last two years and last five years. This is consistent with the effect of heteroscedastic random error where the majority of partners had been acquired in the more distant past and therefore were recalled less accurately than the partners acquired more recently to the time of interviewing. Conclusions Memory errors of this type depend on the salience of the events recalled and are likely to be present in many areas of health research based on self-reported behavior. PMID:12435276

  4. Optimistic Bias, Risk Factors, and Development of High Blood Pressure and Obesity among African American Adolescents in Mississippi (USA)

    PubMed Central

    White, Monique S.; Addison, Clifton C.; Campbell Jenkins, Brenda W.; Bland, Vanessa; Clark, Adrianne; Antoine LaVigne, Donna

    2017-01-01

    Childhood obesity has reached epidemic proportions and is linked to hypertension among African American youth. Optimistic bias influences behavior of youth causing them to underestimate their susceptibility to negative health outcomes. This study explored adolescent behaviors and prevalence of high blood pressure and obesity in a school district. We examined the relationship between individual health risk practices and optimistic bias on health outcomes; 433 African American high school students were administered a survey and had their obesity and blood pressure measured by the school nurse. Canonical correlational analyses were used to examine relationships between health risk practices and descriptive statistics for optimistic bias and health outcomes. Engaging in moderate exercise for at least 30 min in the last 7 days and lower blood pressure was the only statistically significant relationship. Two-thirds of the students did not perceive themselves to be at risk of developing cardiovascular disease with males at greater risk than females, despite the presence of clinical risk factors for hypertension and obesity. Reducing health optimistic bias is an effective way of motivating young people to adopt more positive behaviors using educational institutions to implement intervention programs that promote positive health behavior as a way to reduce health disparities. PMID:28230728

  5. Optimistic Bias, Risk Factors, and Development of High Blood Pressure and Obesity among African American Adolescents in Mississippi (USA).

    PubMed

    White, Monique S; Addison, Clifton C; Jenkins, Brenda W Campbell; Bland, Vanessa; Clark, Adrianne; LaVigne, Donna Antoine

    2017-02-20

    Childhood obesity has reached epidemic proportions and is linked to hypertension among African American youth. Optimistic bias influences behavior of youth causing them to underestimate their susceptibility to negative health outcomes. This study explored adolescent behaviors and prevalence of high blood pressure and obesity in a school district. We examined the relationship between individual health risk practices and optimistic bias on health outcomes; 433 African American high school students were administered a survey and had their obesity and blood pressure measured by the school nurse. Canonical correlational analyses were used to examine relationships between health risk practices and descriptive statistics for optimistic bias and health outcomes. Engaging in moderate exercise for at least 30 min in the last 7 days and lower blood pressure was the only statistically significant relationship. Two-thirds of the students did not perceive themselves to be at risk of developing cardiovascular disease with males at greater risk than females, despite the presence of clinical risk factors for hypertension and obesity. Reducing health optimistic bias is an effective way of motivating young people to adopt more positive behaviors using educational institutions to implement intervention programs that promote positive health behavior as a way to reduce health disparities.

  6. Neither fixed nor random: weighted least squares meta-regression.

    PubMed

    Stanley, T D; Doucouliagos, Hristos

    2017-03-01

    Our study revisits and challenges two core conventional meta-regression estimators: the prevalent use of 'mixed-effects' or random-effects meta-regression analysis and the correction of standard errors that defines fixed-effects meta-regression analysis (FE-MRA). We show how and explain why an unrestricted weighted least squares MRA (WLS-MRA) estimator is superior to conventional random-effects (or mixed-effects) meta-regression when there is publication (or small-sample) bias that is as good as FE-MRA in all cases and better than fixed effects in most practical applications. Simulations and statistical theory show that WLS-MRA provides satisfactory estimates of meta-regression coefficients that are practically equivalent to mixed effects or random effects when there is no publication bias. When there is publication selection bias, WLS-MRA always has smaller bias than mixed effects or random effects. In practical applications, an unrestricted WLS meta-regression is likely to give practically equivalent or superior estimates to fixed-effects, random-effects, and mixed-effects meta-regression approaches. However, random-effects meta-regression remains viable and perhaps somewhat preferable if selection for statistical significance (publication bias) can be ruled out and when random, additive normal heterogeneity is known to directly affect the 'true' regression coefficient. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  7. A new method to address verification bias in studies of clinical screening tests: cervical cancer screening assays as an example.

    PubMed

    Xue, Xiaonan; Kim, Mimi Y; Castle, Philip E; Strickler, Howard D

    2014-03-01

    Studies to evaluate clinical screening tests often face the problem that the "gold standard" diagnostic approach is costly and/or invasive. It is therefore common to verify only a subset of negative screening tests using the gold standard method. However, undersampling the screen negatives can lead to substantial overestimation of the sensitivity and underestimation of the specificity of the diagnostic test. Our objective was to develop a simple and accurate statistical method to address this "verification bias." We developed a weighted generalized estimating equation approach to estimate, in a single model, the accuracy (eg, sensitivity/specificity) of multiple assays and simultaneously compare results between assays while addressing verification bias. This approach can be implemented using standard statistical software. Simulations were conducted to assess the proposed method. An example is provided using a cervical cancer screening trial that compared the accuracy of human papillomavirus and Pap tests, with histologic data as the gold standard. The proposed approach performed well in estimating and comparing the accuracy of multiple assays in the presence of verification bias. The proposed approach is an easy to apply and accurate method for addressing verification bias in studies of multiple screening methods. Copyright © 2014 Elsevier Inc. All rights reserved.

  8. Attitudes toward Advanced and Multivariate Statistics When Using Computers.

    ERIC Educational Resources Information Center

    Kennedy, Robert L.; McCallister, Corliss Jean

    This study investigated the attitudes toward statistics of graduate students who studied advanced statistics in a course in which the focus of instruction was the use of a computer program in class. The use of the program made it possible to provide an individualized, self-paced, student-centered, and activity-based course. The three sections…

  9. Statistics Anxiety and Worry: The Roles of Worry Beliefs, Negative Problem Orientation, and Cognitive Avoidance

    ERIC Educational Resources Information Center

    Williams, Amanda S.

    2015-01-01

    Statistics anxiety is a common problem for graduate students. This study explores the multivariate relationship between a set of worry-related variables and six types of statistics anxiety. Canonical correlation analysis indicates a significant relationship between the two sets of variables. Findings suggest that students who are more intolerant…

  10. A two centre observational study of simultaneous pulse oximetry and arterial oxygen saturation recordings in intensive care unit patients.

    PubMed

    Ebmeier, S J; Barker, M; Bacon, M; Beasley, R C; Bellomo, R; Knee Chong, C; Eastwood, G M; Gilchrist, J; Kagaya, H; Pilcher, J; Reddy, S K; Ridgeon, E; Sarma, N; Sprogis, S; Tanaka, A; Tweedie, M; Weatherall, M; Young, P J

    2018-05-01

    The influence of variables that might affect the accuracy of pulse oximetry (SpO2) recordings in critically ill patients is not well established. We sought to describe the relationship between paired SpO2/SaO2 (oxygen saturation via arterial blood gas analysis) in adult intensive care unit (ICU) patients and to describe the diagnostic performance of SpO2 in detecting low SaO2 and PaO2. A paired SpO2/SaO2 measurement was obtained from 404 adults in ICU. Measurements were used to calculate bias, precision, and limits of agreement. Associations between bias and variables including vasopressor and inotrope use, capillary refill time, hand temperature, pulse pressure, body temperature, oximeter model, and skin colour were estimated. There was no overall statistically significant bias in paired SpO2/SaO2 measurements; observed limits of agreement were +/-4.4%. However, body temperature, oximeter model, and skin colour, were statistically significantly associated with the degree of bias. SpO2 <89% had a sensitivity of 3/7 (42.9%; 95% confidence intervals, CI, 9.9% to 81.6%) and a specificity of 344/384 (89.6%; 95% CI 86.1% to 92.5%) for detecting SaO2 <89%. The absence of statistically significant bias in paired SpO2/SaO2 in adult ICU patients provides support for the use of pulse oximetry to titrate oxygen therapy. However, SpO2 recordings alone should be used cautiously when SaO2 recordings of 4.4% higher or lower than the observed SpO2 would be of concern. A range of variables relevant to the critically ill had little or no effect on bias.

  11. Implications of clinical trial design on sample size requirements.

    PubMed

    Leon, Andrew C

    2008-07-01

    The primary goal in designing a randomized controlled clinical trial (RCT) is to minimize bias in the estimate of treatment effect. Randomized group assignment, double-blinded assessments, and control or comparison groups reduce the risk of bias. The design must also provide sufficient statistical power to detect a clinically meaningful treatment effect and maintain a nominal level of type I error. An attempt to integrate neurocognitive science into an RCT poses additional challenges. Two particularly relevant aspects of such a design often receive insufficient attention in an RCT. Multiple outcomes inflate type I error, and an unreliable assessment process introduces bias and reduces statistical power. Here we describe how both unreliability and multiple outcomes can increase the study costs and duration and reduce the feasibility of the study. The objective of this article is to consider strategies that overcome the problems of unreliability and multiplicity.

  12. Dynamical evolution of topology of large-scale structure. [in distribution of galaxies

    NASA Technical Reports Server (NTRS)

    Park, Changbom; Gott, J. R., III

    1991-01-01

    The nonlinear effects of statistical biasing and gravitational evolution on the genus are studied. The biased galaxy subset is picked for the first time by actually identifying galaxy-sized peaks above a fixed threshold in the initial conditions, and their subsequent evolution is followed. It is found that in the standard cold dark matter (CDM) model the statistical biasing in the locations of galaxies produces asymmetry in the genus curve and coupling with gravitational evolution gives rise to a shift in the genus curve to the left in moderately nonlinear regimes. Gravitational evolution alone reduces the amplitude of the genus curve due to strong phase correlations in the density field and also produces asymmetry in the curve. Results on the genus of the mass density field for both CDM and hot dark matter models are consistent with previous work by Melott, Weinberg, and Gott (1987).

  13. Contemporary use trends and survival outcomes in patients undergoing radical cystectomy or bladder-preservation therapy for muscle-invasive bladder cancer.

    PubMed

    Cahn, David B; Handorf, Elizabeth A; Ghiraldi, Eric M; Ristau, Benjamin T; Geynisman, Daniel M; Churilla, Thomas M; Horwitz, Eric M; Sobczak, Mark L; Chen, David Y T; Viterbo, Rosalia; Greenberg, Richard E; Kutikov, Alexander; Uzzo, Robert G; Smaldone, Marc C

    2017-11-15

    The current study was performed to examine temporal trends and compare overall survival (OS) in patients undergoing radical cystectomy (RC) or bladder-preservation therapy (BPT) for muscle-invasive urothelial carcinoma of the bladder. The authors reviewed the National Cancer Data Base to identify patients with AJCC stage II to III urothelial carcinoma of the bladder from 2004 through 2013. Patients receiving BPT were stratified as having received any external-beam radiotherapy (any XRT), definitive XRT (50-80 grays), and definitive XRT with chemotherapy (CRT). Treatment trends and OS outcomes for the BPT and RC cohorts were evaluated using Cochran-Armitage tests, unadjusted Kaplan-Meier curves, adjusted Cox multivariate regression, and propensity score matching, using increasingly stringent selection criteria. A total of 32,300 patients met the inclusion criteria and were treated with RC (22,680 patients) or BPT (9620 patients). Of the patients treated with BPT, 26.4% (2540 patients) and 15.5% (1489 patients), respectively, were treated with definitive XRT and CRT. Improved OS was observed for RC in all groups. After adjustments with more rigorous statistical models controlling for confounders and with more restrictive BPT cohorts, the magnitude of the OS benefit became attenuated on multivariate (any XRT: hazard ratio [HR], 2.115 [95% confidence interval [95% CI], 2.045-2.188]; definitive XRT: HR, 1.870 [95% CI, 1.773-1.972]; and CRT: HR, 1.578 [95% CI, 1.474-1.691]) and propensity score (any XRT: HR, 2.008 [95% CI, 1.871-2.154]; definitive XRT: HR, 1.606 [95% CI, 1.453-1.776]; and CRT: HR, 1.406 [95% CI, 1.235-1.601]) analyses. In the National Cancer Data Base, receipt of BPT was associated with decreased OS compared with RC in patients with stage II to III urothelial carcinoma. Increasingly stringent definitions of BPT and more rigorous statistical methods adjusting for selection biases attenuated observed survival differences. Cancer 2017;123:4337-45. © 2017 American Cancer Society. © 2017 American Cancer Society.

  14. Can bias correction and statistical downscaling methods improve the skill of seasonal precipitation forecasts?

    NASA Astrophysics Data System (ADS)

    Manzanas, R.; Lucero, A.; Weisheimer, A.; Gutiérrez, J. M.

    2018-02-01

    Statistical downscaling methods are popular post-processing tools which are widely used in many sectors to adapt the coarse-resolution biased outputs from global climate simulations to the regional-to-local scale typically required by users. They range from simple and pragmatic Bias Correction (BC) methods, which directly adjust the model outputs of interest (e.g. precipitation) according to the available local observations, to more complex Perfect Prognosis (PP) ones, which indirectly derive local predictions (e.g. precipitation) from appropriate upper-air large-scale model variables (predictors). Statistical downscaling methods have been extensively used and critically assessed in climate change applications; however, their advantages and limitations in seasonal forecasting are not well understood yet. In particular, a key problem in this context is whether they serve to improve the forecast quality/skill of raw model outputs beyond the adjustment of their systematic biases. In this paper we analyze this issue by applying two state-of-the-art BC and two PP methods to downscale precipitation from a multimodel seasonal hindcast in a challenging tropical region, the Philippines. To properly assess the potential added value beyond the reduction of model biases, we consider two validation scores which are not sensitive to changes in the mean (correlation and reliability categories). Our results show that, whereas BC methods maintain or worsen the skill of the raw model forecasts, PP methods can yield significant skill improvement (worsening) in cases for which the large-scale predictor variables considered are better (worse) predicted by the model than precipitation. For instance, PP methods are found to increase (decrease) model reliability in nearly 40% of the stations considered in boreal summer (autumn). Therefore, the choice of a convenient downscaling approach (either BC or PP) depends on the region and the season.

  15. Outliers in Questionnaire Data: Can They Be Detected and Should They Be Removed?

    ERIC Educational Resources Information Center

    Zijlstra, Wobbe P.; van der Ark, L. Andries; Sijtsma, Klaas

    2011-01-01

    Outliers in questionnaire data are unusual observations, which may bias statistical results, and outlier statistics may be used to detect such outliers. The authors investigated the effect outliers have on the specificity and the sensitivity of each of six different outlier statistics. The Mahalanobis distance and the item-pair based outlier…

  16. Why Are People Bad at Detecting Randomness? A Statistical Argument

    ERIC Educational Resources Information Center

    Williams, Joseph J.; Griffiths, Thomas L.

    2013-01-01

    Errors in detecting randomness are often explained in terms of biases and misconceptions. We propose and provide evidence for an account that characterizes the contribution of the inherent statistical difficulty of the task. Our account is based on a Bayesian statistical analysis, focusing on the fact that a random process is a special case of…

  17. Critical Views of 8th Grade Students toward Statistical Data in Newspaper Articles: Analysis in Light of Statistical Literacy

    ERIC Educational Resources Information Center

    Guler, Mustafa; Gursoy, Kadir; Guven, Bulent

    2016-01-01

    Understanding and interpreting biased data, decision-making in accordance with the data, and critically evaluating situations involving data are among the fundamental skills necessary in the modern world. To develop these required skills, emphasis on statistical literacy in school mathematics has been gradually increased in recent years. The…

  18. Trace Element Study of H Chondrites: Evidence for Meteoroid Streams.

    NASA Astrophysics Data System (ADS)

    Wolf, Stephen Frederic

    1993-01-01

    Multivariate statistical analyses, both linear discriminant analysis and logistic regression, of the volatile trace elemental concentrations in H4-6 chondrites reveal compositionally distinguishable subpopulations. Observed difference in volatile trace element composition between Antarctic and non-Antarctic H4-6 chondrites (Lipschutz and Samuels, 1991) can be explained by a compositionaily distinct subpopulation found in Victoria Land, Antarctica. This population of H4-6 chondrites is compositionally distinct from non-Antarctic H4-6 chondrites and from Antarctic H4 -6 chondrites from Queen Maud Land. Comparisons of Queen Maud Land H4-6 chondrites with non-Antarctic H4-6 chondrites do not give reason to believe that these two populations are distinguishable from each other on the basis of the ten volatile trace element concentrations measured. ANOVA indicates that these differences are not the result of trivial causes such as weathering and analytical bias. Thermoluminescence properties of these populations parallels the results of volatile trace element comparisons. Given the differences in terrestrial age between Victoria Land, Queen Maud Land, and modern H4-6 chondrite falls, these results are consistent with a variation in H4-6 chondrite flux on a 300 ky timescale. This conclusion requires the existence of co-orbital meteoroid streams. Statistical analyses of the volatile trace elemental concentrations in non-Antarctic modern falls of H4-6 chondrites also demonstrate that a group of 13 H4-6 chondrites, Cluster 1, selected exclusively for their distinct fall parameters (Dodd, 1992) is compositionally distinguishable from a control group of 45 non-Antarctic modern H4-6 chondrites on the basis of the ten volatile trace element concentrations measured. Model-independent randomization-simulations based on both linear discriminant analysis and logistic regression verify these results. While ANOVA identifies two possible causes for this difference, analytical bias and group classification, a test validation experiment verifies that group classification is the more significant cause of compositional difference between Cluster 1 and non-Cluster 1 modern H4-6 chondrite falls. Thermoluminescence properties of these populations parallels the results of volatile trace element comparisons. This suggests that these meteorites are fragments of a co-orbital meteorite stream derived from a single parent body.

  19. Controlling Guessing Bias in the Dichotomous Rasch Model Applied to a Large-Scale, Vertically Scaled Testing Program

    PubMed Central

    Andrich, David; Marais, Ida; Humphry, Stephen Mark

    2015-01-01

    Recent research has shown how the statistical bias in Rasch model difficulty estimates induced by guessing in multiple-choice items can be eliminated. Using vertical scaling of a high-profile national reading test, it is shown that the dominant effect of removing such bias is a nonlinear change in the unit of scale across the continuum. The consequence is that the proficiencies of the more proficient students are increased relative to those of the less proficient. Not controlling the guessing bias underestimates the progress of students across 7 years of schooling with important educational implications. PMID:29795871

  20. Statistical methods and neural network approaches for classification of data from multiple sources

    NASA Technical Reports Server (NTRS)

    Benediktsson, Jon Atli; Swain, Philip H.

    1990-01-01

    Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.

  1. Interfaces between statistical analysis packages and the ESRI geographic information system

    NASA Technical Reports Server (NTRS)

    Masuoka, E.

    1980-01-01

    Interfaces between ESRI's geographic information system (GIS) data files and real valued data files written to facilitate statistical analysis and display of spatially referenced multivariable data are described. An example of data analysis which utilized the GIS and the statistical analysis system is presented to illustrate the utility of combining the analytic capability of a statistical package with the data management and display features of the GIS.

  2. Single-variant and multi-variant trend tests for genetic association with next-generation sequencing that are robust to sequencing error.

    PubMed

    Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Alejandro Q; Musolf, Anthony; Matise, Tara C; Finch, Stephen J; Gordon, Derek

    2012-01-01

    As with any new technology, next-generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to those data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single-variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p value, no matter how many loci. Copyright © 2013 S. Karger AG, Basel.

  3. Single variant and multi-variant trend tests for genetic association with next generation sequencing that are robust to sequencing error

    PubMed Central

    Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Andrew; Musolf, Anthony; Matise, Tara C.; Finch, Stephen J.; Gordon, Derek

    2013-01-01

    As with any new technology, next generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model, based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to that data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p-value, no matter how many loci. PMID:23594495

  4. Decomposing biodiversity data using the Latent Dirichlet Allocation model, a probabilistic multivariate statistical method

    Treesearch

    Denis Valle; Benjamin Baiser; Christopher W. Woodall; Robin Chazdon; Jerome Chave

    2014-01-01

    We propose a novel multivariate method to analyse biodiversity data based on the Latent Dirichlet Allocation (LDA) model. LDA, a probabilistic model, reduces assemblages to sets of distinct component communities. It produces easily interpretable results, can represent abrupt and gradual changes in composition, accommodates missing data and allows for coherent estimates...

  5. Departure from Normality in Multivariate Normative Comparison: The Cramer Alternative for Hotelling's "T[squared]"

    ERIC Educational Resources Information Center

    Grasman, Raoul P. P. P.; Huizenga, Hilde M.; Geurts, Hilde M.

    2010-01-01

    Crawford and Howell (1998) have pointed out that the common practice of z-score inference on cognitive disability is inappropriate if a patient's performance on a task is compared with relatively few typical control individuals. Appropriate univariate and multivariate statistical tests have been proposed for these studies, but these are only valid…

  6. Applied statistics in agricultural, biological, and environmental sciences.

    USDA-ARS?s Scientific Manuscript database

    Agronomic research often involves measurement and collection of multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate statistical methods encompass the simultaneous analysis of all random variables measured on each experimental or s...

  7. Combined data preprocessing and multivariate statistical analysis characterizes fed-batch culture of mouse hybridoma cells for rational medium design.

    PubMed

    Selvarasu, Suresh; Kim, Do Yun; Karimi, Iftekhar A; Lee, Dong-Yup

    2010-10-01

    We present an integrated framework for characterizing fed-batch cultures of mouse hybridoma cells producing monoclonal antibody (mAb). This framework systematically combines data preprocessing, elemental balancing and statistical analysis technique. Initially, specific rates of cell growth, glucose/amino acid consumptions and mAb/metabolite productions were calculated via curve fitting using logistic equations, with subsequent elemental balancing of the preprocessed data indicating the presence of experimental measurement errors. Multivariate statistical analysis was then employed to understand physiological characteristics of the cellular system. The results from principal component analysis (PCA) revealed three major clusters of amino acids with similar trends in their consumption profiles: (i) arginine, threonine and serine, (ii) glycine, tyrosine, phenylalanine, methionine, histidine and asparagine, and (iii) lysine, valine and isoleucine. Further analysis using partial least square (PLS) regression identified key amino acids which were positively or negatively correlated with the cell growth, mAb production and the generation of lactate and ammonia. Based on these results, the optimal concentrations of key amino acids in the feed medium can be inferred, potentially leading to an increase in cell viability and productivity, as well as a decrease in toxic waste production. The study demonstrated how the current methodological framework using multivariate statistical analysis techniques can serve as a potential tool for deriving rational medium design strategies. Copyright © 2010 Elsevier B.V. All rights reserved.

  8. Multivariate model of female black bear habitat use for a Geographic Information System

    USGS Publications Warehouse

    Clark, Joseph D.; Dunn, James E.; Smith, Kimberly G.

    1993-01-01

    Simple univariate statistical techniques may not adequately assess the multidimensional nature of habitats used by wildlife. Thus, we developed a multivariate method to model habitat-use potential using a set of female black bear (Ursus americanus) radio locations and habitat data consisting of forest cover type, elevation, slope, aspect, distance to roads, distance to streams, and forest cover type diversity score in the Ozark Mountains of Arkansas. The model is based on the Mahalanobis distance statistic coupled with Geographic Information System (GIS) technology. That statistic is a measure of dissimilarity and represents a standardized squared distance between a set of sample variates and an ideal based on the mean of variates associated with animal observations. Calculations were made with the GIS to produce a map containing Mahalanobis distance values within each cell on a 60- × 60-m grid. The model identified areas of high habitat use potential that could not otherwise be identified by independent perusal of any single map layer. This technique avoids many pitfalls that commonly affect typical multivariate analyses of habitat use and is a useful tool for habitat manipulation or mitigation to favor terrestrial vertebrates that use habitats on a landscape scale.

  9. Multivariate analysis of heavy metal contamination using river sediment cores of Nankan River, northern Taiwan

    NASA Astrophysics Data System (ADS)

    Lee, An-Sheng; Lu, Wei-Li; Huang, Jyh-Jaan; Chang, Queenie; Wei, Kuo-Yen; Lin, Chin-Jung; Liou, Sofia Ya Hsuan

    2016-04-01

    Through the geology and climate characteristic in Taiwan, generally rivers carry a lot of suspended particles. After these particles settled, they become sediments which are good sorbent for heavy metals in river system. Consequently, sediments can be found recording contamination footprint at low flow energy region, such as estuary. Seven sediment cores were collected along Nankan River, northern Taiwan, which is seriously contaminated by factory, household and agriculture input. Physico-chemical properties of these cores were derived from Itrax-XRF Core Scanner and grain size analysis. In order to interpret these complex data matrices, the multivariate statistical techniques (cluster analysis, factor analysis and discriminant analysis) were introduced to this study. Through the statistical determination, the result indicates four types of sediment. One of them represents contamination event which shows high concentration of Cu, Zn, Pb, Ni and Fe, and low concentration of Si and Zr. Furthermore, three possible contamination sources of this type of sediment were revealed by Factor Analysis. The combination of sediment analysis and multivariate statistical techniques used provides new insights into the contamination depositional history of Nankan River and could be similarly applied to other river systems to determine the scale of anthropogenic contamination.

  10. Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques

    NASA Astrophysics Data System (ADS)

    Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein

    2017-10-01

    The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.

  11. Is it time for studying real-life debiasing? Evaluation of the effectiveness of an analogical intervention technique

    PubMed Central

    Aczel, Balazs; Bago, Bence; Szollosi, Aba; Foldes, Andrei; Lukacs, Bence

    2015-01-01

    The aim of this study was to initiate the exploration of debiasing methods applicable in real-life settings for achieving lasting improvement in decision making competence regarding multiple decision biases. Here, we tested the potentials of the analogical encoding method for decision debiasing. The advantage of this method is that it can foster the transfer from learning abstract principles to improving behavioral performance. For the purpose of the study, we devised an analogical debiasing technique for 10 biases (covariation detection, insensitivity to sample size, base rate neglect, regression to the mean, outcome bias, sunk cost fallacy, framing effect, anchoring bias, overconfidence bias, planning fallacy) and assessed the susceptibility of the participants (N = 154) to these biases before and 4 weeks after the training. We also compared the effect of the analogical training to the effect of ‘awareness training’ and a ‘no-training’ control group. Results suggested improved performance of the analogical training group only on tasks where the violations of statistical principles are measured. The interpretation of these findings require further investigation, yet it is possible that analogical training may be the most effective in the case of learning abstract concepts, such as statistical principles, which are otherwise difficult to master. The study encourages a systematic research of debiasing trainings and the development of intervention assessment methods to measure the endurance of behavior change in decision debiasing. PMID:26300816

  12. Addressing criticisms of existing predictive bias research: cognitive ability test scores still overpredict African Americans' job performance.

    PubMed

    Berry, Christopher M; Zhao, Peng

    2015-01-01

    Predictive bias studies have generally suggested that cognitive ability test scores overpredict job performance of African Americans, meaning these tests are not predictively biased against African Americans. However, at least 2 issues call into question existing over-/underprediction evidence: (a) a bias identified by Aguinis, Culpepper, and Pierce (2010) in the intercept test typically used to assess over-/underprediction and (b) a focus on the level of observed validity instead of operational validity. The present study developed and utilized a method of assessing over-/underprediction that draws on the math of subgroup regression intercept differences, does not rely on the biased intercept test, allows for analysis at the level of operational validity, and can use meta-analytic estimates as input values. Therefore, existing meta-analytic estimates of key parameters, corrected for relevant statistical artifacts, were used to determine whether African American job performance remains overpredicted at the level of operational validity. African American job performance was typically overpredicted by cognitive ability tests across levels of job complexity and across conditions wherein African American and White regression slopes did and did not differ. Because the present study does not rely on the biased intercept test and because appropriate statistical artifact corrections were carried out, the present study's results are not affected by the 2 issues mentioned above. The present study represents strong evidence that cognitive ability tests generally overpredict job performance of African Americans. (c) 2015 APA, all rights reserved.

  13. Attentional bias and disinhibition toward gaming cues are related to problem gaming in male adolescents.

    PubMed

    van Holst, Ruth J; Lemmens, Jeroen S; Valkenburg, Patti M; Peter, Jochen; Veltman, Dick J; Goudriaan, Anna E

    2012-06-01

    The aim of this study was to examine whether behavioral tendencies commonly related to addictive behaviors are also related to problematic computer and video game playing in adolescents. The study of attentional bias and response inhibition, characteristic for addictive disorders, is relevant to the ongoing discussion on whether problematic gaming should be classified as an addictive disorder. We tested the relation between self-reported levels of problem gaming and two behavioral domains: attentional bias and response inhibition. Ninety-two male adolescents performed two attentional bias tasks (addiction-Stroop, dot-probe) and a behavioral inhibition task (go/no-go). Self-reported problem gaming was measured by the game addiction scale, based on the Diagnostic and Statistical Manual of Mental Disorders-fourth edition criteria for pathological gambling and time spent on computer and/or video games. Male adolescents with higher levels of self-reported problem gaming displayed signs of error-related attentional bias to game cues. Higher levels of problem gaming were also related to more errors on response inhibition, but only when game cues were presented. These findings are in line with the findings of attentional bias reported in clinically recognized addictive disorders, such as substance dependence and pathological gambling, and contribute to the discussion on the proposed concept of "Addiction and Related Disorders" (which may include non-substance-related addictive behaviors) in the Diagnostic and Statistical Manual of Mental Disorders-fourth edition. Copyright © 2012 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.

  14. Publication bias in obesity treatment trials?

    PubMed

    Allison, D B; Faith, M S; Gorman, B S

    1996-10-01

    The present investigation examined the extent of publication bias (namely the tendency to publish significant findings and file away non-significant findings) within the obesity treatment literature. Quantitative literature synthesis of four published meta-analyses from the obesity treatment literature. Interventions in these studies included pharmacological, educational, child, and couples treatments. To assess publication bias, several regression procedures (for example weighted least-squares, random-effects multi-level modeling, and robust regression methods) were used to regress effect sizes onto their standard errors, or proxies thereof, within each of the four meta-analysis. A significant positive beta weight in these analyses signified publication bias. There was evidence for publication bias within two of the four published meta-analyses, such that reviews of published studies were likely to overestimate clinical efficacy. The lack of evidence for publication bias within the two other meta-analyses might have been due to insufficient statistical power rather than the absence of selection bias. As in other disciplines, publication bias appears to exist in the obesity treatment literature. Suggestions are offered for managing publication bias once identified or reducing its likelihood in the first place.

  15. The Weak Spots in Contemporary Science (and How to Fix Them)

    PubMed Central

    2017-01-01

    Simple Summary Several fraud cases, widespread failure to replicate or reproduce seminal findings, and pervasive error in the scientific literature have led to a crisis of confidence in the biomedical, behavioral, and social sciences. In this review, the author discusses some of the core findings that point at weak spots in contemporary science and considers the human factors that underlie them. He delves into the human tendencies that create errors and biases in data collection, analyses, and reporting of research results. He presents several solutions to deal with observer bias, publication bias, the researcher’s tendency to exploit degrees of freedom in their analysis of data, low statistical power, and errors in the reporting of results, with a focus on the specific challenges in animal welfare research. Abstract In this review, the author discusses several of the weak spots in contemporary science, including scientific misconduct, the problems of post hoc hypothesizing (HARKing), outcome switching, theoretical bloopers in formulating research questions and hypotheses, selective reading of the literature, selective citing of previous results, improper blinding and other design failures, p-hacking or researchers’ tendency to analyze data in many different ways to find positive (typically significant) results, errors and biases in the reporting of results, and publication bias. The author presents some empirical results highlighting problems that lower the trustworthiness of reported results in scientific literatures, including that of animal welfare studies. Some of the underlying causes of these biases are discussed based on the notion that researchers are only human and hence are not immune to confirmation bias, hindsight bias, and minor ethical transgressions. The author discusses solutions in the form of enhanced transparency, sharing of data and materials, (post-publication) peer review, pre-registration, registered reports, improved training, reporting guidelines, replication, dealing with publication bias, alternative inferential techniques, power, and other statistical tools. PMID:29186879

  16. Effect of Belief Bias on the Development of Undergraduate Students' Reasoning about Inference

    ERIC Educational Resources Information Center

    Kaplan, Jennifer K.

    2009-01-01

    Psychologists have discovered a phenomenon called "Belief Bias" in which subjects rate the strength of arguments based on the believability of the conclusions. This paper reports the results of a small qualitative pilot study of undergraduate students who had previously taken an algebra-based introduction to statistics class. The subjects in this…

  17. Neuroanatomical morphometric characterization of sex differences in youth using statistical learning.

    PubMed

    Sepehrband, Farshid; Lynch, Kirsten M; Cabeen, Ryan P; Gonzalez-Zacarias, Clio; Zhao, Lu; D'Arcy, Mike; Kesselman, Carl; Herting, Megan M; Dinov, Ivo D; Toga, Arthur W; Clark, Kristi A

    2018-05-15

    Exploring neuroanatomical sex differences using a multivariate statistical learning approach can yield insights that cannot be derived with univariate analysis. While gross differences in total brain volume are well-established, uncovering the more subtle, regional sex-related differences in neuroanatomy requires a multivariate approach that can accurately model spatial complexity as well as the interactions between neuroanatomical features. Here, we developed a multivariate statistical learning model using a support vector machine (SVM) classifier to predict sex from MRI-derived regional neuroanatomical features from a single-site study of 967 healthy youth from the Philadelphia Neurodevelopmental Cohort (PNC). Then, we validated the multivariate model on an independent dataset of 682 healthy youth from the multi-site Pediatric Imaging, Neurocognition and Genetics (PING) cohort study. The trained model exhibited an 83% cross-validated prediction accuracy, and correctly predicted the sex of 77% of the subjects from the independent multi-site dataset. Results showed that cortical thickness of the middle occipital lobes and the angular gyri are major predictors of sex. Results also demonstrated the inferential benefits of going beyond classical regression approaches to capture the interactions among brain features in order to better characterize sex differences in male and female youths. We also identified specific cortical morphological measures and parcellation techniques, such as cortical thickness as derived from the Destrieux atlas, that are better able to discriminate between males and females in comparison to other brain atlases (Desikan-Killiany, Brodmann and subcortical atlases). Copyright © 2018 Elsevier Inc. All rights reserved.

  18. Multivariate Statistical Approach Applied to Sediment Source Tracking Through Quantification and Mineral Identification, Cheyenne River, South Dakota

    NASA Astrophysics Data System (ADS)

    Valder, J.; Kenner, S.; Long, A.

    2008-12-01

    Portions of the Cheyenne River are characterized as impaired by the U.S. Environmental Protection Agency because of water-quality exceedences. The Cheyenne River watershed includes the Black Hills National Forest and part of the Badlands National Park. Preliminary analysis indicates that the Badlands National Park is a major contributor to the exceedances of the water-quality constituents for total dissolved solids and total suspended solids. Water-quality data have been collected continuously since 2007, and in the second year of collection (2008), monthly grab and passive sediment samplers are being used to collect total suspended sediment and total dissolved solids in both base-flow and runoff-event conditions. In addition, sediment samples from the river channel, including bed, bank, and floodplain, have been collected. These samples are being analyzed at the South Dakota School of Mines and Technology's X-Ray Diffraction Lab to quantify the mineralogy of the sediments. A multivariate statistical approach (including principal components, least squares, and maximum likelihood techniques) is applied to the mineral percentages that were characterized for each site to identify the contributing source areas that are causing exceedances of sediment transport in the Cheyenne River watershed. Results of the multivariate analysis demonstrate the likely sources of solids found in the Cheyenne River samples. A further refinement of the methods is in progress that utilizes a conceptual model which, when applied with the multivariate statistical approach, provides a better estimate for sediment sources.

  19. Statistical Learning Is Constrained to Less Abstract Patterns in Complex Sensory Input (but not the Least)

    PubMed Central

    Emberson, Lauren L.; Rubinstein, Dani

    2016-01-01

    The influence of statistical information on behavior (either through learning or adaptation) is quickly becoming foundational to many domains of cognitive psychology and cognitive neuroscience, from language comprehension to visual development. We investigate a central problem impacting these diverse fields: when encountering input with rich statistical information, are there any constraints on learning? This paper examines learning outcomes when adult learners are given statistical information across multiple levels of abstraction simultaneously: from abstract, semantic categories of everyday objects to individual viewpoints on these objects. After revealing statistical learning of abstract, semantic categories with scrambled individual exemplars (Exp. 1), participants viewed pictures where the categories as well as the individual objects predicted picture order (e.g., bird1—dog1, bird2—dog2). Our findings suggest that participants preferentially encode the relationships between the individual objects, even in the presence of statistical regularities linking semantic categories (Exps. 2 and 3). In a final experiment we investigate whether learners are biased towards learning object-level regularities or simply construct the most detailed model given the data (and therefore best able to predict the specifics of the upcoming stimulus) by investigating whether participants preferentially learn from the statistical regularities linking individual snapshots of objects or the relationship between the objects themselves (e.g., bird_picture1— dog_picture1, bird_picture2—dog_picture2). We find that participants fail to learn the relationships between individual snapshots, suggesting a bias towards object-level statistical regularities as opposed to merely constructing the most complete model of the input. This work moves beyond the previous existence proofs that statistical learning is possible at both very high and very low levels of abstraction (categories vs. individual objects) and suggests that, at least with the current categories and type of learner, there are biases to pick up on statistical regularities between individual objects even when robust statistical information is present at other levels of abstraction. These findings speak directly to emerging theories about how systems supporting statistical learning and prediction operate in our structure-rich environments. Moreover, the theoretical implications of the current work across multiple domains of study is already clear: statistical learning cannot be assumed to be unconstrained even if statistical learning has previously been established at a given level of abstraction when that information is presented in isolation. PMID:27139779

  20. Bias to CMB lensing reconstruction from temperature anisotropies due to large-scale galaxy motions

    NASA Astrophysics Data System (ADS)

    Ferraro, Simone; Hill, J. Colin

    2018-01-01

    Gravitational lensing of the cosmic microwave background (CMB) is expected to be amongst the most powerful cosmological tools for ongoing and upcoming CMB experiments. In this work, we investigate a bias to CMB lensing reconstruction from temperature anisotropies due to the kinematic Sunyaev-Zel'dovich (kSZ) effect, that is, the Doppler shift of CMB photons induced by Compton scattering off moving electrons. The kSZ signal yields biases due to both its own intrinsic non-Gaussianity and its nonzero cross-correlation with the CMB lensing field (and other fields that trace the large-scale structure). This kSZ-induced bias affects both the CMB lensing autopower spectrum and its cross-correlation with low-redshift tracers. Furthermore, it cannot be removed by multifrequency foreground separation techniques because the kSZ effect preserves the blackbody spectrum of the CMB. While statistically negligible for current data sets, we show that it will be important for upcoming surveys, and failure to account for it can lead to large biases in constraints on neutrino masses or the properties of dark energy. For a stage 4 CMB experiment, the bias can be as large as ≈15 % or 12% in cross-correlation with LSST galaxy lensing convergence or galaxy overdensity maps, respectively, when the maximum temperature multipole used in the reconstruction is ℓmax=4000 , and about half of that when ℓmax=3000 . Similarly, we find that the CMB lensing autopower spectrum can be biased by up to several percent. These biases are many times larger than the expected statistical errors. We validate our analytical predictions with cosmological simulations and present the first complete estimate of secondary-induced CMB lensing biases. The predicted bias is sensitive to the small-scale gas distribution, which is affected by pressure and feedback mechanisms, thus making removal via "bias-hardened" estimators challenging. Reducing ℓmax can significantly mitigate the bias at the cost of a decrease in the overall lensing reconstruction signal-to-noise. A bias ≲1 % on large scales requires ℓmax≲2000 , which leads to a reduction in signal-to-noise by a factor of ≈3 - 5 for a stage 4 CMB experiment. Polarization-only reconstruction may be the most robust mitigation strategy.

  1. Motion Direction Biases and Decoding in Human Visual Cortex

    PubMed Central

    Wang, Helena X.; Merriam, Elisha P.; Freeman, Jeremy

    2014-01-01

    Functional magnetic resonance imaging (fMRI) studies have relied on multivariate analysis methods to decode visual motion direction from measurements of cortical activity. Above-chance decoding has been commonly used to infer the motion-selective response properties of the underlying neural populations. Moreover, patterns of reliable response biases across voxels that underlie decoding have been interpreted to reflect maps of functional architecture. Using fMRI, we identified a direction-selective response bias in human visual cortex that: (1) predicted motion-decoding accuracy; (2) depended on the shape of the stimulus aperture rather than the absolute direction of motion, such that response amplitudes gradually decreased with distance from the stimulus aperture edge corresponding to motion origin; and 3) was present in V1, V2, V3, but not evident in MT+, explaining the higher motion-decoding accuracies reported previously in early visual cortex. These results demonstrate that fMRI-based motion decoding has little or no dependence on the underlying functional organization of motion selectivity. PMID:25209297

  2. Efficient Determination of Free Energy Landscapes in Multiple Dimensions from Biased Umbrella Sampling Simulations Using Linear Regression.

    PubMed

    Meng, Yilin; Roux, Benoît

    2015-08-11

    The weighted histogram analysis method (WHAM) is a standard protocol for postprocessing the information from biased umbrella sampling simulations to construct the potential of mean force with respect to a set of order parameters. By virtue of the WHAM equations, the unbiased density of state is determined by satisfying a self-consistent condition through an iterative procedure. While the method works very effectively when the number of order parameters is small, its computational cost grows rapidly in higher dimension. Here, we present a simple and efficient alternative strategy, which avoids solving the self-consistent WHAM equations iteratively. An efficient multivariate linear regression framework is utilized to link the biased probability densities of individual umbrella windows and yield an unbiased global free energy landscape in the space of order parameters. It is demonstrated with practical examples that free energy landscapes that are comparable in accuracy to WHAM can be generated at a small fraction of the cost.

  3. Efficient Determination of Free Energy Landscapes in Multiple Dimensions from Biased Umbrella Sampling Simulations Using Linear Regression

    PubMed Central

    2015-01-01

    The weighted histogram analysis method (WHAM) is a standard protocol for postprocessing the information from biased umbrella sampling simulations to construct the potential of mean force with respect to a set of order parameters. By virtue of the WHAM equations, the unbiased density of state is determined by satisfying a self-consistent condition through an iterative procedure. While the method works very effectively when the number of order parameters is small, its computational cost grows rapidly in higher dimension. Here, we present a simple and efficient alternative strategy, which avoids solving the self-consistent WHAM equations iteratively. An efficient multivariate linear regression framework is utilized to link the biased probability densities of individual umbrella windows and yield an unbiased global free energy landscape in the space of order parameters. It is demonstrated with practical examples that free energy landscapes that are comparable in accuracy to WHAM can be generated at a small fraction of the cost. PMID:26574437

  4. Testing competing forms of the Milankovitch hypothesis: A multivariate approach

    NASA Astrophysics Data System (ADS)

    Kaufmann, Robert K.; Juselius, Katarina

    2016-02-01

    We test competing forms of the Milankovitch hypothesis by estimating the coefficients and diagnostic statistics for a cointegrated vector autoregressive model that includes 10 climate variables and four exogenous variables for solar insolation. The estimates are consistent with the physical mechanisms postulated to drive glacial cycles. They show that the climate variables are driven partly by solar insolation, determining the timing and magnitude of glaciations and terminations, and partly by internal feedback dynamics, pushing the climate variables away from equilibrium. We argue that the latter is consistent with a weak form of the Milankovitch hypothesis and that it should be restated as follows: internal climate dynamics impose perturbations on glacial cycles that are driven by solar insolation. Our results show that these perturbations are likely caused by slow adjustment between land ice volume and solar insolation. The estimated adjustment dynamics show that solar insolation affects an array of climate variables other than ice volume, each at a unique rate. This implies that previous efforts to test the strong form of the Milankovitch hypothesis by examining the relationship between solar insolation and a single climate variable are likely to suffer from omitted variable bias.

  5. MANCOVA for one way classification with homogeneity of regression coefficient vectors

    NASA Astrophysics Data System (ADS)

    Mokesh Rayalu, G.; Ravisankar, J.; Mythili, G. Y.

    2017-11-01

    The MANOVA and MANCOVA are the extensions of the univariate ANOVA and ANCOVA techniques to multidimensional or vector valued observations. The assumption of a Gaussian distribution has been replaced with the Multivariate Gaussian distribution for the vectors data and residual term variables in the statistical models of these techniques. The objective of MANCOVA is to determine if there are statistically reliable mean differences that can be demonstrated between groups later modifying the newly created variable. When randomization assignment of samples or subjects to groups is not possible, multivariate analysis of covariance (MANCOVA) provides statistical matching of groups by adjusting dependent variables as if all subjects scored the same on the covariates. In this research article, an extension has been made to the MANCOVA technique with more number of covariates and homogeneity of regression coefficient vectors is also tested.

  6. The extent and consequences of p-hacking in science.

    PubMed

    Head, Megan L; Holman, Luke; Lanfear, Rob; Kahn, Andrew T; Jennions, Michael D

    2015-03-01

    A focus on novel, confirmatory, and statistically significant results leads to substantial bias in the scientific literature. One type of bias, known as "p-hacking," occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant. Here, we use text-mining to demonstrate that p-hacking is widespread throughout science. We then illustrate how one can test for p-hacking when performing a meta-analysis and show that, while p-hacking is probably common, its effect seems to be weak relative to the real effect sizes being measured. This result suggests that p-hacking probably does not drastically alter scientific consensuses drawn from meta-analyses.

  7. Analytical framework for reconstructing heterogeneous environmental variables from mammal community structure.

    PubMed

    Louys, Julien; Meloro, Carlo; Elton, Sarah; Ditchfield, Peter; Bishop, Laura C

    2015-01-01

    We test the performance of two models that use mammalian communities to reconstruct multivariate palaeoenvironments. While both models exploit the correlation between mammal communities (defined in terms of functional groups) and arboreal heterogeneity, the first uses a multiple multivariate regression of community structure and arboreal heterogeneity, while the second uses a linear regression of the principal components of each ecospace. The success of these methods means the palaeoenvironment of a particular locality can be reconstructed in terms of the proportions of heavy, moderate, light, and absent tree canopy cover. The linear regression is less biased, and more precisely and accurately reconstructs heavy tree canopy cover than the multiple multivariate model. However, the multiple multivariate model performs better than the linear regression for all other canopy cover categories. Both models consistently perform better than randomly generated reconstructions. We apply both models to the palaeocommunity of the Upper Laetolil Beds, Tanzania. Our reconstructions indicate that there was very little heavy tree cover at this site (likely less than 10%), with the palaeo-landscape instead comprising a mixture of light and absent tree cover. These reconstructions help resolve the previous conflicting palaeoecological reconstructions made for this site. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. The choice of prior distribution for a covariance matrix in multivariate meta-analysis: a simulation study.

    PubMed

    Hurtado Rúa, Sandra M; Mazumdar, Madhu; Strawderman, Robert L

    2015-12-30

    Bayesian meta-analysis is an increasingly important component of clinical research, with multivariate meta-analysis a promising tool for studies with multiple endpoints. Model assumptions, including the choice of priors, are crucial aspects of multivariate Bayesian meta-analysis (MBMA) models. In a given model, two different prior distributions can lead to different inferences about a particular parameter. A simulation study was performed in which the impact of families of prior distributions for the covariance matrix of a multivariate normal random effects MBMA model was analyzed. Inferences about effect sizes were not particularly sensitive to prior choice, but the related covariance estimates were. A few families of prior distributions with small relative biases, tight mean squared errors, and close to nominal coverage for the effect size estimates were identified. Our results demonstrate the need for sensitivity analysis and suggest some guidelines for choosing prior distributions in this class of problems. The MBMA models proposed here are illustrated in a small meta-analysis example from the periodontal field and a medium meta-analysis from the study of stroke. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  9. Linear Regression Quantile Mapping (RQM) - A new approach to bias correction with consistent quantile trends

    NASA Astrophysics Data System (ADS)

    Passow, Christian; Donner, Reik

    2017-04-01

    Quantile mapping (QM) is an established concept that allows to correct systematic biases in multiple quantiles of the distribution of a climatic observable. It shows remarkable results in correcting biases in historical simulations through observational data and outperforms simpler correction methods which relate only to the mean or variance. Since it has been shown that bias correction of future predictions or scenario runs with basic QM can result in misleading trends in the projection, adjusted, trend preserving, versions of QM were introduced in the form of detrended quantile mapping (DQM) and quantile delta mapping (QDM) (Cannon, 2015, 2016). Still, all previous versions and applications of QM based bias correction rely on the assumption of time-independent quantiles over the investigated period, which can be misleading in the context of a changing climate. Here, we propose a novel combination of linear quantile regression (QR) with the classical QM method to introduce a consistent, time-dependent and trend preserving approach of bias correction for historical and future projections. Since QR is a regression method, it is possible to estimate quantiles in the same resolution as the given data and include trends or other dependencies. We demonstrate the performance of the new method of linear regression quantile mapping (RQM) in correcting biases of temperature and precipitation products from historical runs (1959 - 2005) of the COSMO model in climate mode (CCLM) from the Euro-CORDEX ensemble relative to gridded E-OBS data of the same spatial and temporal resolution. A thorough comparison with established bias correction methods highlights the strengths and potential weaknesses of the new RQM approach. References: A.J. Cannon, S.R. Sorbie, T.Q. Murdock: Bias Correction of GCM Precipitation by Quantile Mapping - How Well Do Methods Preserve Changes in Quantiles and Extremes? Journal of Climate, 28, 6038, 2015 A.J. Cannon: Multivariate Bias Correction of Climate Model Outputs - Matching Marginal Distributions and Inter-variable Dependence Structure. Journal of Climate, 29, 7045, 2016

  10. A new multivariate zero-adjusted Poisson model with applications to biomedicine.

    PubMed

    Liu, Yin; Tian, Guo-Liang; Tang, Man-Lai; Yuen, Kam Chuen

    2018-05-25

    Recently, although advances were made on modeling multivariate count data, existing models really has several limitations: (i) The multivariate Poisson log-normal model (Aitchison and Ho, ) cannot be used to fit multivariate count data with excess zero-vectors; (ii) The multivariate zero-inflated Poisson (ZIP) distribution (Li et al., 1999) cannot be used to model zero-truncated/deflated count data and it is difficult to apply to high-dimensional cases; (iii) The Type I multivariate zero-adjusted Poisson (ZAP) distribution (Tian et al., 2017) could only model multivariate count data with a special correlation structure for random components that are all positive or negative. In this paper, we first introduce a new multivariate ZAP distribution, based on a multivariate Poisson distribution, which allows the correlations between components with a more flexible dependency structure, that is some of the correlation coefficients could be positive while others could be negative. We then develop its important distributional properties, and provide efficient statistical inference methods for multivariate ZAP model with or without covariates. Two real data examples in biomedicine are used to illustrate the proposed methods. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Noncentral Chi-Square versus Normal Distributions in Describing the Likelihood Ratio Statistic: The Univariate Case and Its Multivariate Implication

    ERIC Educational Resources Information Center

    Yuan, Ke-Hai

    2008-01-01

    In the literature of mean and covariance structure analysis, noncentral chi-square distribution is commonly used to describe the behavior of the likelihood ratio (LR) statistic under alternative hypothesis. Due to the inaccessibility of the rather technical literature for the distribution of the LR statistic, it is widely believed that the…

  12. Some Tests of Randomness with Applications

    DTIC Science & Technology

    1981-02-01

    freedom. For further details, the reader is referred to Gnanadesikan (1977, p. 169) wherein other relevant tests are also given, Graphical tests, as...sample from a gamma distri- bution. J. Am. Statist. Assoc. 71, 480-7. Gnanadesikan , R. (1977). Methods for Statistical Data Analysis of Multivariate

  13. Statistical polarization in greenhouse gas emissions: Theory and evidence.

    PubMed

    Remuzgo, Lorena; Trueba, Carmen

    2017-11-01

    The current debate on climate change is over whether global warming can be limited in order to lessen its impacts. In this sense, evidence of a decrease in the statistical polarization in greenhouse gas (GHG) emissions could encourage countries to establish a stronger multilateral climate change agreement. Based on the interregional and intraregional components of the multivariate generalised entropy measures (Maasoumi, 1986), Gigliarano and Mosler (2009) proposed to study the statistical polarization concept from a multivariate view. In this paper, we apply this approach to study the evolution of such phenomenon in the global distribution of the main GHGs. The empirical analysis has been carried out for the time period 1990-2011, considering an endogenous grouping of countries (Aghevli and Mehran, 1981; Davies and Shorrocks, 1989). Most of the statistical polarization indices showed a slightly increasing pattern that was similar regardless of the number of groups considered. Finally, some policy implications are commented. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Propensity score to detect baseline imbalance in cluster randomized trials: the role of the c-statistic.

    PubMed

    Leyrat, Clémence; Caille, Agnès; Foucher, Yohann; Giraudeau, Bruno

    2016-01-22

    Despite randomization, baseline imbalance and confounding bias may occur in cluster randomized trials (CRTs). Covariate imbalance may jeopardize the validity of statistical inferences if they occur on prognostic factors. Thus, the diagnosis of a such imbalance is essential to adjust statistical analysis if required. We developed a tool based on the c-statistic of the propensity score (PS) model to detect global baseline covariate imbalance in CRTs and assess the risk of confounding bias. We performed a simulation study to assess the performance of the proposed tool and applied this method to analyze the data from 2 published CRTs. The proposed method had good performance for large sample sizes (n =500 per arm) and when the number of unbalanced covariates was not too small as compared with the total number of baseline covariates (≥40% of unbalanced covariates). We also provide a strategy for pre selection of the covariates needed to be included in the PS model to enhance imbalance detection. The proposed tool could be useful in deciding whether covariate adjustment is required before performing statistical analyses of CRTs.

  15. Towards identification of relevant variables in the observed aerosol optical depth bias between MODIS and AERONET observations

    NASA Astrophysics Data System (ADS)

    Malakar, N. K.; Lary, D. J.; Gencaga, D.; Albayrak, A.; Wei, J.

    2013-08-01

    Measurements made by satellite remote sensing, Moderate Resolution Imaging Spectroradiometer (MODIS), and globally distributed Aerosol Robotic Network (AERONET) are compared. Comparison of the two datasets measurements for aerosol optical depth values show that there are biases between the two data products. In this paper, we present a general framework towards identifying relevant set of variables responsible for the observed bias. We present a general framework to identify the possible factors influencing the bias, which might be associated with the measurement conditions such as the solar and sensor zenith angles, the solar and sensor azimuth, scattering angles, and surface reflectivity at the various measured wavelengths, etc. Specifically, we performed analysis for remote sensing Aqua-Land data set, and used machine learning technique, neural network in this case, to perform multivariate regression between the ground-truth and the training data sets. Finally, we used mutual information between the observed and the predicted values as the measure of similarity to identify the most relevant set of variables. The search is brute force method as we have to consider all possible combinations. The computations involves a huge number crunching exercise, and we implemented it by writing a job-parallel program.

  16. Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design.

    PubMed

    Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo

    2017-03-15

    Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  17. TATES: Efficient Multivariate Genotype-Phenotype Analysis for Genome-Wide Association Studies

    PubMed Central

    van der Sluis, Sophie; Posthuma, Danielle; Dolan, Conor V.

    2013-01-01

    To date, the genome-wide association study (GWAS) is the primary tool to identify genetic variants that cause phenotypic variation. As GWAS analyses are generally univariate in nature, multivariate phenotypic information is usually reduced to a single composite score. This practice often results in loss of statistical power to detect causal variants. Multivariate genotype–phenotype methods do exist but attain maximal power only in special circumstances. Here, we present a new multivariate method that we refer to as TATES (Trait-based Association Test that uses Extended Simes procedure), inspired by the GATES procedure proposed by Li et al (2011). For each component of a multivariate trait, TATES combines p-values obtained in standard univariate GWAS to acquire one trait-based p-value, while correcting for correlations between components. Extensive simulations, probing a wide variety of genotype–phenotype models, show that TATES's false positive rate is correct, and that TATES's statistical power to detect causal variants explaining 0.5% of the variance can be 2.5–9 times higher than the power of univariate tests based on composite scores and 1.5–2 times higher than the power of the standard MANOVA. Unlike other multivariate methods, TATES detects both genetic variants that are common to multiple phenotypes and genetic variants that are specific to a single phenotype, i.e. TATES provides a more complete view of the genetic architecture of complex traits. As the actual causal genotype–phenotype model is usually unknown and probably phenotypically and genetically complex, TATES, available as an open source program, constitutes a powerful new multivariate strategy that allows researchers to identify novel causal variants, while the complexity of traits is no longer a limiting factor. PMID:23359524

  18. Chemotherapy Versus Chemoradiation for Node-Positive Bladder Cancer: Practice Patterns and Outcomes from the National Cancer Data Base

    PubMed Central

    Haque, Waqar; Verma, Vivek; Butler, E. Brian; Teh, Bin S.

    2017-01-01

    Background: Management of clinically node-positive bladder cancer (cN+ BC) is poorly defined; national guidelines recommend chemotherapy (CT) alone or chemoradiation (CRT). Objective: Using a large, contemporary dataset, we evaluated national practice patterns and outcomes of CT versus CRT to elucidate the optimal therapy for this patient population. Methods: The National Cancer Data Base (NCDB) was queried (2004–2013) for patients diagnosed with cTanyN1-3M0 BC. Patients were divided into two groups: CT alone or CRT. Statistics included multivariable logistic regression to determine factors predictive of receiving additional radiotherapy, Kaplan-Meier analysis to evaluate overall survival (OS), and Cox proportional hazards modeling to determine variables associated with OS. Propensity score matching was performed to assess groups in a balanced manner while reducing indication biases. Results: Of 1,783 total patients, 1,388 (77.8%) underwent CT alone, and 395 (22.2%) CRT. Although patients receiving CRT tended to be of higher socioeconomic status, they were more likely older (p = 0.053), higher T stage, N1 (versus N2) disease, squamous histology, and treated at a non-academic center (p < 0.05). Median overall survival (OS) was 19.0 months and 13.8 months (p < 0.001) for patients receiving CRT or CT, respectively. On Cox multivariate analysis, receipt of CRT was independently associated with improved survival (p < 0.001). Outcome improvements with CRT persisted on evaluation of propensity-matched populations (p < 0.001). Conclusions: CRT is underutilized in the United States for cN+ BC but is independently associated with improved survival despite being preferentially administered to a somewhat higher-risk population. PMID:29152552

  19. Estimators for longitudinal latent exposure models: examining measurement model assumptions.

    PubMed

    Sánchez, Brisa N; Kim, Sehee; Sammel, Mary D

    2017-06-15

    Latent variable (LV) models are increasingly being used in environmental epidemiology as a way to summarize multiple environmental exposures and thus minimize statistical concerns that arise in multiple regression. LV models may be especially useful when multivariate exposures are collected repeatedly over time. LV models can accommodate a variety of assumptions but, at the same time, present the user with many choices for model specification particularly in the case of exposure data collected repeatedly over time. For instance, the user could assume conditional independence of observed exposure biomarkers given the latent exposure and, in the case of longitudinal latent exposure variables, time invariance of the measurement model. Choosing which assumptions to relax is not always straightforward. We were motivated by a study of prenatal lead exposure and mental development, where assumptions of the measurement model for the time-changing longitudinal exposure have appreciable impact on (maximum-likelihood) inferences about the health effects of lead exposure. Although we were not particularly interested in characterizing the change of the LV itself, imposing a longitudinal LV structure on the repeated multivariate exposure measures could result in high efficiency gains for the exposure-disease association. We examine the biases of maximum likelihood estimators when assumptions about the measurement model for the longitudinal latent exposure variable are violated. We adapt existing instrumental variable estimators to the case of longitudinal exposures and propose them as an alternative to estimate the health effects of a time-changing latent predictor. We show that instrumental variable estimators remain unbiased for a wide range of data generating models and have advantages in terms of mean squared error. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  20. Radiation Therapy Administration and Survival in Stage I/II Extranodal Marginal Zone B-Cell Lymphoma of Mucosa-Associated Lymphoid Tissue

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Olszewski, Adam J., E-mail: adam_olszewski@brown.edu; Desai, Amrita

    2014-03-01

    Purpose: To determine the factors associated with the use of radiation therapy and associated survival outcomes in early-stage marginal zone lymphoma of the mucosa-associated lymphoid tissue (MALT). Methods and Materials: We extracted data on adult patients with stage I/II MALT lymphoma diagnoses between 1998 and 2010 recorded in the Surveillance, Epidemiology, and End Results (SEER) database. We studied factors associated with radiation therapy administration in a logistic regression model and described the cumulative incidence of lymphoma-related death (LRD) according to receipt of the treatment. The association of radiation therapy with survival was explored in multivariate models with adjustment for immortalmore » time bias. Results: Of the 7774 identified patients, 36% received radiation therapy as part of the initial course of treatment. Older patients; black or Hispanic men; white, Hispanic, and black women; and socioeconomically disadvantaged and underinsured patients had a significantly lower chance of receiving radiation therapy. Radiation therapy administration was associated with a lower chance of LRD in most sites. In cutaneous, ocular, and salivary MALT lymphomas, the 5-year estimate of LRD after radiation therapy was 0%. The association of radiation therapy with overall survival in different lymphoma sites was heterogeneous, and statistically significant in cutaneous (hazard ratio 0.45, P=.009) and ocular (hazard ratio 0.47, P<.0001) locations after multivariate adjustment. Conclusions: Demographic factors are associated with the use of radiation therapy in MALT lymphoma. Clinicians should be sensitive to those disparities because the administration of radiation therapy may be associated with improved survival, particularly in cutaneous and ocular lymphomas.« less

  1. A New Multivariate Approach in Generating Ensemble Meteorological Forcings for Hydrological Forecasting

    NASA Astrophysics Data System (ADS)

    Khajehei, Sepideh; Moradkhani, Hamid

    2015-04-01

    Producing reliable and accurate hydrologic ensemble forecasts are subject to various sources of uncertainty, including meteorological forcing, initial conditions, model structure, and model parameters. Producing reliable and skillful precipitation ensemble forecasts is one approach to reduce the total uncertainty in hydrological applications. Currently, National Weather Prediction (NWP) models are developing ensemble forecasts for various temporal ranges. It is proven that raw products from NWP models are biased in mean and spread. Given the above state, there is a need for methods that are able to generate reliable ensemble forecasts for hydrological applications. One of the common techniques is to apply statistical procedures in order to generate ensemble forecast from NWP-generated single-value forecasts. The procedure is based on the bivariate probability distribution between the observation and single-value precipitation forecast. However, one of the assumptions of the current method is fitting Gaussian distribution to the marginal distributions of observed and modeled climate variable. Here, we have described and evaluated a Bayesian approach based on Copula functions to develop an ensemble precipitation forecast from the conditional distribution of single-value precipitation forecasts. Copula functions are known as the multivariate joint distribution of univariate marginal distributions, which are presented as an alternative procedure in capturing the uncertainties related to meteorological forcing. Copulas are capable of modeling the joint distribution of two variables with any level of correlation and dependency. This study is conducted over a sub-basin in the Columbia River Basin in USA using the monthly precipitation forecasts from Climate Forecast System (CFS) with 0.5x0.5 Deg. spatial resolution to reproduce the observations. The verification is conducted on a different period and the superiority of the procedure is compared with Ensemble Pre-Processor approach currently used by National Weather Service River Forecast Centers in USA.

  2. A multi-institutional, propensity-score-matched comparison of post-operative outcomes between general anesthesia and monitored anesthesia care with intravenous sedation in umbilical hernia repair.

    PubMed

    Vu, M M; Galiano, R D; Souza, J M; Du Qin, C; Kim, J Y S

    2016-08-01

    Monitored anesthesia care with intravenous sedation (MAC/IV), recently proposed as a good choice for hernia repair, has faster recovery and better patient satisfaction than general anesthesia; however the possibility of oversedation and respiratory distress is a widespread concern. There is a paucity of the literature examining umbilical hernia repairs (UHR) and optimal anesthesia choice, despite its importance in determining clinical outcomes. A retrospective analysis of anesthesia type in UHR was performed in the National Surgical Quality Improvement Program 2005-2013 database. General anesthesia and MAC/IV groups were propensity-score-matched (PSM) to reduce treatment selection bias. Surgical complications, medical complications, and post-operative hospital stays exceeding 1 day were the primary outcomes of interest. Pre-operative characteristics and post-operative outcomes were compared between the two anesthesia groups using univariate and multivariate statistics. PSM removed all observed differences between the two groups (p > 0.05 for all tracked pre-operative characteristics). MAC/IV cases required fewer post-operative hospital stays exceeding 1 day (3.5 vs 6.3 %, p < 0.001). Univariate analysis showed that overall complication rate did not differ (1.7 vs 1.8 %, p = 0.569), however MAC/IV cases resulted in fewer incidences of septic shock (<0.1 vs 0.1 %, p = 0.016). After multivariate logistic regression, MAC/IV was revealed to yield significantly lower chances of overall medical complications (OR = 0.654, p = 0.046). UHR under MAC/IV causes fewer medical complications and reduces post-operative hospital stays compared to general anesthesia. The implications for surgeons and patients are broad, including improved surgical safety, cost-effective care, and patient satisfaction.

  3. The Association Between Immigration Status and Office-based Medical Provider Visits for Cancer Patients in the United States.

    PubMed

    Wang, Yang; Wilson, Fernando A; Chen, Li-Wu

    2017-06-01

    We examined differences in cancer-related office-based provider visits associated with immigration status in the United States. Data from the 2007-2012 Medical Expenditure Panel Survey and National Health Interview Survey included adult patients diagnosed with cancer. Univariate analyses described distributions of cancer-related office-based provider visits received, expenditures, visit characteristics, as well as demographic, socioeconomic, and health covariates, across immigration groups. We measured the relationships of immigrant status to number of visits and associated expenditure within the past 12 months, adjusting for age, sex, educational attainment, race/ethnicity, self-reported health status, time since cancer diagnosis, cancer remission status, marital status, poverty status, insurance status, and usual source of care. We finally performed sensitivity analyses for regression results by using the propensity score matching method to adjust for potential selection bias. Noncitizens had about 2 fewer visits in a 12-month period in comparison to US-born citizens (4.0 vs. 5.9). Total expenditure per patient was higher for US-born citizens than immigrants (not statistically significant). Noncitizens (88.3%) were more likely than US-born citizens (76.6%) to be seen by a medical doctor during a visit. Multivariate regression results showed that noncitizens had 42% lower number of visiting medical providers at office-based settings for cancer care than US-born citizens, after adjusting for all the other covariates. There were no significant differences in expenditures across immigration groups. The propensity score matching results were largely consistent with those in multivariate-adjusted regressions. Results suggest targeted interventions are needed to reduce disparities in utilization between immigrants and US-born citizen cancer patients.

  4. Novel statistical tools for management of public databases facilitate community-wide replicability and control of false discovery.

    PubMed

    Rosset, Saharon; Aharoni, Ehud; Neuvirth, Hani

    2014-07-01

    Issues of publication bias, lack of replicability, and false discovery have long plagued the genetics community. Proper utilization of public and shared data resources presents an opportunity to ameliorate these problems. We present an approach to public database management that we term Quality Preserving Database (QPD). It enables perpetual use of the database for testing statistical hypotheses while controlling false discovery and avoiding publication bias on the one hand, and maintaining testing power on the other hand. We demonstrate it on a use case of a replication server for GWAS findings, underlining its practical utility. We argue that a shift to using QPD in managing current and future biological databases will significantly enhance the community's ability to make efficient and statistically sound use of the available data resources. © 2014 WILEY PERIODICALS, INC.

  5. Galaxy bias and primordial non-Gaussianity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Assassi, Valentin; Baumann, Daniel; Schmidt, Fabian, E-mail: assassi@ias.edu, E-mail: D.D.Baumann@uva.nl, E-mail: fabians@MPA-Garching.MPG.DE

    2015-12-01

    We present a systematic study of galaxy biasing in the presence of primordial non-Gaussianity. For a large class of non-Gaussian initial conditions, we define a general bias expansion and prove that it is closed under renormalization, thereby showing that the basis of operators in the expansion is complete. We then study the effects of primordial non-Gaussianity on the statistics of galaxies. We show that the equivalence principle enforces a relation between the scale-dependent bias in the galaxy power spectrum and that in the dipolar part of the bispectrum. This provides a powerful consistency check to confirm the primordial origin ofmore » any observed scale-dependent bias. Finally, we also discuss the imprints of anisotropic non-Gaussianity as motivated by recent studies of higher-spin fields during inflation.« less

  6. A re-examination of the effects of biased lineup instructions in eyewitness identification.

    PubMed

    Clark, Steven E

    2005-10-01

    A meta-analytic review of research comparing biased and unbiased instructions in eyewitness identification experiments showed an asymmetry; specifically, that biased instructions led to a large and consistent decrease in accuracy in target-absent lineups, but produced inconsistent results for target-present lineups, with an average effect size near zero (Steblay, 1997). The results for target-present lineups are surprising, and are inconsistent with statistical decision theories (i.e., Green & Swets, 1966). A re-examination of the relevant studies and the meta-analysis of those studies shows clear evidence that correct identification rates do increase with biased lineup instructions, and that biased witnesses make correct identifications at a rate considerably above chance. Implications for theory, as well as police procedure and policy, are discussed.

  7. A re-examination of the effects of biased lineup instructions in eyewitness identification.

    PubMed

    Clark, Steven E

    2005-08-01

    A meta-analytic review of research comparing biased and unbiased instructions in eyewitness identification experiments showed an asymmetry, specifically that biased instructions led to a large and consistent decrease in accuracy in target-absent lineups, but produced inconsistent results for target-present lineups, with an average effect size near zero (N. M. Steblay, 1997). The results for target-present lineups are surprising, and are inconsistent with statistical decision theories (i.e., D. M. Green & J. A. Swets, 1966). A re-examination of the relevant studies and the meta-analysis of those studies shows clear evidence that correct identification rates do increase with biased lineup instructions, and that biased witnesses make correct identifications at a rate considerably above chance. Implications for theory, as well as police procedure and policy, are discussed.

  8. Using Bayesian Models to Assess the Effects of Under-reporting of Cannabis Use on the Association with Birth Defects, National Birth Defects Prevention Study, 1997–2005

    PubMed Central

    van Gelder, Marleen M. H. J.; Rogier, A.; Donders, T.; Devine, Owen; Roeleveld, Nel; Reefhuis, Jennita

    2015-01-01

    Background Studies on associations between periconceptional cannabis exposure and birth defects have mainly relied on self-reported exposure. Therefore, the results may be biased due to underreporting of the exposure. The aim of this study was to quantify the potential effects of this form of exposure misclassification. Methods Using multivariable logistic regression, we re-analyzed associations between periconceptional cannabis use and 20 specific birth defects using data from the National Birth Defects Prevention Study from 1997–2005 for 13 859 case infants and 6556 control infants. For seven birth defects, we implemented four Bayesian models based on various assumptions concerning the sensitivity of self-reported cannabis use to estimate odds ratios (ORs), adjusted for confounding and underreporting of the exposure. We used information on sensitivity of self-reported cannabis use from the literature for prior assumptions. Results The results unadjusted for underreporting of the exposure showed an association between cannabis use and anencephaly (posterior OR 1.9 [95% credible interval (CRI) 1.1, 3.2]) which persisted after adjustment for potential exposure misclassification. Initially, no statistically significant associations were observed between cannabis use and the other birth defect categories studied. Although adjustment for underreporting did not notably change these effect estimates, cannabis use was associated with esophageal atresia (posterior OR 1.7 [95% CRI 1.0, 2.9]), diaphragmatic hernia (posterior OR 1.8 [95% CRI 1.1, 3.0]) and gastroschisis (posterior OR 1.7 [95% CRI 1.2, 2.3]) after correction for exposure misclassification. Conclusions Underreporting of the exposure may have obscured some cannabis-birth defect associations in previous studies. However, the resulting bias is likely to be limited. PMID:25155701

  9. Using bayesian models to assess the effects of under-reporting of cannabis use on the association with birth defects, national birth defects prevention study, 1997-2005.

    PubMed

    van Gelder, Marleen M H J; Donders, A Rogier T; Devine, Owen; Roeleveld, Nel; Reefhuis, Jennita

    2014-09-01

    Studies on associations between periconceptional cannabis exposure and birth defects have mainly relied on self-reported exposure. Therefore, the results may be biased due to under-reporting of the exposure. The aim of this study was to quantify the potential effects of this form of exposure misclassification. Using multivariable logistic regression, we re-analysed associations between periconceptional cannabis use and 20 specific birth defects using data from the National Birth Defects Prevention Study from 1997-2005 for 13 859 case infants and 6556 control infants. For seven birth defects, we implemented four Bayesian models based on various assumptions concerning the sensitivity of self-reported cannabis use to estimate odds ratios (ORs), adjusted for confounding and under-reporting of the exposure. We used information on sensitivity of self-reported cannabis use from the literature for prior assumptions. The results unadjusted for under-reporting of the exposure showed an association between cannabis use and anencephaly (posterior OR 1.9 [95% credible interval (CRI) 1.1, 3.2]) which persisted after adjustment for potential exposure misclassification. Initially, no statistically significant associations were observed between cannabis use and the other birth defect categories studied. Although adjustment for under-reporting did not notably change these effect estimates, cannabis use was associated with esophageal atresia (posterior OR 1.7 [95% CRI 1.0, 2.9]), diaphragmatic hernia (posterior OR 1.8 [95% CRI 1.1, 3.0]), and gastroschisis (posterior OR 1.7 [95% CRI 1.2, 2.3]) after correction for exposure misclassification. Under-reporting of the exposure may have obscured some cannabis-birth defect associations in previous studies. However, the resulting bias is likely to be limited. © 2014 John Wiley & Sons Ltd.

  10. In Utero Exposure to Histological Chorioamnionitis Primes the Exometabolomic Profiles of Preterm CD4+ T Lymphocytes.

    PubMed

    Matta, Poojitha; Sherrod, Stacy D; Marasco, Christina C; Moore, Daniel J; McLean, John A; Weitkamp, Joern-Hendrik

    2017-11-01

    Histological chorioamnionitis (HCA) is an intrauterine inflammatory condition that increases the risk for preterm birth, death, and disability because of persistent systemic and localized inflammation. The immunological mechanisms sustaining this response in the preterm newborn remain unclear. We sought to determine the consequences of HCA exposure on the fetal CD4 + T lymphocyte exometabolome. We cultured naive CD4 + T lymphocytes from HCA-positive and -negative preterm infants matched for gestational age, sex, race, prenatal steroid exposure, and delivery mode. We collected conditioned media samples before and after a 6-h in vitro activation of naive CD4 + T lymphocytes with soluble staphylococcal enterotoxin B and anti-CD28. We analyzed samples by ultraperformance liquid chromatography ion mobility-mass spectrometry. We determined the impact of HCA on the CD4 + T lymphocyte exometabolome and identified potential biomarker metabolites by multivariate statistical analyses. We discovered that: 1) CD4 + T lymphocytes exposed to HCA exhibit divergent exometabolomic profiles in both naive and activated states; 2) ∼30% of detected metabolites differentially expressed in response to activation were unique to HCA-positive CD4 + T lymphocytes; 3) metabolic pathways associated with glutathione detoxification and tryptophan degradation were altered in HCA-positive CD4 + T lymphocytes; and 4) flow cytometry and cytokine analyses suggested a bias toward a T H 1-biased immune response in HCA-positive samples. HCA exposure primes the neonatal adaptive immune processes by inducing changes to the exometabolomic profile of fetal CD4 + T lymphocytes. These exometabolomic changes may link HCA exposure to T H 1 polarization of the neonatal adaptive immune response. Copyright © 2017 by The American Association of Immunologists, Inc.

  11. Estimation of the uncertainty of a climate model using an ensemble simulation

    NASA Astrophysics Data System (ADS)

    Barth, A.; Mathiot, P.; Goosse, H.

    2012-04-01

    The atmospheric forcings play an important role in the study of the ocean and sea-ice dynamics of the Southern Ocean. Error in the atmospheric forcings will inevitably result in uncertain model results. The sensitivity of the model results to errors in the atmospheric forcings are studied with ensemble simulations using multivariate perturbations of the atmospheric forcing fields. The numerical ocean model used is the NEMO-LIM in a global configuration with an horizontal resolution of 2°. NCEP reanalyses are used to provide air temperature and wind data to force the ocean model over the last 50 years. A climatological mean is used to prescribe relative humidity, cloud cover and precipitation. In a first step, the model results is compared with OSTIA SST and OSI SAF sea ice concentration of the southern hemisphere. The seasonal behavior of the RMS difference and bias in SST and ice concentration is highlighted as well as the regions with relatively high RMS errors and biases such as the Antarctic Circumpolar Current and near the ice-edge. Ensemble simulations are performed to statistically characterize the model error due to uncertainties in the atmospheric forcings. Such information is a crucial element for future data assimilation experiments. Ensemble simulations are performed with perturbed air temperature and wind forcings. A Fourier decomposition of the NCEP wind vectors and air temperature for 2007 is used to generate ensemble perturbations. The perturbations are scaled such that the resulting ensemble spread matches approximately the RMS differences between the satellite SST and sea ice concentration. The ensemble spread and covariance are analyzed for the minimum and maximum sea ice extent. It is shown that errors in the atmospheric forcings can extend to several hundred meters in depth near the Antarctic Circumpolar Current.

  12. Affective bias as a rational response to the statistics of rewards and punishments.

    PubMed

    Pulcu, Erdem; Browning, Michael

    2017-10-04

    Affective bias, the tendency to differentially prioritise the processing of negative relative to positive events, is commonly observed in clinical and non-clinical populations. However, why such biases develop is not known. Using a computational framework, we investigated whether affective biases may reflect individuals' estimates of the information content of negative relative to positive events. During a reinforcement learning task, the information content of positive and negative outcomes was manipulated independently by varying the volatility of their occurrence. Human participants altered the learning rates used for the outcomes selectively, preferentially learning from the most informative. This behaviour was associated with activity of the central norepinephrine system, estimated using pupilometry, for loss outcomes. Humans maintain independent estimates of the information content of distinct positive and negative outcomes which may bias their processing of affective events. Normalising affective biases using computationally inspired interventions may represent a novel approach to treatment development.

  13. Affective bias as a rational response to the statistics of rewards and punishments

    PubMed Central

    Pulcu, Erdem

    2017-01-01

    Affective bias, the tendency to differentially prioritise the processing of negative relative to positive events, is commonly observed in clinical and non-clinical populations. However, why such biases develop is not known. Using a computational framework, we investigated whether affective biases may reflect individuals’ estimates of the information content of negative relative to positive events. During a reinforcement learning task, the information content of positive and negative outcomes was manipulated independently by varying the volatility of their occurrence. Human participants altered the learning rates used for the outcomes selectively, preferentially learning from the most informative. This behaviour was associated with activity of the central norepinephrine system, estimated using pupilometry, for loss outcomes. Humans maintain independent estimates of the information content of distinct positive and negative outcomes which may bias their processing of affective events. Normalising affective biases using computationally inspired interventions may represent a novel approach to treatment development. PMID:28976304

  14. Quantitative investigation of inappropriate regression model construction and the importance of medical statistics experts in observational medical research: a cross-sectional study.

    PubMed

    Nojima, Masanori; Tokunaga, Mutsumi; Nagamura, Fumitaka

    2018-05-05

    To investigate under what circumstances inappropriate use of 'multivariate analysis' is likely to occur and to identify the population that needs more support with medical statistics. The frequency of inappropriate regression model construction in multivariate analysis and related factors were investigated in observational medical research publications. The inappropriate algorithm of using only variables that were significant in univariate analysis was estimated to occur at 6.4% (95% CI 4.8% to 8.5%). This was observed in 1.1% of the publications with a medical statistics expert (hereinafter 'expert') as the first author, 3.5% if an expert was included as coauthor and in 12.2% if experts were not involved. In the publications where the number of cases was 50 or less and the study did not include experts, inappropriate algorithm usage was observed with a high proportion of 20.2%. The OR of the involvement of experts for this outcome was 0.28 (95% CI 0.15 to 0.53). A further, nation-level, analysis showed that the involvement of experts and the implementation of unfavourable multivariate analysis are associated at the nation-level analysis (R=-0.652). Based on the results of this study, the benefit of participation of medical statistics experts is obvious. Experts should be involved for proper confounding adjustment and interpretation of statistical models. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  15. A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula.

    PubMed

    Ince, Robin A A; Giordano, Bruno L; Kayser, Christoph; Rousselet, Guillaume A; Gross, Joachim; Schyns, Philippe G

    2017-03-01

    We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open-source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541-1573, 2017. © 2016 Wiley Periodicals, Inc. 2016 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.

  16. Data-Based Detection of Potential Terrorist Attacks: Statistical and Graphical Methods

    DTIC Science & Technology

    2010-06-01

    Naren; Vasquez-Robinet, Cecilia; Watkinson, Jonathan: "A General Probabilistic Model of the PCR Process," Applied Mathematics and Computation 182(1...September 2006. Seminar, Measuring the effect of Length biased sampling, Mathematical Sciences Section, National Security Agency, 19 September 2006...Committee on National Statistics, 9 February 2007. Invited seminar, Statistical Tests for Bullet Lead Comparisons, Department of Mathematics , Butler

  17. Multivariate postprocessing techniques for probabilistic hydrological forecasting

    NASA Astrophysics Data System (ADS)

    Hemri, Stephan; Lisniak, Dmytro; Klein, Bastian

    2016-04-01

    Hydrologic ensemble forecasts driven by atmospheric ensemble prediction systems need statistical postprocessing in order to account for systematic errors in terms of both mean and spread. Runoff is an inherently multivariate process with typical events lasting from hours in case of floods to weeks or even months in case of droughts. This calls for multivariate postprocessing techniques that yield well calibrated forecasts in univariate terms and ensure a realistic temporal dependence structure at the same time. To this end, the univariate ensemble model output statistics (EMOS; Gneiting et al., 2005) postprocessing method is combined with two different copula approaches that ensure multivariate calibration throughout the entire forecast horizon. These approaches comprise ensemble copula coupling (ECC; Schefzik et al., 2013), which preserves the dependence structure of the raw ensemble, and a Gaussian copula approach (GCA; Pinson and Girard, 2012), which estimates the temporal correlations from training observations. Both methods are tested in a case study covering three subcatchments of the river Rhine that represent different sizes and hydrological regimes: the Upper Rhine up to the gauge Maxau, the river Moselle up to the gauge Trier, and the river Lahn up to the gauge Kalkofen. The results indicate that both ECC and GCA are suitable for modelling the temporal dependences of probabilistic hydrologic forecasts (Hemri et al., 2015). References Gneiting, T., A. E. Raftery, A. H. Westveld, and T. Goldman (2005), Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation, Monthly Weather Review, 133(5), 1098-1118, DOI: 10.1175/MWR2904.1. Hemri, S., D. Lisniak, and B. Klein, Multivariate postprocessing techniques for probabilistic hydrological forecasting, Water Resources Research, 51(9), 7436-7451, DOI: 10.1002/2014WR016473. Pinson, P., and R. Girard (2012), Evaluating the quality of scenarios of short-term wind power generation, Applied Energy, 96, 12-20, DOI: 10.1016/j.apenergy.2011.11.004. Schefzik, R., T. L. Thorarinsdottir, and T. Gneiting (2013), Uncertainty quantification in complex simulation models using ensemble copula coupling, Statistical Science, 28, 616-640, DOI: 10.1214/13-STS443.

  18. Analysis of threats to research validity introduced by audio recording clinic visits: Selection bias, Hawthorne effect, both, or neither?

    PubMed Central

    Henry, Stephen G.; Jerant, Anthony; Iosif, Ana-Maria; Feldman, Mitchell D.; Cipri, Camille; Kravitz, Richard L.

    2015-01-01

    Objective To identify factors associated with participant consent to record visits; to estimate effects of recording on patient-clinician interactions Methods Secondary analysis of data from a randomized trial studying communication about depression; participants were asked for optional consent to audio record study visits. Multiple logistic regression was used to model likelihood of patient and clinician consent. Multivariable regression and propensity score analyses were used to estimate effects of audio recording on 6 dependent variables: discussion of depressive symptoms, preventive health, and depression diagnosis; depression treatment recommendations; visit length; visit difficulty. Results Of 867 visits involving 135 primary care clinicians, 39% were recorded. For clinicians, only working in academic settings (P=0.003) and having worked longer at their current practice (P=0.02) were associated with increased likelihood of consent. For patients, white race (P=0.002) and diabetes (P=0.03) were associated with increased likelihood of consent. Neither multivariable regression nor propensity score analyses revealed any significant effects of recording on the variables examined. Conclusion Few clinician or patient characteristics were significantly associated with consent. Audio recording had no significant effect on any dependent variables. Practice Implications Benefits of recording clinic visits likely outweigh the risks of bias in this setting. PMID:25837372

  19. Exploring the Structure of Library and Information Science Web Space Based on Multivariate Analysis of Social Tags

    ERIC Educational Resources Information Center

    Joo, Soohyung; Kipp, Margaret E. I.

    2015-01-01

    Introduction: This study examines the structure of Web space in the field of library and information science using multivariate analysis of social tags from the Website, Delicious.com. A few studies have examined mathematical modelling of tags, mainly examining tagging in terms of tripartite graphs, pattern tracing and descriptive statistics. This…

  20. Identification of Differential Item Functioning in Multiple-Group Settings: A Multivariate Outlier Detection Approach

    ERIC Educational Resources Information Center

    Magis, David; De Boeck, Paul

    2011-01-01

    We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is…

Top