Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo
2018-01-01
This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555
A new test of multivariate nonlinear causality
Bai, Zhidong; Jiang, Dandan; Lv, Zhihui; Wong, Wing-Keung; Zheng, Shurong
2018-01-01
The multivariate nonlinear Granger causality developed by Bai et al. (2010) (Mathematics and Computers in simulation. 2010; 81: 5-17) plays an important role in detecting the dynamic interrelationships between two groups of variables. Following the idea of Hiemstra-Jones (HJ) test proposed by Hiemstra and Jones (1994) (Journal of Finance. 1994; 49(5): 1639-1664), they attempt to establish a central limit theorem (CLT) of their test statistic by applying the asymptotical property of multivariate U-statistic. However, Bai et al. (2016) (2016; arXiv: 1701.03992) revisit the HJ test and find that the test statistic given by HJ is NOT a function of U-statistics which implies that the CLT neither proposed by Hiemstra and Jones (1994) nor the one extended by Bai et al. (2010) is valid for statistical inference. In this paper, we re-estimate the probabilities and reestablish the CLT of the new test statistic. Numerical simulation shows that our new estimates are consistent and our new test performs decent size and power. PMID:29304085
A new test of multivariate nonlinear causality.
Bai, Zhidong; Hui, Yongchang; Jiang, Dandan; Lv, Zhihui; Wong, Wing-Keung; Zheng, Shurong
2018-01-01
The multivariate nonlinear Granger causality developed by Bai et al. (2010) (Mathematics and Computers in simulation. 2010; 81: 5-17) plays an important role in detecting the dynamic interrelationships between two groups of variables. Following the idea of Hiemstra-Jones (HJ) test proposed by Hiemstra and Jones (1994) (Journal of Finance. 1994; 49(5): 1639-1664), they attempt to establish a central limit theorem (CLT) of their test statistic by applying the asymptotical property of multivariate U-statistic. However, Bai et al. (2016) (2016; arXiv: 1701.03992) revisit the HJ test and find that the test statistic given by HJ is NOT a function of U-statistics which implies that the CLT neither proposed by Hiemstra and Jones (1994) nor the one extended by Bai et al. (2010) is valid for statistical inference. In this paper, we re-estimate the probabilities and reestablish the CLT of the new test statistic. Numerical simulation shows that our new estimates are consistent and our new test performs decent size and power.
Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti
2016-07-01
A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J.; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T.; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti
2016-01-01
Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco Contacts: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153689
Problems with Multivariate Normality: Can the Multivariate Bootstrap Help?
ERIC Educational Resources Information Center
Thompson, Bruce
Multivariate normality is required for some statistical tests. This paper explores the implications of violating the assumption of multivariate normality and illustrates a graphical procedure for evaluating multivariate normality. The logic for using the multivariate bootstrap is presented. The multivariate bootstrap can be used when distribution…
McArtor, Daniel B.; Lubke, Gitta H.; Bergeman, C. S.
2017-01-01
Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains. PMID:27738957
McArtor, Daniel B; Lubke, Gitta H; Bergeman, C S
2017-12-01
Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains.
Applying the multivariate time-rescaling theorem to neural population models
Gerhard, Felipe; Haslinger, Robert; Pipa, Gordon
2011-01-01
Statistical models of neural activity are integral to modern neuroscience. Recently, interest has grown in modeling the spiking activity of populations of simultaneously recorded neurons to study the effects of correlations and functional connectivity on neural information processing. However any statistical model must be validated by an appropriate goodness-of-fit test. Kolmogorov-Smirnov tests based upon the time-rescaling theorem have proven to be useful for evaluating point-process-based statistical models of single-neuron spike trains. Here we discuss the extension of the time-rescaling theorem to the multivariate (neural population) case. We show that even in the presence of strong correlations between spike trains, models which neglect couplings between neurons can be erroneously passed by the univariate time-rescaling test. We present the multivariate version of the time-rescaling theorem, and provide a practical step-by-step procedure for applying it towards testing the sufficiency of neural population models. Using several simple analytically tractable models and also more complex simulated and real data sets, we demonstrate that important features of the population activity can only be detected using the multivariate extension of the test. PMID:21395436
Statistical analysis of multivariate atmospheric variables. [cloud cover
NASA Technical Reports Server (NTRS)
Tubbs, J. D.
1979-01-01
Topics covered include: (1) estimation in discrete multivariate distributions; (2) a procedure to predict cloud cover frequencies in the bivariate case; (3) a program to compute conditional bivariate normal parameters; (4) the transformation of nonnormal multivariate to near-normal; (5) test of fit for the extreme value distribution based upon the generalized minimum chi-square; (6) test of fit for continuous distributions based upon the generalized minimum chi-square; (7) effect of correlated observations on confidence sets based upon chi-square statistics; and (8) generation of random variates from specified distributions.
A Statistical Discrimination Experiment for Eurasian Events Using a Twenty-Seven-Station Network
1980-07-08
to test the effectiveness of a multivariate method of analysis for distinguishing earthquakes from explosions. The data base for the experiment...to test the effectiveness of a multivariate method of analysis for distinguishing earthquakes from explosions. The data base for the experiment...the weight assigned to each variable whenever a new one is added. Jennrich, R. I. (1977). Stepwise discriminant analysis , in Statistical Methods for
Taylor, Sandra L; Ruhaak, L Renee; Weiss, Robert H; Kelly, Karen; Kim, Kyoungmi
2017-01-01
High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact between-biospecimen correlation and multivariate analysis results. We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen. We provide R functions to implement and illustrate our method as supplementary information CONTACT: sltaylor@ucdavis.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F
2017-04-01
Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.
Multivariate assessment of event-related potentials with the t-CWT method.
Bostanov, Vladimir
2015-11-05
Event-related brain potentials (ERPs) are usually assessed with univariate statistical tests although they are essentially multivariate objects. Brain-computer interface applications are a notable exception to this practice, because they are based on multivariate classification of single-trial ERPs. Multivariate ERP assessment can be facilitated by feature extraction methods. One such method is t-CWT, a mathematical-statistical algorithm based on the continuous wavelet transform (CWT) and Student's t-test. This article begins with a geometric primer on some basic concepts of multivariate statistics as applied to ERP assessment in general and to the t-CWT method in particular. Further, it presents for the first time a detailed, step-by-step, formal mathematical description of the t-CWT algorithm. A new multivariate outlier rejection procedure based on principal component analysis in the frequency domain is presented as an important pre-processing step. The MATLAB and GNU Octave implementation of t-CWT is also made publicly available for the first time as free and open source code. The method is demonstrated on some example ERP data obtained in a passive oddball paradigm. Finally, some conceptually novel applications of the multivariate approach in general and of the t-CWT method in particular are suggested and discussed. Hopefully, the publication of both the t-CWT source code and its underlying mathematical algorithm along with a didactic geometric introduction to some basic concepts of multivariate statistics would make t-CWT more accessible to both users and developers in the field of neuroscience research.
Williams, L. Keoki; Buu, Anne
2017-01-01
We propose a multivariate genome-wide association test for mixed continuous, binary, and ordinal phenotypes. A latent response model is used to estimate the correlation between phenotypes with different measurement scales so that the empirical distribution of the Fisher’s combination statistic under the null hypothesis is estimated efficiently. The simulation study shows that our proposed correlation estimation methods have high levels of accuracy. More importantly, our approach conservatively estimates the variance of the test statistic so that the type I error rate is controlled. The simulation also shows that the proposed test maintains the power at the level very close to that of the ideal analysis based on known latent phenotypes while controlling the type I error. In contrast, conventional approaches–dichotomizing all observed phenotypes or treating them as continuous variables–could either reduce the power or employ a linear regression model unfit for the data. Furthermore, the statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that conducting a multivariate test on multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests. The proposed method also offers a new approach to analyzing the Fagerström Test for Nicotine Dependence as multivariate phenotypes in genome-wide association studies. PMID:28081206
The Effect of the Multivariate Box-Cox Transformation on the Power of MANOVA.
ERIC Educational Resources Information Center
Kirisci, Levent; Hsu, Tse-Chi
Most of the multivariate statistical techniques rely on the assumption of multivariate normality. The effects of non-normality on multivariate tests are assumed to be negligible when variance-covariance matrices and sample sizes are equal. Therefore, in practice, investigators do not usually attempt to remove non-normality. In this simulation…
1981-08-01
RATIO TEST STATISTIC FOR SPHERICITY OF COMPLEX MULTIVARIATE NORMAL DISTRIBUTION* C. Fang P. R. Krishnaiah B. N. Nagarsenker** August 1981 Technical...and their applications in time sEries, the reader is referred to Krishnaiah (1976). Motivated by the applications in the area of inference on multiple...for practical purposes. Here, we note that Krishnaiah , Lee and Chang (1976) approxi- mated the null distribution of certain power of the likeli
A multivariate model and statistical method for validating tree grade lumber yield equations
Donald W. Seegrist
1975-01-01
Lumber yields within lumber grades can be described by a multivariate linear model. A method for validating lumber yield prediction equations when there are several tree grades is presented. The method is based on multivariate simultaneous test procedures.
NONPARAMETRIC MANOVA APPROACHES FOR NON-NORMAL MULTIVARIATE OUTCOMES WITH MISSING VALUES
He, Fanyin; Mazumdar, Sati; Tang, Gong; Bhatia, Triptish; Anderson, Stewart J.; Dew, Mary Amanda; Krafty, Robert; Nimgaonkar, Vishwajit; Deshpande, Smita; Hall, Martica; Reynolds, Charles F.
2017-01-01
Between-group comparisons often entail many correlated response variables. The multivariate linear model, with its assumption of multivariate normality, is the accepted standard tool for these tests. When this assumption is violated, the nonparametric multivariate Kruskal-Wallis (MKW) test is frequently used. However, this test requires complete cases with no missing values in response variables. Deletion of cases with missing values likely leads to inefficient statistical inference. Here we extend the MKW test to retain information from partially-observed cases. Results of simulated studies and analysis of real data show that the proposed method provides adequate coverage and superior power to complete-case analyses. PMID:29416225
Some Tests of Randomness with Applications
1981-02-01
freedom. For further details, the reader is referred to Gnanadesikan (1977, p. 169) wherein other relevant tests are also given, Graphical tests, as...sample from a gamma distri- bution. J. Am. Statist. Assoc. 71, 480-7. Gnanadesikan , R. (1977). Methods for Statistical Data Analysis of Multivariate
Interpreting support vector machine models for multivariate group wise analysis in neuroimaging
Gaonkar, Bilwaj; Shinohara, Russell T; Davatzikos, Christos
2015-01-01
Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier’s decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification. PMID:26210913
Multiple Hypothesis Testing for Experimental Gingivitis Based on Wilcoxon Signed Rank Statistics
Preisser, John S.; Sen, Pranab K.; Offenbacher, Steven
2011-01-01
Dental research often involves repeated multivariate outcomes on a small number of subjects for which there is interest in identifying outcomes that exhibit change in their levels over time as well as to characterize the nature of that change. In particular, periodontal research often involves the analysis of molecular mediators of inflammation for which multivariate parametric methods are highly sensitive to outliers and deviations from Gaussian assumptions. In such settings, nonparametric methods may be favored over parametric ones. Additionally, there is a need for statistical methods that control an overall error rate for multiple hypothesis testing. We review univariate and multivariate nonparametric hypothesis tests and apply them to longitudinal data to assess changes over time in 31 biomarkers measured from the gingival crevicular fluid in 22 subjects whereby gingivitis was induced by temporarily withholding tooth brushing. To identify biomarkers that can be induced to change, multivariate Wilcoxon signed rank tests for a set of four summary measures based upon area under the curve are applied for each biomarker and compared to their univariate counterparts. Multiple hypothesis testing methods with choice of control of the false discovery rate or strong control of the family-wise error rate are examined. PMID:21984957
Spatial Dynamics and Determinants of County-Level Education Expenditure in China
ERIC Educational Resources Information Center
Gu, Jiafeng
2012-01-01
In this paper, a multivariate spatial autoregressive model of local public education expenditure determination with autoregressive disturbance is developed and estimated. The existence of spatial interdependence is tested using Moran's I statistic and Lagrange multiplier test statistics for both the spatial error and spatial lag models. The full…
Yang, James J; Williams, L Keoki; Buu, Anne
2017-08-24
A multivariate genome-wide association test is proposed for analyzing data on multivariate quantitative phenotypes collected from related subjects. The proposed method is a two-step approach. The first step models the association between the genotype and marginal phenotype using a linear mixed model. The second step uses the correlation between residuals of the linear mixed model to estimate the null distribution of the Fisher combination test statistic. The simulation results show that the proposed method controls the type I error rate and is more powerful than the marginal tests across different population structures (admixed or non-admixed) and relatedness (related or independent). The statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that applying the multivariate association test may facilitate identification of the pleiotropic genes contributing to the risk for alcohol dependence commonly expressed by four correlated phenotypes. This study proposes a multivariate method for identifying pleiotropic genes while adjusting for cryptic relatedness and population structure between subjects. The two-step approach is not only powerful but also computationally efficient even when the number of subjects and the number of phenotypes are both very large.
Zhi, Ruicong; Zhao, Lei; Xie, Nan; Wang, Houyin; Shi, Bolin; Shi, Jingye
2016-01-13
A framework of establishing standard reference scale (texture) is proposed by multivariate statistical analysis according to instrumental measurement and sensory evaluation. Multivariate statistical analysis is conducted to rapidly select typical reference samples with characteristics of universality, representativeness, stability, substitutability, and traceability. The reasonableness of the framework method is verified by establishing standard reference scale of texture attribute (hardness) with Chinese well-known food. More than 100 food products in 16 categories were tested using instrumental measurement (TPA test), and the result was analyzed with clustering analysis, principal component analysis, relative standard deviation, and analysis of variance. As a result, nine kinds of foods were determined to construct the hardness standard reference scale. The results indicate that the regression coefficient between the estimated sensory value and the instrumentally measured value is significant (R(2) = 0.9765), which fits well with Stevens's theory. The research provides reliable a theoretical basis and practical guide for quantitative standard reference scale establishment on food texture characteristics.
ERIC Educational Resources Information Center
SAW, J.G.
THIS PAPER DEALS WITH SOME TESTS OF HYPOTHESIS FREQUENTLY ENCOUNTERED IN THE ANALYSIS OF MULTIVARIATE DATA. THE TYPE OF HYPOTHESIS CONSIDERED IS THAT WHICH THE STATISTICIAN CAN ANSWER IN THE NEGATIVE OR AFFIRMATIVE. THE DOOLITTLE METHOD MAKES IT POSSIBLE TO EVALUATE THE DETERMINANT OF A MATRIX OF HIGH ORDER, TO SOLVE A MATRIX EQUATION, OR TO…
Gordon, Derek; Londono, Douglas; Patel, Payal; Kim, Wonkuk; Finch, Stephen J; Heiman, Gary A
2016-01-01
Our motivation here is to calculate the power of 3 statistical tests used when there are genetic traits that operate under a pleiotropic mode of inheritance and when qualitative phenotypes are defined by use of thresholds for the multiple quantitative phenotypes. Specifically, we formulate a multivariate function that provides the probability that an individual has a vector of specific quantitative trait values conditional on having a risk locus genotype, and we apply thresholds to define qualitative phenotypes (affected, unaffected) and compute penetrances and conditional genotype frequencies based on the multivariate function. We extend the analytic power and minimum-sample-size-necessary (MSSN) formulas for 2 categorical data-based tests (genotype, linear trend test [LTT]) of genetic association to the pleiotropic model. We further compare the MSSN of the genotype test and the LTT with that of a multivariate ANOVA (Pillai). We approximate the MSSN for statistics by linear models using a factorial design and ANOVA. With ANOVA decomposition, we determine which factors most significantly change the power/MSSN for all statistics. Finally, we determine which test statistics have the smallest MSSN. In this work, MSSN calculations are for 2 traits (bivariate distributions) only (for illustrative purposes). We note that the calculations may be extended to address any number of traits. Our key findings are that the genotype test usually has lower MSSN requirements than the LTT. More inclusive thresholds (top/bottom 25% vs. top/bottom 10%) have higher sample size requirements. The Pillai test has a much larger MSSN than both the genotype test and the LTT, as a result of sample selection. With these formulas, researchers can specify how many subjects they must collect to localize genes for pleiotropic phenotypes. © 2017 S. Karger AG, Basel.
Testing for significance of phase synchronisation dynamics in the EEG.
Daly, Ian; Sweeney-Reed, Catherine M; Nasuto, Slawomir J
2013-06-01
A number of tests exist to check for statistical significance of phase synchronisation within the Electroencephalogram (EEG); however, the majority suffer from a lack of generality and applicability. They may also fail to account for temporal dynamics in the phase synchronisation, regarding synchronisation as a constant state instead of a dynamical process. Therefore, a novel test is developed for identifying the statistical significance of phase synchronisation based upon a combination of work characterising temporal dynamics of multivariate time-series and Markov modelling. We show how this method is better able to assess the significance of phase synchronisation than a range of commonly used significance tests. We also show how the method may be applied to identify and classify significantly different phase synchronisation dynamics in both univariate and multivariate datasets.
Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Alejandro Q; Musolf, Anthony; Matise, Tara C; Finch, Stephen J; Gordon, Derek
2012-01-01
As with any new technology, next-generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to those data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single-variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p value, no matter how many loci. Copyright © 2013 S. Karger AG, Basel.
Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Andrew; Musolf, Anthony; Matise, Tara C.; Finch, Stephen J.; Gordon, Derek
2013-01-01
As with any new technology, next generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model, based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to that data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p-value, no matter how many loci. PMID:23594495
MGAS: a powerful tool for multivariate gene-based genome-wide association analysis.
Van der Sluis, Sophie; Dolan, Conor V; Li, Jiang; Song, Youqiang; Sham, Pak; Posthuma, Danielle; Li, Miao-Xin
2015-04-01
Standard genome-wide association studies, testing the association between one phenotype and a large number of single nucleotide polymorphisms (SNPs), are limited in two ways: (i) traits are often multivariate, and analysis of composite scores entails loss in statistical power and (ii) gene-based analyses may be preferred, e.g. to decrease the multiple testing problem. Here we present a new method, multivariate gene-based association test by extended Simes procedure (MGAS), that allows gene-based testing of multivariate phenotypes in unrelated individuals. Through extensive simulation, we show that under most trait-generating genotype-phenotype models MGAS has superior statistical power to detect associated genes compared with gene-based analyses of univariate phenotypic composite scores (i.e. GATES, multiple regression), and multivariate analysis of variance (MANOVA). Re-analysis of metabolic data revealed 32 False Discovery Rate controlled genome-wide significant genes, and 12 regions harboring multiple genes; of these 44 regions, 30 were not reported in the original analysis. MGAS allows researchers to conduct their multivariate gene-based analyses efficiently, and without the loss of power that is often associated with an incorrectly specified genotype-phenotype models. MGAS is freely available in KGG v3.0 (http://statgenpro.psychiatry.hku.hk/limx/kgg/download.php). Access to the metabolic dataset can be requested at dbGaP (https://dbgap.ncbi.nlm.nih.gov/). The R-simulation code is available from http://ctglab.nl/people/sophie_van_der_sluis. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Cardot, J-M; Roudier, B; Schütz, H
2017-07-01
The f 2 test is generally used for comparing dissolution profiles. In cases of high variability, the f 2 test is not applicable, and the Multivariate Statistical Distance (MSD) test is frequently proposed as an alternative by the FDA and EMA. The guidelines provide only general recommendations. MSD tests can be performed either on raw data with or without time as a variable or on parameters of models. In addition, data can be limited-as in the case of the f 2 test-to dissolutions of up to 85% or to all available data. In the context of the present paper, the recommended calculation included all raw dissolution data up to the first point greater than 85% as a variable-without the various times as parameters. The proposed MSD overcomes several drawbacks found in other methods.
SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *
Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.
2014-01-01
The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844
NASA Technical Reports Server (NTRS)
Crutcher, H. L.; Falls, L. W.
1976-01-01
Sets of experimentally determined or routinely observed data provide information about the past, present and, hopefully, future sets of similarly produced data. An infinite set of statistical models exists which may be used to describe the data sets. The normal distribution is one model. If it serves at all, it serves well. If a data set, or a transformation of the set, representative of a larger population can be described by the normal distribution, then valid statistical inferences can be drawn. There are several tests which may be applied to a data set to determine whether the univariate normal model adequately describes the set. The chi-square test based on Pearson's work in the late nineteenth and early twentieth centuries is often used. Like all tests, it has some weaknesses which are discussed in elementary texts. Extension of the chi-square test to the multivariate normal model is provided. Tables and graphs permit easier application of the test in the higher dimensions. Several examples, using recorded data, illustrate the procedures. Tests of maximum absolute differences, mean sum of squares of residuals, runs and changes of sign are included in these tests. Dimensions one through five with selected sample sizes 11 to 101 are used to illustrate the statistical tests developed.
Preliminary Multi-Variable Parametric Cost Model for Space Telescopes
NASA Technical Reports Server (NTRS)
Stahl, H. Philip; Hendrichs, Todd
2010-01-01
This slide presentation reviews creating a preliminary multi-variable cost model for the contract costs of making a space telescope. There is discussion of the methodology for collecting the data, definition of the statistical analysis methodology, single variable model results, testing of historical models and an introduction of the multi variable models.
Avalappampatty Sivasamy, Aneetha; Sundan, Bose
2015-01-01
The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668
Sivasamy, Aneetha Avalappampatty; Sundan, Bose
2015-01-01
The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.
Harnessing Multivariate Statistics for Ellipsoidal Data in Structural Geology
NASA Astrophysics Data System (ADS)
Roberts, N.; Davis, J. R.; Titus, S.; Tikoff, B.
2015-12-01
Most structural geology articles do not state significance levels, report confidence intervals, or perform regressions to find trends. This is, in part, because structural data tend to include directions, orientations, ellipsoids, and tensors, which are not treatable by elementary statistics. We describe a full procedural methodology for the statistical treatment of ellipsoidal data. We use a reconstructed dataset of deformed ooids in Maryland from Cloos (1947) to illustrate the process. Normalized ellipsoids have five degrees of freedom and can be represented by a second order tensor. This tensor can be permuted into a five dimensional vector that belongs to a vector space and can be treated with standard multivariate statistics. Cloos made several claims about the distribution of deformation in the South Mountain fold, Maryland, and we reexamine two particular claims using hypothesis testing: 1) octahedral shear strain increases towards the axial plane of the fold; 2) finite strain orientation varies systematically along the trend of the axial trace as it bends with the Appalachian orogen. We then test the null hypothesis that the southern segment of South Mountain is the same as the northern segment. This test illustrates the application of ellipsoidal statistics, which combine both orientation and shape. We report confidence intervals for each test, and graphically display our results with novel plots. This poster illustrates the importance of statistics in structural geology, especially when working with noisy or small datasets.
Robustness of Multiple Objective Decision Analysis Preference Functions
2002-06-01
p p′ : The probability of some event. ,i ip q : The probability of event . i Π : An aggregation of proportional data used in calculating a test ...statistical tests of the significance of the term and also is conducted in a multivariate framework rather than the ROSA univariate approach. A...residual error is ˆ−e = y y (45) The coefficient provides a ready indicator of the contribution for the associated variable and statistical tests
Extending local canonical correlation analysis to handle general linear contrasts for FMRI data.
Jin, Mingwu; Nandy, Rajesh; Curran, Tim; Cordes, Dietmar
2012-01-01
Local canonical correlation analysis (CCA) is a multivariate method that has been proposed to more accurately determine activation patterns in fMRI data. In its conventional formulation, CCA has several drawbacks that limit its usefulness in fMRI. A major drawback is that, unlike the general linear model (GLM), a test of general linear contrasts of the temporal regressors has not been incorporated into the CCA formalism. To overcome this drawback, a novel directional test statistic was derived using the equivalence of multivariate multiple regression (MVMR) and CCA. This extension will allow CCA to be used for inference of general linear contrasts in more complicated fMRI designs without reparameterization of the design matrix and without reestimating the CCA solutions for each particular contrast of interest. With the proper constraints on the spatial coefficients of CCA, this test statistic can yield a more powerful test on the inference of evoked brain regional activations from noisy fMRI data than the conventional t-test in the GLM. The quantitative results from simulated and pseudoreal data and activation maps from fMRI data were used to demonstrate the advantage of this novel test statistic.
Extending Local Canonical Correlation Analysis to Handle General Linear Contrasts for fMRI Data
Jin, Mingwu; Nandy, Rajesh; Curran, Tim; Cordes, Dietmar
2012-01-01
Local canonical correlation analysis (CCA) is a multivariate method that has been proposed to more accurately determine activation patterns in fMRI data. In its conventional formulation, CCA has several drawbacks that limit its usefulness in fMRI. A major drawback is that, unlike the general linear model (GLM), a test of general linear contrasts of the temporal regressors has not been incorporated into the CCA formalism. To overcome this drawback, a novel directional test statistic was derived using the equivalence of multivariate multiple regression (MVMR) and CCA. This extension will allow CCA to be used for inference of general linear contrasts in more complicated fMRI designs without reparameterization of the design matrix and without reestimating the CCA solutions for each particular contrast of interest. With the proper constraints on the spatial coefficients of CCA, this test statistic can yield a more powerful test on the inference of evoked brain regional activations from noisy fMRI data than the conventional t-test in the GLM. The quantitative results from simulated and pseudoreal data and activation maps from fMRI data were used to demonstrate the advantage of this novel test statistic. PMID:22461786
ERIC Educational Resources Information Center
Grasman, Raoul P. P. P.; Huizenga, Hilde M.; Geurts, Hilde M.
2010-01-01
Crawford and Howell (1998) have pointed out that the common practice of z-score inference on cognitive disability is inappropriate if a patient's performance on a task is compared with relatively few typical control individuals. Appropriate univariate and multivariate statistical tests have been proposed for these studies, but these are only valid…
Ringham, Brandy M; Kreidler, Sarah M; Muller, Keith E; Glueck, Deborah H
2016-07-30
Multilevel and longitudinal studies are frequently subject to missing data. For example, biomarker studies for oral cancer may involve multiple assays for each participant. Assays may fail, resulting in missing data values that can be assumed to be missing completely at random. Catellier and Muller proposed a data analytic technique to account for data missing at random in multilevel and longitudinal studies. They suggested modifying the degrees of freedom for both the Hotelling-Lawley trace F statistic and its null case reference distribution. We propose parallel adjustments to approximate power for this multivariate test in studies with missing data. The power approximations use a modified non-central F statistic, which is a function of (i) the expected number of complete cases, (ii) the expected number of non-missing pairs of responses, or (iii) the trimmed sample size, which is the planned sample size reduced by the anticipated proportion of missing data. The accuracy of the method is assessed by comparing the theoretical results to the Monte Carlo simulated power for the Catellier and Muller multivariate test. Over all experimental conditions, the closest approximation to the empirical power of the Catellier and Muller multivariate test is obtained by adjusting power calculations with the expected number of complete cases. The utility of the method is demonstrated with a multivariate power analysis for a hypothetical oral cancer biomarkers study. We describe how to implement the method using standard, commercially available software products and give example code. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Filipiak, Katarzyna; Klein, Daniel; Roy, Anuradha
2017-01-01
The problem of testing the separability of a covariance matrix against an unstructured variance-covariance matrix is studied in the context of multivariate repeated measures data using Rao's score test (RST). The RST statistic is developed with the first component of the separable structure as a first-order autoregressive (AR(1)) correlation matrix or an unstructured (UN) covariance matrix under the assumption of multivariate normality. It is shown that the distribution of the RST statistic under the null hypothesis of any separability does not depend on the true values of the mean or the unstructured components of the separable structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data, where the likelihood ratio test (LRT) cannot be used, and it outperforms the standard LRT in a number of contexts. Monte Carlo simulations are then used to study the comparative behavior of the null distribution of the RST statistic, as well as that of the LRT statistic, in terms of sample size considerations, and for the estimation of the empirical percentiles. Our findings are compared with existing results where the first component of the separable structure is a compound symmetry (CS) correlation matrix. It is also shown by simulations that the empirical null distribution of the RST statistic converges faster than the empirical null distribution of the LRT statistic to the limiting χ 2 distribution. The tests are implemented on a real dataset from medical studies. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Meta-analysis of gene-level associations for rare variants based on single-variant statistics.
Hu, Yi-Juan; Berndt, Sonja I; Gustafsson, Stefan; Ganna, Andrea; Hirschhorn, Joel; North, Kari E; Ingelsson, Erik; Lin, Dan-Yu
2013-08-08
Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants
Broadaway, K. Alaine; Cutler, David J.; Duncan, Richard; Moore, Jacob L.; Ware, Erin B.; Jhun, Min A.; Bielak, Lawrence F.; Zhao, Wei; Smith, Jennifer A.; Peyser, Patricia A.; Kardia, Sharon L.R.; Ghosh, Debashis; Epstein, Michael P.
2016-01-01
Increasing empirical evidence suggests that many genetic variants influence multiple distinct phenotypes. When cross-phenotype effects exist, multivariate association methods that consider pleiotropy are often more powerful than univariate methods that model each phenotype separately. Although several statistical approaches exist for testing cross-phenotype effects for common variants, there is a lack of similar tests for gene-based analysis of rare variants. In order to fill this important gap, we introduce a statistical method for cross-phenotype analysis of rare variants using a nonparametric distance-covariance approach that compares similarity in multivariate phenotypes to similarity in rare-variant genotypes across a gene. The approach can accommodate both binary and continuous phenotypes and further can adjust for covariates. Our approach yields a closed-form test whose significance can be evaluated analytically, thereby improving computational efficiency and permitting application on a genome-wide scale. We use simulated data to demonstrate that our method, which we refer to as the Gene Association with Multiple Traits (GAMuT) test, provides increased power over competing approaches. We also illustrate our approach using exome-chip data from the Genetic Epidemiology Network of Arteriopathy. PMID:26942286
Stamate, Mirela Cristina; Todor, Nicolae; Cosgarea, Marcel
2015-01-01
The clinical utility of otoacoustic emissions as a noninvasive objective test of cochlear function has been long studied. Both transient otoacoustic emissions and distorsion products can be used to identify hearing loss, but to what extent they can be used as predictors for hearing loss is still debated. Most studies agree that multivariate analyses have better test performances than univariate analyses. The aim of the study was to determine transient otoacoustic emissions and distorsion products performance in identifying normal and impaired hearing loss, using the pure tone audiogram as a gold standard procedure and different multivariate statistical approaches. The study included 105 adult subjects with normal hearing and hearing loss who underwent the same test battery: pure-tone audiometry, tympanometry, otoacoustic emission tests. We chose to use the logistic regression as a multivariate statistical technique. Three logistic regression models were developed to characterize the relations between different risk factors (age, sex, tinnitus, demographic features, cochlear status defined by otoacoustic emissions) and hearing status defined by pure-tone audiometry. The multivariate analyses allow the calculation of the logistic score, which is a combination of the inputs, weighted by coefficients, calculated within the analyses. The accuracy of each model was assessed using receiver operating characteristics curve analysis. We used the logistic score to generate receivers operating curves and to estimate the areas under the curves in order to compare different multivariate analyses. We compared the performance of each otoacoustic emission (transient, distorsion product) using three different multivariate analyses for each ear, when multi-frequency gold standards were used. We demonstrated that all multivariate analyses provided high values of the area under the curve proving the performance of the otoacoustic emissions. Each otoacoustic emission test presented high values of area under the curve, suggesting that implementing a multivariate approach to evaluate the performances of each otoacoustic emission test would serve to increase the accuracy in identifying the normal and impaired ears. We encountered the highest area under the curve value for the combined multivariate analysis suggesting that both otoacoustic emission tests should be used in assessing hearing status. Our multivariate analyses revealed that age is a constant predictor factor of the auditory status for both ears, but the presence of tinnitus was the most important predictor for the hearing level, only for the left ear. Age presented similar coefficients, but tinnitus coefficients, by their high value, produced the highest variations of the logistic scores, only for the left ear group, thus increasing the risk of hearing loss. We did not find gender differences between ears for any otoacoustic emission tests, but studies still debate this question as the results are contradictory. Neither gender, nor environment origin had any predictive value for the hearing status, according to the results of our study. Like any other audiological test, using otoacoustic emissions to identify hearing loss is not without error. Even when applying multivariate analysis, perfect test performance is never achieved. Although most studies demonstrated the benefit of using the multivariate analysis, it has not been incorporated into clinical decisions maybe because of the idiosyncratic nature of multivariate solutions or because of the lack of the validation studies.
STAMATE, MIRELA CRISTINA; TODOR, NICOLAE; COSGAREA, MARCEL
2015-01-01
Background and aim The clinical utility of otoacoustic emissions as a noninvasive objective test of cochlear function has been long studied. Both transient otoacoustic emissions and distorsion products can be used to identify hearing loss, but to what extent they can be used as predictors for hearing loss is still debated. Most studies agree that multivariate analyses have better test performances than univariate analyses. The aim of the study was to determine transient otoacoustic emissions and distorsion products performance in identifying normal and impaired hearing loss, using the pure tone audiogram as a gold standard procedure and different multivariate statistical approaches. Methods The study included 105 adult subjects with normal hearing and hearing loss who underwent the same test battery: pure-tone audiometry, tympanometry, otoacoustic emission tests. We chose to use the logistic regression as a multivariate statistical technique. Three logistic regression models were developed to characterize the relations between different risk factors (age, sex, tinnitus, demographic features, cochlear status defined by otoacoustic emissions) and hearing status defined by pure-tone audiometry. The multivariate analyses allow the calculation of the logistic score, which is a combination of the inputs, weighted by coefficients, calculated within the analyses. The accuracy of each model was assessed using receiver operating characteristics curve analysis. We used the logistic score to generate receivers operating curves and to estimate the areas under the curves in order to compare different multivariate analyses. Results We compared the performance of each otoacoustic emission (transient, distorsion product) using three different multivariate analyses for each ear, when multi-frequency gold standards were used. We demonstrated that all multivariate analyses provided high values of the area under the curve proving the performance of the otoacoustic emissions. Each otoacoustic emission test presented high values of area under the curve, suggesting that implementing a multivariate approach to evaluate the performances of each otoacoustic emission test would serve to increase the accuracy in identifying the normal and impaired ears. We encountered the highest area under the curve value for the combined multivariate analysis suggesting that both otoacoustic emission tests should be used in assessing hearing status. Our multivariate analyses revealed that age is a constant predictor factor of the auditory status for both ears, but the presence of tinnitus was the most important predictor for the hearing level, only for the left ear. Age presented similar coefficients, but tinnitus coefficients, by their high value, produced the highest variations of the logistic scores, only for the left ear group, thus increasing the risk of hearing loss. We did not find gender differences between ears for any otoacoustic emission tests, but studies still debate this question as the results are contradictory. Neither gender, nor environment origin had any predictive value for the hearing status, according to the results of our study. Conclusion Like any other audiological test, using otoacoustic emissions to identify hearing loss is not without error. Even when applying multivariate analysis, perfect test performance is never achieved. Although most studies demonstrated the benefit of using the multivariate analysis, it has not been incorporated into clinical decisions maybe because of the idiosyncratic nature of multivariate solutions or because of the lack of the validation studies. PMID:26733749
Quantifying the impact of between-study heterogeneity in multivariate meta-analyses
Jackson, Dan; White, Ian R; Riley, Richard D
2012-01-01
Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950
NASA Astrophysics Data System (ADS)
Sneath, P. H. A.
A BASIC program is presented for significance tests to determine whether a dendrogram is derived from clustering of points that belong to a single multivariate normal distribution. The significance tests are based on statistics of the Kolmogorov—Smirnov type, obtained by comparing the observed cumulative graph of branch levels with a graph for the hypothesis of multivariate normality. The program also permits testing whether the dendrogram could be from a cluster of lower dimensionality due to character correlations. The program makes provision for three similarity coefficients, (1) Euclidean distances, (2) squared Euclidean distances, and (3) Simple Matching Coefficients, and for five cluster methods (1) WPGMA, (2) UPGMA, (3) Single Linkage (or Minimum Spanning Trees), (4) Complete Linkage, and (5) Ward's Increase in Sums of Squares. The program is entitled DENBRAN.
Friedman, David B
2012-01-01
All quantitative proteomics experiments measure variation between samples. When performing large-scale experiments that involve multiple conditions or treatments, the experimental design should include the appropriate number of individual biological replicates from each condition to enable the distinction between a relevant biological signal from technical noise. Multivariate statistical analyses, such as principal component analysis (PCA), provide a global perspective on experimental variation, thereby enabling the assessment of whether the variation describes the expected biological signal or the unanticipated technical/biological noise inherent in the system. Examples will be shown from high-resolution multivariable DIGE experiments where PCA was instrumental in demonstrating biologically significant variation as well as sample outliers, fouled samples, and overriding technical variation that would not be readily observed using standard univariate tests.
MIDAS: Regionally linear multivariate discriminative statistical mapping.
Varol, Erdem; Sotiras, Aristeidis; Davatzikos, Christos
2018-07-01
Statistical parametric maps formed via voxel-wise mass-univariate tests, such as the general linear model, are commonly used to test hypotheses about regionally specific effects in neuroimaging cross-sectional studies where each subject is represented by a single image. Despite being informative, these techniques remain limited as they ignore multivariate relationships in the data. Most importantly, the commonly employed local Gaussian smoothing, which is important for accounting for registration errors and making the data follow Gaussian distributions, is usually chosen in an ad hoc fashion. Thus, it is often suboptimal for the task of detecting group differences and correlations with non-imaging variables. Information mapping techniques, such as searchlight, which use pattern classifiers to exploit multivariate information and obtain more powerful statistical maps, have become increasingly popular in recent years. However, existing methods may lead to important interpretation errors in practice (i.e., misidentifying a cluster as informative, or failing to detect truly informative voxels), while often being computationally expensive. To address these issues, we introduce a novel efficient multivariate statistical framework for cross-sectional studies, termed MIDAS, seeking highly sensitive and specific voxel-wise brain maps, while leveraging the power of regional discriminant analysis. In MIDAS, locally linear discriminative learning is applied to estimate the pattern that best discriminates between two groups, or predicts a variable of interest. This pattern is equivalent to local filtering by an optimal kernel whose coefficients are the weights of the linear discriminant. By composing information from all neighborhoods that contain a given voxel, MIDAS produces a statistic that collectively reflects the contribution of the voxel to the regional classifiers as well as the discriminative power of the classifiers. Critically, MIDAS efficiently assesses the statistical significance of the derived statistic by analytically approximating its null distribution without the need for computationally expensive permutation tests. The proposed framework was extensively validated using simulated atrophy in structural magnetic resonance imaging (MRI) and further tested using data from a task-based functional MRI study as well as a structural MRI study of cognitive performance. The performance of the proposed framework was evaluated against standard voxel-wise general linear models and other information mapping methods. The experimental results showed that MIDAS achieves relatively higher sensitivity and specificity in detecting group differences. Together, our results demonstrate the potential of the proposed approach to efficiently map effects of interest in both structural and functional data. Copyright © 2018. Published by Elsevier Inc.
Demanuele, Charmaine; Bähner, Florian; Plichta, Michael M; Kirsch, Peter; Tost, Heike; Meyer-Lindenberg, Andreas; Durstewitz, Daniel
2015-01-01
Multivariate pattern analysis can reveal new information from neuroimaging data to illuminate human cognition and its disturbances. Here, we develop a methodological approach, based on multivariate statistical/machine learning and time series analysis, to discern cognitive processing stages from functional magnetic resonance imaging (fMRI) blood oxygenation level dependent (BOLD) time series. We apply this method to data recorded from a group of healthy adults whilst performing a virtual reality version of the delayed win-shift radial arm maze (RAM) task. This task has been frequently used to study working memory and decision making in rodents. Using linear classifiers and multivariate test statistics in conjunction with time series bootstraps, we show that different cognitive stages of the task, as defined by the experimenter, namely, the encoding/retrieval, choice, reward and delay stages, can be statistically discriminated from the BOLD time series in brain areas relevant for decision making and working memory. Discrimination of these task stages was significantly reduced during poor behavioral performance in dorsolateral prefrontal cortex (DLPFC), but not in the primary visual cortex (V1). Experimenter-defined dissection of time series into class labels based on task structure was confirmed by an unsupervised, bottom-up approach based on Hidden Markov Models. Furthermore, we show that different groupings of recorded time points into cognitive event classes can be used to test hypotheses about the specific cognitive role of a given brain region during task execution. We found that whilst the DLPFC strongly differentiated between task stages associated with different memory loads, but not between different visual-spatial aspects, the reverse was true for V1. Our methodology illustrates how different aspects of cognitive information processing during one and the same task can be separated and attributed to specific brain regions based on information contained in multivariate patterns of voxel activity.
Performance of the S - [chi][squared] Statistic for Full-Information Bifactor Models
ERIC Educational Resources Information Center
Li, Ying; Rupp, Andre A.
2011-01-01
This study investigated the Type I error rate and power of the multivariate extension of the S - [chi][squared] statistic using unidimensional and multidimensional item response theory (UIRT and MIRT, respectively) models as well as full-information bifactor (FI-bifactor) models through simulation. Manipulated factors included test length, sample…
Exploring the Replicability of a Study's Results: Bootstrap Statistics for the Multivariate Case.
ERIC Educational Resources Information Center
Thompson, Bruce
Conventional statistical significance tests do not inform the researcher regarding the likelihood that results will replicate. One strategy for evaluating result replication is to use a "bootstrap" resampling of a study's data so that the stability of results across numerous configurations of the subjects can be explored. This paper…
Conceptual and statistical problems associated with the use of diversity indices in ecology.
Barrantes, Gilbert; Sandoval, Luis
2009-09-01
Diversity indices, particularly the Shannon-Wiener index, have extensively been used in analyzing patterns of diversity at different geographic and ecological scales. These indices have serious conceptual and statistical problems which make comparisons of species richness or species abundances across communities nearly impossible. There is often no a single statistical method that retains all information needed to answer even a simple question. However, multivariate analyses could be used instead of diversity indices, such as cluster analyses or multiple regressions. More complex multivariate analyses, such as Canonical Correspondence Analysis, provide very valuable information on environmental variables associated to the presence and abundance of the species in a community. In addition, particular hypotheses associated to changes in species richness across localities, or change in abundance of one, or a group of species can be tested using univariate, bivariate, and/or rarefaction statistical tests. The rarefaction method has proved to be robust to standardize all samples to a common size. Even the simplest method as reporting the number of species per taxonomic category possibly provides more information than a diversity index value.
Modified Distribution-Free Goodness-of-Fit Test Statistic.
Chun, So Yeon; Browne, Michael W; Shapiro, Alexander
2018-03-01
Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.
Multivariate evoked response detection based on the spectral F-test.
Rocha, Paulo Fábio F; Felix, Leonardo B; Miranda de Sá, Antonio Mauricio F L; Mendes, Eduardo M A M
2016-05-01
Objective response detection techniques, such as magnitude square coherence, component synchrony measure, and the spectral F-test, have been used to automate the detection of evoked responses. The performance of these detectors depends on both the signal-to-noise ratio (SNR) and the length of the electroencephalogram (EEG) signal. Recently, multivariate detectors were developed to increase the detection rate even in the case of a low signal-to-noise ratio or of short data records originated from EEG signals. In this context, an extension to the multivariate case of the spectral F-test detector is proposed. The performance of this technique is assessed using Monte Carlo. As an example, EEG data from 12 subjects during photic stimulation is used to demonstrate the usefulness of the proposed detector. The multivariate method showed detection rates consistently higher than those ones when only one signal was used. It is shown that the response detection in EEG signals with the multivariate technique was statistically significant if two or more EEG derivations were used. Copyright © 2016 Elsevier B.V. All rights reserved.
A power analysis for multivariate tests of temporal trend in species composition.
Irvine, Kathryn M; Dinger, Eric C; Sarr, Daniel
2011-10-01
Long-term monitoring programs emphasize power analysis as a tool to determine the sampling effort necessary to effectively document ecologically significant changes in ecosystems. Programs that monitor entire multispecies assemblages require a method for determining the power of multivariate statistical models to detect trend. We provide a method to simulate presence-absence species assemblage data that are consistent with increasing or decreasing directional change in species composition within multiple sites. This step is the foundation for using Monte Carlo methods to approximate the power of any multivariate method for detecting temporal trends. We focus on comparing the power of the Mantel test, permutational multivariate analysis of variance, and constrained analysis of principal coordinates. We find that the power of the various methods we investigate is sensitive to the number of species in the community, univariate species patterns, and the number of sites sampled over time. For increasing directional change scenarios, constrained analysis of principal coordinates was as or more powerful than permutational multivariate analysis of variance, the Mantel test was the least powerful. However, in our investigation of decreasing directional change, the Mantel test was typically as or more powerful than the other models.
Calypso: a user-friendly web-server for mining and visualizing microbiome-environment interactions.
Zakrzewski, Martha; Proietti, Carla; Ellis, Jonathan J; Hasan, Shihab; Brion, Marie-Jo; Berger, Bernard; Krause, Lutz
2017-03-01
Calypso is an easy-to-use online software suite that allows non-expert users to mine, interpret and compare taxonomic information from metagenomic or 16S rDNA datasets. Calypso has a focus on multivariate statistical approaches that can identify complex environment-microbiome associations. The software enables quantitative visualizations, statistical testing, multivariate analysis, supervised learning, factor analysis, multivariable regression, network analysis and diversity estimates. Comprehensive help pages, tutorials and videos are provided via a wiki page. The web-interface is accessible via http://cgenome.net/calypso/ . The software is programmed in Java, PERL and R and the source code is available from Zenodo ( https://zenodo.org/record/50931 ). The software is freely available for non-commercial users. l.krause@uq.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Multivariate meta-analysis: a robust approach based on the theory of U-statistic.
Ma, Yan; Mazumdar, Madhu
2011-10-30
Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.
Holmes, Susan; Alekseyenko, Alexander; Timme, Alden; Nelson, Tyrrell; Pasricha, Pankaj Jay; Spormann, Alfred
2011-01-01
This article explains the statistical and computational methodology used to analyze species abundances collected using the LNBL Phylochip in a study of Irritable Bowel Syndrome (IBS) in rats. Some tools already available for the analysis of ordinary microarray data are useful in this type of statistical analysis. For instance in correcting for multiple testing we use Family Wise Error rate control and step-down tests (available in the multtest package). Once the most significant species are chosen we use the hypergeometric tests familiar for testing GO categories to test specific phyla and families. We provide examples of normalization, multivariate projections, batch effect detection and integration of phylogenetic covariation, as well as tree equalization and robustification methods.
Estimating and Testing the Sources of Evoked Potentials in the Brain.
ERIC Educational Resources Information Center
Huizenga, Hilde M.; Molenaar, Peter C. M.
1994-01-01
The source of an event-related brain potential (ERP) is estimated from multivariate measures of ERP on the head under several mathematical and physical constraints on the parameters of the source model. Statistical aspects of estimation are discussed, and new tests are proposed. (SLD)
TATES: Efficient Multivariate Genotype-Phenotype Analysis for Genome-Wide Association Studies
van der Sluis, Sophie; Posthuma, Danielle; Dolan, Conor V.
2013-01-01
To date, the genome-wide association study (GWAS) is the primary tool to identify genetic variants that cause phenotypic variation. As GWAS analyses are generally univariate in nature, multivariate phenotypic information is usually reduced to a single composite score. This practice often results in loss of statistical power to detect causal variants. Multivariate genotype–phenotype methods do exist but attain maximal power only in special circumstances. Here, we present a new multivariate method that we refer to as TATES (Trait-based Association Test that uses Extended Simes procedure), inspired by the GATES procedure proposed by Li et al (2011). For each component of a multivariate trait, TATES combines p-values obtained in standard univariate GWAS to acquire one trait-based p-value, while correcting for correlations between components. Extensive simulations, probing a wide variety of genotype–phenotype models, show that TATES's false positive rate is correct, and that TATES's statistical power to detect causal variants explaining 0.5% of the variance can be 2.5–9 times higher than the power of univariate tests based on composite scores and 1.5–2 times higher than the power of the standard MANOVA. Unlike other multivariate methods, TATES detects both genetic variants that are common to multiple phenotypes and genetic variants that are specific to a single phenotype, i.e. TATES provides a more complete view of the genetic architecture of complex traits. As the actual causal genotype–phenotype model is usually unknown and probably phenotypically and genetically complex, TATES, available as an open source program, constitutes a powerful new multivariate strategy that allows researchers to identify novel causal variants, while the complexity of traits is no longer a limiting factor. PMID:23359524
Multivariate analysis of cytokine profiles in pregnancy complications.
Azizieh, Fawaz; Dingle, Kamaludin; Raghupathy, Raj; Johnson, Kjell; VanderPlas, Jacob; Ansari, Ali
2018-03-01
The immunoregulation to tolerate the semiallogeneic fetus during pregnancy includes a harmonious dynamic balance between anti- and pro-inflammatory cytokines. Several earlier studies reported significantly different levels and/or ratios of several cytokines in complicated pregnancy as compared to normal pregnancy. However, as cytokines operate in networks with potentially complex interactions, it is also interesting to compare groups with multi-cytokine data sets, with multivariate analysis. Such analysis will further examine how great the differences are, and which cytokines are more different than others. Various multivariate statistical tools, such as Cramer test, classification and regression trees, partial least squares regression figures, 2-dimensional Kolmogorov-Smirmov test, principal component analysis and gap statistic, were used to compare cytokine data of normal vs anomalous groups of different pregnancy complications. Multivariate analysis assisted in examining if the groups were different, how strongly they differed, in what ways they differed and further reported evidence for subgroups in 1 group (pregnancy-induced hypertension), possibly indicating multiple causes for the complication. This work contributes to a better understanding of cytokines interaction and may have important implications on targeting cytokine balance modulation or design of future medications or interventions that best direct management or prevention from an immunological approach. © 2018 The Authors. American Journal of Reproductive Immunology Published by John Wiley & Sons Ltd.
1983-06-16
has been advocated by Gnanadesikan and ilk (1969), and others in the literature. This suggests that, if we use the formal signficance test type...American Statistical Asso., 62, 1159-1178. Gnanadesikan , R., and Wilk, M..B. (1969). Data Analytic Methods in Multi- variate Statistical Analysis. In
Velasco-Tapia, Fernando
2014-01-01
Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC) volcanic range (Mexican Volcanic Belt). In this locality, the volcanic activity (3.7 to 0.5 Ma) was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward's linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas) in the comingled lavas (binary mixtures).
Adams, Dean C
2014-09-01
Phylogenetic signal is the tendency for closely related species to display similar trait values due to their common ancestry. Several methods have been developed for quantifying phylogenetic signal in univariate traits and for sets of traits treated simultaneously, and the statistical properties of these approaches have been extensively studied. However, methods for assessing phylogenetic signal in high-dimensional multivariate traits like shape are less well developed, and their statistical performance is not well characterized. In this article, I describe a generalization of the K statistic of Blomberg et al. that is useful for quantifying and evaluating phylogenetic signal in highly dimensional multivariate data. The method (K(mult)) is found from the equivalency between statistical methods based on covariance matrices and those based on distance matrices. Using computer simulations based on Brownian motion, I demonstrate that the expected value of K(mult) remains at 1.0 as trait variation among species is increased or decreased, and as the number of trait dimensions is increased. By contrast, estimates of phylogenetic signal found with a squared-change parsimony procedure for multivariate data change with increasing trait variation among species and with increasing numbers of trait dimensions, confounding biological interpretations. I also evaluate the statistical performance of hypothesis testing procedures based on K(mult) and find that the method displays appropriate Type I error and high statistical power for detecting phylogenetic signal in high-dimensional data. Statistical properties of K(mult) were consistent for simulations using bifurcating and random phylogenies, for simulations using different numbers of species, for simulations that varied the number of trait dimensions, and for different underlying models of trait covariance structure. Overall these findings demonstrate that K(mult) provides a useful means of evaluating phylogenetic signal in high-dimensional multivariate traits. Finally, I illustrate the utility of the new approach by evaluating the strength of phylogenetic signal for head shape in a lineage of Plethodon salamanders. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Karr, Justin E; Garcia-Barrera, Mauricio A; Holdnack, James A; Iverson, Grant L
2018-01-01
Multivariate base rates allow for the simultaneous statistical interpretation of multiple test scores, quantifying the normal frequency of low scores on a test battery. This study provides multivariate base rates for the Delis-Kaplan Executive Function System (D-KEFS). The D-KEFS consists of 9 tests with 16 Total Achievement scores (i.e. primary indicators of executive function ability). Stratified by education and intelligence, multivariate base rates were derived for the full D-KEFS and an abbreviated four-test battery (i.e. Trail Making, Color-Word Interference, Verbal Fluency, and Tower Test) using the adult portion of the normative sample (ages 16-89). Multivariate base rates are provided for the full and four-test D-KEFS batteries, calculated using five low score cutoffs (i.e. ≤25th, 16th, 9th, 5th, and 2nd percentiles). Low scores occurred commonly among the D-KEFS normative sample, with 82.6 and 71.8% of participants obtaining at least one score ≤16th percentile for the full and four-test batteries, respectively. Intelligence and education were inversely related to low score frequency. The base rates provided herein allow clinicians to interpret multiple D-KEFS scores simultaneously for the full D-KEFS and an abbreviated battery of commonly administered tests. The use of these base rates will support clinicians when differentiating between normal variations in cognitive performance and true executive function deficits.
Testing for Granger Causality in the Frequency Domain: A Phase Resampling Method.
Liu, Siwei; Molenaar, Peter
2016-01-01
This article introduces phase resampling, an existing but rarely used surrogate data method for making statistical inferences of Granger causality in frequency domain time series analysis. Granger causality testing is essential for establishing causal relations among variables in multivariate dynamic processes. However, testing for Granger causality in the frequency domain is challenging due to the nonlinear relation between frequency domain measures (e.g., partial directed coherence, generalized partial directed coherence) and time domain data. Through a simulation study, we demonstrate that phase resampling is a general and robust method for making statistical inferences even with short time series. With Gaussian data, phase resampling yields satisfactory type I and type II error rates in all but one condition we examine: when a small effect size is combined with an insufficient number of data points. Violations of normality lead to slightly higher error rates but are mostly within acceptable ranges. We illustrate the utility of phase resampling with two empirical examples involving multivariate electroencephalography (EEG) and skin conductance data.
Del Giudice, G; Padulano, R; Siciliano, D
2016-01-01
The lack of geometrical and hydraulic information about sewer networks often excludes the adoption of in-deep modeling tools to obtain prioritization strategies for funds management. The present paper describes a novel statistical procedure for defining the prioritization scheme for preventive maintenance strategies based on a small sample of failure data collected by the Sewer Office of the Municipality of Naples (IT). Novelty issues involve, among others, considering sewer parameters as continuous statistical variables and accounting for their interdependences. After a statistical analysis of maintenance interventions, the most important available factors affecting the process are selected and their mutual correlations identified. Then, after a Box-Cox transformation of the original variables, a methodology is provided for the evaluation of a vulnerability map of the sewer network by adopting a joint multivariate normal distribution with different parameter sets. The goodness-of-fit is eventually tested for each distribution by means of a multivariate plotting position. The developed methodology is expected to assist municipal engineers in identifying critical sewers, prioritizing sewer inspections in order to fulfill rehabilitation requirements.
Mathematical background and attitudes toward statistics in a sample of Spanish college students.
Carmona, José; Martínez, Rafael J; Sánchez, Manuel
2005-08-01
To examine the relation of mathematical background and initial attitudes toward statistics of Spanish college students in social sciences the Survey of Attitudes Toward Statistics was given to 827 students. Multivariate analyses tested the effects of two indicators of mathematical background (amount of exposure and achievement in previous courses) on the four subscales. Analysis suggested grades in previous courses are more related to initial attitudes toward statistics than the number of mathematics courses taken. Mathematical background was related with students' affective responses to statistics but not with their valuing of statistics. Implications of possible research are discussed.
MANCOVA for one way classification with homogeneity of regression coefficient vectors
NASA Astrophysics Data System (ADS)
Mokesh Rayalu, G.; Ravisankar, J.; Mythili, G. Y.
2017-11-01
The MANOVA and MANCOVA are the extensions of the univariate ANOVA and ANCOVA techniques to multidimensional or vector valued observations. The assumption of a Gaussian distribution has been replaced with the Multivariate Gaussian distribution for the vectors data and residual term variables in the statistical models of these techniques. The objective of MANCOVA is to determine if there are statistically reliable mean differences that can be demonstrated between groups later modifying the newly created variable. When randomization assignment of samples or subjects to groups is not possible, multivariate analysis of covariance (MANCOVA) provides statistical matching of groups by adjusting dependent variables as if all subjects scored the same on the covariates. In this research article, an extension has been made to the MANCOVA technique with more number of covariates and homogeneity of regression coefficient vectors is also tested.
Prolonged instability prior to a regime shift
Spanbauer, Trisha; Allen, Craig R.; Angeler, David G.; Eason, Tarsha; Fritz, Sherilyn C.; Garmestani, Ahjond S.; Nash, Kirsty L.; Stone, Jeffery R.
2014-01-01
Regime shifts are generally defined as the point of ‘abrupt’ change in the state of a system. However, a seemingly abrupt transition can be the product of a system reorganization that has been ongoing much longer than is evident in statistical analysis of a single component of the system. Using both univariate and multivariate statistical methods, we tested a long-term high-resolution paleoecological dataset with a known change in species assemblage for a regime shift. Analysis of this dataset with Fisher Information and multivariate time series modeling showed that there was a∼2000 year period of instability prior to the regime shift. This period of instability and the subsequent regime shift coincide with regional climate change, indicating that the system is undergoing extrinsic forcing. Paleoecological records offer a unique opportunity to test tools for the detection of thresholds and stable-states, and thus to examine the long-term stability of ecosystems over periods of multiple millennia.
2013-01-01
Background Cognitive complaints are reported frequently after breast cancer treatments. Their association with neuropsychological (NP) test performance is not well-established. Methods Early-stage, posttreatment breast cancer patients were enrolled in a prospective, longitudinal, cohort study prior to starting endocrine therapy. Evaluation included an NP test battery and self-report questionnaires assessing symptoms, including cognitive complaints. Multivariable regression models assessed associations among cognitive complaints, mood, treatment exposures, and NP test performance. Results One hundred eighty-nine breast cancer patients, aged 21–65 years, completed the evaluation; 23.3% endorsed higher memory complaints and 19.0% reported higher executive function complaints (>1 SD above the mean for healthy control sample). Regression modeling demonstrated a statistically significant association of higher memory complaints with combined chemotherapy and radiation treatments (P = .01), poorer NP verbal memory performance (P = .02), and higher depressive symptoms (P < .001), controlling for age and IQ. For executive functioning complaints, multivariable modeling controlling for age, IQ, and other confounds demonstrated statistically significant associations with better NP visual memory performance (P = .03) and higher depressive symptoms (P < .001), whereas combined chemotherapy and radiation treatment (P = .05) approached statistical significance. Conclusions About one in five post–adjuvant treatment breast cancer patients had elevated memory and/or executive function complaints that were statistically significantly associated with domain-specific NP test performances and depressive symptoms; combined chemotherapy and radiation treatment was also statistically significantly associated with memory complaints. These results and other emerging studies suggest that subjective cognitive complaints in part reflect objective NP performance, although their etiology and biology appear to be multifactorial, motivating further transdisciplinary research. PMID:23606729
Use of the Analysis of the Volatile Faecal Metabolome in Screening for Colorectal Cancer
2015-01-01
Diagnosis of colorectal cancer is an invasive and expensive colonoscopy, which is usually carried out after a positive screening test. Unfortunately, existing screening tests lack specificity and sensitivity, hence many unnecessary colonoscopies are performed. Here we report on a potential new screening test for colorectal cancer based on the analysis of volatile organic compounds (VOCs) in the headspace of faecal samples. Faecal samples were obtained from subjects who had a positive faecal occult blood sample (FOBT). Subjects subsequently had colonoscopies performed to classify them into low risk (non-cancer) and high risk (colorectal cancer) groups. Volatile organic compounds were analysed by selected ion flow tube mass spectrometry (SIFT-MS) and then data were analysed using both univariate and multivariate statistical methods. Ions most likely from hydrogen sulphide, dimethyl sulphide and dimethyl disulphide are statistically significantly higher in samples from high risk rather than low risk subjects. Results using multivariate methods show that the test gives a correct classification of 75% with 78% specificity and 72% sensitivity on FOBT positive samples, offering a potentially effective alternative to FOBT. PMID:26086914
Moseson, Heidi; Gerdts, Caitlin; Dehlendorf, Christine; Hiatt, Robert A; Vittinghoff, Eric
2017-12-21
The list experiment is a promising measurement tool for eliciting truthful responses to stigmatized or sensitive health behaviors. However, investigators may be hesitant to adopt the method due to previously untestable assumptions and the perceived inability to conduct multivariable analysis. With a recently developed statistical test that can detect the presence of a design effect - the absence of which is a central assumption of the list experiment method - we sought to test the validity of a list experiment conducted on self-reported abortion in Liberia. We also aim to introduce recently developed multivariable regression estimators for the analysis of list experiment data, to explore relationships between respondent characteristics and having had an abortion - an important component of understanding the experiences of women who have abortions. To test the null hypothesis of no design effect in the Liberian list experiment data, we calculated the percentage of each respondent "type," characterized by response to the control items, and compared these percentages across treatment and control groups with a Bonferroni-adjusted alpha criterion. We then implemented two least squares and two maximum likelihood models (four total), each representing different bias-variance trade-offs, to estimate the association between respondent characteristics and abortion. We find no clear evidence of a design effect in list experiment data from Liberia (p = 0.18), affirming the first key assumption of the method. Multivariable analyses suggest a negative association between education and history of abortion. The retrospective nature of measuring lifetime experience of abortion, however, complicates interpretation of results, as the timing and safety of a respondent's abortion may have influenced her ability to pursue an education. Our work demonstrates that multivariable analyses, as well as statistical testing of a key design assumption, are possible with list experiment data, although with important limitations when considering lifetime measures. We outline how to implement this methodology with list experiment data in future research.
Normalization methods in time series of platelet function assays
Van Poucke, Sven; Zhang, Zhongheng; Roest, Mark; Vukicevic, Milan; Beran, Maud; Lauwereins, Bart; Zheng, Ming-Hua; Henskens, Yvonne; Lancé, Marcus; Marcus, Abraham
2016-01-01
Abstract Platelet function can be quantitatively assessed by specific assays such as light-transmission aggregometry, multiple-electrode aggregometry measuring the response to adenosine diphosphate (ADP), arachidonic acid, collagen, and thrombin-receptor activating peptide and viscoelastic tests such as rotational thromboelastometry (ROTEM). The task of extracting meaningful statistical and clinical information from high-dimensional data spaces in temporal multivariate clinical data represented in multivariate time series is complex. Building insightful visualizations for multivariate time series demands adequate usage of normalization techniques. In this article, various methods for data normalization (z-transformation, range transformation, proportion transformation, and interquartile range) are presented and visualized discussing the most suited approach for platelet function data series. Normalization was calculated per assay (test) for all time points and per time point for all tests. Interquartile range, range transformation, and z-transformation demonstrated the correlation as calculated by the Spearman correlation test, when normalized per assay (test) for all time points. When normalizing per time point for all tests, no correlation could be abstracted from the charts as was the case when using all data as 1 dataset for normalization. PMID:27428217
A multi-analyte serum test for the detection of non-small cell lung cancer
Farlow, E C; Vercillo, M S; Coon, J S; Basu, S; Kim, A W; Faber, L P; Warren, W H; Bonomi, P; Liptay, M J; Borgia, J A
2010-01-01
Background: In this study, we appraised a wide assortment of biomarkers previously shown to have diagnostic or prognostic value for non-small cell lung cancer (NSCLC) with the intent of establishing a multi-analyte serum test capable of identifying patients with lung cancer. Methods: Circulating levels of 47 biomarkers were evaluated against patient cohorts consisting of 90 NSCLC and 43 non-cancer controls using commercial immunoassays. Multivariate statistical methods were used on all biomarkers achieving statistical relevance to define an optimised panel of diagnostic biomarkers for NSCLC. The resulting biomarkers were fashioned into a classification algorithm and validated against serum from a second patient cohort. Results: A total of 14 analytes achieved statistical relevance upon evaluation. Multivariate statistical methods then identified a panel of six biomarkers (tumour necrosis factor-α, CYFRA 21-1, interleukin-1ra, matrix metalloproteinase-2, monocyte chemotactic protein-1 and sE-selectin) as being the most efficacious for diagnosing early stage NSCLC. When tested against a second patient cohort, the panel successfully classified 75 of 88 patients. Conclusions: Here, we report the development of a serum algorithm with high specificity for classifying patients with NSCLC against cohorts of various ‘high-risk' individuals. A high rate of false positives was observed within the cohort in which patients had non-neoplastic lung nodules, possibly as a consequence of the inflammatory nature of these conditions. PMID:20859284
Velasco-Tapia, Fernando
2014-01-01
Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC) volcanic range (Mexican Volcanic Belt). In this locality, the volcanic activity (3.7 to 0.5 Ma) was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward's linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas) in the comingled lavas (binary mixtures). PMID:24737994
NASA Technical Reports Server (NTRS)
Morrissey, L. A.; Weinstock, K. J.; Mouat, D. A.; Card, D. H.
1984-01-01
An evaluation of Thematic Mapper Simulator (TMS) data for the geobotanical discrimination of rock types based on vegetative cover characteristics is addressed in this research. A methodology for accomplishing this evaluation utilizing univariate and multivariate techniques is presented. TMS data acquired with a Daedalus DEI-1260 multispectral scanner were integrated with vegetation and geologic information for subsequent statistical analyses, which included a chi-square test, an analysis of variance, stepwise discriminant analysis, and Duncan's multiple range test. Results indicate that ultramafic rock types are spectrally separable from nonultramafics based on vegetative cover through the use of statistical analyses.
NASA Astrophysics Data System (ADS)
Coelho, Carlos A.; Marques, Filipe J.
2013-09-01
In this paper the authors combine the equicorrelation and equivariance test introduced by Wilks [13] with the likelihood ratio test (l.r.t.) for independence of groups of variables to obtain the l.r.t. of block equicorrelation and equivariance. This test or its single block version may find applications in many areas as in psychology, education, medicine, genetics and they are important "in many tests of multivariate analysis, e.g. in MANOVA, Profile Analysis, Growth Curve analysis, etc" [12, 9]. By decomposing the overall hypothesis into the hypotheses of independence of groups of variables and the hypothesis of equicorrelation and equivariance we are able to obtain the expressions for the overall l.r.t. statistic and its moments. From these we obtain a suitable factorization of the characteristic function (c.f.) of the logarithm of the l.r.t. statistic, which enables us to develop highly manageable and precise near-exact distributions for the test statistic.
Fusion And Inference From Multiple And Massive Disparate Distributed Dynamic Data Sets
2017-07-01
principled methodology for two-sample graph testing; designed a provably almost-surely perfect vertex clustering algorithm for block model graphs; proved...3.7 Semi-Supervised Clustering Methodology ...................................................................... 9 3.8 Robust Hypothesis Testing...dimensional Euclidean space – allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for
Chiu, Chi-yang; Jung, Jeesun; Chen, Wei; Weeks, Daniel E; Ren, Haobo; Boehnke, Michael; Amos, Christopher I; Liu, Aiyi; Mills, James L; Ting Lee, Mei-ling; Xiong, Momiao; Fan, Ruzong
2017-01-01
To analyze next-generation sequencing data, multivariate functional linear models are developed for a meta-analysis of multiple studies to connect genetic variant data to multiple quantitative traits adjusting for covariates. The goal is to take the advantage of both meta-analysis and pleiotropic analysis in order to improve power and to carry out a unified association analysis of multiple studies and multiple traits of complex disorders. Three types of approximate F -distributions based on Pillai–Bartlett trace, Hotelling–Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants. Simulation analysis is performed to evaluate false-positive rates and power of the proposed tests. The proposed methods are applied to analyze lipid traits in eight European cohorts. It is shown that it is more advantageous to perform multivariate analysis than univariate analysis in general, and it is more advantageous to perform meta-analysis of multiple studies instead of analyzing the individual studies separately. The proposed models require individual observations. The value of the current paper can be seen at least for two reasons: (a) the proposed methods can be applied to studies that have individual genotype data; (b) the proposed methods can be used as a criterion for future work that uses summary statistics to build test statistics to meta-analyze the data. PMID:28000696
Chiu, Chi-Yang; Jung, Jeesun; Chen, Wei; Weeks, Daniel E; Ren, Haobo; Boehnke, Michael; Amos, Christopher I; Liu, Aiyi; Mills, James L; Ting Lee, Mei-Ling; Xiong, Momiao; Fan, Ruzong
2017-02-01
To analyze next-generation sequencing data, multivariate functional linear models are developed for a meta-analysis of multiple studies to connect genetic variant data to multiple quantitative traits adjusting for covariates. The goal is to take the advantage of both meta-analysis and pleiotropic analysis in order to improve power and to carry out a unified association analysis of multiple studies and multiple traits of complex disorders. Three types of approximate F -distributions based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants. Simulation analysis is performed to evaluate false-positive rates and power of the proposed tests. The proposed methods are applied to analyze lipid traits in eight European cohorts. It is shown that it is more advantageous to perform multivariate analysis than univariate analysis in general, and it is more advantageous to perform meta-analysis of multiple studies instead of analyzing the individual studies separately. The proposed models require individual observations. The value of the current paper can be seen at least for two reasons: (a) the proposed methods can be applied to studies that have individual genotype data; (b) the proposed methods can be used as a criterion for future work that uses summary statistics to build test statistics to meta-analyze the data.
Xu, Min; Zhang, Lei; Yue, Hong-Shui; Pang, Hong-Wei; Ye, Zheng-Liang; Ding, Li
2017-10-01
To establish an on-line monitoring method for extraction process of Schisandrae Chinensis Fructus, the formula medicinal material of Yiqi Fumai lyophilized injection by combining near infrared spectroscopy with multi-variable data analysis technology. The multivariate statistical process control (MSPC) model was established based on 5 normal batches in production and 2 test batches were monitored by PC scores, DModX and Hotelling T2 control charts. The results showed that MSPC model had a good monitoring ability for the extraction process. The application of the MSPC model to actual production process could effectively achieve on-line monitoring for extraction process of Schisandrae Chinensis Fructus, and can reflect the change of material properties in the production process in real time. This established process monitoring method could provide reference for the application of process analysis technology in the process quality control of traditional Chinese medicine injections. Copyright© by the Chinese Pharmaceutical Association.
Kocer, Naci; Mondel, Prabath Kumar; Yamac, Elif; Kavak, Ayse; Kizilkilic, Osman; Islak, Civan
2017-11-01
Flow diverters are increasingly used in the treatment of complex and giant intracranial aneurysms. However, they are associated with complications like late aneurysmal rupture. Additionally, flow diverters show focal structural decrease in luminal diameter without any intimal hyperplasia. This resembles a "fish mouth" when viewed en face. In this pilot study, we tested the hypothesis of a possible association between flow diverter fish-mouthing and delayed-type hypersensitivity to its metal constituents. We retrospectively reviewed patient records from our center between May 2010 and November 2015. A total of nine patients had flow diverter fish mouthing. A control group of 25 patients was selected. All study participants underwent prospective patch test to detect hypersensitivity to flow diverter metal constituents. Analysis was performed using logistic regression analysis and Wilcoxon sign rank sum test. Univariate and multivariate analyses were performed to test variables to predict flow diverter fish mouthing. The association between flow diverter fish mouthing and positive patch test was not statistically significant. In multivariate analysis, history of allergy and maximum aneurysm size category was associated with flow diverter fish mouthing. This was further confirmed on Wilcoxon sign rank sum test. The study showed statistically significant association between flow diverter fish mouthing and history of contact allergy and a small aneurysmal size. Further large-scale studies are needed to detect a statistically significant association between flow diverter fish mouthing and patch test. We recommend early and more frequent follow-up imaging in patients with contact allergy to detect flow diverter fish mouthing and its subsequent evolution.
Giordano, Bruno L.; Kayser, Christoph; Rousselet, Guillaume A.; Gross, Joachim; Schyns, Philippe G.
2016-01-01
Abstract We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open‐source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541–1573, 2017. © 2016 Wiley Periodicals, Inc. PMID:27860095
NASA Astrophysics Data System (ADS)
Ghanate, A. D.; Kothiwale, S.; Singh, S. P.; Bertrand, Dominique; Krishna, C. Murali
2011-02-01
Cancer is now recognized as one of the major causes of morbidity and mortality. Histopathological diagnosis, the gold standard, is shown to be subjective, time consuming, prone to interobserver disagreement, and often fails to predict prognosis. Optical spectroscopic methods are being contemplated as adjuncts or alternatives to conventional cancer diagnostics. The most important aspect of these approaches is their objectivity, and multivariate statistical tools play a major role in realizing it. However, rigorous evaluation of the robustness of spectral models is a prerequisite. The utility of Raman spectroscopy in the diagnosis of cancers has been well established. Until now, the specificity and applicability of spectral models have been evaluated for specific cancer types. In this study, we have evaluated the utility of spectroscopic models representing normal and malignant tissues of the breast, cervix, colon, larynx, and oral cavity in a broader perspective, using different multivariate tests. The limit test, which was used in our earlier study, gave high sensitivity but suffered from poor specificity. The performance of other methods such as factorial discriminant analysis and partial least square discriminant analysis are at par with more complex nonlinear methods such as decision trees, but they provide very little information about the classification model. This comparative study thus demonstrates not just the efficacy of Raman spectroscopic models but also the applicability and limitations of different multivariate tools for discrimination under complex conditions such as the multicancer scenario.
The Statistical Consulting Center for Astronomy (SCCA)
NASA Technical Reports Server (NTRS)
Akritas, Michael
2001-01-01
The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to scientists both within and outside of astronomy. The most popular sections are multivariate techniques, image analysis, and time series analysis. Hundreds of copies of the ASURV, SLOPES and CENS-TAU codes developed by SCCA scientists were also downloaded from the StatCodes site. In addition to formal SCCA duties, SCCA scientists continued a variety of related activities in astrostatistics, including refereeing of statistically oriented papers submitted to the Astrophysical Journal, talks in meetings including Feigelson's talk to science journalists entitled "The reemergence of astrostatistics" at the American Association for the Advancement of Science meeting, and published papers of astrostatistical content.
Keenan, Michael R; Smentkowski, Vincent S; Ulfig, Robert M; Oltman, Edward; Larson, David J; Kelly, Thomas F
2011-06-01
We demonstrate for the first time that multivariate statistical analysis techniques can be applied to atom probe tomography data to estimate the chemical composition of a sample at the full spatial resolution of the atom probe in three dimensions. Whereas the raw atom probe data provide the specific identity of an atom at a precise location, the multivariate results can be interpreted in terms of the probabilities that an atom representing a particular chemical phase is situated there. When aggregated to the size scale of a single atom (∼0.2 nm), atom probe spectral-image datasets are huge and extremely sparse. In fact, the average spectrum will have somewhat less than one total count per spectrum due to imperfect detection efficiency. These conditions, under which the variance in the data is completely dominated by counting noise, test the limits of multivariate analysis, and an extensive discussion of how to extract the chemical information is presented. Efficient numerical approaches to performing principal component analysis (PCA) on these datasets, which may number hundreds of millions of individual spectra, are put forward, and it is shown that PCA can be computed in a few seconds on a typical laptop computer.
Assessing Cultural Competence in Graduating Students
ERIC Educational Resources Information Center
Kohli, Hermeet K.; Kohli, Amarpreet S.; Huber, Ruth; Faul, Anna C.
2010-01-01
Twofold purpose of this study was to develop a framework to understand cultural competence in graduating social work students, and test that framework for appropriateness and predictability using multivariate statistics. Scale and predictor variables were collected using an online instrument from a nationwide convenience sample of graduating…
Corron, Louise; Marchal, François; Condemi, Silvana; Chaumoître, Kathia; Adalian, Pascal
2017-01-01
Juvenile age estimation methods used in forensic anthropology generally lack methodological consistency and/or statistical validity. Considering this, a standard approach using nonparametric Multivariate Adaptive Regression Splines (MARS) models were tested to predict age from iliac biometric variables of male and female juveniles from Marseilles, France, aged 0-12 years. Models using unidimensional (length and width) and bidimensional iliac data (module and surface) were constructed on a training sample of 176 individuals and validated on an independent test sample of 68 individuals. Results show that MARS prediction models using iliac width, module and area give overall better and statistically valid age estimates. These models integrate punctual nonlinearities of the relationship between age and osteometric variables. By constructing valid prediction intervals whose size increases with age, MARS models take into account the normal increase of individual variability. MARS models can qualify as a practical and standardized approach for juvenile age estimation. © 2016 American Academy of Forensic Sciences.
Gauging Skills of Hospital Security Personnel: a Statistically-driven, Questionnaire-based Approach.
Rinkoo, Arvind Vashishta; Mishra, Shubhra; Rahesuddin; Nabi, Tauqeer; Chandra, Vidha; Chandra, Hem
2013-01-01
This study aims to gauge the technical and soft skills of the hospital security personnel so as to enable prioritization of their training needs. A cross sectional questionnaire based study was conducted in December 2011. Two separate predesigned and pretested questionnaires were used for gauging soft skills and technical skills of the security personnel. Extensive statistical analysis, including Multivariate Analysis (Pillai-Bartlett trace along with Multi-factorial ANOVA) and Post-hoc Tests (Bonferroni Test) was applied. The 143 participants performed better on the soft skills front with an average score of 6.43 and standard deviation of 1.40. The average technical skills score was 5.09 with a standard deviation of 1.44. The study avowed a need for formal hands on training with greater emphasis on technical skills. Multivariate analysis of the available data further helped in identifying 20 security personnel who should be prioritized for soft skills training and a group of 36 security personnel who should receive maximum attention during technical skills training. This statistically driven approach can be used as a prototype by healthcare delivery institutions worldwide, after situation specific customizations, to identify the training needs of any category of healthcare staff.
Gauging Skills of Hospital Security Personnel: a Statistically-driven, Questionnaire-based Approach
Rinkoo, Arvind Vashishta; Mishra, Shubhra; Rahesuddin; Nabi, Tauqeer; Chandra, Vidha; Chandra, Hem
2013-01-01
Objectives This study aims to gauge the technical and soft skills of the hospital security personnel so as to enable prioritization of their training needs. Methodology A cross sectional questionnaire based study was conducted in December 2011. Two separate predesigned and pretested questionnaires were used for gauging soft skills and technical skills of the security personnel. Extensive statistical analysis, including Multivariate Analysis (Pillai-Bartlett trace along with Multi-factorial ANOVA) and Post-hoc Tests (Bonferroni Test) was applied. Results The 143 participants performed better on the soft skills front with an average score of 6.43 and standard deviation of 1.40. The average technical skills score was 5.09 with a standard deviation of 1.44. The study avowed a need for formal hands on training with greater emphasis on technical skills. Multivariate analysis of the available data further helped in identifying 20 security personnel who should be prioritized for soft skills training and a group of 36 security personnel who should receive maximum attention during technical skills training. Conclusion This statistically driven approach can be used as a prototype by healthcare delivery institutions worldwide, after situation specific customizations, to identify the training needs of any category of healthcare staff. PMID:23559904
An Analysis of Methods Used to Examine Gender Differences in Computer-Related Behavior.
ERIC Educational Resources Information Center
Kay, Robin
1992-01-01
Review of research investigating gender differences in computer-related behavior examines statistical and methodological flaws. Issues addressed include sample selection, sample size, scale development, scale quality, the use of univariate and multivariate analyses, regressional analysis, construct definition, construct testing, and the…
A Primer on Multivariate Analysis of Variance (MANOVA) for Behavioral Scientists
ERIC Educational Resources Information Center
Warne, Russell T.
2014-01-01
Reviews of statistical procedures (e.g., Bangert & Baumberger, 2005; Kieffer, Reese, & Thompson, 2001; Warne, Lazo, Ramos, & Ritter, 2012) show that one of the most common multivariate statistical methods in psychological research is multivariate analysis of variance (MANOVA). However, MANOVA and its associated procedures are often not…
NASA Technical Reports Server (NTRS)
Wolf, S. F.; Lipschutz, M. E.
1993-01-01
Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.
Reporting Practices and Use of Quantitative Methods in Canadian Journal Articles in Psychology.
Counsell, Alyssa; Harlow, Lisa L
2017-05-01
With recent focus on the state of research in psychology, it is essential to assess the nature of the statistical methods and analyses used and reported by psychological researchers. To that end, we investigated the prevalence of different statistical procedures and the nature of statistical reporting practices in recent articles from the four major Canadian psychology journals. The majority of authors evaluated their research hypotheses through the use of analysis of variance (ANOVA), t -tests, and multiple regression. Multivariate approaches were less common. Null hypothesis significance testing remains a popular strategy, but the majority of authors reported a standardized or unstandardized effect size measure alongside their significance test results. Confidence intervals on effect sizes were infrequently employed. Many authors provided minimal details about their statistical analyses and less than a third of the articles presented on data complications such as missing data and violations of statistical assumptions. Strengths of and areas needing improvement for reporting quantitative results are highlighted. The paper concludes with recommendations for how researchers and reviewers can improve comprehension and transparency in statistical reporting.
Verbal Neuropsychological Functions in Aphasia: An Integrative Model
ERIC Educational Resources Information Center
Vigliecca, Nora Silvana; Báez, Sandra
2015-01-01
A theoretical framework which considers the verbal functions of the brain under a multivariate and comprehensive cognitive model was statistically analyzed. A confirmatory factor analysis was performed to verify whether some recognized aphasia constructs can be hierarchically integrated as latent factors from a homogenously verbal test. The Brief…
Influences of environment and disturbance on forest patterns in coastal Oregon watersheds.
Michael C. Wimberly; Thomas A. Spies
2001-01-01
Modern ecology often emphasizes the distinction between traditional theories of stable, environmentally structured communities and a new paradigm of disturbance driven, nonequilibrium dynamics. However, multiple hypotheses for observed vegetation patterns have seldom been explicitly tested. We used multivariate statistics and variation partitioning methods to assess...
NASA Technical Reports Server (NTRS)
Tripp, John S.; Tcheng, Ping
1999-01-01
Statistical tools, previously developed for nonlinear least-squares estimation of multivariate sensor calibration parameters and the associated calibration uncertainty analysis, have been applied to single- and multiple-axis inertial model attitude sensors used in wind tunnel testing to measure angle of attack and roll angle. The analysis provides confidence and prediction intervals of calibrated sensor measurement uncertainty as functions of applied input pitch and roll angles. A comparative performance study of various experimental designs for inertial sensor calibration is presented along with corroborating experimental data. The importance of replicated calibrations over extended time periods has been emphasized; replication provides independent estimates of calibration precision and bias uncertainties, statistical tests for calibration or modeling bias uncertainty, and statistical tests for sensor parameter drift over time. A set of recommendations for a new standardized model attitude sensor calibration method and usage procedures is included. The statistical information provided by these procedures is necessary for the uncertainty analysis of aerospace test results now required by users of industrial wind tunnel test facilities.
Ince, Robin A A; Giordano, Bruno L; Kayser, Christoph; Rousselet, Guillaume A; Gross, Joachim; Schyns, Philippe G
2017-03-01
We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open-source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541-1573, 2017. © 2016 Wiley Periodicals, Inc. 2016 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.
Analyzing Faculty Salaries When Statistics Fail.
ERIC Educational Resources Information Center
Simpson, William A.
The role played by nonstatistical procedures, in contrast to multivariant statistical approaches, in analyzing faculty salaries is discussed. Multivariant statistical methods are usually used to establish or defend against prima facia cases of gender and ethnic discrimination with respect to faculty salaries. These techniques are not applicable,…
Kuselman, Ilya; Pennecchi, Francesca R; da Silva, Ricardo J N B; Hibbert, D Brynn
2017-11-01
The probability of a false decision on conformity of a multicomponent material due to measurement uncertainty is discussed when test results are correlated. Specification limits of the components' content of such a material generate a multivariate specification interval/domain. When true values of components' content and corresponding test results are modelled by multivariate distributions (e.g. by multivariate normal distributions), a total global risk of a false decision on the material conformity can be evaluated based on calculation of integrals of their joint probability density function. No transformation of the raw data is required for that. A total specific risk can be evaluated as the joint posterior cumulative function of true values of a specific batch or lot lying outside the multivariate specification domain, when the vector of test results, obtained for the lot, is inside this domain. It was shown, using a case study of four components under control in a drug, that the correlation influence on the risk value is not easily predictable. To assess this influence, the evaluated total risk values were compared with those calculated for independent test results and also with those assuming much stronger correlation than that observed. While the observed statistically significant correlation did not lead to a visible difference in the total risk values in comparison to the independent test results, the stronger correlation among the variables caused either the total risk decreasing or its increasing, depending on the actual values of the test results. Copyright © 2017 Elsevier B.V. All rights reserved.
Multivariate Relationships between Statistics Anxiety and Motivational Beliefs
ERIC Educational Resources Information Center
Baloglu, Mustafa; Abbassi, Amir; Kesici, Sahin
2017-01-01
In general, anxiety has been found to be associated with motivational beliefs and the current study investigated multivariate relationships between statistics anxiety and motivational beliefs among 305 college students (60.0% women). The Statistical Anxiety Rating Scale, the Motivated Strategies for Learning Questionnaire, and a set of demographic…
Felix, Leonardo Bonato; Miranda de Sá, Antonio Mauricio Ferreira Leite; Infantosi, Antonio Fernando Catelli; Yehia, Hani Camille
2007-03-01
The presence of cerebral evoked responses can be tested by using objective response detectors. They are statistical tests that provide a threshold above which responses can be assumed to have occurred. The detection power depends on the signal-to-noise ratio (SNR) of the response and the amount of data available. However, the correlation within the background noise could also affect the power of such detectors. For a fixed SNR, the detection can only be improved at the expense of using a longer stretch of signal. This can constitute a limitation, for instance, in monitored surgeries. Alternatively, multivariate objective response detection (MORD) could be used. This work applies two MORD techniques (multiple coherence and multiple component synchrony measure) to EEG data collected during intermittent photic stimulation. They were evaluated throughout Monte Carlo simulations, which also allowed verifying that correlation in the background reduces the detection rate. Considering the N EEG derivations as close as possible to the primary visual cortex, if N = 4, 6 or 8, multiple coherence leads to a statistically significant higher detection rate in comparison with multiple component synchrony measure. With the former, the best performance was obtained with six signals (O1, O2, T5, T6, P3 and P4).
Multivariate statistical approach to estimate mixing proportions for unknown end members
Valder, Joshua F.; Long, Andrew J.; Davis, Arden D.; Kenner, Scott J.
2012-01-01
A multivariate statistical method is presented, which includes principal components analysis (PCA) and an end-member mixing model to estimate unknown end-member hydrochemical compositions and the relative mixing proportions of those end members in mixed waters. PCA, together with the Hotelling T2 statistic and a conceptual model of groundwater flow and mixing, was used in selecting samples that best approximate end members, which then were used as initial values in optimization of the end-member mixing model. This method was tested on controlled datasets (i.e., true values of estimates were known a priori) and found effective in estimating these end members and mixing proportions. The controlled datasets included synthetically generated hydrochemical data, synthetically generated mixing proportions, and laboratory analyses of sample mixtures, which were used in an evaluation of the effectiveness of this method for potential use in actual hydrological settings. For three different scenarios tested, correlation coefficients (R2) for linear regression between the estimated and known values ranged from 0.968 to 0.993 for mixing proportions and from 0.839 to 0.998 for end-member compositions. The method also was applied to field data from a study of end-member mixing in groundwater as a field example and partial method validation.
Robust Optimum Invariant Tests for Random MANOVA Models.
1986-10-01
are assumed to be independent normal with zero mean and dispersion o2 and o72 respectively, Roy and Gnanadesikan (1959) considered the prob- 2 2 lem of...Part II: The multivariate case. Ann. Math. Statist. 31, 939-968. [7] Roy, S.N. and Gnanadesikan , R. (1959). Some contributions to ANOVA in one or more
Interactive visual analysis promotes exploration of long-term ecological data
T.N. Pham; J.A. Jones; R. Metoyer; F.J. Swanson; R.J. Pabst
2013-01-01
Long-term ecological data are crucial in helping ecologists understand ecosystem function and environmental change. Nevertheless, these kinds of data sets are difficult to analyze because they are usually large, multivariate, and spatiotemporal. Although existing analysis tools such as statistical methods and spreadsheet software permit rigorous tests of pre-conceived...
Prolonged Instability Prior to a Regime Shift | Science ...
Regime shifts are generally defined as the point of ‘abrupt’ change in the state of a system. However, a seemingly abrupt transition can be the product of a system reorganization that has been ongoing much longer than is evident in statistical analysis of a single component of the system. Using both univariate and multivariate statistical methods, we tested a long-term high-resolution paleoecological dataset with a known change in species assemblage for a regime shift. Analysis of this dataset with Fisher Information and multivariate time series modeling showed that there was a∼2000 year period of instability prior to the regime shift. This period of instability and the subsequent regime shift coincide with regional climate change, indicating that the system is undergoing extrinsic forcing. Paleoecological records offer a unique opportunity to test tools for the detection of thresholds and stable-states, and thus to examine the long-term stability of ecosystems over periods of multiple millennia. This manuscript explores various methods of assessing the transition between alternative states in an ecological system described by a long-term high-resolution paleoecological dataset.
A novel examination of atypical major depressive disorder based on attachment theory.
Levitan, Robert D; Atkinson, Leslie; Pedersen, Rebecca; Buis, Tom; Kennedy, Sidney H; Chopra, Kevin; Leung, Eman M; Segal, Zindel V
2009-06-01
While a large body of descriptive work has thoroughly investigated the clinical correlates of atypical depression, little is known about its fundamental origins. This study examined atypical depression from an attachment theory framework. Our hypothesis was that, compared to adults with melancholic depression, those with atypical depression would report more anxious-ambivalent attachment and less secure attachment. As gender has been an important consideration in prior work on atypical depression, this same hypothesis was further tested in female subjects only. One hundred ninety-nine consecutive adults presenting to a tertiary mood disorders clinic with major depressive disorder with either atypical or melancholic features according to the Structured Clinical Interview for DSM-IV Axis-I Disorders were administered a self-report adult attachment questionnaire to assess the core dimensions of secure, anxious-ambivalent, and avoidant attachment. Attachment scores were compared across the 2 depressed groups defined by atypical and melancholic features using multivariate analysis of variance. The study was conducted between 1999 and 2004. When men and women were considered together, the multivariate test comparing attachment scores by depressive group was statistically significant at p < .05. Between-subjects testing indicated that atypical depression was associated with significantly lower secure attachment scores, with a trend toward higher anxious-ambivalent attachment scores, than was melancholia. When women were analyzed separately, the multivariate test was statistically significant at p < .01, with both secure and anxious-ambivalent attachment scores differing significantly across depressive groups. These preliminary findings suggest that attachment theory, and insecure and anxious-ambivalent attachment in particular, may be a useful framework from which to study the origins, clinical correlates, and treatment of atypical depression. Gender may be an important consideration when considering atypical depression from an attachment perspective. Copyright 2009 Physicians Postgraduate Press, Inc.
Warton, David I; Thibaut, Loïc; Wang, Yi Alice
2017-01-01
Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.
Thibaut, Loïc; Wang, Yi Alice
2017-01-01
Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)—common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of “model-free bootstrap”, adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods. PMID:28738071
Luo, Li; Zhu, Yun
2012-01-01
Abstract The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T2, collapsing method, multivariate and collapsing (CMC) method, individual χ2 test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets. PMID:22651812
Luo, Li; Zhu, Yun; Xiong, Momiao
2012-06-01
The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.
Mujica Ascencio, Saul; Choe, ChunSik; Meinke, Martina C; Müller, Rainer H; Maksimov, George V; Wigger-Alberti, Walter; Lademann, Juergen; Darvin, Maxim E
2016-07-01
Propylene glycol is one of the known substances added in cosmetic formulations as a penetration enhancer. Recently, nanocrystals have been employed also to increase the skin penetration of active components. Caffeine is a component with many applications and its penetration into the epidermis is controversially discussed in the literature. In the present study, the penetration ability of two components - caffeine nanocrystals and propylene glycol, applied topically on porcine ear skin in the form of a gel, was investigated ex vivo using two confocal Raman microscopes operated at different excitation wavelengths (785nm and 633nm). Several depth profiles were acquired in the fingerprint region and different spectral ranges, i.e., 526-600cm(-1) and 810-880cm(-1) were chosen for independent analysis of caffeine and propylene glycol penetration into the skin, respectively. Multivariate statistical methods such as principal component analysis (PCA) and linear discriminant analysis (LDA) combined with Student's t-test were employed to calculate the maximum penetration depths of each substance (caffeine and propylene glycol). The results show that propylene glycol penetrates significantly deeper than caffeine (20.7-22.0μm versus 12.3-13.0μm) without any penetration enhancement effect on caffeine. The results confirm that different substances, even if applied onto the skin as a mixture, can penetrate differently. The penetration depths of caffeine and propylene glycol obtained using two different confocal Raman microscopes are comparable showing that both types of microscopes are well suited for such investigations and that multivariate statistical PCA-LDA methods combined with Student's t-test are very useful for analyzing the penetration of different substances into the skin. Copyright © 2016 Elsevier B.V. All rights reserved.
Lindberg, Ann-Sofie; Oksa, Juha; Antti, Henrik; Malm, Christer
2015-01-01
Physical capacity has previously been deemed important for firefighters physical work capacity, and aerobic fitness, muscular strength, and muscular endurance are the most frequently investigated parameters of importance. Traditionally, bivariate and multivariate linear regression statistics have been used to study relationships between physical capacities and work capacities among firefighters. An alternative way to handle datasets consisting of numerous correlated variables is to use multivariate projection analyses, such as Orthogonal Projection to Latent Structures. The first aim of the present study was to evaluate the prediction and predictive power of field and laboratory tests, respectively, on firefighters' physical work capacity on selected work tasks. Also, to study if valid predictions could be achieved without anthropometric data. The second aim was to externally validate selected models. The third aim was to validate selected models on firefighters' and on civilians'. A total of 38 (26 men and 12 women) + 90 (38 men and 52 women) subjects were included in the models and the external validation, respectively. The best prediction (R2) and predictive power (Q2) of Stairs, Pulling, Demolition, Terrain, and Rescue work capacities included field tests (R2 = 0.73 to 0.84, Q2 = 0.68 to 0.82). The best external validation was for Stairs work capacity (R2 = 0.80) and worst for Demolition work capacity (R2 = 0.40). In conclusion, field and laboratory tests could equally well predict physical work capacities for firefighting work tasks, and models excluding anthropometric data were valid. The predictive power was satisfactory for all included work tasks except Demolition.
Technical Reports Prepared Under Contract N00014-76-C-0475.
1987-05-29
264 Approximations to Densities in Geometric H. Solomon 10/27/78 Probability M.A. Stephens 3. Technical Relort No. Title Author Date 265 Sequential ...Certain Multivariate S. Iyengar 8/12/82 Normal Probabilities 323 EDF Statistics for Testing for the Gamma M.A. Stephens 8/13/82 Distribution with...20-85 Nets 360 Random Sequential Coding By Hamming Distance Yoshiaki Itoh 07-11-85 Herbert Solomon 361 Transforming Censored Samples And Testing Fit
Instrumental Neutron Activation Analysis and Multivariate Statistics for Pottery Provenance
NASA Astrophysics Data System (ADS)
Glascock, M. D.; Neff, H.; Vaughn, K. J.
2004-06-01
The application of instrumental neutron activation analysis and multivariate statistics to archaeological studies of ceramics and clays is described. A small pottery data set from the Nasca culture in southern Peru is presented for illustration.
Benson, Nsikak U.; Asuquo, Francis E.; Williams, Akan B.; Essien, Joseph P.; Ekong, Cyril I.; Akpabio, Otobong; Olajire, Abaas A.
2016-01-01
Trace metals (Cd, Cr, Cu, Ni and Pb) concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria). The degree of contamination was assessed using the individual contamination factors (ICF) and global contamination factor (GCF). Multivariate statistical approaches including principal component analysis (PCA), cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources. PMID:27257934
Multivariate postprocessing techniques for probabilistic hydrological forecasting
NASA Astrophysics Data System (ADS)
Hemri, Stephan; Lisniak, Dmytro; Klein, Bastian
2016-04-01
Hydrologic ensemble forecasts driven by atmospheric ensemble prediction systems need statistical postprocessing in order to account for systematic errors in terms of both mean and spread. Runoff is an inherently multivariate process with typical events lasting from hours in case of floods to weeks or even months in case of droughts. This calls for multivariate postprocessing techniques that yield well calibrated forecasts in univariate terms and ensure a realistic temporal dependence structure at the same time. To this end, the univariate ensemble model output statistics (EMOS; Gneiting et al., 2005) postprocessing method is combined with two different copula approaches that ensure multivariate calibration throughout the entire forecast horizon. These approaches comprise ensemble copula coupling (ECC; Schefzik et al., 2013), which preserves the dependence structure of the raw ensemble, and a Gaussian copula approach (GCA; Pinson and Girard, 2012), which estimates the temporal correlations from training observations. Both methods are tested in a case study covering three subcatchments of the river Rhine that represent different sizes and hydrological regimes: the Upper Rhine up to the gauge Maxau, the river Moselle up to the gauge Trier, and the river Lahn up to the gauge Kalkofen. The results indicate that both ECC and GCA are suitable for modelling the temporal dependences of probabilistic hydrologic forecasts (Hemri et al., 2015). References Gneiting, T., A. E. Raftery, A. H. Westveld, and T. Goldman (2005), Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation, Monthly Weather Review, 133(5), 1098-1118, DOI: 10.1175/MWR2904.1. Hemri, S., D. Lisniak, and B. Klein, Multivariate postprocessing techniques for probabilistic hydrological forecasting, Water Resources Research, 51(9), 7436-7451, DOI: 10.1002/2014WR016473. Pinson, P., and R. Girard (2012), Evaluating the quality of scenarios of short-term wind power generation, Applied Energy, 96, 12-20, DOI: 10.1016/j.apenergy.2011.11.004. Schefzik, R., T. L. Thorarinsdottir, and T. Gneiting (2013), Uncertainty quantification in complex simulation models using ensemble copula coupling, Statistical Science, 28, 616-640, DOI: 10.1214/13-STS443.
ERIC Educational Resources Information Center
Fouladi, Rachel T.
2000-01-01
Provides an overview of standard and modified normal theory and asymptotically distribution-free covariance and correlation structure analysis techniques and details Monte Carlo simulation results on Type I and Type II error control. Demonstrates through the simulation that robustness and nonrobustness of structure analysis techniques vary as a…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ladd-Lively, Jennifer L
2014-01-01
The objective of this work was to determine the feasibility of using on-line multivariate statistical process control (MSPC) for safeguards applications in natural uranium conversion plants. Multivariate statistical process control is commonly used throughout industry for the detection of faults. For safeguards applications in uranium conversion plants, faults could include the diversion of intermediate products such as uranium dioxide, uranium tetrafluoride, and uranium hexafluoride. This study was limited to a 100 metric ton of uranium (MTU) per year natural uranium conversion plant (NUCP) using the wet solvent extraction method for the purification of uranium ore concentrate. A key component inmore » the multivariate statistical methodology is the Principal Component Analysis (PCA) approach for the analysis of data, development of the base case model, and evaluation of future operations. The PCA approach was implemented through the use of singular value decomposition of the data matrix where the data matrix represents normal operation of the plant. Component mole balances were used to model each of the process units in the NUCP. However, this approach could be applied to any data set. The monitoring framework developed in this research could be used to determine whether or not a diversion of material has occurred at an NUCP as part of an International Atomic Energy Agency (IAEA) safeguards system. This approach can be used to identify the key monitoring locations, as well as locations where monitoring is unimportant. Detection limits at the key monitoring locations can also be established using this technique. Several faulty scenarios were developed to test the monitoring framework after the base case or normal operating conditions of the PCA model were established. In all of the scenarios, the monitoring framework was able to detect the fault. Overall this study was successful at meeting the stated objective.« less
[Analysis of variance of repeated data measured by water maze with SPSS].
Qiu, Hong; Jin, Guo-qin; Jin, Ru-feng; Zhao, Wei-kang
2007-01-01
To introduce the method of analyzing repeated data measured by water maze with SPSS 11.0, and offer a reference statistical method to clinical and basic medicine researchers who take the design of repeated measures. Using repeated measures and multivariate analysis of variance (ANOVA) process of the general linear model in SPSS and giving comparison among different groups and different measure time pairwise. Firstly, Mauchly's test of sphericity should be used to judge whether there were relations among the repeatedly measured data. If any (P
2017-09-01
efficacy of statistical post-processing methods downstream of these dynamical model components with a hierarchical multivariate Bayesian approach to...Bayesian hierarchical modeling, Markov chain Monte Carlo methods , Metropolis algorithm, machine learning, atmospheric prediction 15. NUMBER OF PAGES...scale processes. However, this dissertation explores the efficacy of statistical post-processing methods downstream of these dynamical model components
Clinical validation of robot simulation of toothbrushing - comparative plaque removal efficacy
2014-01-01
Background Clinical validation of laboratory toothbrushing tests has important advantages. It was, therefore, the aim to demonstrate correlation of tooth cleaning efficiency of a new robot brushing simulation technique with clinical plaque removal. Methods Clinical programme: 27 subjects received dental cleaning prior to 3-day-plaque-regrowth-interval. Plaque was stained, photographically documented and scored using planimetrical index. Subjects brushed teeth 33–47 with three techniques (horizontal, rotating, vertical), each for 20s buccally and for 20s orally in 3 consecutive intervals. The force was calibrated, the brushing technique was video supported. Two different brushes were randomly assigned to the subject. Robot programme: Clinical brushing programmes were transfered to a 6-axis-robot. Artificial teeth 33–47 were covered with plaque-simulating substrate. All brushing techniques were repeated 7 times, results were scored according to clinical planimetry. All data underwent statistical analysis by t-test, U-test and multivariate analysis. Results The individual clinical cleaning patterns are well reproduced by the robot programmes. Differences in plaque removal are statistically significant for the two brushes, reproduced in clinical and robot data. Multivariate analysis confirms the higher cleaning efficiency for anterior teeth and for the buccal sites. Conclusions The robot tooth brushing simulation programme showed good correlation with clinically standardized tooth brushing. This new robot brushing simulation programme can be used for rapid, reproducible laboratory testing of tooth cleaning. PMID:24996973
Clinical validation of robot simulation of toothbrushing--comparative plaque removal efficacy.
Lang, Tomas; Staufer, Sebastian; Jennes, Barbara; Gaengler, Peter
2014-07-04
Clinical validation of laboratory toothbrushing tests has important advantages. It was, therefore, the aim to demonstrate correlation of tooth cleaning efficiency of a new robot brushing simulation technique with clinical plaque removal. Clinical programme: 27 subjects received dental cleaning prior to 3-day-plaque-regrowth-interval. Plaque was stained, photographically documented and scored using planimetrical index. Subjects brushed teeth 33-47 with three techniques (horizontal, rotating, vertical), each for 20s buccally and for 20s orally in 3 consecutive intervals. The force was calibrated, the brushing technique was video supported. Two different brushes were randomly assigned to the subject. Robot programme: Clinical brushing programmes were transfered to a 6-axis-robot. Artificial teeth 33-47 were covered with plaque-simulating substrate. All brushing techniques were repeated 7 times, results were scored according to clinical planimetry. All data underwent statistical analysis by t-test, U-test and multivariate analysis. The individual clinical cleaning patterns are well reproduced by the robot programmes. Differences in plaque removal are statistically significant for the two brushes, reproduced in clinical and robot data. Multivariate analysis confirms the higher cleaning efficiency for anterior teeth and for the buccal sites. The robot tooth brushing simulation programme showed good correlation with clinically standardized tooth brushing.This new robot brushing simulation programme can be used for rapid, reproducible laboratory testing of tooth cleaning.
Clinical Trials With Large Numbers of Variables: Important Advantages of Canonical Analysis.
Cleophas, Ton J
2016-01-01
Canonical analysis assesses the combined effects of a set of predictor variables on a set of outcome variables, but it is little used in clinical trials despite the omnipresence of multiple variables. The aim of this study was to assess the performance of canonical analysis as compared with traditional multivariate methods using multivariate analysis of covariance (MANCOVA). As an example, a simulated data file with 12 gene expression levels and 4 drug efficacy scores was used. The correlation coefficient between the 12 predictor and 4 outcome variables was 0.87 (P = 0.0001) meaning that 76% of the variability in the outcome variables was explained by the 12 covariates. Repeated testing after the removal of 5 unimportant predictor and 1 outcome variable produced virtually the same overall result. The MANCOVA identified identical unimportant variables, but it was unable to provide overall statistics. (1) Canonical analysis is remarkable, because it can handle many more variables than traditional multivariate methods such as MANCOVA can. (2) At the same time, it accounts for the relative importance of the separate variables, their interactions and differences in units. (3) Canonical analysis provides overall statistics of the effects of sets of variables, whereas traditional multivariate methods only provide the statistics of the separate variables. (4) Unlike other methods for combining the effects of multiple variables such as factor analysis/partial least squares, canonical analysis is scientifically entirely rigorous. (5) Limitations include that it is less flexible than factor analysis/partial least squares, because only 2 sets of variables are used and because multiple solutions instead of one is offered. We do hope that this article will stimulate clinical investigators to start using this remarkable method.
Multivariate meta-analysis: potential and promise.
Jackson, Dan; Riley, Richard; White, Ian R
2011-09-10
The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day 'Multivariate meta-analysis' event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd.
Multivariate meta-analysis: Potential and promise
Jackson, Dan; Riley, Richard; White, Ian R
2011-01-01
The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day ‘Multivariate meta-analysis’ event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd. PMID:21268052
Catelani, Tiago A; Santos, João Rodrigo; Páscoa, Ricardo N M J; Pezza, Leonardo; Pezza, Helena R; Lopes, João A
2018-03-01
This work proposes the use of near infrared (NIR) spectroscopy in diffuse reflectance mode and multivariate statistical process control (MSPC) based on principal component analysis (PCA) for real-time monitoring of the coffee roasting process. The main objective was the development of a MSPC methodology able to early detect disturbances to the roasting process resourcing to real-time acquisition of NIR spectra. A total of fifteen roasting batches were defined according to an experimental design to develop the MSPC models. This methodology was tested on a set of five batches where disturbances of different nature were imposed to simulate real faulty situations. Some of these batches were used to optimize the model while the remaining was used to test the methodology. A modelling strategy based on a time sliding window provided the best results in terms of distinguishing batches with and without disturbances, resourcing to typical MSPC charts: Hotelling's T 2 and squared predicted error statistics. A PCA model encompassing a time window of four minutes with three principal components was able to efficiently detect all disturbances assayed. NIR spectroscopy combined with the MSPC approach proved to be an adequate auxiliary tool for coffee roasters to detect faults in a conventional roasting process in real-time. Copyright © 2017 Elsevier B.V. All rights reserved.
TU-FG-201-05: Varian MPC as a Statistical Process Control Tool
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carver, A; Rowbottom, C
Purpose: Quality assurance in radiotherapy requires the measurement of various machine parameters to ensure they remain within permitted values over time. In Truebeam release 2.0 the Machine Performance Check (MPC) was released allowing beam output and machine axis movements to be assessed in a single test. We aim to evaluate the Varian Machine Performance Check (MPC) as a tool for Statistical Process Control (SPC). Methods: Varian’s MPC tool was used on three Truebeam and one EDGE linac for a period of approximately one year. MPC was commissioned against independent systems. After this period the data were reviewed to determine whethermore » or not the MPC was useful as a process control tool. Analyses on individual tests were analysed using Shewhart control plots, using Matlab for analysis. Principal component analysis was used to determine if a multivariate model was of any benefit in analysing the data. Results: Control charts were found to be useful to detect beam output changes, worn T-nuts and jaw calibration issues. Upper and lower control limits were defined at the 95% level. Multivariate SPC was performed using Principal Component Analysis. We found little evidence of clustering beyond that which might be naively expected such as beam uniformity and beam output. Whilst this makes multivariate analysis of little use it suggests that each test is giving independent information. Conclusion: The variety of independent parameters tested in MPC makes it a sensitive tool for routine machine QA. We have determined that using control charts in our QA programme would rapidly detect changes in machine performance. The use of control charts allows large quantities of tests to be performed on all linacs without visual inspection of all results. The use of control limits alerts users when data are inconsistent with previous measurements before they become out of specification. A. Carver has received a speaker’s honorarium from Varian.« less
Cain, Meghan K; Zhang, Zhiyong; Yuan, Ke-Hai
2017-10-01
Nonnormality of univariate data has been extensively examined previously (Blanca et al., Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 9(2), 78-84, 2013; Miceeri, Psychological Bulletin, 105(1), 156, 1989). However, less is known of the potential nonnormality of multivariate data although multivariate analysis is commonly used in psychological and educational research. Using univariate and multivariate skewness and kurtosis as measures of nonnormality, this study examined 1,567 univariate distriubtions and 254 multivariate distributions collected from authors of articles published in Psychological Science and the American Education Research Journal. We found that 74 % of univariate distributions and 68 % multivariate distributions deviated from normal distributions. In a simulation study using typical values of skewness and kurtosis that we collected, we found that the resulting type I error rates were 17 % in a t-test and 30 % in a factor analysis under some conditions. Hence, we argue that it is time to routinely report skewness and kurtosis along with other summary statistics such as means and variances. To facilitate future report of skewness and kurtosis, we provide a tutorial on how to compute univariate and multivariate skewness and kurtosis by SAS, SPSS, R and a newly developed Web application.
A refined method for multivariate meta-analysis and meta-regression
Jackson, Daniel; Riley, Richard D
2014-01-01
Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects’ standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:23996351
Cai, Li
2006-02-01
A permutation test typically requires fewer assumptions than does a comparable parametric counterpart. The multi-response permutation procedure (MRPP) is a class of multivariate permutation tests of group difference useful for the analysis of experimental data. However, psychologists seldom make use of the MRPP in data analysis, in part because the MRPP is not implemented in popular statistical packages that psychologists use. A set of SPSS macros implementing the MRPP test is provided in this article. The use of the macros is illustrated by analyzing example data sets.
Multivariate Phylogenetic Comparative Methods: Evaluations, Comparisons, and Recommendations.
Adams, Dean C; Collyer, Michael L
2018-01-01
Recent years have seen increased interest in phylogenetic comparative analyses of multivariate data sets, but to date the varied proposed approaches have not been extensively examined. Here we review the mathematical properties required of any multivariate method, and specifically evaluate existing multivariate phylogenetic comparative methods in this context. Phylogenetic comparative methods based on the full multivariate likelihood are robust to levels of covariation among trait dimensions and are insensitive to the orientation of the data set, but display increasing model misspecification as the number of trait dimensions increases. This is because the expected evolutionary covariance matrix (V) used in the likelihood calculations becomes more ill-conditioned as trait dimensionality increases, and as evolutionary models become more complex. Thus, these approaches are only appropriate for data sets with few traits and many species. Methods that summarize patterns across trait dimensions treated separately (e.g., SURFACE) incorrectly assume independence among trait dimensions, resulting in nearly a 100% model misspecification rate. Methods using pairwise composite likelihood are highly sensitive to levels of trait covariation, the orientation of the data set, and the number of trait dimensions. The consequences of these debilitating deficiencies are that a user can arrive at differing statistical conclusions, and therefore biological inferences, simply from a dataspace rotation, like principal component analysis. By contrast, algebraic generalizations of the standard phylogenetic comparative toolkit that use the trace of covariance matrices are insensitive to levels of trait covariation, the number of trait dimensions, and the orientation of the data set. Further, when appropriate permutation tests are used, these approaches display acceptable Type I error and statistical power. We conclude that methods summarizing information across trait dimensions, as well as pairwise composite likelihood methods should be avoided, whereas algebraic generalizations of the phylogenetic comparative toolkit provide a useful means of assessing macroevolutionary patterns in multivariate data. Finally, we discuss areas in which multivariate phylogenetic comparative methods are still in need of future development; namely highly multivariate Ornstein-Uhlenbeck models and approaches for multivariate evolutionary model comparisons. © The Author(s) 2017. Published by Oxford University Press on behalf of the Systematic Biology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
USDA-ARS?s Scientific Manuscript database
Characterizing population genetic structure across geographic space is a fundamental challenge in population genetics. Multivariate statistical analyses are powerful tools for summarizing genetic variability, but geographic information and accompanying metadata is not always easily integrated into t...
Quantifying and mapping spatial variability in simulated forest plots
Gavin R. Corral; Harold E. Burkhart
2016-01-01
We used computer simulations to test the efficacy of multivariate statistical methods to detect, quantify, and map spatial variability of forest stands. Simulated stands were developed of regularly-spaced plantations of loblolly pine (Pinus taeda L.). We assumed no affects of competition or mortality, but random variability was added to individual tree characteristics...
Maserejian, Nancy N.; Trachtenberg, Felicia L.; Hauser, Russ; McKinlay, Sonja; Shrader, Peter; Bellinger, David C.
2012-01-01
Background Resin-based dental restorations may intra-orally release their components and bisphenol A. Gestational bisphenol A exposure has been associated with poorer executive functioning in children. Objectives To examine whether exposure to resin-based composite restorations is associated with neuropsychological development in children. Methods Secondary analysis of treatment level data from the New England Children’s Amalgam Trial, a 2-group randomized safety trial conducted from 1997–2006. Children (N=534) aged 6–10 y with >2 posterior tooth caries were randomized to treatment with amalgam or resin-based composites (bisphenol-A-diglycidyl-dimethacrylate-composite for permanent teeth; urethane dimethacrylate-based polyacid-modified compomer for primary teeth). Neuropsychological function at 4- and 5-year follow-up (N=444) was measured by a battery of tests of executive function, intelligence, memory, visual-spatial skills, verbal fluency, and problem-solving. Multivariable generalized linear regression models were used to examine the association between composite exposure levels and changes in neuropsychological test scores from baseline to follow-up. For comparison, data on children randomized to amalgam treatment were similarly analyzed. Results With greater exposure to either dental composite material, results were generally consistent in the direction of slightly poorer changes in tests of intelligence, achievement or memory, but there were no statistically significant associations. For the four primary measures of executive function, scores were slightly worse with greater total composite exposure, but statistically significant only for the test of Letter Fluency (10-surface-years β= −0.8, SE=0.4, P=0.035), and the subtest of color naming (β= −1.5, SE=0.5, P=0.004) in the Stroop Color-Word Interference Test. Multivariate analysis of variance confirmed that the negative associations between composite level and executive function were not statistically significant (MANOVA P=0.18). Results for greater amalgam exposure were mostly nonsignificant in the opposite direction of slightly improved scores over follow-up. Conclusions Dental composite restorations had statistically insignificant associations of small magnitude with impairments in neuropsychological test change scores over 4- or 5-years of follow-up in this trial. PMID:22906860
Wang, Longfei; Lee, Sungyoung; Gim, Jungsoo; Qiao, Dandi; Cho, Michael; Elston, Robert C; Silverman, Edwin K; Won, Sungho
2016-09-01
Family-based designs have been repeatedly shown to be powerful in detecting the significant rare variants associated with human diseases. Furthermore, human diseases are often defined by the outcomes of multiple phenotypes, and thus we expect multivariate family-based analyses may be very efficient in detecting associations with rare variants. However, few statistical methods implementing this strategy have been developed for family-based designs. In this report, we describe one such implementation: the multivariate family-based rare variant association tool (mFARVAT). mFARVAT is a quasi-likelihood-based score test for rare variant association analysis with multiple phenotypes, and tests both homogeneous and heterogeneous effects of each variant on multiple phenotypes. Simulation results show that the proposed method is generally robust and efficient for various disease models, and we identify some promising candidate genes associated with chronic obstructive pulmonary disease. The software of mFARVAT is freely available at http://healthstat.snu.ac.kr/software/mfarvat/, implemented in C++ and supported on Linux and MS Windows. © 2016 WILEY PERIODICALS, INC.
Multivariate Strategies in Functional Magnetic Resonance Imaging
ERIC Educational Resources Information Center
Hansen, Lars Kai
2007-01-01
We discuss aspects of multivariate fMRI modeling, including the statistical evaluation of multivariate models and means for dimensional reduction. In a case study we analyze linear and non-linear dimensional reduction tools in the context of a "mind reading" predictive multivariate fMRI model.
A functional U-statistic method for association analysis of sequencing data.
Jadhav, Sneha; Tong, Xiaoran; Lu, Qing
2017-11-01
Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence. © 2017 WILEY PERIODICALS, INC.
Santos, L N S; Cabral, P D S; Neves, G A R; Alves, F R; Teixeira, M B; Cunha, F N; Silva, N F
2017-03-16
The availability of common bean cultivars tolerant to Meloidogyne javanica is limited in Brazil. Thus, the present study aimed to evaluate the reactions of 33 common bean genotypes (23 landrace, 8 commercial, 1 susceptible standard and 1 resistant standard) to M. javanica, employing multivariate statistics to discriminate the reaction of the genotypes. The experiment was conducted in a greenhouse using a completely randomized design with seven replicates. The seeds were sown in 1-L pots containing autoclaved soil and sand in a 1:1 ratio (v:v). On day 19, after emergence of the seedlings, the plants were treated with inoculum containing 4000 eggs + second-stage juveniles (J2). At 60 days after inoculation, the seedlings were evaluated based on biometric and parasitism-related traits, such as number of galls, final nematode population per root system, reproduction factor, and percent reduction in the reproduction factor of the nematode (%RRF). The data were subjected to analysis of variance using the F-test. The Mahalanobis generalized distance was used to obtain the dissimilarity matrix, and the average linkage between groups was used for clustering. The use of multivariate statistics allowed groups to be separated according to the resistance levels of genotypes, as observed in the %RRF. The landrace genotypes FORT-09, FORT-17, FORT-31, FORT-32, FORT-34 and FORT-36 presented resistance to M. javanica; thus, these genotypes can be considered potential sources of resistance.
Statistical Analyses of Raw Material Data for MTM45-1/CF7442A-36% RW: CMH Cure Cycle
NASA Technical Reports Server (NTRS)
Coroneos, Rula; Pai, Shantaram, S.; Murthy, Pappu
2013-01-01
This report describes statistical characterization of physical properties of the composite material system MTM45-1/CF7442A, which has been tested and is currently being considered for use on spacecraft structures. This composite system is made of 6K plain weave graphite fibers in a highly toughened resin system. This report summarizes the distribution types and statistical details of the tests and the conditions for the experimental data generated. These distributions will be used in multivariate regression analyses to help determine material and design allowables for similar material systems and to establish a procedure for other material systems. Additionally, these distributions will be used in future probabilistic analyses of spacecraft structures. The specific properties that are characterized are the ultimate strength, modulus, and Poisson??s ratio by using a commercially available statistical package. Results are displayed using graphical and semigraphical methods and are included in the accompanying appendixes.
Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics
Chen, Wenan; Larrabee, Beth R.; Ovsyannikova, Inna G.; Kennedy, Richard B.; Haralambieva, Iana H.; Poland, Gregory A.; Schaid, Daniel J.
2015-01-01
Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf. PMID:25948564
A Civilian/Military Trauma Institute: National Trauma Coordinating Center
2015-12-01
zip codes was used in “proximity to violence” analysis. Data were analyzed using SPSS (version 20.0, SPSS Inc., Chicago, IL). Multivariable linear...number of adverse events and serious events was not statistically higher in one group, the incidence of deep venous thrombosis (DVT) was statistically ...subjects the lack of statistical difference on multivariate analysis may be related to an underpowered sample size. It was recommended that the
Applied Statistics: From Bivariate through Multivariate Techniques [with CD-ROM
ERIC Educational Resources Information Center
Warner, Rebecca M.
2007-01-01
This book provides a clear introduction to widely used topics in bivariate and multivariate statistics, including multiple regression, discriminant analysis, MANOVA, factor analysis, and binary logistic regression. The approach is applied and does not require formal mathematics; equations are accompanied by verbal explanations. Students are asked…
NASA Astrophysics Data System (ADS)
Chen, Zhe; Qiu, Zurong; Huo, Xinming; Fan, Yuming; Li, Xinghua
2017-03-01
A fiber-capacitive drop analyzer is an instrument which monitors a growing droplet to produce a capacitive opto-tensiotrace (COT). Each COT is an integration of fiber light intensity signals and capacitance signals and can reflect the unique physicochemical property of a liquid. In this study, we propose a solution analytical and concentration quantitative method based on multivariate statistical methods. Eight characteristic values are extracted from each COT. A series of COT characteristic values of training solutions at different concentrations compose a data library of this kind of solution. A two-stage linear discriminant analysis is applied to analyze different solution libraries and establish discriminant functions. Test solutions can be discriminated by these functions. After determining the variety of test solutions, Spearman correlation test and principal components analysis are used to filter and reduce dimensions of eight characteristic values, producing a new representative parameter. A cubic spline interpolation function is built between the parameters and concentrations, based on which we can calculate the concentration of the test solution. Methanol, ethanol, n-propanol, and saline solutions are taken as experimental subjects in this paper. For each solution, nine or ten different concentrations are chosen to be the standard library, and the other two concentrations compose the test group. By using the methods mentioned above, all eight test solutions are correctly identified and the average relative error of quantitative analysis is 1.11%. The method proposed is feasible which enlarges the applicable scope of recognizing liquids based on the COT and improves the concentration quantitative precision, as well.
Yang, James J; Li, Jia; Williams, L Keoki; Buu, Anne
2016-01-05
In genome-wide association studies (GWAS) for complex diseases, the association between a SNP and each phenotype is usually weak. Combining multiple related phenotypic traits can increase the power of gene search and thus is a practically important area that requires methodology work. This study provides a comprehensive review of existing methods for conducting GWAS on complex diseases with multiple phenotypes including the multivariate analysis of variance (MANOVA), the principal component analysis (PCA), the generalizing estimating equations (GEE), the trait-based association test involving the extended Simes procedure (TATES), and the classical Fisher combination test. We propose a new method that relaxes the unrealistic independence assumption of the classical Fisher combination test and is computationally efficient. To demonstrate applications of the proposed method, we also present the results of statistical analysis on the Study of Addiction: Genetics and Environment (SAGE) data. Our simulation study shows that the proposed method has higher power than existing methods while controlling for the type I error rate. The GEE and the classical Fisher combination test, on the other hand, do not control the type I error rate and thus are not recommended. In general, the power of the competing methods decreases as the correlation between phenotypes increases. All the methods tend to have lower power when the multivariate phenotypes come from long tailed distributions. The real data analysis also demonstrates that the proposed method allows us to compare the marginal results with the multivariate results and specify which SNPs are specific to a particular phenotype or contribute to the common construct. The proposed method outperforms existing methods in most settings and also has great applications in GWAS on complex diseases with multiple phenotypes such as the substance abuse disorders.
NASA Astrophysics Data System (ADS)
Schwartz, Craig R.; Thelen, Brian J.; Kenton, Arthur C.
1995-06-01
A statistical parametric multispectral sensor performance model was developed by ERIM to support mine field detection studies, multispectral sensor design/performance trade-off studies, and target detection algorithm development. The model assumes target detection algorithms and their performance models which are based on data assumed to obey multivariate Gaussian probability distribution functions (PDFs). The applicability of these algorithms and performance models can be generalized to data having non-Gaussian PDFs through the use of transforms which convert non-Gaussian data to Gaussian (or near-Gaussian) data. An example of one such transform is the Box-Cox power law transform. In practice, such a transform can be applied to non-Gaussian data prior to the introduction of a detection algorithm that is formally based on the assumption of multivariate Gaussian data. This paper presents an extension of these techniques to the case where the joint multivariate probability density function of the non-Gaussian input data is known, and where the joint estimate of the multivariate Gaussian statistics, under the Box-Cox transform, is desired. The jointly estimated multivariate Gaussian statistics can then be used to predict the performance of a target detection algorithm which has an associated Gaussian performance model.
Patient acceptance of non-invasive testing for fetal aneuploidy via cell-free fetal DNA.
Vahanian, Sevan A; Baraa Allaf, M; Yeh, Corinne; Chavez, Martin R; Kinzler, Wendy L; Vintzileos, Anthony M
2014-01-01
To evaluate factors associated with patient acceptance of noninvasive prenatal testing for trisomy 21, 18 and 13 via cell-free fetal DNA. This was a retrospective study of all patients who were offered noninvasive prenatal testing at a single institution from 1 March 2012 to 2 July 2012. Patients were identified through our perinatal ultrasound database; demographic information, testing indication and insurance coverage were compared between patients who accepted the test and those who declined. Parametric and nonparametric tests were used as appropriate. Significant variables were assessed using multivariate logistic regression. The value p < 0.05 was considered significant. Two hundred thirty-five patients were offered noninvasive prenatal testing. Ninety-three patients (40%) accepted testing and 142 (60%) declined. Women who accepted noninvasive prenatal testing were more commonly white, had private insurance and had more than one testing indication. There was no statistical difference in the number or the type of testing indications. Multivariable logistic regression analysis was then used to assess individual variables. After controlling for race, patients with public insurance were 83% less likely to accept noninvasive prenatal testing than those with private insurance (3% vs. 97%, adjusted RR 0.17, 95% CI 0.05-0.62). In our population, having public insurance was the factor most strongly associated with declining noninvasive prenatal testing.
Brouckaert, D; Uyttersprot, J-S; Broeckx, W; De Beer, T
2018-03-01
Calibration transfer or standardisation aims at creating a uniform spectral response on different spectroscopic instruments or under varying conditions, without requiring a full recalibration for each situation. In the current study, this strategy is applied to construct at-line multivariate calibration models and consequently employ them in-line in a continuous industrial production line, using the same spectrometer. Firstly, quantitative multivariate models are constructed at-line at laboratory scale for predicting the concentration of two main ingredients in hard surface cleaners. By regressing the Raman spectra of a set of small-scale calibration samples against their reference concentration values, partial least squares (PLS) models are developed to quantify the surfactant levels in the liquid detergent compositions under investigation. After evaluating the models performance with a set of independent validation samples, a univariate slope/bias correction is applied in view of transporting these at-line calibration models to an in-line manufacturing set-up. This standardisation technique allows a fast and easy transfer of the PLS regression models, by simply correcting the model predictions on the in-line set-up, without adjusting anything to the original multivariate calibration models. An extensive statistical analysis is performed in order to assess the predictive quality of the transferred regression models. Before and after transfer, the R 2 and RMSEP of both models is compared for evaluating if their magnitude is similar. T-tests are then performed to investigate whether the slope and intercept of the transferred regression line are not statistically different from 1 and 0, respectively. Furthermore, it is inspected whether no significant bias can be noted. F-tests are executed as well, for assessing the linearity of the transfer regression line and for investigating the statistical coincidence of the transfer and validation regression line. Finally, a paired t-test is performed to compare the original at-line model to the slope/bias corrected in-line model, using interval hypotheses. It is shown that the calibration models of Surfactant 1 and Surfactant 2 yield satisfactory in-line predictions after slope/bias correction. While Surfactant 1 passes seven out of eight statistical tests, the recommended validation parameters are 100% successful for Surfactant 2. It is hence concluded that the proposed strategy for transferring at-line calibration models to an in-line industrial environment via a univariate slope/bias correction of the predicted values offers a successful standardisation approach. Copyright © 2017 Elsevier B.V. All rights reserved.
Handwriting Examination: Moving from Art to Science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jarman, K.H.; Hanlen, R.C.; Manzolillo, P.A.
In this document, we present a method for validating the premises and methodology of forensic handwriting examination. This method is intuitively appealing because it relies on quantitative measurements currently used qualitatively by FDE's in making comparisons, and it is scientifically rigorous because it exploits the power of multivariate statistical analysis. This approach uses measures of both central tendency and variation to construct a profile for a given individual. (Central tendency and variation are important for characterizing an individual's writing and both are currently used by FDE's in comparative analyses). Once constructed, different profiles are then compared for individuality using clustermore » analysis; they are grouped so that profiles within a group cannot be differentiated from one another based on the measured characteristics, whereas profiles between groups can. The cluster analysis procedure used here exploits the power of multivariate hypothesis testing. The result is not only a profile grouping but also an indication of statistical significance of the groups generated.« less
SPICE: exploration and analysis of post-cytometric complex multivariate datasets.
Roederer, Mario; Nozzi, Joshua L; Nason, Martha C
2011-02-01
Polychromatic flow cytometry results in complex, multivariate datasets. To date, tools for the aggregate analysis of these datasets across multiple specimens grouped by different categorical variables, such as demographic information, have not been optimized. Often, the exploration of such datasets is accomplished by visualization of patterns with pie charts or bar charts, without easy access to statistical comparisons of measurements that comprise multiple components. Here we report on algorithms and a graphical interface we developed for these purposes. In particular, we discuss thresholding necessary for accurate representation of data in pie charts, the implications for display and comparison of normalized versus unnormalized data, and the effects of averaging when samples with significant background noise are present. Finally, we define a statistic for the nonparametric comparison of complex distributions to test for difference between groups of samples based on multi-component measurements. While originally developed to support the analysis of T cell functional profiles, these techniques are amenable to a broad range of datatypes. Published 2011 Wiley-Liss, Inc.
NASA Technical Reports Server (NTRS)
Aires, Filipe; Rossow, William B.; Hansen, James E. (Technical Monitor)
2001-01-01
A new approach is presented for the analysis of feedback processes in a nonlinear dynamical system by observing its variations. The new methodology consists of statistical estimates of the sensitivities between all pairs of variables in the system based on a neural network modeling of the dynamical system. The model can then be used to estimate the instantaneous, multivariate and nonlinear sensitivities, which are shown to be essential for the analysis of the feedbacks processes involved in the dynamical system. The method is described and tested on synthetic data from the low-order Lorenz circulation model where the correct sensitivities can be evaluated analytically.
Lastoria, Secondo; Piccirillo, Maria Carmela; Caracò, Corradina; Nasti, Guglielmo; Aloj, Luigi; Arrichiello, Cecilia; de Lutio di Castelguidone, Elisabetta; Tatangelo, Fabiana; Ottaiano, Alessandro; Iaffaioli, Rosario Vincenzo; Izzo, Francesco; Romano, Giovanni; Giordano, Pasqualina; Signoriello, Simona; Gallo, Ciro; Perrone, Francesco
2013-12-01
Markers predictive of treatment effect might be useful to improve the treatment of patients with metastatic solid tumors. Particularly, early changes in tumor metabolism measured by PET/CT with (18)F-FDG could predict the efficacy of treatment better than standard dimensional Response Evaluation Criteria In Solid Tumors (RECIST) response. We performed PET/CT evaluation before and after 1 cycle of treatment in patients with resectable liver metastases from colorectal cancer, within a phase 2 trial of preoperative FOLFIRI plus bevacizumab. For each lesion, the maximum standardized uptake value (SUV) and the total lesion glycolysis (TLG) were determined. On the basis of previous studies, a ≤ -50% change from baseline was used as a threshold for significant metabolic response for maximum SUV and, exploratively, for TLG. Standard RECIST response was assessed with CT after 3 mo of treatment. Pathologic response was assessed in patients undergoing resection. The association between metabolic and CT/RECIST and pathologic response was tested with the McNemar test; the ability to predict progression-free survival (PFS) and overall survival (OS) was tested with the Log-rank test and a multivariable Cox model. Thirty-three patients were analyzed. After treatment, there was a notable decrease of all the parameters measured by PET/CT. Early metabolic PET/CT response (either SUV- or TLG-based) had a stronger, independent and statistically significant predictive value for PFS and OS than both CT/RECIST and pathologic response at multivariate analysis, although with different degrees of statistical significance. The predictive value of CT/RECIST response was not significant at multivariate analysis. PET/CT response was significantly predictive of long-term outcomes during preoperative treatment of patients with liver metastases from colorectal cancer, and its predictive ability was higher than that of CT/RECIST response after 3 mo of treatment. Such findings need to be confirmed by larger prospective trials.
Efficient Global Aerodynamic Modeling from Flight Data
NASA Technical Reports Server (NTRS)
Morelli, Eugene A.
2012-01-01
A method for identifying global aerodynamic models from flight data in an efficient manner is explained and demonstrated. A novel experiment design technique was used to obtain dynamic flight data over a range of flight conditions with a single flight maneuver. Multivariate polynomials and polynomial splines were used with orthogonalization techniques and statistical modeling metrics to synthesize global nonlinear aerodynamic models directly and completely from flight data alone. Simulation data and flight data from a subscale twin-engine jet transport aircraft were used to demonstrate the techniques. Results showed that global multivariate nonlinear aerodynamic dependencies could be accurately identified using flight data from a single maneuver. Flight-derived global aerodynamic model structures, model parameter estimates, and associated uncertainties were provided for all six nondimensional force and moment coefficients for the test aircraft. These models were combined with a propulsion model identified from engine ground test data to produce a high-fidelity nonlinear flight simulation very efficiently. Prediction testing using a multi-axis maneuver showed that the identified global model accurately predicted aircraft responses.
Quirós, Elia; Felicísimo, Angel M; Cuartero, Aurora
2009-01-01
This work proposes a new method to classify multi-spectral satellite images based on multivariate adaptive regression splines (MARS) and compares this classification system with the more common parallelepiped and maximum likelihood (ML) methods. We apply the classification methods to the land cover classification of a test zone located in southwestern Spain. The basis of the MARS method and its associated procedures are explained in detail, and the area under the ROC curve (AUC) is compared for the three methods. The results show that the MARS method provides better results than the parallelepiped method in all cases, and it provides better results than the maximum likelihood method in 13 cases out of 17. These results demonstrate that the MARS method can be used in isolation or in combination with other methods to improve the accuracy of soil cover classification. The improvement is statistically significant according to the Wilcoxon signed rank test.
User Selection Criteria of Airspace Designs in Flexible Airspace Management
NASA Technical Reports Server (NTRS)
Lee, Hwasoo E.; Lee, Paul U.; Jung, Jaewoo; Lai, Chok Fung
2011-01-01
A method for identifying global aerodynamic models from flight data in an efficient manner is explained and demonstrated. A novel experiment design technique was used to obtain dynamic flight data over a range of flight conditions with a single flight maneuver. Multivariate polynomials and polynomial splines were used with orthogonalization techniques and statistical modeling metrics to synthesize global nonlinear aerodynamic models directly and completely from flight data alone. Simulation data and flight data from a subscale twin-engine jet transport aircraft were used to demonstrate the techniques. Results showed that global multivariate nonlinear aerodynamic dependencies could be accurately identified using flight data from a single maneuver. Flight-derived global aerodynamic model structures, model parameter estimates, and associated uncertainties were provided for all six nondimensional force and moment coefficients for the test aircraft. These models were combined with a propulsion model identified from engine ground test data to produce a high-fidelity nonlinear flight simulation very efficiently. Prediction testing using a multi-axis maneuver showed that the identified global model accurately predicted aircraft responses.
Engineering Design Handbook. Army Weapon Systems Analysis. Part 2
1979-10-01
EXPERIMENTAL DESIGN ............................... ............ 41-3 41-5 RESULTS OF THE ASARS lIX SIMULATIONS ........................... 41-4 41-6 LATIN...sciences and human factors engineering fields utilizing experimental methodology and multi-variable statistical techniques drawn from experimental ...randomly to grenades for the test design . The nine experimental types of hand grenades (first’ nine in Table 33-2) had a "pip" on their spherical
A method of using cluster analysis to study statistical dependence in multivariate data
NASA Technical Reports Server (NTRS)
Borucki, W. J.; Card, D. H.; Lyle, G. C.
1975-01-01
A technique is presented that uses both cluster analysis and a Monte Carlo significance test of clusters to discover associations between variables in multidimensional data. The method is applied to an example of a noisy function in three-dimensional space, to a sample from a mixture of three bivariate normal distributions, and to the well-known Fisher's Iris data.
NASA Technical Reports Server (NTRS)
Park, Steve
1990-01-01
A large and diverse number of computational techniques are routinely used to process and analyze remotely sensed data. These techniques include: univariate statistics; multivariate statistics; principal component analysis; pattern recognition and classification; other multivariate techniques; geometric correction; registration and resampling; radiometric correction; enhancement; restoration; Fourier analysis; and filtering. Each of these techniques will be considered, in order.
J. Grabinsky; A. Aldama; A. Chacalo; H. J. Vazquez
2000-01-01
Inventory data of Mexico City's street trees were studied using classical statistical arboricultural and ecological statistical approaches. Multivariate techniques were applied to both. Results did not differ substantially and were complementary. It was possible to reduce inventory data and to group species, boroughs, blocks, and variables.
Comparison of connectivity analyses for resting state EEG data
NASA Astrophysics Data System (ADS)
Olejarczyk, Elzbieta; Marzetti, Laura; Pizzella, Vittorio; Zappasodi, Filippo
2017-06-01
Objective. In the present work, a nonlinear measure (transfer entropy, TE) was used in a multivariate approach for the analysis of effective connectivity in high density resting state EEG data in eyes open and eyes closed. Advantages of the multivariate approach in comparison to the bivariate one were tested. Moreover, the multivariate TE was compared to an effective linear measure, i.e. directed transfer function (DTF). Finally, the existence of a relationship between the information transfer and the level of brain synchronization as measured by phase synchronization value (PLV) was investigated. Approach. The comparison between the connectivity measures, i.e. bivariate versus multivariate TE, TE versus DTF, TE versus PLV, was performed by means of statistical analysis of indexes based on graph theory. Main results. The multivariate approach is less sensitive to false indirect connections with respect to the bivariate estimates. The multivariate TE differentiated better between eyes closed and eyes open conditions compared to DTF. Moreover, the multivariate TE evidenced non-linear phenomena in information transfer, which are not evidenced by the use of DTF. We also showed that the target of information flow, in particular the frontal region, is an area of greater brain synchronization. Significance. Comparison of different connectivity analysis methods pointed to the advantages of nonlinear methods, and indicated a relationship existing between the flow of information and the level of synchronization of the brain.
FGWAS: Functional genome wide association analysis.
Huang, Chao; Thompson, Paul; Wang, Yalin; Yu, Yang; Zhang, Jingwen; Kong, Dehan; Colen, Rivka R; Knickmeyer, Rebecca C; Zhu, Hongtu
2017-10-01
Functional phenotypes (e.g., subcortical surface representation), which commonly arise in imaging genetic studies, have been used to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. However, existing statistical methods largely ignore the functional features (e.g., functional smoothness and correlation). The aim of this paper is to develop a functional genome-wide association analysis (FGWAS) framework to efficiently carry out whole-genome analyses of functional phenotypes. FGWAS consists of three components: a multivariate varying coefficient model, a global sure independence screening procedure, and a test procedure. Compared with the standard multivariate regression model, the multivariate varying coefficient model explicitly models the functional features of functional phenotypes through the integration of smooth coefficient functions and functional principal component analysis. Statistically, compared with existing methods for genome-wide association studies (GWAS), FGWAS can substantially boost the detection power for discovering important genetic variants influencing brain structure and function. Simulation studies show that FGWAS outperforms existing GWAS methods for searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. We have successfully applied FGWAS to large-scale analysis of data from the Alzheimer's Disease Neuroimaging Initiative for 708 subjects, 30,000 vertices on the left and right hippocampal surfaces, and 501,584 SNPs. Copyright © 2017 Elsevier Inc. All rights reserved.
Rare Variant Association Test with Multiple Phenotypes
Lee, Selyeong; Won, Sungho; Kim, Young Jin; Kim, Yongkang; Kim, Bong-Jo; Park, Taesung
2016-01-01
Although genome-wide association studies (GWAS) have now discovered thousands of genetic variants associated with common traits, such variants cannot explain the large degree of “missing heritability,” likely due to rare variants. The advent of next generation sequencing technology has allowed rare variant detection and association with common traits, often by investigating specific genomic regions for rare variant effects on a trait. Although multiply correlated phenotypes are often concurrently observed in GWAS, most studies analyze only single phenotypes, which may lessen statistical power. To increase power, multivariate analyses, which consider correlations between multiple phenotypes, can be used. However, few existing multi-variant analyses can identify rare variants for assessing multiple phenotypes. Here, we propose Multivariate Association Analysis using Score Statistics (MAAUSS), to identify rare variants associated with multiple phenotypes, based on the widely used Sequence Kernel Association Test (SKAT) for a single phenotype. We applied MAAUSS to Whole Exome Sequencing (WES) data from a Korean population of 1,058 subjects, to discover genes associated with multiple traits of liver function. We then assessed validation of those genes by a replication study, using an independent dataset of 3,445 individuals. Notably, we detected the gene ZNF620 among five significant genes. We then performed a simulation study to compare MAAUSS's performance with existing methods. Overall, MAAUSS successfully conserved type 1 error rates and in many cases, had a higher power than the existing methods. This study illustrates a feasible and straightforward approach for identifying rare variants correlated with multiple phenotypes, with likely relevance to missing heritability. PMID:28039885
Lotan, Tamara L.; Wei, Wei; Morais, Carlos L.; Hawley, Sarah T.; Fazli, Ladan; Hurtado-Coll, Antonio; Troyer, Dean; McKenney, Jesse K.; Simko, Jeffrey; Carroll, Peter R.; Gleave, Martin; Lance, Raymond; Lin, Daniel W.; Nelson, Peter S.; Thompson, Ian M.; True, Lawrence D.; Feng, Ziding; Brooks, James D.
2015-01-01
Background PTEN is the most commonly deleted tumor suppressor gene in primary prostate cancer (PCa) and its loss is associated with poor clinical outcomes and ERG gene rearrangement. Objective We tested whether PTEN loss is associated with shorter recurrence-free survival (RFS) in surgically treated PCa patients with known ERG status. Design, setting, and participants A genetically validated, automated PTEN immunohistochemistry (IHC) protocol was used for 1275 primary prostate tumors from the Canary Foundation retrospective PCa tissue microarray cohort to assess homogeneous (in all tumor tissue sampled) or heterogeneous (in a subset of tumor tissue sampled) PTEN loss. ERG status as determined by a genetically validated IHC assay was available for a subset of 938 tumors. Outcome measurements and statistical analysis Associations between PTEN and ERG status were assessed using Fisher’s exact test. Kaplan-Meier and multivariate weighted Cox proportional models for RFS were constructed. Results and limitations When compared to intact PTEN, homogeneous (hazard ratio [HR] 1.66, p = 0.001) but not heterogeneous (HR 1.24, p = 0.14) PTEN loss was significantly associated with shorter RFS in multivariate models. Among ERG-positive tumors, homogeneous (HR 3.07, p < 0.0001) but not heterogeneous (HR 1.46, p = 0.10) PTEN loss was significantly associated with shorter RFS. Among ERG-negative tumors, PTEN did not reach significance for inclusion in the final multivariate models. The interaction term for PTEN and ERG status with respect to RFS did not reach statistical significance (p = 0.11) for the current sample size. Conclusions These data suggest that PTEN is a useful prognostic biomarker and that there is no statistically significant interaction between PTEN and ERG status for RFS. Patient summary We found that loss of the PTEN tumor suppressor gene in prostate tumors as assessed by tissue staining is correlated with shorter time to prostate cancer recurrence after radical prostatectomy. PMID:27617307
ERDEMİR, Ugur; YİLDİZ, Esra; EREN, Meltem Mert; OZEL, Sevda
2013-01-01
Objectives: This study evaluated the effect of sports and energy drinks on the surface hardness of different composite resin restorative materials over a 1-month period. Material and Methods: A total of 168 specimens: Compoglass F, Filtek Z250, Filtek Supreme, and Premise were prepared using a customized cylindrical metal mould and they were divided into six groups (N=42; n=7 per group). For the control groups, the specimens were stored in distilled water for 24 hours at 37º C and the water was renewed daily. For the experimental groups, the specimens were immersed in 5 mL of one of the following test solutions: Powerade, Gatorade, X-IR, Burn, and Red Bull, for two minutes daily for up to a 1-month test period and all the solutions were refreshed daily. Surface hardness was measured using a Vickers hardness measuring instrument at baseline, after 1-week and 1-month. Data were statistically analyzed using Multivariate repeated measure ANOVA and Bonferroni's multiple comparison tests (α=0.05). Results: Multivariate repeated measures ANOVA revealed that there were statistically significant differences in the hardness of the restorative materials in different immersion times (p<0.001) in different solutions (p<0.001). The effect of different solutions on the surface hardness values of the restorative materials was tested using Bonferroni's multiple comparison tests, and it was observed that specimens stored in distilled water demonstrated statistically significant lower mean surface hardness reductions when compared to the specimens immersed in sports and energy drinks after a 1-month evaluation period (p<0.001). The compomer was the most affected by an acidic environment, whereas the composite resin materials were the least affected materials. Conclusions: The effect of sports and energy drinks on the surface hardness of a restorative material depends on the duration of exposure time, and the composition of the material. PMID:23739850
Multivariate Regression Analysis and Slaughter Livestock,
AGRICULTURE, *ECONOMICS), (*MEAT, PRODUCTION), MULTIVARIATE ANALYSIS, REGRESSION ANALYSIS , ANIMALS, WEIGHT, COSTS, PREDICTIONS, STABILITY, MATHEMATICAL MODELS, STORAGE, BEEF, PORK, FOOD, STATISTICAL DATA, ACCURACY
USDA-ARS?s Scientific Manuscript database
The mixed linear model (MLM) is currently among the most advanced and flexible statistical modeling techniques and its use in tackling problems in plant pathology has begun surfacing in the literature. The longitudinal MLM is a multivariate extension that handles repeatedly measured data, such as r...
ERIC Educational Resources Information Center
Martin, James L.
This paper reports on attempts by the author to construct a theoretical framework of adult education participation using a theory development process and the corresponding multivariate statistical techniques. Two problems are identified: the lack of theoretical framework in studying problems, and the limiting of statistical analysis to univariate…
Multivariate methods to visualise colour-space and colour discrimination data.
Hastings, Gareth D; Rubin, Alan
2015-01-01
Despite most modern colour spaces treating colour as three-dimensional (3-D), colour data is usually not visualised in 3-D (and two-dimensional (2-D) projection-plane segments and multiple 2-D perspective views are used instead). The objectives of this article are firstly, to introduce a truly 3-D percept of colour space using stereo-pairs, secondly to view colour discrimination data using that platform, and thirdly to apply formal statistics and multivariate methods to analyse the data in 3-D. This is the first demonstration of the software that generated stereo-pairs of RGB colour space, as well as of a new computerised procedure that investigated colour discrimination by measuring colour just noticeable differences (JND). An initial pilot study and thorough investigation of instrument repeatability were performed. Thereafter, to demonstrate the capabilities of the software, five colour-normal and one colour-deficient subject were examined using the JND procedure and multivariate methods of data analysis. Scatter plots of responses were meaningfully examined in 3-D and were useful in evaluating multivariate normality as well as identifying outliers. The extent and direction of the difference between each JND response and the stimulus colour point was calculated and appreciated in 3-D. Ellipsoidal surfaces of constant probability density (distribution ellipsoids) were fitted to response data; the volumes of these ellipsoids appeared useful in differentiating the colour-deficient subject from the colour-normals. Hypothesis tests of variances and covariances showed many statistically significant differences between the results of the colour-deficient subject and those of the colour-normals, while far fewer differences were found when comparing within colour-normals. The 3-D visualisation of colour data using stereo-pairs, as well as the statistics and multivariate methods of analysis employed, were found to be unique and useful tools in the representation and study of colour. Many additional studies using these methods along with the JND and other procedures have been identified and will be reported in future publications. © 2014 The Authors Ophthalmic & Physiological Optics © 2014 The College of Optometrists.
An Evaluation of the Euroncap Crash Test Safety Ratings in the Real World
Segui-Gomez, Maria; Lopez-Valdes, Francisco J.; Frampton, Richard
2007-01-01
We investigated whether the rating obtained in the EuroNCAP test procedures correlates with injury protection to vehicle occupants in real crashes using data in the UK Cooperative Crash Injury Study (CCIS) database from 1996 to 2005. Multivariate Poisson regression models were developed, using the Abbreviated Injury Scale (AIS) score by body region as the dependent variable and the EuroNCAP score for that particular body region, seat belt use, mass ratio and Equivalent Test Speed (ETS) as independent variables. Our models identified statistically significant relationships between injury severity and safety belt use, mass ratio and ETS. We could not identify any statistically significant relationships between the EuroNCAP body region scores and real injury outcome except for the protection to pelvis-femur-knee in frontal impacts where scoring “green” is significantly better than scoring “yellow” or “red”.
Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics.
Chen, Wenan; Larrabee, Beth R; Ovsyannikova, Inna G; Kennedy, Richard B; Haralambieva, Iana H; Poland, Gregory A; Schaid, Daniel J
2015-07-01
Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf. Copyright © 2015 by the Genetics Society of America.
Yoshida, Hiroyuki; Shibata, Hiroko; Izutsu, Ken-Ichi; Goda, Yukihiro
2017-01-01
The current Japanese Ministry of Health Labour and Welfare (MHLW)'s Guideline for Bioequivalence Studies of Generic Products uses averaged dissolution rates for the assessment of dissolution similarity between test and reference formulations. This study clarifies how the application of model-independent multivariate confidence region procedure (Method B), described in the European Medical Agency and U.S. Food and Drug Administration guidelines, affects similarity outcomes obtained empirically from dissolution profiles with large variations in individual dissolution rates. Sixty-one datasets of dissolution profiles for immediate release, oral generic, and corresponding innovator products that showed large variation in individual dissolution rates in generic products were assessed on their similarity by using the f 2 statistics defined in the MHLW guidelines (MHLW f 2 method) and two different Method B procedures, including a bootstrap method applied with f 2 statistics (BS method) and a multivariate analysis method using the Mahalanobis distance (MV method). The MHLW f 2 and BS methods provided similar dissolution similarities between reference and generic products. Although a small difference in the similarity assessment may be due to the decrease in the lower confidence interval for expected f 2 values derived from the large variation in individual dissolution rates, the MV method provided results different from those obtained through MHLW f 2 and BS methods. Analysis of actual dissolution data for products with large individual variations would provide valuable information towards an enhanced understanding of these methods and their possible incorporation in the MHLW guidelines.
Statistical Significance and Baseline Monitoring.
1984-07-01
impacted at once........................... 24 6 Observed versus nominal a levels for multivariate tests of data sets (50 runs of 4 groups each...cumulative proportion of the observations found for each nominal level. The results of the comparisons of the observed versus nominal a levels for the...a values are always higher than nominal levels. Virtual- . .,ly all nominal a levels are below 0.20. In other words, the discriminant analysis models
[Sem: a suitable statistical software adaptated for research in oncology].
Kwiatkowski, F; Girard, M; Hacene, K; Berlie, J
2000-10-01
Many softwares have been adapted for medical use; they rarely enable conveniently both data management and statistics. A recent cooperative work ended up in a new software, Sem (Statistics Epidemiology Medicine), which allows data management of trials and, as well, statistical treatments on them. Very convenient, it can be used by non professional in statistics (biologists, doctors, researchers, data managers), since usually (excepted with multivariate models), the software performs by itself the most adequate test, after what complementary tests can be requested if needed. Sem data base manager (DBM) is not compatible with usual DBM: this constitutes a first protection against loss of privacy. Other shields (passwords, cryptage...) strengthen data security, all the more necessary today since Sem can be run on computers nets. Data organization enables multiplicity: forms can be duplicated by patient. Dates are treated in a special but transparent manner (sorting, date and delay calculations...). Sem communicates with common desktop softwares, often with a simple copy/paste. So, statistics can be easily performed on data stored in external calculation sheets, and slides by pasting graphs with a single mouse click (survival curves...). Already used over fifty places in different hospitals for daily work, this product, combining data management and statistics, appears to be a convenient and innovative solution.
Prognostic Significance of POLE Proofreading Mutations in Endometrial Cancer
Church, David N.; Stelloo, Ellen; Nout, Remi A.; Valtcheva, Nadejda; Depreeuw, Jeroen; ter Haar, Natalja; Noske, Aurelia; Amant, Frederic; Wild, Peter J.; Lambrechts, Diether; Jürgenliemk-Schulz, Ina M.; Jobsen, Jan J.; Smit, Vincent T. H. B. M.; Creutzberg, Carien L.; Bosse, Tjalling
2015-01-01
Background: Current risk stratification in endometrial cancer (EC) results in frequent over- and underuse of adjuvant therapy, and may be improved by novel biomarkers. We examined whether POLE proofreading mutations, recently reported in about 7% of ECs, predict prognosis. Methods: We performed targeted POLE sequencing in ECs from the PORTEC-1 and -2 trials (n = 788), and analyzed clinical outcome according to POLE status. We combined these results with those from three additional series (n = 628) by meta-analysis to generate multivariable-adjusted, pooled hazard ratios (HRs) for recurrence-free survival (RFS) and cancer-specific survival (CSS) of POLE-mutant ECs. All statistical tests were two-sided. Results: POLE mutations were detected in 48 of 788 (6.1%) ECs from PORTEC-1 and-2 and were associated with high tumor grade (P < .001). Women with POLE-mutant ECs had fewer recurrences (6.2% vs 14.1%) and EC deaths (2.3% vs 9.7%), though, in the total PORTEC cohort, differences in RFS and CSS were not statistically significant (multivariable-adjusted HR = 0.43, 95% CI = 0.13 to 1.37, P = .15; HR = 0.19, 95% CI = 0.03 to 1.44, P = .11 respectively). However, of 109 grade 3 tumors, 0 of 15 POLE-mutant ECs recurred, compared with 29 of 94 (30.9%) POLE wild-type cancers; reflected in statistically significantly greater RFS (multivariable-adjusted HR = 0.11, 95% CI = 0.001 to 0.84, P = .03). In the additional series, there were no EC-related events in any of 33 POLE-mutant ECs, resulting in a multivariable-adjusted, pooled HR of 0.33 for RFS (95% CI = 0.12 to 0.91, P = .03) and 0.26 for CSS (95% CI = 0.06 to 1.08, P = .06). Conclusion: POLE proofreading mutations predict favorable EC prognosis, independently of other clinicopathological variables, with the greatest effect seen in high-grade tumors. This novel biomarker may help to reduce overtreatment in EC. PMID:25505230
Introduction to multivariate discrimination
NASA Astrophysics Data System (ADS)
Kégl, Balázs
2013-07-01
Multivariate discrimination or classification is one of the best-studied problem in machine learning, with a plethora of well-tested and well-performing algorithms. There are also several good general textbooks [1-9] on the subject written to an average engineering, computer science, or statistics graduate student; most of them are also accessible for an average physics student with some background on computer science and statistics. Hence, instead of writing a generic introduction, we concentrate here on relating the subject to a practitioner experimental physicist. After a short introduction on the basic setup (Section 1) we delve into the practical issues of complexity regularization, model selection, and hyperparameter optimization (Section 2), since it is this step that makes high-complexity non-parametric fitting so different from low-dimensional parametric fitting. To emphasize that this issue is not restricted to classification, we illustrate the concept on a low-dimensional but non-parametric regression example (Section 2.1). Section 3 describes the common algorithmic-statistical formal framework that unifies the main families of multivariate classification algorithms. We explain here the large-margin principle that partly explains why these algorithms work. Section 4 is devoted to the description of the three main (families of) classification algorithms, neural networks, the support vector machine, and AdaBoost. We do not go into the algorithmic details; the goal is to give an overview on the form of the functions these methods learn and on the objective functions they optimize. Besides their technical description, we also make an attempt to put these algorithm into a socio-historical context. We then briefly describe some rather heterogeneous applications to illustrate the pattern recognition pipeline and to show how widespread the use of these methods is (Section 5). We conclude the chapter with three essentially open research problems that are either relevant to or even motivated by certain unorthodox applications of multivariate discrimination in experimental physics.
Almeida, Tiago P; Chu, Gavin S; Li, Xin; Dastagir, Nawshin; Tuan, Jiun H; Stafford, Peter J; Schlindwein, Fernando S; Ng, G André
2017-01-01
Purpose: Complex fractionated atrial electrograms (CFAE)-guided ablation after pulmonary vein isolation (PVI) has been used for persistent atrial fibrillation (persAF) therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model. Methods: 207 pairs of atrial electrograms (AEGs) were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA) and linear discriminant analysis (LDA) have been used to characterize the atrial regions and AEGs. Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P < 0.0001). Four types of LA regions were identified, based on the AEGs characteristics: (i) fractionated before PVI that remained fractionated after PVI (31% of the collected points); (ii) fractionated that converted to normal (39%); (iii) normal prior to PVI that became fractionated (9%) and; (iv) normal that remained normal (21%). Individually, the attributes failed to distinguish these LA regions, but multivariate statistical models were effective in their discrimination ( P < 0.0001). Conclusion: Our results have unveiled that there are LA regions resistant to PVI, while others are affected by it. Although, traditional methods were unable to identify these different regions, the proposed multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information.
Martin, Lisa; Watanabe, Sharon; Fainsinger, Robin; Lau, Francis; Ghosh, Sunita; Quan, Hue; Atkins, Marlis; Fassbender, Konrad; Downing, G Michael; Baracos, Vickie
2010-10-01
To determine whether elements of a standard nutritional screening assessment are independently prognostic of survival in patients with advanced cancer. A prospective nested cohort of patients with metastatic cancer were accrued from different units of a Regional Palliative Care Program. Patients completed a nutritional screen on admission. Data included age, sex, cancer site, height, weight history, dietary intake, 13 nutrition impact symptoms, and patient- and physician-reported performance status (PS). Univariate and multivariate survival analyses were conducted. Concordance statistics (c-statistics) were used to test the predictive accuracy of models based on training and validation sets; a c-statistic of 0.5 indicates the model predicts the outcome as well as chance; perfect prediction has a c-statistic of 1.0. A training set of patients in palliative home care (n = 1,164) was used to identify prognostic variables. Primary disease site, PS, short-term weight change (either gain or loss), dietary intake, and dysphagia predicted survival in multivariate analysis (P < .05). A model including only patients separated by disease site and PS with high c-statistics between predicted and observed responses for survival in the training set (0.90) and validation set (0.88; n = 603). The addition of weight change, dietary intake, and dysphagia did not further improve the c-statistic of the model. The c-statistic was also not altered by substituting physician-rated palliative PS for patient-reported PS. We demonstrate a high probability of concordance between predicted and observed survival for patients in distinct palliative care settings (home care, tertiary inpatient, ambulatory outpatient) based on patient-reported information.
Application of multivariate statistical techniques in microbial ecology
Paliy, O.; Shankar, V.
2016-01-01
Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large scale ecological datasets. Especially noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions, and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amounts of data, powerful statistical techniques of multivariate analysis are well suited to analyze and interpret these datasets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular dataset. In this review we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive, and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and dataset structure. PMID:26786791
Multivariate analysis in thoracic research.
Mengual-Macenlle, Noemí; Marcos, Pedro J; Golpe, Rafael; González-Rivas, Diego
2015-03-01
Multivariate analysis is based in observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. The development of multivariate methods emerged to analyze large databases and increasingly complex data. Since the best way to represent the knowledge of reality is the modeling, we should use multivariate statistical methods. Multivariate methods are designed to simultaneously analyze data sets, i.e., the analysis of different variables for each person or object studied. Keep in mind at all times that all variables must be treated accurately reflect the reality of the problem addressed. There are different types of multivariate analysis and each one should be employed according to the type of variables to analyze: dependent, interdependence and structural methods. In conclusion, multivariate methods are ideal for the analysis of large data sets and to find the cause and effect relationships between variables; there is a wide range of analysis types that we can use.
NASA Astrophysics Data System (ADS)
Attia, Khalid A. M.; Nassar, Mohammed W. I.; El-Zeiny, Mohamed B.; Serag, Ahmed
2017-01-01
For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration.
Multivariate analysis: A statistical approach for computations
NASA Astrophysics Data System (ADS)
Michu, Sachin; Kaushik, Vandana
2014-10-01
Multivariate analysis is a type of multivariate statistical approach commonly used in, automotive diagnosis, education evaluating clusters in finance etc and more recently in the health-related professions. The objective of the paper is to provide a detailed exploratory discussion about factor analysis (FA) in image retrieval method and correlation analysis (CA) of network traffic. Image retrieval methods aim to retrieve relevant images from a collected database, based on their content. The problem is made more difficult due to the high dimension of the variable space in which the images are represented. Multivariate correlation analysis proposes an anomaly detection and analysis method based on the correlation coefficient matrix. Anomaly behaviors in the network include the various attacks on the network like DDOs attacks and network scanning.
Real, Jordi; Forné, Carles; Roso-Llorach, Albert; Martínez-Sánchez, Jose M
2016-05-01
Controlling for confounders is a crucial step in analytical observational studies, and multivariable models are widely used as statistical adjustment techniques. However, the validation of the assumptions of the multivariable regression models (MRMs) should be made clear in scientific reporting. The objective of this study is to review the quality of statistical reporting of the most commonly used MRMs (logistic, linear, and Cox regression) that were applied in analytical observational studies published between 2003 and 2014 by journals indexed in MEDLINE.Review of a representative sample of articles indexed in MEDLINE (n = 428) with observational design and use of MRMs (logistic, linear, and Cox regression). We assessed the quality of reporting about: model assumptions and goodness-of-fit, interactions, sensitivity analysis, crude and adjusted effect estimate, and specification of more than 1 adjusted model.The tests of underlying assumptions or goodness-of-fit of the MRMs used were described in 26.2% (95% CI: 22.0-30.3) of the articles and 18.5% (95% CI: 14.8-22.1) reported the interaction analysis. Reporting of all items assessed was higher in articles published in journals with a higher impact factor.A low percentage of articles indexed in MEDLINE that used multivariable techniques provided information demonstrating rigorous application of the model selected as an adjustment method. Given the importance of these methods to the final results and conclusions of observational studies, greater rigor is required in reporting the use of MRMs in the scientific literature.
Wu, Hao
2018-05-01
In structural equation modelling (SEM), a robust adjustment to the test statistic or to its reference distribution is needed when its null distribution deviates from a χ 2 distribution, which usually arises when data do not follow a multivariate normal distribution. Unfortunately, existing studies on this issue typically focus on only a few methods and neglect the majority of alternative methods in statistics. Existing simulation studies typically consider only non-normal distributions of data that either satisfy asymptotic robustness or lead to an asymptotic scaled χ 2 distribution. In this work we conduct a comprehensive study that involves both typical methods in SEM and less well-known methods from the statistics literature. We also propose the use of several novel non-normal data distributions that are qualitatively different from the non-normal distributions widely used in existing studies. We found that several under-studied methods give the best performance under specific conditions, but the Satorra-Bentler method remains the most viable method for most situations. © 2017 The British Psychological Society.
Martin, Guillaume; Chapuis, Elodie; Goudet, Jérôme
2008-01-01
Neutrality tests in quantitative genetics provide a statistical framework for the detection of selection on polygenic traits in wild populations. However, the existing method based on comparisons of divergence at neutral markers and quantitative traits (Qst–Fst) suffers from several limitations that hinder a clear interpretation of the results with typical empirical designs. In this article, we propose a multivariate extension of this neutrality test based on empirical estimates of the among-populations (D) and within-populations (G) covariance matrices by MANOVA. A simple pattern is expected under neutrality: D = 2Fst/(1 − Fst)G, so that neutrality implies both proportionality of the two matrices and a specific value of the proportionality coefficient. This pattern is tested using Flury's framework for matrix comparison [common principal-component (CPC) analysis], a well-known tool in G matrix evolution studies. We show the importance of using a Bartlett adjustment of the test for the small sample sizes typically found in empirical studies. We propose a dual test: (i) that the proportionality coefficient is not different from its neutral expectation [2Fst/(1 − Fst)] and (ii) that the MANOVA estimates of mean square matrices between and among populations are proportional. These two tests combined provide a more stringent test for neutrality than the classic Qst–Fst comparison and avoid several statistical problems. Extensive simulations of realistic empirical designs suggest that these tests correctly detect the expected pattern under neutrality and have enough power to efficiently detect mild to strong selection (homogeneous, heterogeneous, or mixed) when it is occurring on a set of traits. This method also provides a rigorous and quantitative framework for disentangling the effects of different selection regimes and of drift on the evolution of the G matrix. We discuss practical requirements for the proper application of our test in empirical studies and potential extensions. PMID:18245845
Deconstructing multivariate decoding for the study of brain function.
Hebart, Martin N; Baker, Chris I
2017-08-04
Multivariate decoding methods were developed originally as tools to enable accurate predictions in real-world applications. The realization that these methods can also be employed to study brain function has led to their widespread adoption in the neurosciences. However, prior to the rise of multivariate decoding, the study of brain function was firmly embedded in a statistical philosophy grounded on univariate methods of data analysis. In this way, multivariate decoding for brain interpretation grew out of two established frameworks: multivariate decoding for predictions in real-world applications, and classical univariate analysis based on the study and interpretation of brain activation. We argue that this led to two confusions, one reflecting a mixture of multivariate decoding for prediction or interpretation, and the other a mixture of the conceptual and statistical philosophies underlying multivariate decoding and classical univariate analysis. Here we attempt to systematically disambiguate multivariate decoding for the study of brain function from the frameworks it grew out of. After elaborating these confusions and their consequences, we describe six, often unappreciated, differences between classical univariate analysis and multivariate decoding. We then focus on how the common interpretation of what is signal and noise changes in multivariate decoding. Finally, we use four examples to illustrate where these confusions may impact the interpretation of neuroimaging data. We conclude with a discussion of potential strategies to help resolve these confusions in interpreting multivariate decoding results, including the potential departure from multivariate decoding methods for the study of brain function. Copyright © 2017. Published by Elsevier Inc.
Voss, Jesse S; Iqbal, Seher; Jenkins, Sarah M; Henry, Michael R; Clayton, Amy C; Jett, James R; Kipp, Benjamin R; Halling, Kevin C; Maldonado, Fabien
2014-01-01
Studies have shown that fluorescence in situ hybridization (FISH) testing increases lung cancer detection on cytology specimens in peripheral nodules. The goal of this study was to determine whether a predictive model using clinical features and routine cytology with FISH results could predict lung malignancy after a nondiagnostic bronchoscopic evaluation. Patients with an indeterminate peripheral lung nodule that had a nondiagnostic bronchoscopic evaluation were included in this study (N = 220). FISH was performed on residual bronchial brushing cytology specimens diagnosed as negative (n = 195), atypical (n = 16), or suspicious (n = 9). FISH results included hypertetrasomy (n = 30) and negative (n = 190). Primary study end points included lung cancer status along with time to diagnosis of lung cancer or date of last clinical follow-up. Hazard ratios (HRs) were calculated using Cox proportional hazards regression model analyses, and P values < .05 were considered statistically significant. The mean age of the 220 patients was 66.7 years (range, 35-91), and most (58%) were men. Most patients (79%) were current or former smokers with a mean pack year history of 43.2 years (median, 40; range, 1-200). After multivariate analysis, hypertetrasomy FISH (HR = 2.96, P < .001), pack years (HR = 1.03 per pack year up to 50, P = .001), age (HR = 1.04 per year, P = .02), atypical or suspicious cytology (HR = 2.02, P = .04), and nodule spiculation (HR = 2.36, P = .003) were independent predictors of malignancy over time and were used to create a prediction model (C-statistic = 0.78). These results suggest that this multivariate model including test results and clinical features may be useful following a nondiagnostic bronchoscopic examination. © 2013.
Lotan, Tamara L; Wei, Wei; Morais, Carlos L; Hawley, Sarah T; Fazli, Ladan; Hurtado-Coll, Antonio; Troyer, Dean; McKenney, Jesse K; Simko, Jeffrey; Carroll, Peter R; Gleave, Martin; Lance, Raymond; Lin, Daniel W; Nelson, Peter S; Thompson, Ian M; True, Lawrence D; Feng, Ziding; Brooks, James D
2016-06-01
PTEN is the most commonly deleted tumor suppressor gene in primary prostate cancer (PCa) and its loss is associated with poor clinical outcomes and ERG gene rearrangement. We tested whether PTEN loss is associated with shorter recurrence-free survival (RFS) in surgically treated PCa patients with known ERG status. A genetically validated, automated PTEN immunohistochemistry (IHC) protocol was used for 1275 primary prostate tumors from the Canary Foundation retrospective PCa tissue microarray cohort to assess homogeneous (in all tumor tissue sampled) or heterogeneous (in a subset of tumor tissue sampled) PTEN loss. ERG status as determined by a genetically validated IHC assay was available for a subset of 938 tumors. Associations between PTEN and ERG status were assessed using Fisher's exact test. Kaplan-Meier and multivariate weighted Cox proportional models for RFS were constructed. When compared to intact PTEN, homogeneous (hazard ratio [HR] 1.66, p = 0.001) but not heterogeneous (HR 1.24, p = 0.14) PTEN loss was significantly associated with shorter RFS in multivariate models. Among ERG-positive tumors, homogeneous (HR 3.07, p < 0.0001) but not heterogeneous (HR 1.46, p = 0.10) PTEN loss was significantly associated with shorter RFS. Among ERG-negative tumors, PTEN did not reach significance for inclusion in the final multivariate models. The interaction term for PTEN and ERG status with respect to RFS did not reach statistical significance ( p = 0.11) for the current sample size. These data suggest that PTEN is a useful prognostic biomarker and that there is no statistically significant interaction between PTEN and ERG status for RFS. We found that loss of the PTEN tumor suppressor gene in prostate tumors as assessed by tissue staining is correlated with shorter time to prostate cancer recurrence after radical prostatectomy.
Chen, Gang; Adleman, Nancy E; Saad, Ziad S; Leibenluft, Ellen; Cox, Robert W
2014-10-01
All neuroimaging packages can handle group analysis with t-tests or general linear modeling (GLM). However, they are quite hamstrung when there are multiple within-subject factors or when quantitative covariates are involved in the presence of a within-subject factor. In addition, sphericity is typically assumed for the variance-covariance structure when there are more than two levels in a within-subject factor. To overcome such limitations in the traditional AN(C)OVA and GLM, we adopt a multivariate modeling (MVM) approach to analyzing neuroimaging data at the group level with the following advantages: a) there is no limit on the number of factors as long as sample sizes are deemed appropriate; b) quantitative covariates can be analyzed together with within-subject factors; c) when a within-subject factor is involved, three testing methodologies are provided: traditional univariate testing (UVT) with sphericity assumption (UVT-UC) and with correction when the assumption is violated (UVT-SC), and within-subject multivariate testing (MVT-WS); d) to correct for sphericity violation at the voxel level, we propose a hybrid testing (HT) approach that achieves equal or higher power via combining traditional sphericity correction methods (Greenhouse-Geisser and Huynh-Feldt) with MVT-WS. To validate the MVM methodology, we performed simulations to assess the controllability for false positives and power achievement. A real FMRI dataset was analyzed to demonstrate the capability of the MVM approach. The methodology has been implemented into an open source program 3dMVM in AFNI, and all the statistical tests can be performed through symbolic coding with variable names instead of the tedious process of dummy coding. Our data indicates that the severity of sphericity violation varies substantially across brain regions. The differences among various modeling methodologies were addressed through direct comparisons between the MVM approach and some of the GLM implementations in the field, and the following two issues were raised: a) the improper formulation of test statistics in some univariate GLM implementations when a within-subject factor is involved in a data structure with two or more factors, and b) the unjustified presumption of uniform sphericity violation and the practice of estimating the variance-covariance structure through pooling across brain regions. Published by Elsevier Inc.
Wilcox, Jared T; Satkunendrarajah, Kajana; Nasirzadeh, Yasmin; Laliberte, Alex M; Lip, Alyssa; Cadotte, David W; Foltz, Warren D; Fehlings, Michael G
2017-09-01
The majority of spinal cord injuries (SCI) occur at the cervical level, which results in significant impairment. Neurologic level and severity of injury are primary endpoints in clinical trials; however, how level-specific damages relate to behavioural performance in cervical injury is incompletely understood. We hypothesized that ascending level of injury leads to worsening forelimb performance, and correlates with loss of neural tissue and muscle-specific neuron pools. A direct comparison of multiple models was made with injury realized at the C5, C6, C7 and T7 vertebral levels using clip compression with sham-operated controls. Animals were assessed for 10weeks post-injury with numerous (40) outcome measures, including: classic behavioural tests, CatWalk, non-invasive MRI, electrophysiology, histologic lesion morphometry, neuron counts, and motor compartment quantification, and multivariate statistics on the total dataset. Histologic staining and T1-weighted MR imaging revealed similar structural changes and distinct tissue loss with cystic cavitation across all injuries. Forelimb tests, including grip strength, F-WARP motor scale, Inclined Plane, and forelimb ladder walk, exhibited stratification between all groups and marked impairment with C5 and C6 injuries. Classic hindlimb tests including BBB, hindlimb ladder walk, bladder recovery, and mortality were not different between cervical and thoracic injuries. CatWalk multivariate gait analysis showed reciprocal and progressive changes forelimb and hindlimb function with ascending level of injury. Electrophysiology revealed poor forelimb axonal conduction in cervical C5 and C6 groups alone. The cervical enlargement (C5-T2) showed progressive ventral horn atrophy and loss of specific motor neuron populations with ascending injury. Multivariate statistics revealed a robust dataset, rank-order contribution of outcomes, and allowed prediction of injury level with single-level discrimination using forelimb performance and neuron counts. Level-dependent models were generated using clip-compression SCI, with marked and reliable differences in forelimb performance and specific neuron pool loss. Copyright © 2017 Elsevier Inc. All rights reserved.
Wojcik, Pawel Jerzy; Pereira, Luís; Martins, Rodrigo; Fortunato, Elvira
2014-01-13
An efficient mathematical strategy in the field of solution processed electrochromic (EC) films is outlined as a combination of an experimental work, modeling, and information extraction from massive computational data via statistical software. Design of Experiment (DOE) was used for statistical multivariate analysis and prediction of mixtures through a multiple regression model, as well as the optimization of a five-component sol-gel precursor subjected to complex constraints. This approach significantly reduces the number of experiments to be realized, from 162 in the full factorial (L=3) and 72 in the extreme vertices (D=2) approach down to only 30 runs, while still maintaining a high accuracy of the analysis. By carrying out a finite number of experiments, the empirical modeling in this study shows reasonably good prediction ability in terms of the overall EC performance. An optimized ink formulation was employed in a prototype of a passive EC matrix fabricated in order to test and trial this optically active material system together with a solid-state electrolyte for the prospective application in EC displays. Coupling of DOE with chromogenic material formulation shows the potential to maximize the capabilities of these systems and ensures increased productivity in many potential solution-processed electrochemical applications.
Longobardi, F; Ventrella, A; Bianco, A; Catucci, L; Cafagna, I; Gallo, V; Mastrorilli, P; Agostiano, A
2013-12-01
In this study, non-targeted (1)H NMR fingerprinting was used in combination with multivariate statistical techniques for the classification of Italian sweet cherries based on their different geographical origins (Emilia Romagna and Puglia). As classification techniques, Soft Independent Modelling of Class Analogy (SIMCA), Partial Least Squares Discriminant Analysis (PLS-DA), and Linear Discriminant Analysis (LDA) were carried out and the results were compared. For LDA, before performing a refined selection of the number/combination of variables, two different strategies for a preliminary reduction of the variable number were tested. The best average recognition and CV prediction abilities (both 100.0%) were obtained for all the LDA models, although PLS-DA also showed remarkable performances (94.6%). All the statistical models were validated by observing the prediction abilities with respect to an external set of cherry samples. The best result (94.9%) was obtained with LDA by performing a best subset selection procedure on a set of 30 principal components previously selected by a stepwise decorrelation. The metabolites that mostly contributed to the classification performances of such LDA model, were found to be malate, glucose, fructose, glutamine and succinate. Copyright © 2013 Elsevier Ltd. All rights reserved.
Gene set analysis using variance component tests.
Huang, Yen-Tsung; Lin, Xihong
2013-06-28
Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.
Time Series Model Identification by Estimating Information.
1982-11-01
principle, Applications of Statistics, P. R. Krishnaiah , ed., North-Holland: Amsterdam, 27-41. Anderson, T. W. (1971). The Statistical Analysis of Time Series...E. (1969). Multiple Time Series Modeling, Multivariate Analysis II, edited by P. Krishnaiah , Academic Press: New York, 389-409. Parzen, E. (1981...Newton, H. J. (1980). Multiple Time Series Modeling, II Multivariate Analysis - V, edited by P. Krishnaiah , North Holland: Amsterdam, 181-197. Shibata, R
Wartberg, Lutz; Kriston, Levente; Kammerl, Rudolf
2017-07-01
Internet Gaming Disorder (IGD) has been included in the current edition of the Diagnostic and Statistical Manual of Mental Disorders-Fifth Edition (DSM-5). In the present study, the relationship among social support, friends only known through the Internet, health-related quality of life, and IGD in adolescence was explored for the first time. For this purpose, 1,095 adolescents aged from 12 to 14 years were surveyed with a standardized questionnaire concerning IGD, self-perceived social support, proportion of friends only known through the Internet, and health-related quality of life. The authors conducted unpaired t-tests, a chi-square test, as well as correlation and logistic regression analyses. According to the statistical analyses, adolescents with IGD reported lower self-perceived social support, more friends only known through the Internet, and a lower health-related quality of life compared with the group without IGD. Both in bivariate and multivariate logistic regression models, statistically significant associations between IGD and male gender, a higher proportion of friends only known through the Internet, and a lower health-related quality of life (multivariate model: Nagelkerke's R 2 = 0.37) were revealed. Lower self-perceived social support was related to IGD in the bivariate model only. In summary, quality of life and social aspects seem to be important factors for IGD in adolescence and therefore should be incorporated in further (longitudinal) studies. The findings of the present survey may provide starting points for the development of prevention and intervention programs for adolescents affected by IGD.
Multivariate Welch t-test on distances
2016-01-01
Motivation: Permutational non-Euclidean analysis of variance, PERMANOVA, is routinely used in exploratory analysis of multivariate datasets to draw conclusions about the significance of patterns visualized through dimension reduction. This method recognizes that pairwise distance matrix between observations is sufficient to compute within and between group sums of squares necessary to form the (pseudo) F statistic. Moreover, not only Euclidean, but arbitrary distances can be used. This method, however, suffers from loss of power and type I error inflation in the presence of heteroscedasticity and sample size imbalances. Results: We develop a solution in the form of a distance-based Welch t-test, TW2, for two sample potentially unbalanced and heteroscedastic data. We demonstrate empirically the desirable type I error and power characteristics of the new test. We compare the performance of PERMANOVA and TW2 in reanalysis of two existing microbiome datasets, where the methodology has originated. Availability and Implementation: The source code for methods and analysis of this article is available at https://github.com/alekseyenko/Tw2. Further guidance on application of these methods can be obtained from the author. Contact: alekseye@musc.edu PMID:27515741
Multivariate Welch t-test on distances.
Alekseyenko, Alexander V
2016-12-01
Permutational non-Euclidean analysis of variance, PERMANOVA, is routinely used in exploratory analysis of multivariate datasets to draw conclusions about the significance of patterns visualized through dimension reduction. This method recognizes that pairwise distance matrix between observations is sufficient to compute within and between group sums of squares necessary to form the (pseudo) F statistic. Moreover, not only Euclidean, but arbitrary distances can be used. This method, however, suffers from loss of power and type I error inflation in the presence of heteroscedasticity and sample size imbalances. We develop a solution in the form of a distance-based Welch t-test, [Formula: see text], for two sample potentially unbalanced and heteroscedastic data. We demonstrate empirically the desirable type I error and power characteristics of the new test. We compare the performance of PERMANOVA and [Formula: see text] in reanalysis of two existing microbiome datasets, where the methodology has originated. The source code for methods and analysis of this article is available at https://github.com/alekseyenko/Tw2 Further guidance on application of these methods can be obtained from the author. alekseye@musc.edu. © The Author 2016. Published by Oxford University Press.
Takahara, Mitsuyoshi; Katakami, Naoto; Kaneto, Hideaki; Noguchi, Midori; Shimomura, Iichiro
2014-01-01
The aim of the current study was to develop a predictive model of insulin resistance using general health checkup data in Japanese employees with one or more metabolic risk factors. We used a database of 846 Japanese employees with one or more metabolic risk factors who underwent general health checkup and a 75-g oral glucose tolerance test (OGTT). Logistic regression models were developed to predict existing insulin resistance evaluated using the Matsuda index. The predictive performance of these models was assessed using the C statistic. The C statistics of body mass index (BMI), waist circumference and their combined use were 0.743, 0.732 and 0.749, with no significant differences. The multivariate backward selection model, in which BMI, the levels of plasma glucose, high-density lipoprotein (HDL) cholesterol, log-transformed triglycerides and log-transformed alanine aminotransferase and hypertension under treatment remained, had a C statistic of 0.816, with a significant difference compared to the combined use of BMI and waist circumference (p<0.01). The C statistic was not significantly reduced when the levels of log-transformed triglycerides and log-transformed alanine aminotransferase and hypertension under treatment were simultaneously excluded from the multivariate model (p=0.14). On the other hand, further exclusion of any of the remaining three variables significantly reduced the C statistic (all p<0.01). When predicting the presence of insulin resistance using general health checkup data in Japanese employees with metabolic risk factors, it is important to take into consideration the BMI and fasting plasma glucose and HDL cholesterol levels.
Bayesian statistics and Monte Carlo methods
NASA Astrophysics Data System (ADS)
Koch, K. R.
2018-03-01
The Bayesian approach allows an intuitive way to derive the methods of statistics. Probability is defined as a measure of the plausibility of statements or propositions. Three rules are sufficient to obtain the laws of probability. If the statements refer to the numerical values of variables, the so-called random variables, univariate and multivariate distributions follow. They lead to the point estimation by which unknown quantities, i.e. unknown parameters, are computed from measurements. The unknown parameters are random variables, they are fixed quantities in traditional statistics which is not founded on Bayes' theorem. Bayesian statistics therefore recommends itself for Monte Carlo methods, which generate random variates from given distributions. Monte Carlo methods, of course, can also be applied in traditional statistics. The unknown parameters, are introduced as functions of the measurements, and the Monte Carlo methods give the covariance matrix and the expectation of these functions. A confidence region is derived where the unknown parameters are situated with a given probability. Following a method of traditional statistics, hypotheses are tested by determining whether a value for an unknown parameter lies inside or outside the confidence region. The error propagation of a random vector by the Monte Carlo methods is presented as an application. If the random vector results from a nonlinearly transformed vector, its covariance matrix and its expectation follow from the Monte Carlo estimate. This saves a considerable amount of derivatives to be computed, and errors of the linearization are avoided. The Monte Carlo method is therefore efficient. If the functions of the measurements are given by a sum of two or more random vectors with different multivariate distributions, the resulting distribution is generally not known. TheMonte Carlo methods are then needed to obtain the covariance matrix and the expectation of the sum.
Soltani, Shahla; Asghari Moghaddam, Asghar; Barzegar, Rahim; Kazemian, Naeimeh; Tziritis, Evangelos
2017-08-18
Kordkandi-Duzduzan plain is one of the fertile plains of East Azarbaijan Province, NW of Iran. Groundwater is an important resource for drinking and agricultural purposes due to the lack of surface water resources in the region. The main objectives of the present study are to identify the hydrogeochemical processes and the potential sources of major, minor, and trace metals and metalloids such as Cr, Mn, Cd, Fe, Al, and As by using joint hydrogeochemical techniques and multivariate statistical analysis and to evaluate groundwater quality deterioration with the use of PoS environmental index. To achieve these objectives, 23 groundwater samples were collected in September 2015. Piper diagram shows that the mixed Ca-Mg-Cl is the dominant groundwater type, and some of the samples have Ca-HCO 3 , Ca-Cl, and Na-Cl types. Multivariate statistical analyses indicate that weathering and dissolution of different rocks and minerals, e.g., silicates, gypsum, and halite, ion exchange, and agricultural activities influence the hydrogeochemistry of the study area. The cluster analysis divides the samples into two distinct clusters which are completely different in EC (and its dependent variables such as Na + , K + , Ca 2+ , Mg 2+ , SO 4 2- , and Cl - ), Cd, and Cr variables according to the ANOVA statistical test. Based on the median values, the concentrations of pH, NO 3 - , SiO 2 , and As in cluster 1 are elevated compared with those of cluster 2, while their maximum values occur in cluster 2. According to the PoS index, the dominant parameter that controls quality deterioration is As, with 60% of contribution. Samples of lowest PoS values are located in the southern and northern parts (recharge area) while samples of the highest values are located in the discharge area and the eastern part.
Alves, Darlan Daniel; Riegel, Roberta Plangg; de Quevedo, Daniela Müller; Osório, Daniela Montanari Migliavacca; da Costa, Gustavo Marques; do Nascimento, Carlos Augusto; Telöken, Franko
2018-06-08
Assessment of surface water quality is an issue of currently high importance, especially in polluted rivers which provide water for treatment and distribution as drinking water, as is the case of the Sinos River, southern Brazil. Multivariate statistical techniques allow a better understanding of the seasonal variations in water quality, as well as the source identification and source apportionment of water pollution. In this study, the multivariate statistical techniques of cluster analysis (CA), principal component analysis (PCA), and positive matrix factorization (PMF) were used, along with the Kruskal-Wallis test and Spearman's correlation analysis in order to interpret a water quality data set resulting from a monitoring program conducted over a period of almost two years (May 2013 to April 2015). The water samples were collected from the raw water inlet of the municipal water treatment plant (WTP) operated by the Water and Sewage Services of Novo Hamburgo (COMUSA). CA allowed the data to be grouped into three periods (autumn and summer (AUT-SUM); winter (WIN); spring (SPR)). Through the PCA, it was possible to identify that the most important parameters in contribution to water quality variations are total coliforms (TCOLI) in SUM-AUT, water level (WL), water temperature (WT), and electrical conductivity (EC) in WIN and color (COLOR) and turbidity (TURB) in SPR. PMF was applied to the complete data set and enabled the source apportionment water pollution through three factors, which are related to anthropogenic sources, such as the discharge of domestic sewage (mostly represented by Escherichia coli (ECOLI)), industrial wastewaters, and agriculture runoff. The results provided by this study demonstrate the contribution provided by the use of integrated statistical techniques in the interpretation and understanding of large data sets of water quality, showing also that this approach can be used as an efficient methodology to optimize indicators for water quality assessment.
Multivariate Statistical Analysis of Water Quality data in Indian River Lagoon, Florida
NASA Astrophysics Data System (ADS)
Sayemuzzaman, M.; Ye, M.
2015-12-01
The Indian River Lagoon, is part of the longest barrier island complex in the United States, is a region of particular concern to the environmental scientist because of the rapid rate of human development throughout the region and the geographical position in between the colder temperate zone and warmer sub-tropical zone. Thus, the surface water quality analysis in this region always brings the newer information. In this present study, multivariate statistical procedures were applied to analyze the spatial and temporal water quality in the Indian River Lagoon over the period 1998-2013. Twelve parameters have been analyzed on twelve key water monitoring stations in and beside the lagoon on monthly datasets (total of 27,648 observations). The dataset was treated using cluster analysis (CA), principle component analysis (PCA) and non-parametric trend analysis. The CA was used to cluster twelve monitoring stations into four groups, with stations on the similar surrounding characteristics being in the same group. The PCA was then applied to the similar groups to find the important water quality parameters. The principal components (PCs), PC1 to PC5 was considered based on the explained cumulative variances 75% to 85% in each cluster groups. Nutrient species (phosphorus and nitrogen), salinity, specific conductivity and erosion factors (TSS, Turbidity) were major variables involved in the construction of the PCs. Statistical significant positive or negative trends and the abrupt trend shift were detected applying Mann-Kendall trend test and Sequential Mann-Kendall (SQMK), for each individual stations for the important water quality parameters. Land use land cover change pattern, local anthropogenic activities and extreme climate such as drought might be associated with these trends. This study presents the multivariate statistical assessment in order to get better information about the quality of surface water. Thus, effective pollution control/management of the surface waters can be undertaken.
Rivoirard, Romain; Duplay, Vianney; Oriol, Mathieu; Tinquaut, Fabien; Chauvin, Franck; Magne, Nicolas; Bourmaud, Aurelie
2016-01-01
Quality of reporting for Randomized Clinical Trials (RCTs) in oncology was analyzed in several systematic reviews, but, in this setting, there is paucity of data for the outcomes definitions and consistency of reporting for statistical tests in RCTs and Observational Studies (OBS). The objective of this review was to describe those two reporting aspects, for OBS and RCTs in oncology. From a list of 19 medical journals, three were retained for analysis, after a random selection: British Medical Journal (BMJ), Annals of Oncology (AoO) and British Journal of Cancer (BJC). All original articles published between March 2009 and March 2014 were screened. Only studies whose main outcome was accompanied by a corresponding statistical test were included in the analysis. Studies based on censored data were excluded. Primary outcome was to assess quality of reporting for description of primary outcome measure in RCTs and of variables of interest in OBS. A logistic regression was performed to identify covariates of studies potentially associated with concordance of tests between Methods and Results parts. 826 studies were included in the review, and 698 were OBS. Variables were described in Methods section for all OBS studies and primary endpoint was clearly detailed in Methods section for 109 RCTs (85.2%). 295 OBS (42.2%) and 43 RCTs (33.6%) had perfect agreement for reported statistical test between Methods and Results parts. In multivariable analysis, variable "number of included patients in study" was associated with test consistency: aOR (adjusted Odds Ratio) for third group compared to first group was equal to: aOR Grp3 = 0.52 [0.31-0.89] (P value = 0.009). Variables in OBS and primary endpoint in RCTs are reported and described with a high frequency. However, statistical tests consistency between methods and Results sections of OBS is not always noted. Therefore, we encourage authors and peer reviewers to verify consistency of statistical tests in oncology studies.
Rivoirard, Romain; Duplay, Vianney; Oriol, Mathieu; Tinquaut, Fabien; Chauvin, Franck; Magne, Nicolas; Bourmaud, Aurelie
2016-01-01
Background Quality of reporting for Randomized Clinical Trials (RCTs) in oncology was analyzed in several systematic reviews, but, in this setting, there is paucity of data for the outcomes definitions and consistency of reporting for statistical tests in RCTs and Observational Studies (OBS). The objective of this review was to describe those two reporting aspects, for OBS and RCTs in oncology. Methods From a list of 19 medical journals, three were retained for analysis, after a random selection: British Medical Journal (BMJ), Annals of Oncology (AoO) and British Journal of Cancer (BJC). All original articles published between March 2009 and March 2014 were screened. Only studies whose main outcome was accompanied by a corresponding statistical test were included in the analysis. Studies based on censored data were excluded. Primary outcome was to assess quality of reporting for description of primary outcome measure in RCTs and of variables of interest in OBS. A logistic regression was performed to identify covariates of studies potentially associated with concordance of tests between Methods and Results parts. Results 826 studies were included in the review, and 698 were OBS. Variables were described in Methods section for all OBS studies and primary endpoint was clearly detailed in Methods section for 109 RCTs (85.2%). 295 OBS (42.2%) and 43 RCTs (33.6%) had perfect agreement for reported statistical test between Methods and Results parts. In multivariable analysis, variable "number of included patients in study" was associated with test consistency: aOR (adjusted Odds Ratio) for third group compared to first group was equal to: aOR Grp3 = 0.52 [0.31–0.89] (P value = 0.009). Conclusion Variables in OBS and primary endpoint in RCTs are reported and described with a high frequency. However, statistical tests consistency between methods and Results sections of OBS is not always noted. Therefore, we encourage authors and peer reviewers to verify consistency of statistical tests in oncology studies. PMID:27716793
Grey matter volume patterns in thalamic nuclei are associated with familial risk for schizophrenia.
Pergola, Giulio; Trizio, Silvestro; Di Carlo, Pasquale; Taurisano, Paolo; Mancini, Marina; Amoroso, Nicola; Nettis, Maria Antonietta; Andriola, Ileana; Caforio, Grazia; Popolizio, Teresa; Rampino, Antonio; Di Giorgio, Annabella; Bertolino, Alessandro; Blasi, Giuseppe
2017-02-01
Previous evidence suggests reduced thalamic grey matter volume (GMV) in patients with schizophrenia (SCZ). However, it is not considered an intermediate phenotype for schizophrenia, possibly because previous studies did not assess the contribution of individual thalamic nuclei and employed univariate statistics. Here, we hypothesized that multivariate statistics would reveal an association of GMV in different thalamic nuclei with familial risk for schizophrenia. We also hypothesized that accounting for the heterogeneity of thalamic GMV in healthy controls would improve the detection of subjects at familial risk for the disorder. We acquired MRI scans for 96 clinically stable SCZ, 55 non-affected siblings of patients with schizophrenia (SIB), and 249 HC. The thalamus was parceled into seven regions of interest (ROIs). After a canonical univariate analysis, we used GMV estimates of thalamic ROIs, together with total thalamic GMV and premorbid intelligence, as features in Random Forests to classify HC, SIB, and SCZ. Then, we computed a Misclassification Index for each individual and tested the improvement in SIB detection after excluding a subsample of HC misclassified as patients. Random Forests discriminated SCZ from HC (accuracy=81%) and SIB from HC (accuracy=75%). Left anteromedial thalamic volumes were significantly associated with both multivariate classifications (p<0.05). Excluding HC misclassified as SCZ improved greatly HC vs. SIB classification (Cohen's d=1.39). These findings suggest that multivariate statistics identify a familial background associated with thalamic GMV reduction in SCZ. They also suggest the relevance of inter-individual variability of GMV patterns for the discrimination of individuals at familial risk for the disorder. Copyright © 2016 Elsevier B.V. All rights reserved.
Su, Zhong; Zhang, Lisha; Ramakrishnan, V; Hagan, Michael; Anscher, Mitchell
2011-05-01
To evaluate both the Calypso Systems' (Calypso Medical Technologies, Inc., Seattle, WA) localization accuracy in the presence of wireless metal-oxide-semiconductor field-effect transistor (MOSFET) dosimeters of dose verification system (DVS, Sicel Technologies, Inc., Morrisville, NC) and the dosimeters' reading accuracy in the presence of wireless electromagnetic transponders inside a phantom. A custom-made, solid-water phantom was fabricated with space for transponders and dosimeters. Two inserts were machined with positioning grooves precisely matching the dimensions of the transponders and dosimeters and were arranged in orthogonal and parallel orientations, respectively. To test the transponder localization accuracy with/without presence of dosimeters (hypothesis 1), multivariate analyses were performed on transponder-derived localization data with and without dosimeters at each preset distance to detect statistically significant localization differences between the control and test sets. To test dosimeter dose-reading accuracy with/without presence of transponders (hypothesis 2), an approach of alternating the transponder presence in seven identical fraction dose (100 cGy) deliveries and measurements was implemented. Two-way analysis of variance was performed to examine statistically significant dose-reading differences between the two groups and the different fractions. A relative-dose analysis method was also used to evaluate transponder impact on dose-reading accuracy after dose-fading effect was removed by a second-order polynomial fit. Multivariate analysis indicated that hypothesis 1 was false; there was a statistically significant difference between the localization data from the control and test sets. However, the upper and lower bounds of the 95% confidence intervals of the localized positional differences between the control and test sets were less than 0.1 mm, which was significantly smaller than the minimum clinical localization resolution of 0.5 mm. For hypothesis 2, analysis of variance indicated that there was no statistically significant difference between the dosimeter readings with and without the presence of transponders. Both orthogonal and parallel configurations had difference of polynomial-fit dose to measured dose values within 1.75%. The phantom study indicated that the Calypso System's localization accuracy was not affected clinically due to the presence of DVS wireless MOSFET dosimeters and the dosimeter-measured doses were not affected by the presence of transponders. Thus, the same patients could be implanted with both transponders and dosimeters to benefit from improved accuracy of radiotherapy treatments offered by conjunctional use of the two systems.
Souza, Iara da Costa; Morozesk, Mariana; Duarte, Ian Drumond; Bonomo, Marina Marques; Rocha, Lívia Dorsch; Furlan, Larissa Maria; Arrivabene, Hiulana Pereira; Monferrán, Magdalena Victoria; Matsumoto, Silvia Tamie; Milanez, Camilla Rozindo Dias; Wunderlin, Daniel Alberto; Fernandes, Marisa Narciso
2014-08-01
Roots of mangrove trees have an important role in depurating water and sediments by retaining metals that may accumulate in different plant tissues, affecting physiological processes and anatomy. The present study aimed to evaluate adaptive changes in root of Rhizophora mangle in response to different levels of chemical elements (metals/metalloids) in interstitial water and sediments from four neotropical mangroves in Brazil. What sets this study apart from other studies is that we not only investigate adaptive modifications in R. mangle but also changes in environments where this plant grows, evaluating correspondence between physical, chemical and biological issues by a combined set of multivariate statistical methods (pattern recognition). Thus, we looked to match changes in the environment with adaptations in plants. Multivariate statistics highlighted that the lignified periderm and the air gaps are directly related to the environmental contamination. Current results provide new evidences of root anatomical strategies to deal with contaminated environments. Multivariate statistics greatly contributes to extrapolate results from complex data matrixes obtained when analyzing environmental issues, pointing out parameters involved in environmental changes and also evidencing the adaptive response of the exposed biota. Copyright © 2014 Elsevier Ltd. All rights reserved.
Karunathilaka, Sanjeewa R; Kia, Ali-Reza Fardin; Srigley, Cynthia; Chung, Jin Kyu; Mossoba, Magdi M
2016-10-01
A rapid tool for evaluating authenticity was developed and applied to the screening of extra virgin olive oil (EVOO) retail products by using Fourier-transform near infrared (FT-NIR) spectroscopy in combination with univariate and multivariate data analysis methods. Using disposable glass tubes, spectra for 62 reference EVOO, 10 edible oil adulterants, 20 blends consisting of EVOO spiked with adulterants, 88 retail EVOO products and other test samples were rapidly measured in the transmission mode without any sample preparation. The univariate conformity index (CI) and the multivariate supervised soft independent modeling of class analogy (SIMCA) classification tool were used to analyze the various olive oil products which were tested for authenticity against a library of reference EVOO. Better discrimination between the authentic EVOO and some commercial EVOO products was observed with SIMCA than with CI analysis. Approximately 61% of all EVOO commercial products were flagged by SIMCA analysis, suggesting that further analysis be performed to identify quality issues and/or potential adulterants. Due to its simplicity and speed, FT-NIR spectroscopy in combination with multivariate data analysis can be used as a complementary tool to conventional official methods of analysis to rapidly flag EVOO products that may not belong to the class of authentic EVOO. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
Estimating an Effect Size in One-Way Multivariate Analysis of Variance (MANOVA)
ERIC Educational Resources Information Center
Steyn, H. S., Jr.; Ellis, S. M.
2009-01-01
When two or more univariate population means are compared, the proportion of variation in the dependent variable accounted for by population group membership is eta-squared. This effect size can be generalized by using multivariate measures of association, based on the multivariate analysis of variance (MANOVA) statistics, to establish whether…
Yan, Binjun; Fang, Zhonghua; Shen, Lijuan; Qu, Haibin
2015-01-01
The batch-to-batch quality consistency of herbal drugs has always been an important issue. To propose a methodology for batch-to-batch quality control based on HPLC-MS fingerprints and process knowledgebase. The extraction process of Compound E-jiao Oral Liquid was taken as a case study. After establishing the HPLC-MS fingerprint analysis method, the fingerprints of the extract solutions produced under normal and abnormal operation conditions were obtained. Multivariate statistical models were built for fault detection and a discriminant analysis model was built using the probabilistic discriminant partial-least-squares method for fault diagnosis. Based on multivariate statistical analysis, process knowledge was acquired and the cause-effect relationship between process deviations and quality defects was revealed. The quality defects were detected successfully by multivariate statistical control charts and the type of process deviations were diagnosed correctly by discriminant analysis. This work has demonstrated the benefits of combining HPLC-MS fingerprints, process knowledge and multivariate analysis for the quality control of herbal drugs. Copyright © 2015 John Wiley & Sons, Ltd.
Application of multivariate statistical techniques in microbial ecology.
Paliy, O; Shankar, V
2016-03-01
Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure. © 2016 John Wiley & Sons Ltd.
Attia, Khalid A M; Nassar, Mohammed W I; El-Zeiny, Mohamed B; Serag, Ahmed
2017-01-05
For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration. Copyright © 2016 Elsevier B.V. All rights reserved.
A Test Strategy for High Resolution Image Scanners.
1983-10-01
for multivariate analysis. Holt, Richart and Winston, Inc., New York. Graybill , F.A., 1961: An introduction to linear statistical models . SVolume I...i , j i -(7) 02 1 )2 y 4n .i ij 13 The linear estimation model for the polynomial coefficients can be set up as - =; =(8) with T = ( x’ . . X-nn "X...Resolution Image Scanner MTF Geometrical and radiometric performance Dynamic range, linearity , noise - Dynamic scanning errors Response uniformity Skewness of
Multivariate Non-Symmetric Stochastic Models for Spatial Dependence Models
NASA Astrophysics Data System (ADS)
Haslauer, C. P.; Bárdossy, A.
2017-12-01
A copula based multivariate framework allows more flexibility to describe different kind of dependences than what is possible using models relying on the confining assumption of symmetric Gaussian models: different quantiles can be modelled with a different degree of dependence; it will be demonstrated how this can be expected given process understanding. maximum likelihood based multivariate quantitative parameter estimation yields stable and reliable results; not only improved results in cross-validation based measures of uncertainty are obtained but also a more realistic spatial structure of uncertainty compared to second order models of dependence; as much information as is available is included in the parameter estimation: incorporation of censored measurements (e.g., below detection limit, or ones that are above the sensitive range of the measurement device) yield to more realistic spatial models; the proportion of true zeros can be jointly estimated with and distinguished from censored measurements which allow estimates about the age of a contaminant in the system; secondary information (categorical and on the rational scale) has been used to improve the estimation of the primary variable; These copula based multivariate statistical techniques are demonstrated based on hydraulic conductivity observations at the Borden (Canada) site, the MADE site (USA), and a large regional groundwater quality data-set in south-west Germany. Fields of spatially distributed K were simulated with identical marginal simulation, identical second order spatial moments, yet substantially differing solute transport characteristics when numerical tracer tests were performed. A statistical methodology is shown that allows the delineation of a boundary layer separating homogenous parts of a spatial data-set. The effects of this boundary layer (macro structure) and the spatial dependence of K (micro structure) on solute transport behaviour is shown.
Jansson, Daniel; Lindström, Susanne Wiklund; Norlin, Rikard; Hok, Saphon; Valdez, Carlos A; Williams, Audrey M; Alcaraz, Armando; Nilsson, Calle; Åstot, Crister
2018-08-15
This work is part two of a three-part series in this issue of a Sweden-United States collaborative effort towards the understanding of the chemical attribution signatures of Russian VX (VR) in synthesized samples and complex food matrices. In this study, we describe the sourcing of VR present in food based on chemical analysis of attribution signatures by liquid chromatography-tandem mass spectrometry (LC-MS/MS) combined with multivariate data analysis. Analytical data was acquired from seven different foods spiked with VR batches that were synthesized via six different routes in two separate laboratories. The synthesis products were spiked at a lethal dose into seven food matrices: water, orange juice, apple purée, baby food, pea purée, liquid eggs and hot dog. After acetonitrile sample extraction, the samples were analyzed by LC-MS/MS operated in MRM mode. A multivariate statistical calibration model was built on the chemical attribution profiles from 118 VR spiked food samples. Using the model, an external test-set of the six synthesis routes employed for VR production was correctly identified with no observable major impact of the food matrices to the classification. The overall performance of the statistical models was found to be exceptional (94%) for the test set samples retrospectively classified to their synthesis routes. Copyright © 2018 Elsevier B.V. All rights reserved.
Giuca, Maria Rita; Cappè, Maria; Carli, Elisabetta; Lardani, Lisa
2018-01-01
Aim The purpose of the present study was to evaluate the clinical defects and etiological factors potentially involved in the onset of MIH in a pediatric sample. Methods 120 children, selected from the university dental clinic, were included: 60 children (25 boys and 35 girls; average age: 9.8 ± 1.8 years) with MIH formed the test group and 60 children (27 boys and 33 girls; average age: 10.1 ± 2 years) without MIH constituted the control group. Distribution and severity of MIH defects were evaluated, and a questionnaire was used to investigate the etiological variables; chi-square, univariate, and multivariate statistical tests were performed (significance level set at p < 0.05). Results A total of 186 molars and 98 incisors exhibited MIH defects: 55 molars and 75 incisors showed mild defects, 91 molars and 20 incisors had moderate lesions, and 40 molars and 3 incisors showed severe lesions. Univariate and multivariate statistical analysis showed a significant association (p < 0.05) between MIH and ear, nose, and throat (ENT) disorders and the antibiotics used during pregnancy (0.019). Conclusions Moderate defects were more frequent in the molars, while mild lesions were more frequent in the incisors. Antibiotics used during pregnancy and ENT may be directly involved in the etiology of MIH in children. PMID:29861729
Quality control for quantitative PCR based on amplification compatibility test.
Tichopad, Ales; Bar, Tzachi; Pecen, Ladislav; Kitchen, Robert R; Kubista, Mikael; Pfaffl, Michael W
2010-04-01
Quantitative qPCR is a routinely used method for the accurate quantification of nucleic acids. Yet it may generate erroneous results if the amplification process is obscured by inhibition or generation of aberrant side-products such as primer dimers. Several methods have been established to control for pre-processing performance that rely on the introduction of a co-amplified reference sequence, however there is currently no method to allow for reliable control of the amplification process without directly modifying the sample mix. Herein we present a statistical approach based on multivariate analysis of the amplification response data generated in real-time. The amplification trajectory in its most resolved and dynamic phase is fitted with a suitable model. Two parameters of this model, related to amplification efficiency, are then used for calculation of the Z-score statistics. Each studied sample is compared to a predefined reference set of reactions, typically calibration reactions. A probabilistic decision for each individual Z-score is then used to identify the majority of inhibited reactions in our experiments. We compare this approach to univariate methods using only the sample specific amplification efficiency as reporter of the compatibility. We demonstrate improved identification performance using the multivariate approach compared to the univariate approach. Finally we stress that the performance of the amplification compatibility test as a quality control procedure depends on the quality of the reference set. Copyright 2010 Elsevier Inc. All rights reserved.
Bonetti, Jennifer; Quarino, Lawrence
2014-05-01
This study has shown that the combination of simple techniques with the use of multivariate statistics offers the potential for the comparative analysis of soil samples. Five samples were obtained from each of twelve state parks across New Jersey in both the summer and fall seasons. Each sample was examined using particle-size distribution, pH analysis in both water and 1 M CaCl2 , and a loss on ignition technique. Data from each of the techniques were combined, and principal component analysis (PCA) and canonical discriminant analysis (CDA) were used for multivariate data transformation. Samples from different locations could be visually differentiated from one another using these multivariate plots. Hold-one-out cross-validation analysis showed error rates as low as 3.33%. Ten blind study samples were analyzed resulting in no misclassifications using Mahalanobis distance calculations and visual examinations of multivariate plots. Seasonal variation was minimal between corresponding samples, suggesting potential success in forensic applications. © 2014 American Academy of Forensic Sciences.
Ferreira, Fábio S; Pereira, João M S; Duarte, João V; Castelo-Branco, Miguel
2017-01-01
Although voxel based morphometry studies are still the standard for analyzing brain structure, their dependence on massive univariate inferential methods is a limiting factor. A better understanding of brain pathologies can be achieved by applying inferential multivariate methods, which allow the study of multiple dependent variables, e.g. different imaging modalities of the same subject. Given the widespread use of SPM software in the brain imaging community, the main aim of this work is the implementation of massive multivariate inferential analysis as a toolbox in this software package. applied to the use of T1 and T2 structural data from diabetic patients and controls. This implementation was compared with the traditional ANCOVA in SPM and a similar multivariate GLM toolbox (MRM). We implemented the new toolbox and tested it by investigating brain alterations on a cohort of twenty-eight type 2 diabetes patients and twenty-six matched healthy controls, using information from both T1 and T2 weighted structural MRI scans, both separately - using standard univariate VBM - and simultaneously, with multivariate analyses. Univariate VBM replicated predominantly bilateral changes in basal ganglia and insular regions in type 2 diabetes patients. On the other hand, multivariate analyses replicated key findings of univariate results, while also revealing the thalami as additional foci of pathology. While the presented algorithm must be further optimized, the proposed toolbox is the first implementation of multivariate statistics in SPM8 as a user-friendly toolbox, which shows great potential and is ready to be validated in other clinical cohorts and modalities.
Ferreira, Fábio S.; Pereira, João M.S.; Duarte, João V.; Castelo-Branco, Miguel
2017-01-01
Background: Although voxel based morphometry studies are still the standard for analyzing brain structure, their dependence on massive univariate inferential methods is a limiting factor. A better understanding of brain pathologies can be achieved by applying inferential multivariate methods, which allow the study of multiple dependent variables, e.g. different imaging modalities of the same subject. Objective: Given the widespread use of SPM software in the brain imaging community, the main aim of this work is the implementation of massive multivariate inferential analysis as a toolbox in this software package. applied to the use of T1 and T2 structural data from diabetic patients and controls. This implementation was compared with the traditional ANCOVA in SPM and a similar multivariate GLM toolbox (MRM). Method: We implemented the new toolbox and tested it by investigating brain alterations on a cohort of twenty-eight type 2 diabetes patients and twenty-six matched healthy controls, using information from both T1 and T2 weighted structural MRI scans, both separately – using standard univariate VBM - and simultaneously, with multivariate analyses. Results: Univariate VBM replicated predominantly bilateral changes in basal ganglia and insular regions in type 2 diabetes patients. On the other hand, multivariate analyses replicated key findings of univariate results, while also revealing the thalami as additional foci of pathology. Conclusion: While the presented algorithm must be further optimized, the proposed toolbox is the first implementation of multivariate statistics in SPM8 as a user-friendly toolbox, which shows great potential and is ready to be validated in other clinical cohorts and modalities. PMID:28761571
Whist, A C; Liland, K H; Jonsson, M E; Sæbø, S; Sviland, S; Østerås, O; Norström, M; Hopp, P
2014-11-01
Surveillance programs for animal diseases are critical to early disease detection and risk estimation and to documenting a population's disease status at a given time. The aim of this study was to describe a risk-based surveillance program for detecting Mycobacterium avium ssp. paratuberculosis (MAP) infection in Norwegian dairy cattle. The included risk factors for detecting MAP were purchase of cattle, combined cattle and goat farming, and location of the cattle farm in counties containing goats with MAP. The risk indicators included production data [culling of animals >3 yr of age, carcass conformation of animals >3 yr of age, milk production decrease in older lactating cows (lactations 3, 4, and 5)], and clinical data (diarrhea, enteritis, or both, in animals >3 yr of age). Except for combined cattle and goat farming and cattle farm location, all data were collected at the cow level and summarized at the herd level. Predefined risk factors and risk indicators were extracted from different national databases and combined in a multivariate statistical process control to obtain a risk assessment for each herd. The ordinary Hotelling's T(2) statistic was applied as a multivariate, standardized measure of difference between the current observed state and the average state of the risk factors for a given herd. To make the analysis more robust and adapt it to the slowly developing nature of MAP, monthly risk calculations were based on data accumulated during a 24-mo period. Monitoring of these variables was performed to identify outliers that may indicate deviance in one or more of the underlying processes. The highest-ranked herds were scattered all over Norway and clustered in high-density dairy cattle farm areas. The resulting rankings of herds are being used in the national surveillance program for MAP in 2014 to increase the sensitivity of the ongoing surveillance program in which 5 fecal samples for bacteriological examination are collected from 25 dairy herds. The use of multivariate statistical process control for selection of herds will be beneficial when a diagnostic test suitable for mass screening is available and validated on the Norwegian cattle population, thus making it possible to increase the number of sampled herds. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.
Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V
2007-01-01
The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.
Alkarkhi, Abbas F M; Ramli, Saifullah Bin; Easa, Azhar Mat
2009-01-01
Major (sodium, potassium, calcium, magnesium) and minor elements (iron, copper, zinc, manganese) and one heavy metal (lead) of Cavendish banana flour and Dream banana flour were determined, and data were analyzed using multivariate statistical techniques of factor analysis and discriminant analysis. Factor analysis yielded four factors explaining more than 81% of the total variance: the first factor explained 28.73%, comprising magnesium, sodium, and iron; the second factor explained 21.47%, comprising only manganese and copper; the third factor explained 15.66%, comprising zinc and lead; while the fourth factor explained 15.50%, comprising potassium. Discriminant analysis showed that magnesium and sodium exhibited a strong contribution in discriminating the two types of banana flour, affording 100% correct assignation. This study presents the usefulness of multivariate statistical techniques for analysis and interpretation of complex mineral content data from banana flour of different varieties.
Self-testing of vaginal pH to prevent preterm delivery: a controlled trial.
Bitzer, Eva-Maria; Schneider, Andrea; Wenzlaff, Paul; Hoyme, Udo B; Siegmund-Schultze, Elisabeth
2011-02-01
From 2004 to 2006, in a model project carried out by four German health insurers, expectant mothers were offered self-testing of vaginal pH in order to prevent preterm delivery. They were given pH test gloves on request so that they could measure their vaginal pH twice a week from the 12(th) to the 32(nd) week of gestation. They were instructed to consult with a gynecologist after any positive result. All further diagnostic or therapeutic decisions were at the discretion of the treating gynecologist. We assessed the effectiveness of the screening intervention, using delivery before the 37th week of gestation as the primary endpoint. In this prospective, controlled trial, we collected data on deliveries from 2004 to 2006 that were covered by the four participating insurers in five German federal states. We compared the outcomes of pregnancy in women who did and did not request test gloves (intervention group, [IG], and control group, [CG]). The data were derived from claims data of the participating insurers, as well as from a nationwide quality assurance auditing program for obstetrics and perinatal care. Propensity score matching and multivariate adjustment were used to control for the expected self-selection bias. The study sample comprised 149 082 deliveries. 13% of the expectant mothers requested test gloves, about half of them up to the 16(th) week of gestation. As expected, women with an elevated risk of preterm birth requested test gloves more often. Delivery before the 37(th) week of gestation was slightly more common in the intervention group than in the control group (IG 7.97%, CG 7.52%, relative risk 1.06, 95% confidence interval 1.00-1.12). This result was of borderline statistical significance in the propensity score matched analysis, but it was not statistically significant in the multivariate model. This trial did not demonstrate the efficacy of self-testing of vaginal pH for the prevention of preterm delivery (< 37 weeks of gestation).
Meteor localization via statistical analysis of spatially temporal fluctuations in image sequences
NASA Astrophysics Data System (ADS)
Kukal, Jaromír.; Klimt, Martin; Šihlík, Jan; Fliegel, Karel
2015-09-01
Meteor detection is one of the most important procedures in astronomical imaging. Meteor path in Earth's atmosphere is traditionally reconstructed from double station video observation system generating 2D image sequences. However, the atmospheric turbulence and other factors cause spatially-temporal fluctuations of image background, which makes the localization of meteor path more difficult. Our approach is based on nonlinear preprocessing of image intensity using Box-Cox and logarithmic transform as its particular case. The transformed image sequences are then differentiated along discrete coordinates to obtain statistical description of sky background fluctuations, which can be modeled by multivariate normal distribution. After verification and hypothesis testing, we use the statistical model for outlier detection. Meanwhile the isolated outlier points are ignored, the compact cluster of outliers indicates the presence of meteoroids after ignition.
Stupák, Ivan; Pavloková, Sylvie; Vysloužil, Jakub; Dohnal, Jiří; Čulen, Martin
2017-11-23
Biorelevant dissolution instruments represent an important tool for pharmaceutical research and development. These instruments are designed to simulate the dissolution of drug formulations in conditions most closely mimicking the gastrointestinal tract. In this work, we focused on the optimization of dissolution compartments/vessels for an updated version of the biorelevant dissolution apparatus-Golem v2. We designed eight compartments of uniform size but different inner geometry. The dissolution performance of the compartments was tested using immediate release caffeine tablets and evaluated by standard statistical methods and principal component analysis. Based on two phases of dissolution testing (using 250 and 100 mL of dissolution medium), we selected two compartment types yielding the highest measurement reproducibility. We also confirmed a statistically ssignificant effect of agitation rate and dissolution volume on the extent of drug dissolved and measurement reproducibility.
Smith, Ben J; Zehle, Katharina; Bauman, Adrian E; Chau, Josephine; Hawkshaw, Barbara; Frost, Steven; Thomas, Margaret
2006-04-01
This study examined the use of quantitative methods in Australian health promotion research in order to identify methodological trends and priorities for strengthening the evidence base for health promotion. Australian health promotion articles were identified by hand searching publications from 1992-2002 in six journals: Health Promotion Journal of Australia, Australian and New Zealand journal of Public Health, Health Promotion International, Health Education Research, Health Education and Behavior and the American Journal of Health Promotion. The study designs and statistical methods used in articles presenting quantitative research were recorded. 591 (57.7%) of the 1,025 articles used quantitative methods. Cross-sectional designs were used in the majority (54.3%) of studies with pre- and post-test (14.6%) and post-test only (9.5%) the next most common designs. Bivariate statistical methods were used in 45.9% of papers, multivariate methods in 27.1% and simple numbers and proportions in 25.4%. Few studies used higher-level statistical techniques. While most studies used quantitative methods, the majority were descriptive in nature. The study designs and statistical methods used provided limited scope for demonstrating intervention effects or understanding the determinants of change.
Self-Regulated Learning Strategies in Relation with Statistics Anxiety
ERIC Educational Resources Information Center
Kesici, Sahin; Baloglu, Mustafa; Deniz, M. Engin
2011-01-01
Dealing with students' attitudinal problems related to statistics is an important aspect of statistics instruction. Employing the appropriate learning strategies may have a relationship with anxiety during the process of statistics learning. Thus, the present study investigated multivariate relationships between self-regulated learning strategies…
Peters, L L; Boter, H; Burgerhof, J G M; Slaets, J P J; Buskens, E
2015-09-01
The primary objective of the present study was to evaluate the validity of the Groningen Frailty Indicator (GFI) in a sample of Dutch elderly persons participating in LifeLines, a large population-based cohort study. Additional aims were to assess differences between frail and non-frail elderly and examine which individual characteristics were associated with frailty. By December 2012, 5712 elderly persons were enrolled in LifeLines and complied with the inclusion criteria of the present study. Mann-Whitney U or Kruskal-Wallis tests were used to assess the variability of GFI-scores among elderly subgroups that differed in demographic characteristics, morbidity, obesity, and healthcare utilization. Within subgroups Kruskal-Wallis tests were also used to examine differences in GFI-scores across age groups. Multivariate logistic regression analyses were performed to assess associations between individual characteristics and frailty. The GFI discriminated between subgroups: statistically significantly higher GFI-median scores (interquartile range) were found in e.g. males (1 [0-2]), the oldest old (2 [1-3]), in elderly who were single (1 [0-2]), with lower socio economic status (1 [0-3]), with increasing co-morbidity (2 [1-3]), who were obese (2 [1-3]), and used more healthcare (2 [1-4]). Overall age had an independent and statistically significant association with GFI scores. Compared with the non-frail, frail elderly persons experienced statistically significantly more chronic stress and more social/psychological related problems. In the multivariate logistic regression model, psychological morbidity had the strongest association with frailty. The present study supports the construct validity of the GFI and provides an insight in the characteristics of (non)frail community-dwelling elderly persons participating in LifeLines. Copyright © 2015 Elsevier Inc. All rights reserved.
Analysis of risk factors for central venous port failure in cancer patients
Hsieh, Ching-Chuan; Weng, Hsu-Huei; Huang, Wen-Shih; Wang, Wen-Ke; Kao, Chiung-Lun; Lu, Ming-Shian; Wang, Chia-Siu
2009-01-01
AIM: To analyze the risk factors for central port failure in cancer patients administered chemotherapy, using univariate and multivariate analyses. METHODS: A total of 1348 totally implantable venous access devices (TIVADs) were implanted into 1280 cancer patients in this cohort study. A Cox proportional hazard model was applied to analyze risk factors for failure of TIVADs. Log-rank test was used to compare actuarial survival rates. Infection, thrombosis, and surgical complication rates (χ2 test or Fisher’s exact test) were compared in relation to the risk factors. RESULTS: Increasing age, male gender and open-ended catheter use were significant risk factors reducing survival of TIVADs as determined by univariate and multivariate analyses. Hematogenous malignancy decreased the survival time of TIVADs; this reduction was not statistically significant by univariate analysis [hazard ratio (HR) = 1.336, 95% CI: 0.966-1.849, P = 0.080)]. However, it became a significant risk factor by multivariate analysis (HR = 1.499, 95% CI: 1.079-2.083, P = 0.016) when correlated with variables of age, sex and catheter type. Close-ended (Groshong) catheters had a lower thrombosis rate than open-ended catheters (2.5% vs 5%, P = 0.015). Hematogenous malignancy had higher infection rates than solid malignancy (10.5% vs 2.5%, P < 0.001). CONCLUSION: Increasing age, male gender, open-ended catheters and hematogenous malignancy were risk factors for TIVAD failure. Close-ended catheters had lower thrombosis rates and hematogenous malignancy had higher infection rates. PMID:19787834
DOE Office of Scientific and Technical Information (OSTI.GOV)
2015-09-14
This package contains statistical routines for extracting features from multivariate time-series data which can then be used for subsequent multivariate statistical analysis to identify patterns and anomalous behavior. It calculates local linear or quadratic regression model fits to moving windows for each series and then summarizes the model coefficients across user-defined time intervals for each series. These methods are domain agnostic-but they have been successfully applied to a variety of domains, including commercial aviation and electric power grid data.
Kalegowda, Yogesh; Harmer, Sarah L
2012-03-20
Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.
NASA Astrophysics Data System (ADS)
Darvishzadeh, R.; Skidmore, A. K.; Mirzaie, M.; Atzberger, C.; Schlerf, M.
2014-12-01
Accurate estimation of grassland biomass at their peak productivity can provide crucial information regarding the functioning and productivity of the rangelands. Hyperspectral remote sensing has proved to be valuable for estimation of vegetation biophysical parameters such as biomass using different statistical techniques. However, in statistical analysis of hyperspectral data, multicollinearity is a common problem due to large amount of correlated hyper-spectral reflectance measurements. The aim of this study was to examine the prospect of above ground biomass estimation in a heterogeneous Mediterranean rangeland employing multivariate calibration methods. Canopy spectral measurements were made in the field using a GER 3700 spectroradiometer, along with concomitant in situ measurements of above ground biomass for 170 sample plots. Multivariate calibrations including partial least squares regression (PLSR), principal component regression (PCR), and Least-Squared Support Vector Machine (LS-SVM) were used to estimate the above ground biomass. The prediction accuracy of the multivariate calibration methods were assessed using cross validated R2 and RMSE. The best model performance was obtained using LS_SVM and then PLSR both calibrated with first derivative reflectance dataset with R2cv = 0.88 & 0.86 and RMSEcv= 1.15 & 1.07 respectively. The weakest prediction accuracy was appeared when PCR were used (R2cv = 0.31 and RMSEcv= 2.48). The obtained results highlight the importance of multivariate calibration methods for biomass estimation when hyperspectral data are used.
GAISE 2016 Promotes Statistical Literacy
ERIC Educational Resources Information Center
Schield, Milo
2017-01-01
In the 2005 Guidelines for Assessment and Instruction in Statistics Education (GAISE), statistical literacy featured as a primary goal. The 2016 revision eliminated statistical literacy as a stated goal. Although this looks like a rejection, this paper argues that by including multivariate thinking and--more importantly--confounding as recommended…
An R2 statistic for fixed effects in the linear mixed model.
Edwards, Lloyd J; Muller, Keith E; Wolfinger, Russell D; Qaqish, Bahjat F; Schabenberger, Oliver
2008-12-20
Statisticians most often use the linear mixed model to analyze Gaussian longitudinal data. The value and familiarity of the R(2) statistic in the linear univariate model naturally creates great interest in extending it to the linear mixed model. We define and describe how to compute a model R(2) statistic for the linear mixed model by using only a single model. The proposed R(2) statistic measures multivariate association between the repeated outcomes and the fixed effects in the linear mixed model. The R(2) statistic arises as a 1-1 function of an appropriate F statistic for testing all fixed effects (except typically the intercept) in a full model. The statistic compares the full model with a null model with all fixed effects deleted (except typically the intercept) while retaining exactly the same covariance structure. Furthermore, the R(2) statistic leads immediately to a natural definition of a partial R(2) statistic. A mixed model in which ethnicity gives a very small p-value as a longitudinal predictor of blood pressure (BP) compellingly illustrates the value of the statistic. In sharp contrast to the extreme p-value, a very small R(2) , a measure of statistical and scientific importance, indicates that ethnicity has an almost negligible association with the repeated BP outcomes for the study.
NASA Astrophysics Data System (ADS)
Fuchs, Julia; Cermak, Jan; Andersen, Hendrik
2017-04-01
This study aims at untangling the impacts of external dynamics and local conditions on cloud properties in the Southeast Atlantic (SEA) by combining satellite and reanalysis data using multivariate statistics. The understanding of clouds and their determinants at different scales is important for constraining the Earth's radiative budget, and thus prominent in climate-system research. In this study, SEA stratocumulus cloud properties are observed not only as the result of local environmental conditions but also as affected by external dynamics and spatial origins of air masses entering the study area. In order to assess to what extent cloud properties are impacted by aerosol concentration, air mass history, and meteorology, a multivariate approach is conducted using satellite observations of aerosol and cloud properties (MODIS, SEVIRI), information on aerosol species composition (MACC) and meteorological context (ERA-Interim reanalysis). To account for the often-neglected but important role of air mass origin, information on air mass history based on HYSPLIT modeling is included in the statistical model. This multivariate approach is intended to lead to a better understanding of the physical processes behind observed stratocumulus cloud properties in the SEA.
Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan
2017-12-01
Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis (PCA) and cluster analysis (CA). Copyright © 2017 Elsevier Ltd. All rights reserved.
Learning investment indicators through data extension
NASA Astrophysics Data System (ADS)
Dvořák, Marek
2017-07-01
Stock prices in the form of time series were analysed using single and multivariate statistical methods. After simple data preprocessing in the form of logarithmic differences, we augmented this single variate time series to a multivariate representation. This method makes use of sliding windows to calculate several dozen of new variables using simple statistic tools like first and second moments as well as more complicated statistic, like auto-regression coefficients and residual analysis, followed by an optional quadratic transformation that was further used for data extension. These were used as a explanatory variables in a regularized logistic LASSO regression which tried to estimate Buy-Sell Index (BSI) from real stock market data.
Gaskin, Cadeyrn J; Happell, Brenda
2014-05-01
To (a) assess the statistical power of nursing research to detect small, medium, and large effect sizes; (b) estimate the experiment-wise Type I error rate in these studies; and (c) assess the extent to which (i) a priori power analyses, (ii) effect sizes (and interpretations thereof), and (iii) confidence intervals were reported. Statistical review. Papers published in the 2011 volumes of the 10 highest ranked nursing journals, based on their 5-year impact factors. Papers were assessed for statistical power, control of experiment-wise Type I error, reporting of a priori power analyses, reporting and interpretation of effect sizes, and reporting of confidence intervals. The analyses were based on 333 papers, from which 10,337 inferential statistics were identified. The median power to detect small, medium, and large effect sizes was .40 (interquartile range [IQR]=.24-.71), .98 (IQR=.85-1.00), and 1.00 (IQR=1.00-1.00), respectively. The median experiment-wise Type I error rate was .54 (IQR=.26-.80). A priori power analyses were reported in 28% of papers. Effect sizes were routinely reported for Spearman's rank correlations (100% of papers in which this test was used), Poisson regressions (100%), odds ratios (100%), Kendall's tau correlations (100%), Pearson's correlations (99%), logistic regressions (98%), structural equation modelling/confirmatory factor analyses/path analyses (97%), and linear regressions (83%), but were reported less often for two-proportion z tests (50%), analyses of variance/analyses of covariance/multivariate analyses of variance (18%), t tests (8%), Wilcoxon's tests (8%), Chi-squared tests (8%), and Fisher's exact tests (7%), and not reported for sign tests, Friedman's tests, McNemar's tests, multi-level models, and Kruskal-Wallis tests. Effect sizes were infrequently interpreted. Confidence intervals were reported in 28% of papers. The use, reporting, and interpretation of inferential statistics in nursing research need substantial improvement. Most importantly, researchers should abandon the misleading practice of interpreting the results from inferential tests based solely on whether they are statistically significant (or not) and, instead, focus on reporting and interpreting effect sizes, confidence intervals, and significance levels. Nursing researchers also need to conduct and report a priori power analyses, and to address the issue of Type I experiment-wise error inflation in their studies. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.
Ji, Hong; Petro, Nathan M; Chen, Badong; Yuan, Zejian; Wang, Jianji; Zheng, Nanning; Keil, Andreas
2018-02-06
Over the past decade, the simultaneous recording of electroencephalogram (EEG) and functional magnetic resonance imaging (fMRI) data has garnered growing interest because it may provide an avenue towards combining the strengths of both imaging modalities. Given their pronounced differences in temporal and spatial statistics, the combination of EEG and fMRI data is however methodologically challenging. Here, we propose a novel screening approach that relies on a Cross Multivariate Correlation Coefficient (xMCC) framework. This approach accomplishes three tasks: (1) It provides a measure for testing multivariate correlation and multivariate uncorrelation of the two modalities; (2) it provides criterion for the selection of EEG features; (3) it performs a screening of relevant EEG information by grouping the EEG channels into clusters to improve efficiency and to reduce computational load when searching for the best predictors of the BOLD signal. The present report applies this approach to a data set with concurrent recordings of steady-state-visual evoked potentials (ssVEPs) and fMRI, recorded while observers viewed phase-reversing Gabor patches. We test the hypothesis that fluctuations in visuo-cortical mass potentials systematically covary with BOLD fluctuations not only in visual cortical, but also in anterior temporal and prefrontal areas. Results supported the hypothesis and showed that the xMCC-based analysis provides straightforward identification of neurophysiological plausible brain regions with EEG-fMRI covariance. Furthermore xMCC converged with other extant methods for EEG-fMRI analysis. © 2018 The Authors Journal of Neuroscience Research Published by Wiley Periodicals, Inc.
Pounds, Stan; Cheng, Cheng; Cao, Xueyuan; Crews, Kristine R; Plunkett, William; Gandhi, Varsha; Rubnitz, Jeffrey; Ribeiro, Raul C; Downing, James R; Lamba, Jatinder
2009-08-15
In some applications, prior biological knowledge can be used to define a specific pattern of association of multiple endpoint variables with a genomic variable that is biologically most interesting. However, to our knowledge, there is no statistical procedure designed to detect specific patterns of association with multiple endpoint variables. Projection onto the most interesting statistical evidence (PROMISE) is proposed as a general procedure to identify genomic variables that exhibit a specific biologically interesting pattern of association with multiple endpoint variables. Biological knowledge of the endpoint variables is used to define a vector that represents the biologically most interesting values for statistics that characterize the associations of the endpoint variables with a genomic variable. A test statistic is defined as the dot-product of the vector of the observed association statistics and the vector of the most interesting values of the association statistics. By definition, this test statistic is proportional to the length of the projection of the observed vector of correlations onto the vector of most interesting associations. Statistical significance is determined via permutation. In simulation studies and an example application, PROMISE shows greater statistical power to identify genes with the interesting pattern of associations than classical multivariate procedures, individual endpoint analyses or listing genes that have the pattern of interest and are significant in more than one individual endpoint analysis. Documented R routines are freely available from www.stjuderesearch.org/depts/biostats and will soon be available as a Bioconductor package from www.bioconductor.org.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maruyama, Mitsunari, E-mail: mitunari@med-shimane.u.ac.jp; Yoshizako, Takeshi, E-mail: yosizako@med.shimane-u.ac.jp; Nakamura, Tomonori, E-mail: t-naka@med.shimane-u.ac.jp
2016-03-15
PurposeThis study was performed to evaluate the accumulation of lipiodol emulsion (LE) and adverse events during our initial experience of balloon-occluded trans-catheter arterial chemoembolization (B-TACE) for hepatocellular carcinoma (HCC) compared with conventional TACE (C-TACE).MethodsB-TACE group (50 cases) was compared with C-TACE group (50 cases). The ratio of the LE concentration in the tumor to that in the surrounding embolized liver parenchyma (LE ratio) was calculated after each treatment. Adverse events were evaluated according to the Common Terminology Criteria for Adverse Effects (CTCAE) version 4.0.ResultsThe LE ratio at the level of subsegmental showed a statistically significant difference between the groups (tmore » test: P < 0.05). Only elevation of alanine aminotransferase was more frequent in the B-TACE group, showing a statistically significant difference (Mann–Whitney test: P < 0.05). While B-TACE caused severe adverse events (liver abscess and infarction) in patients with bile duct dilatation, there was no statistically significant difference in incidence between the groups. Multivariate logistic regression analysis suggested that the significant risk factor for liver abscess/infarction was bile duct dilatation (P < 0.05).ConclusionThe LE ratio at the level of subsegmental showed a statistically significant difference between the groups (t test: P < 0.05). B-TACE caused severe adverse events (liver abscess and infarction) in patients with bile duct dilatation.« less
Multivariate statistical analysis of low-voltage EDS spectrum images
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, I.M.
1998-03-01
Whereas energy-dispersive X-ray spectrometry (EDS) has been used for compositional analysis in the scanning electron microscope for 30 years, the benefits of using low operating voltages for such analyses have been explored only during the last few years. This paper couples low-voltage EDS with two other emerging areas of characterization: spectrum imaging and multivariate statistical analysis. The specimen analyzed for this study was a finished Intel Pentium processor, with the polyimide protective coating stripped off to expose the final active layers.
Quick Overview Scout 2008 Version 1.0
The Scout 2008 version 1.0 statistical software package has been updated from past DOS and Windows versions to provide classical and robust univariate and multivariate graphical and statistical methods that are not typically available in commercial or freeware statistical softwar...
Hermes, Ilarraza-Lomelí; Marianna, García-Saldivia; Jessica, Rojano-Castillo; Carlos, Barrera-Ramírez; Rafael, Chávez-Domínguez; María Dolores, Rius-Suárez; Pedro, Iturralde
2016-10-01
Mortality due to cardiovascular disease is often associated with ventricular arrhythmias. Nowadays, patients with cardiovascular disease are more encouraged to take part in physical training programs. Nevertheless, high-intensity exercise is associated to a higher risk for sudden death, even in apparently healthy people. During an exercise testing (ET), health care professionals provide patients, in a controlled scenario, an intense physiological stimulus that could precipitate cardiac arrhythmia in high risk individuals. There is still no clinical or statistical tool to predict this incidence. The aim of this study was to develop a statistical model to predict the incidence of exercise-induced potentially life-threatening ventricular arrhythmia (PLVA) during high intensity exercise. 6415 patients underwent a symptom-limited ET with a Balke ramp protocol. A multivariate logistic regression model where the primary outcome was PLVA was performed. Incidence of PLVA was 548 cases (8.5%). After a bivariate model, thirty one clinical or ergometric variables were statistically associated with PLVA and were included in the regression model. In the multivariate model, 13 of these variables were found to be statistically significant. A regression model (G) with a X(2) of 283.987 and a p<0.001, was constructed. Significant variables included: heart failure, antiarrhythmic drugs, myocardial lower-VD, age and use of digoxin, nitrates, among others. This study allows clinicians to identify patients at risk of ventricular tachycardia or couplets during exercise, and to take preventive measures or appropriate supervision. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A multivariate test of disease risk reveals conditions leading to disease amplification.
Halliday, Fletcher W; Heckman, Robert W; Wilfahrt, Peter A; Mitchell, Charles E
2017-10-25
Theory predicts that increasing biodiversity will dilute the risk of infectious diseases under certain conditions and will amplify disease risk under others. Yet, few empirical studies demonstrate amplification. This contrast may occur because few studies have considered the multivariate nature of disease risk, which includes richness and abundance of parasites with different transmission modes. By combining a multivariate statistical model developed for biodiversity-ecosystem-multifunctionality with an extensive field manipulation of host (plant) richness, composition and resource supply to hosts, we reveal that (i) host richness alone could not explain most changes in disease risk, and (ii) shifting host composition allowed disease amplification, depending on parasite transmission mode. Specifically, as predicted from theory, the effect of host diversity on parasite abundance differed for microbes (more density-dependent transmission) and insects (more frequency-dependent transmission). Host diversity did not influence microbial parasite abundance, but nearly doubled insect parasite abundance, and this amplification effect was attributable to variation in host composition. Parasite richness was reduced by resource addition, but only in species-rich host communities. Overall, this study demonstrates that multiple drivers, related to both host community and parasite characteristics, can influence disease risk. Furthermore, it provides a framework for evaluating multivariate disease risk in other systems. © 2017 The Author(s).
Robust tests for multivariate factorial designs under heteroscedasticity.
Vallejo, Guillermo; Ato, Manuel
2012-06-01
The question of how to analyze several multivariate normal mean vectors when normality and covariance homogeneity assumptions are violated is considered in this article. For the two-way MANOVA layout, we address this problem adapting results presented by Brunner, Dette, and Munk (BDM; 1997) and Vallejo and Ato (modified Brown-Forsythe [MBF]; 2006) in the context of univariate factorial and split-plot designs and a multivariate version of the linear model (MLM) to accommodate heterogeneous data. Furthermore, we compare these procedures with the Welch-James (WJ) approximate degrees of freedom multivariate statistics based on ordinary least squares via Monte Carlo simulation. Our numerical studies show that of the methods evaluated, only the modified versions of the BDM and MBF procedures were robust to violations of underlying assumptions. The MLM approach was only occasionally liberal, and then by only a small amount, whereas the WJ procedure was often liberal if the interactive effects were involved in the design, particularly when the number of dependent variables increased and total sample size was small. On the other hand, it was also found that the MLM procedure was uniformly more powerful than its most direct competitors. The overall success rate was 22.4% for the BDM, 36.3% for the MBF, and 45.0% for the MLM.
A false dichotomy? Mental illness and lone-actor terrorism.
Corner, Emily; Gill, Paul
2015-02-01
We test whether significant differences in mental illness exist in a matched sample of lone- and group-based terrorists. We then test whether there are distinct behavioral differences between lone-actor terrorists with and without mental illness. We then stratify our sample across a range of diagnoses and again test whether significant differences exist. We conduct a series of bivariate, multivariate, and multinomial statistical tests using a unique dataset of 119 lone-actor terrorists and a matched sample of group-based terrorists. The odds of a lone-actor terrorist having a mental illness is 13.49 times higher than the odds of a group actor having a mental illness. Lone actors who were mentally ill were 18.07 times more likely to have a spouse or partner who was involved in a wider movement than those without a history of mental illness. Those with a mental illness were more likely to have a proximate upcoming life change, more likely to have been a recent victim of prejudice, and experienced proximate and chronic stress. The results identify behaviors and traits that security agencies can utilize to monitor and prevent lone-actor terrorism events. The correlated behaviors provide an image of how risk can crystalize within the individual offender and that our understanding of lone-actor terrorism should be multivariate in nature.
The use of multivariate statistics in studies of wildlife habitat
David E. Capen
1981-01-01
This report contains edited and reviewed versions of papers presented at a workshop held at the University of Vermont in April 1980. Topics include sampling avian habitats, multivariate methods, applications, examples, and new approaches to analysis and interpretation.
Rejection of Multivariate Outliers.
1983-05-01
available in Gnanadesikan (1977). 2 The motivation for the present investigation lies in a recent paper of Schvager and Margolin (1982) who derive a... Gnanadesikan , R. (1977). Methods for Statistical Data Analysis of Multivariate Observations. Wiley, New York. [7] Hawkins, D.M. (1980). Identification of
Multivariate analysis: greater insights into complex systems
USDA-ARS?s Scientific Manuscript database
Many agronomic researchers measure and collect multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate (MV) statistical methods encompass the simultaneous analysis of all random variables (RV) measured on each experimental or sampling ...
Osterberg, T; Norinder, U
2001-01-01
A method of modelling and predicting biopharmaceutical properties using simple theoretically computed molecular descriptors and multivariate statistics has been investigated for several data sets related to solubility, IAM chromatography, permeability across Caco-2 cell monolayers, human intestinal perfusion, brain-blood partitioning, and P-glycoprotein ATPase activity. The molecular descriptors (e.g. molar refractivity, molar volume, index of refraction, surface tension and density) and logP were computed with ACD/ChemSketch and ACD/logP, respectively. Good statistical models were derived that permit simple computational prediction of biopharmaceutical properties. All final models derived had R(2) values ranging from 0.73 to 0.95 and Q(2) values ranging from 0.69 to 0.86. The RMSEP values for the external test sets ranged from 0.24 to 0.85 (log scale).
Optimal allocation of testing resources for statistical simulations
NASA Astrophysics Data System (ADS)
Quintana, Carolina; Millwater, Harry R.; Singh, Gulshan; Golden, Patrick
2015-07-01
Statistical estimates from simulation involve uncertainty caused by the variability in the input random variables due to limited data. Allocating resources to obtain more experimental data of the input variables to better characterize their probability distributions can reduce the variance of statistical estimates. The methodology proposed determines the optimal number of additional experiments required to minimize the variance of the output moments given single or multiple constraints. The method uses multivariate t-distribution and Wishart distribution to generate realizations of the population mean and covariance of the input variables, respectively, given an amount of available data. This method handles independent and correlated random variables. A particle swarm method is used for the optimization. The optimal number of additional experiments per variable depends on the number and variance of the initial data, the influence of the variable in the output function and the cost of each additional experiment. The methodology is demonstrated using a fretting fatigue example.
Santori, G; Fontana, I; Bertocchi, M; Gasloli, G; Magoni Rossi, A; Tagliamacco, A; Barocci, S; Nocera, A; Valente, U
2010-05-01
A useful approach to reduce the number of discarded marginal kidneys and to increase the nephron mass is double kidney transplantation (DKT). In this study, we retrospectively evaluated the potential predictors for patient and graft survival in a single-center series of 59 DKT procedures performed between April 21, 1999, and September 21, 2008. The kidney recipients of mean age 63.27 +/- 5.17 years included 16 women (27%) and 43 men (73%). The donors of mean age 69.54 +/- 7.48 years included 32 women (54%) and 27 men (46%). The mean posttransplant dialysis time was 2.37 +/- 3.61 days. The mean hospitalization was 20.12 +/- 13.65 days. Average serum creatinine (SCr) at discharge was 1.5 +/- 0.59 mg/dL. In view of the limited numbers of recipient deaths (n = 4) and graft losses (n = 8) that occurred in our series, the proportional hazards assumption for each Cox regression model with P < .05 was tested by using correlation coefficients between transformed survival times and scaled Schoenfeld residuals, and checked with smoothed plots of Schoenfeld residuals. For patient survival, the variables that reached statistical significance were donor SCr (P = .007), donor creatinine cleararance (P = .023), and recipient age (P = .047). Each significant model passed the Schoenfeld test. By entering these variables into a multivariate Cox model for patient survival, no further significance was observed. In the univariate Cox models performed for graft survival, statistical significance was noted for donor SCr (P = .027), SCr 3 months post-DKT (P = .043), and SCr 6 months post-DKT (P = .017). All significant univariate models for graft survival passed the Schoenfeld test. A final multivariate model retained SCr at 6 months (beta = 1.746, P = .042) and donor SCr (beta = .767, P = .090). In our analysis, SCr at 6 months seemed to emerge from both univariate and multivariate Cox models as a potential predictor of graft survival among DKT. Multicenter studies with larger recipient populations and more graft losses should be performed to confirm our findings. Copyright (c) 2010 Elsevier Inc. All rights reserved.
The Bootstrap, the Jackknife, and the Randomization Test: A Sampling Taxonomy.
Rodgers, J L
1999-10-01
A simple sampling taxonomy is defined that shows the differences between and relationships among the bootstrap, the jackknife, and the randomization test. Each method has as its goal the creation of an empirical sampling distribution that can be used to test statistical hypotheses, estimate standard errors, and/or create confidence intervals. Distinctions between the methods can be made based on the sampling approach (with replacement versus without replacement) and the sample size (replacing the whole original sample versus replacing a subset of the original sample). The taxonomy is useful for teaching the goals and purposes of resampling schemes. An extension of the taxonomy implies other possible resampling approaches that have not previously been considered. Univariate and multivariate examples are presented.
Statistical Analysis of Zebrafish Locomotor Response.
Liu, Yiwen; Carmer, Robert; Zhang, Gaonan; Venkatraman, Prahatha; Brown, Skye Ashton; Pang, Chi-Pui; Zhang, Mingzhi; Ma, Ping; Leung, Yuk Fai
2015-01-01
Zebrafish larvae display rich locomotor behaviour upon external stimulation. The movement can be simultaneously tracked from many larvae arranged in multi-well plates. The resulting time-series locomotor data have been used to reveal new insights into neurobiology and pharmacology. However, the data are of large scale, and the corresponding locomotor behavior is affected by multiple factors. These issues pose a statistical challenge for comparing larval activities. To address this gap, this study has analyzed a visually-driven locomotor behaviour named the visual motor response (VMR) by the Hotelling's T-squared test. This test is congruent with comparing locomotor profiles from a time period. Different wild-type (WT) strains were compared using the test, which shows that they responded differently to light change at different developmental stages. The performance of this test was evaluated by a power analysis, which shows that the test was sensitive for detecting differences between experimental groups with sample numbers that were commonly used in various studies. In addition, this study investigated the effects of various factors that might affect the VMR by multivariate analysis of variance (MANOVA). The results indicate that the larval activity was generally affected by stage, light stimulus, their interaction, and location in the plate. Nonetheless, different factors affected larval activity differently over time, as indicated by a dynamical analysis of the activity at each second. Intriguingly, this analysis also shows that biological and technical repeats had negligible effect on larval activity. This finding is consistent with that from the Hotelling's T-squared test, and suggests that experimental repeats can be combined to enhance statistical power. Together, these investigations have established a statistical framework for analyzing VMR data, a framework that should be generally applicable to other locomotor data with similar structure.
Statistical Analysis of Zebrafish Locomotor Response
Zhang, Gaonan; Venkatraman, Prahatha; Brown, Skye Ashton; Pang, Chi-Pui; Zhang, Mingzhi; Ma, Ping; Leung, Yuk Fai
2015-01-01
Zebrafish larvae display rich locomotor behaviour upon external stimulation. The movement can be simultaneously tracked from many larvae arranged in multi-well plates. The resulting time-series locomotor data have been used to reveal new insights into neurobiology and pharmacology. However, the data are of large scale, and the corresponding locomotor behavior is affected by multiple factors. These issues pose a statistical challenge for comparing larval activities. To address this gap, this study has analyzed a visually-driven locomotor behaviour named the visual motor response (VMR) by the Hotelling’s T-squared test. This test is congruent with comparing locomotor profiles from a time period. Different wild-type (WT) strains were compared using the test, which shows that they responded differently to light change at different developmental stages. The performance of this test was evaluated by a power analysis, which shows that the test was sensitive for detecting differences between experimental groups with sample numbers that were commonly used in various studies. In addition, this study investigated the effects of various factors that might affect the VMR by multivariate analysis of variance (MANOVA). The results indicate that the larval activity was generally affected by stage, light stimulus, their interaction, and location in the plate. Nonetheless, different factors affected larval activity differently over time, as indicated by a dynamical analysis of the activity at each second. Intriguingly, this analysis also shows that biological and technical repeats had negligible effect on larval activity. This finding is consistent with that from the Hotelling’s T-squared test, and suggests that experimental repeats can be combined to enhance statistical power. Together, these investigations have established a statistical framework for analyzing VMR data, a framework that should be generally applicable to other locomotor data with similar structure. PMID:26437184
van der Ham, Joris L
2016-05-19
Forensic entomologists can use carrion communities' ecological succession data to estimate the postmortem interval (PMI). Permutation tests of hierarchical cluster analyses of these data provide a conceptual method to estimate part of the PMI, the post-colonization interval (post-CI). This multivariate approach produces a baseline of statistically distinct clusters that reflect changes in the carrion community composition during the decomposition process. Carrion community samples of unknown post-CIs are compared with these baseline clusters to estimate the post-CI. In this short communication, I use data from previously published studies to demonstrate the conceptual feasibility of this multivariate approach. Analyses of these data produce series of significantly distinct clusters, which represent carrion communities during 1- to 20-day periods of the decomposition process. For 33 carrion community samples, collected over an 11-day period, this approach correctly estimated the post-CI within an average range of 3.1 days. © The Authors 2016. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Su, Zhong; Zhang, Lisha; Ramakrishnan, V.; Hagan, Michael; Anscher, Mitchell
2011-01-01
Purpose: To evaluate both the Calypso Systems’ (Calypso Medical Technologies, Inc., Seattle, WA) localization accuracy in the presence of wireless metal–oxide–semiconductor field-effect transistor (MOSFET) dosimeters of dose verification system (DVS, Sicel Technologies, Inc., Morrisville, NC) and the dosimeters’ reading accuracy in the presence of wireless electromagnetic transponders inside a phantom.Methods: A custom-made, solid-water phantom was fabricated with space for transponders and dosimeters. Two inserts were machined with positioning grooves precisely matching the dimensions of the transponders and dosimeters and were arranged in orthogonal and parallel orientations, respectively. To test the transponder localization accuracy with∕without presence of dosimeters (hypothesis 1), multivariate analyses were performed on transponder-derived localization data with and without dosimeters at each preset distance to detect statistically significant localization differences between the control and test sets. To test dosimeter dose-reading accuracy with∕without presence of transponders (hypothesis 2), an approach of alternating the transponder presence in seven identical fraction dose (100 cGy) deliveries and measurements was implemented. Two-way analysis of variance was performed to examine statistically significant dose-reading differences between the two groups and the different fractions. A relative-dose analysis method was also used to evaluate transponder impact on dose-reading accuracy after dose-fading effect was removed by a second-order polynomial fit.Results: Multivariate analysis indicated that hypothesis 1 was false; there was a statistically significant difference between the localization data from the control and test sets. However, the upper and lower bounds of the 95% confidence intervals of the localized positional differences between the control and test sets were less than 0.1 mm, which was significantly smaller than the minimum clinical localization resolution of 0.5 mm. For hypothesis 2, analysis of variance indicated that there was no statistically significant difference between the dosimeter readings with and without the presence of transponders. Both orthogonal and parallel configurations had difference of polynomial-fit dose to measured dose values within 1.75%.Conclusions: The phantom study indicated that the Calypso System’s localization accuracy was not affected clinically due to the presence of DVS wireless MOSFET dosimeters and the dosimeter-measured doses were not affected by the presence of transponders. Thus, the same patients could be implanted with both transponders and dosimeters to benefit from improved accuracy of radiotherapy treatments offered by conjunctional use of the two systems. PMID:21776780
Badran, M; Morsy, R; Soliman, H; Elnimr, T
2016-01-01
The trace elements metabolism has been reported to possess specific roles in the pathogenesis and progress of diabetes mellitus. Due to the continuous increase in the population of patients with Type 2 diabetes (T2D), this study aims to assess the levels and inter-relationships of fast blood glucose (FBG) and serum trace elements in Type 2 diabetic patients. This study was conducted on 40 Egyptian Type 2 diabetic patients and 36 healthy volunteers (Hospital of Tanta University, Tanta, Egypt). The blood serum was digested and then used to determine the levels of 24 trace elements using an inductive coupled plasma mass spectroscopy (ICP-MS). Multivariate statistical analysis depended on correlation coefficient, cluster analysis (CA) and principal component analysis (PCA), were used to analysis the data. The results exhibited significant changes in FBG and eight of trace elements, Zn, Cu, Se, Fe, Mn, Cr, Mg, and As, levels in the blood serum of Type 2 diabetic patients relative to those of healthy controls. The statistical analyses using multivariate statistical techniques were obvious in the reduction of the experimental variables, and grouping the trace elements in patients into three clusters. The application of PCA revealed a distinct difference in associations of trace elements and their clustering patterns in control and patients group in particular for Mg, Fe, Cu, and Zn that appeared to be the most crucial factors which related with Type 2 diabetes. Therefore, on the basis of this study, the contributors of trace elements content in Type 2 diabetic patients can be determine and specify with correlation relationship and multivariate statistical analysis, which confirm that the alteration of some essential trace metals may play a role in the development of diabetes mellitus. Copyright © 2015 Elsevier GmbH. All rights reserved.
Gao, Yongnian; Gao, Junfeng; Yin, Hongbin; Liu, Chuansheng; Xia, Ting; Wang, Jing; Huang, Qi
2015-03-15
Remote sensing has been widely used for ater quality monitoring, but most of these monitoring studies have only focused on a few water quality variables, such as chlorophyll-a, turbidity, and total suspended solids, which have typically been considered optically active variables. Remote sensing presents a challenge in estimating the phosphorus concentration in water. The total phosphorus (TP) in lakes has been estimated from remotely sensed observations, primarily using the simple individual band ratio or their natural logarithm and the statistical regression method based on the field TP data and the spectral reflectance. In this study, we investigated the possibility of establishing a spatial modeling scheme to estimate the TP concentration of a large lake from multi-spectral satellite imagery using band combinations and regional multivariate statistical modeling techniques, and we tested the applicability of the spatial modeling scheme. The results showed that HJ-1A CCD multi-spectral satellite imagery can be used to estimate the TP concentration in a lake. The correlation and regression analysis showed a highly significant positive relationship between the TP concentration and certain remotely sensed combination variables. The proposed modeling scheme had a higher accuracy for the TP concentration estimation in the large lake compared with the traditional individual band ratio method and the whole-lake scale regression-modeling scheme. The TP concentration values showed a clear spatial variability and were high in western Lake Chaohu and relatively low in eastern Lake Chaohu. The northernmost portion, the northeastern coastal zone and the southeastern portion of western Lake Chaohu had the highest TP concentrations, and the other regions had the lowest TP concentration values, except for the coastal zone of eastern Lake Chaohu. These results strongly suggested that the proposed modeling scheme, i.e., the band combinations and the regional multivariate statistical modeling techniques, demonstrated advantages for estimating the TP concentration in a large lake and had a strong potential for universal application for the TP concentration estimation in large lake waters worldwide. Copyright © 2014 Elsevier Ltd. All rights reserved.
Bostanov, Vladimir; Kotchoubey, Boris
2006-12-01
This study was aimed at developing a method for extraction and assessment of event-related brain potentials (ERP) from single-trials. This method should be applicable in the assessment of single persons' ERPs and should be able to handle both single ERP components and whole waveforms. We adopted a recently developed ERP feature extraction method, the t-CWT, for the purposes of hypothesis testing in the statistical assessment of ERPs. The t-CWT is based on the continuous wavelet transform (CWT) and Student's t-statistics. The method was tested in two ERP paradigms, oddball and semantic priming, by assessing individual-participant data on a single-trial basis, and testing the significance of selected ERP components, P300 and N400, as well as of whole ERP waveforms. The t-CWT was also compared to other univariate and multivariate ERP assessment methods: peak picking, area computation, discrete wavelet transform (DWT) and principal component analysis (PCA). The t-CWT produced better results than all of the other assessment methods it was compared with. The t-CWT can be used as a reliable and powerful method for ERP-component detection and testing of statistical hypotheses concerning both single ERP components and whole waveforms extracted from either single persons' or group data. The t-CWT is the first such method based explicitly on the criteria of maximal statistical difference between two average ERPs in the time-frequency domain and is particularly suitable for ERP assessment of individual data (e.g. in clinical settings), but also for the investigation of small and/or novel ERP effects from group data.
Application of two tests of multivariate discordancy to fisheries data sets
Stapanian, M.A.; Kocovsky, P.M.; Garner, F.C.
2008-01-01
The generalized (Mahalanobis) distance and multivariate kurtosis are two powerful tests of multivariate discordancies (outliers). Unlike the generalized distance test, the multivariate kurtosis test has not been applied as a test of discordancy to fisheries data heretofore. We applied both tests, along with published algorithms for identifying suspected causal variable(s) of discordant observations, to two fisheries data sets from Lake Erie: total length, mass, and age from 1,234 burbot, Lota lota; and 22 combinations of unique subsets of 10 morphometrics taken from 119 yellow perch, Perca flavescens. For the burbot data set, the generalized distance test identified six discordant observations and the multivariate kurtosis test identified 24 discordant observations. In contrast with the multivariate tests, the univariate generalized distance test identified no discordancies when applied separately to each variable. Removing discordancies had a substantial effect on length-versus-mass regression equations. For 500-mm burbot, the percent difference in estimated mass after removing discordancies in our study was greater than the percent difference in masses estimated for burbot of the same length in lakes that differed substantially in productivity. The number of discordant yellow perch detected ranged from 0 to 2 with the multivariate generalized distance test and from 6 to 11 with the multivariate kurtosis test. With the kurtosis test, 108 yellow perch (90.7%) were identified as discordant in zero to two combinations, and five (4.2%) were identified as discordant in either all or 21 of the 22 combinations. The relationship among the variables included in each combination determined which variables were identified as causal. The generalized distance test identified between zero and six discordancies when applied separately to each variable. Removing the discordancies found in at least one-half of the combinations (k=5) had a marked effect on a principal components analysis. In particular, the percent of the total variation explained by second and third principal components, which explain shape, increased by 52 and 44% respectively when the discordancies were removed. Multivariate applications of the tests have numerous ecological advantages over univariate applications, including improved management of fish stocks and interpretation of multivariate morphometric data. ?? 2007 Springer Science+Business Media B.V.
Multivariate Analysis and Prediction of Dioxin-Furan ...
Peer Review Draft of Regional Methods Initiative Final Report Dioxins, which are bioaccumulative and environmentally persistent, pose an ongoing risk to human and ecosystem health. Fish constitute a significant source of dioxin exposure for humans and fish-eating wildlife. Current dioxin analytical methods are costly, time-consuming, and produce hazardous by-products. A Danish team developed a novel, multivariate statistical methodology based on the covariance of dioxin-furan congener Toxic Equivalences (TEQs) and fatty acid methyl esters (FAMEs) and applied it to North Atlantic Ocean fishmeal samples. The goal of the current study was to attempt to extend this Danish methodology to 77 whole and composite fish samples from three trophic groups: predator (whole largemouth bass), benthic (whole flathead and channel catfish) and forage fish (composite bluegill, pumpkinseed and green sunfish) from two dioxin contaminated rivers (Pocatalico R. and Kanawha R.) in West Virginia, USA. Multivariate statistical analyses, including, Principal Components Analysis (PCA), Hierarchical Clustering, and Partial Least Squares Regression (PLS), were used to assess the relationship between the FAMEs and TEQs in these dioxin contaminated freshwater fish from the Kanawha and Pocatalico Rivers. These three multivariate statistical methods all confirm that the pattern of Fatty Acid Methyl Esters (FAMEs) in these freshwater fish covaries with and is predictive of the WHO TE
Testing novel patient financial incentives to increase breast cancer screening.
Merrick, Elizabeth Levy; Hodgkin, Dominic; Horgan, Constance M; Lorenz, Laura S; Panas, Lee; Ritter, Grant A; Kasuba, Paul; Poskanzer, Debra; Nefussy, Renee Altman
2015-11-01
To examine the effects of 3 types of low-cost financial incentives for patients, including a novel "person-centered" approach on breast cancer screening (mammogram) rates. Randomized controlled trial with 4 arms: 3 types of financial incentives ($15 gift card, entry into lottery for $250 gift card, and a person-centered incentive with choice of $15 gift card or lottery) and a control group. Sample included privately insured Tufts Health Plan members in Massachusetts who were women aged 42 to 69 years with no mammogram claim in ≥ 2.6 years. A sample of 4700 eligible members were randomized to 4 study arms. The control group received a standard reminder letter and the incentive groups received a reminder letter plus an incentive offer for obtaining a mammogram within the next 4 months. Bivariate tests and multivariate logistic regression were used to assess the incentives' impact on mammogram receipt. Data were analyzed for 4427 members (after exclusions such as undeliverable mail). The percent of members receiving a mammogram during the study was 11.7% (gift card), 12.1% (lottery), 13.4% (person-centered/choice), and 11.9% (controls). Differences were not statistically significant in bivariate or multivariate full-sample analyses. In exploratory subgroup analyses of members with a mammogram during the most recent year prior to the study-defined gap, person-centered incentives were associated with a higher likelihood of mammogram receipt. None of the low-cost incentives tested had a statistically significant effect on mammogram rates in the full sample. Exploratory findings for members who were more recently screened suggest that they may be more responsive to person-centered incentives.
NASA Technical Reports Server (NTRS)
Djorgovski, S. George
1994-01-01
We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complete database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful, and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications, and has produced real, published results.
Multi-criteria evaluation of CMIP5 GCMs for climate change impact analysis
NASA Astrophysics Data System (ADS)
Ahmadalipour, Ali; Rana, Arun; Moradkhani, Hamid; Sharma, Ashish
2017-04-01
Climate change is expected to have severe impacts on global hydrological cycle along with food-water-energy nexus. Currently, there are many climate models used in predicting important climatic variables. Though there have been advances in the field, there are still many problems to be resolved related to reliability, uncertainty, and computing needs, among many others. In the present work, we have analyzed performance of 20 different global climate models (GCMs) from Climate Model Intercomparison Project Phase 5 (CMIP5) dataset over the Columbia River Basin (CRB) in the Pacific Northwest USA. We demonstrate a statistical multicriteria approach, using univariate and multivariate techniques, for selecting suitable GCMs to be used for climate change impact analysis in the region. Univariate methods includes mean, standard deviation, coefficient of variation, relative change (variability), Mann-Kendall test, and Kolmogorov-Smirnov test (KS-test); whereas multivariate methods used were principal component analysis (PCA), singular value decomposition (SVD), canonical correlation analysis (CCA), and cluster analysis. The analysis is performed on raw GCM data, i.e., before bias correction, for precipitation and temperature climatic variables for all the 20 models to capture the reliability and nature of the particular model at regional scale. The analysis is based on spatially averaged datasets of GCMs and observation for the period of 1970 to 2000. Ranking is provided to each of the GCMs based on the performance evaluated against gridded observational data on various temporal scales (daily, monthly, and seasonal). Results have provided insight into each of the methods and various statistical properties addressed by them employed in ranking GCMs. Further; evaluation was also performed for raw GCM simulations against different sets of gridded observational dataset in the area.
Borrowing of strength and study weights in multivariate and network meta-analysis.
Jackson, Dan; White, Ian R; Price, Malcolm; Copas, John; Riley, Richard D
2017-12-01
Multivariate and network meta-analysis have the potential for the estimated mean of one effect to borrow strength from the data on other effects of interest. The extent of this borrowing of strength is usually assessed informally. We present new mathematical definitions of 'borrowing of strength'. Our main proposal is based on a decomposition of the score statistic, which we show can be interpreted as comparing the precision of estimates from the multivariate and univariate models. Our definition of borrowing of strength therefore emulates the usual informal assessment. We also derive a method for calculating study weights, which we embed into the same framework as our borrowing of strength statistics, so that percentage study weights can accompany the results from multivariate and network meta-analyses as they do in conventional univariate meta-analyses. Our proposals are illustrated using three meta-analyses involving correlated effects for multiple outcomes, multiple risk factor associations and multiple treatments (network meta-analysis).
Borrowing of strength and study weights in multivariate and network meta-analysis
Jackson, Dan; White, Ian R; Price, Malcolm; Copas, John; Riley, Richard D
2016-01-01
Multivariate and network meta-analysis have the potential for the estimated mean of one effect to borrow strength from the data on other effects of interest. The extent of this borrowing of strength is usually assessed informally. We present new mathematical definitions of ‘borrowing of strength’. Our main proposal is based on a decomposition of the score statistic, which we show can be interpreted as comparing the precision of estimates from the multivariate and univariate models. Our definition of borrowing of strength therefore emulates the usual informal assessment. We also derive a method for calculating study weights, which we embed into the same framework as our borrowing of strength statistics, so that percentage study weights can accompany the results from multivariate and network meta-analyses as they do in conventional univariate meta-analyses. Our proposals are illustrated using three meta-analyses involving correlated effects for multiple outcomes, multiple risk factor associations and multiple treatments (network meta-analysis). PMID:26546254
McGuire, Thomas G; Ayanian, John Z; Ford, Daniel E; Henke, Rachel E M; Rost, Kathryn M; Zaslavsky, Alan M
2008-01-01
Objective To test for discrimination by race/ethnicity arising from clinical uncertainty in treatment for depression, also known as “statistical discrimination.” Data Sources We used survey data from 1,321 African-American, Hispanic, and white adults identified with depression in primary care. Surveys were administered every six months for two years in the Quality Improvement for Depression (QID) studies. Study Design To examine whether and how change in depression severity affects change in treatment intensity by race/ethnicity, we used multivariate cross-sectional and change models that difference out unobserved time-invariant patient characteristics potentially correlated with race/ethnicity. Data Collection/Extraction Methods Treatment intensity was operationalized as expenditures on drugs, primary care, and specialty services, weighted by national prices from the Medical Expenditure Panel Survey. Patient race/ethnicity was collected at baseline by self-report. Principal Findings Change in depression severity is less associated with change in treatment intensity in minority patients than in whites, consistent with the hypothesis of statistical discrimination. The differential effect by racial/ethnic group was accounted for by use of mental health specialists. Conclusions Enhanced physician–patient communication and use of standardized depression instruments may reduce statistical discrimination arising from clinical uncertainty and be useful in reducing racial/ethnic inequities in depression treatment. PMID:18370966
Epidemiologic methods in clinical trials.
Rothman, K J
1977-04-01
Epidemiologic methods developed to control confounding in non-experimental studies are equally applicable for experiments. In experiments, most confounding is usually controlled by random allocation of subjects to treatment groups, but randomization does not preclude confounding except for extremely large studies, the degree of confounding expected being inversely related to the size of the treatment groups. In experiments, as in non-experimental studies, the extent of confounding for each risk indicator should be assessed, and if sufficiently large, controlled. Confounding is properly assessed by comparing the unconfounded effect estimate to the crude effect estimate; a common error is to assess confounding by statistical tests of significance. Assessment of confounding involves its control as a prerequisite. Control is most readily and cogently achieved by stratification of the data, though with many factors to control simultaneously, multivariate analysis or a combination of multivariate analysis and stratification might be necessary.
Optimal False Discovery Rate Control for Dependent Data
Xie, Jichun; Cai, T. Tony; Maris, John; Li, Hongzhe
2013-01-01
This paper considers the problem of optimal false discovery rate control when the test statistics are dependent. An optimal joint oracle procedure, which minimizes the false non-discovery rate subject to a constraint on the false discovery rate is developed. A data-driven marginal plug-in procedure is then proposed to approximate the optimal joint procedure for multivariate normal data. It is shown that the marginal procedure is asymptotically optimal for multivariate normal data with a short-range dependent covariance structure. Numerical results show that the marginal procedure controls false discovery rate and leads to a smaller false non-discovery rate than several commonly used p-value based false discovery rate controlling methods. The procedure is illustrated by an application to a genome-wide association study of neuroblastoma and it identifies a few more genetic variants that are potentially associated with neuroblastoma than several p-value-based false discovery rate controlling procedures. PMID:23378870
Body mass index, waist circumference, and arterial hypertension in students.
Guilherme, Flávio Ricardo; Molena-Fernandes, Carlos Alexandre; Guilherme, Vânia Renata; Fávero, Maria Teresa Martins; dos Reis, Eliane Josefa Barbosa; Rinaldi, Wilson
2015-01-01
to investigate what is the best anthropometric predictor of arterial hypertension among private school students. this was a cross-sectional study with 286 students between the ages of 10 and 14 from two private schools in the city of Paranavaí, Paraná, Brazil. The following variables were analyzed: body mass index, waist circumference and blood pressure. Statistical analysis was conducted with Pearson's partial correlation test and multivariate logistic regression, with p<0.05. both anthropometric indicators displayed weak correlation with systolic and diastolic levels, with coefficients (r) ranging from 0.27 to 0.36 (p < 0.001). Multivariate analysis showed that the only anthropometric indicator associated with arterial hypertension was waist circumference (OR= 2.3; 95% CI: 1.1-4.5), regardless of age or gender. this age group, waist circumference appeared to be a better predictor for arterial hypertension than body mass index.
Modeling multivariate time series on manifolds with skew radial basis functions.
Jamshidi, Arta A; Kirby, Michael J
2011-01-01
We present an approach for constructing nonlinear empirical mappings from high-dimensional domains to multivariate ranges. We employ radial basis functions and skew radial basis functions for constructing a model using data that are potentially scattered or sparse. The algorithm progresses iteratively, adding a new function at each step to refine the model. The placement of the functions is driven by a statistical hypothesis test that accounts for correlation in the multivariate range variables. The test is applied on training and validation data and reveals nonstatistical or geometric structure when it fails. At each step, the added function is fit to data contained in a spatiotemporally defined local region to determine the parameters--in particular, the scale of the local model. The scale of the function is determined by the zero crossings of the autocorrelation function of the residuals. The model parameters and the number of basis functions are determined automatically from the given data, and there is no need to initialize any ad hoc parameters save for the selection of the skew radial basis functions. Compactly supported skew radial basis functions are employed to improve model accuracy, order, and convergence properties. The extension of the algorithm to higher-dimensional ranges produces reduced-order models by exploiting the existence of correlation in the range variable data. Structure is tested not just in a single time series but between all pairs of time series. We illustrate the new methodologies using several illustrative problems, including modeling data on manifolds and the prediction of chaotic time series.
NASA Astrophysics Data System (ADS)
Vrac, Mathieu
2018-06-01
Climate simulations often suffer from statistical biases with respect to observations or reanalyses. It is therefore common to correct (or adjust) those simulations before using them as inputs into impact models. However, most bias correction (BC) methods are univariate and so do not account for the statistical dependences linking the different locations and/or physical variables of interest. In addition, they are often deterministic, and stochasticity is frequently needed to investigate climate uncertainty and to add constrained randomness to climate simulations that do not possess a realistic variability. This study presents a multivariate method of rank resampling for distributions and dependences (R2D2) bias correction allowing one to adjust not only the univariate distributions but also their inter-variable and inter-site dependence structures. Moreover, the proposed R2D2 method provides some stochasticity since it can generate as many multivariate corrected outputs as the number of statistical dimensions (i.e., number of grid cell × number of climate variables) of the simulations to be corrected. It is based on an assumption of stability in time of the dependence structure - making it possible to deal with a high number of statistical dimensions - that lets the climate model drive the temporal properties and their changes in time. R2D2 is applied on temperature and precipitation reanalysis time series with respect to high-resolution reference data over the southeast of France (1506 grid cell). Bivariate, 1506-dimensional and 3012-dimensional versions of R2D2 are tested over a historical period and compared to a univariate BC. How the different BC methods behave in a climate change context is also illustrated with an application to regional climate simulations over the 2071-2100 period. The results indicate that the 1d-BC basically reproduces the climate model multivariate properties, 2d-R2D2 is only satisfying in the inter-variable context, 1506d-R2D2 strongly improves inter-site properties and 3012d-R2D2 is able to account for both. Applications of the proposed R2D2 method to various climate datasets are relevant for many impact studies. The perspectives of improvements are numerous, such as introducing stochasticity in the dependence itself, questioning its stability assumption, and accounting for temporal properties adjustment while including more physics in the adjustment procedures.
NASA Astrophysics Data System (ADS)
Brandmeier, M.; Wörner, G.
2016-10-01
Multivariate statistical and geospatial analyses based on a compilation of 890 geochemical and 1200 geochronological data for 194 mapped ignimbrites from the Central Andes document the compositional and temporal patterns of large-volume ignimbrites (so-called "ignimbrite flare-ups") during Neogene times. Rapid advances in computational science during the past decade led to a growing pool of algorithms for multivariate statistics for large datasets with many predictor variables. This study applies cluster analysis (CA) and linear discriminant analysis (LDA) on log-ratio transformed data with the aim of (1) testing a tool for ignimbrite correlation and (2) distinguishing compositional groups that reflect different processes and sources of ignimbrite magmatism during the geodynamic evolution of the Central Andes. CA on major and trace elements allows grouping of ignimbrites according to their geochemical characteristics into rhyolitic and dacitic "end-members" and to differentiate characteristic trace element signatures with respect to Eu anomaly, depletions in middle and heavy rare earth elements (REE) and variable enrichments in light REE. To highlight these distinct compositional signatures, we applied LDA to selected ignimbrites for which comprehensive datasets were available. In comparison to traditional geochemical parameters we found that the advantage of multivariate statistics is their capability of dealing with large datasets and many variables (elements) and to take advantage of this n-dimensional space to detect subtle compositional differences contained in the data. The most important predictors for discriminating ignimbrites are La, Yb, Eu, Al2O3, K2O, P2O5, MgO, FeOt, and TiO2. However, other REE such as Gd, Pr, Tm, Sm, Dy and Er also contribute to the discrimination functions. Significant compositional differences were found between (1) the older (> 13 Ma) large-volume plateau-forming ignimbrites in northernmost Chile and southern Peru and (2) the younger (< 10 Ma) Altiplano-Puna-Volcanic-Complex (APVC) ignimbrites that are of similar volumes. Older ignimbrites are less depleted in HREE and less radiogenic in Sr isotopes, indicating smaller crustal contributions during evolution in a thinner and thermally less evolved crust. These compositional variations indicate a relation to crustal thickening with a "transition" from plagioclase to amphibole and garnet residual mineralogy between 13 and 9 Ma. Compositional and volumetric variations correlate to the N-S passage of the Juan-Fernandéz-Ridge, crustal shortening and thickening, and increased average crustal temperatures during the past 26 Ma. Table DR2 Mapped ignimbrite sheets.
Katseanes, Chelsea K; Chappell, Mark A; Hopkins, Bryan G; Durham, Brian D; Price, Cynthia L; Porter, Beth E; Miller, Lesley F
2017-12-01
After nearly a century of use in numerous munition platforms, TNT and RDX contamination has turned up largely in the environment due to ammunition manufacturing or as part of releases from low-order detonations during training activities. Although the basic knowledge governing the environmental fate of TNT and RDX are known, accurate predictions of TNT and RDX persistence in soil remain elusive, particularly given the universal heterogeneity of pedomorphic soil types. In this work, we proposed overcoming this problem by considering the environmental persistence of these munition constituents (MC) as multivariate mathematical functions over a variety of taxonomically distinct soil types, instead of a single constant or parameter of a specific absolute value. To test this idea, we conducted experiments where the disappearance kinetics of TNT and RDX were measured over a >300 h period in taxonomically distinct soils. Classical fertility-based soil measurements were log-transformed, statistically decomposed, and correlated to TNT and RDX disappearance rates (k -TNT and k -RDX ) using multivariate dimension-reduction and correlation techniques. From these efforts, we generated multivariate linear functions for k parameters across different soil types based on a statistically reduced set of their chemical and physical properties: Calculations showed that the soil properties exhibited strong covariance, with a prominent latent structure emerging as the basis for relative comparisons of the samples in reduced space. Loadings describing TNT degradation were largely driven by properties associated with alkaline/calcareous soil characteristics, while the degradation of RDX was attributed to the soil organic matter content - reflective of an important soil fertility characteristic. In spite of the differing responses to the munitions, batch data suggested that the overall nutrient dynamics were consistent for each soil type, as well as readily distinguishable from the other soil types used in this study. Thus, we hypothesized that the latent structure arising from the strong covariance of full multivariate geochemical matrix describing taxonomically distinguished "soil types" may provide the means for potentially predicting complex phenomena in soils. Published by Elsevier Ltd.
Bias-Free Chemically Diverse Test Sets from Machine Learning.
Swann, Ellen T; Fernandez, Michael; Coote, Michelle L; Barnard, Amanda S
2017-08-14
Current benchmarking methods in quantum chemistry rely on databases that are built using a chemist's intuition. It is not fully understood how diverse or representative these databases truly are. Multivariate statistical techniques like archetypal analysis and K-means clustering have previously been used to summarize large sets of nanoparticles however molecules are more diverse and not as easily characterized by descriptors. In this work, we compare three sets of descriptors based on the one-, two-, and three-dimensional structure of a molecule. Using data from the NIST Computational Chemistry Comparison and Benchmark Database and machine learning techniques, we demonstrate the functional relationship between these structural descriptors and the electronic energy of molecules. Archetypes and prototypes found with topological or Coulomb matrix descriptors can be used to identify smaller, statistically significant test sets that better capture the diversity of chemical space. We apply this same method to find a diverse subset of organic molecules to demonstrate how the methods can easily be reapplied to individual research projects. Finally, we use our bias-free test sets to assess the performance of density functional theory and quantum Monte Carlo methods.
Nutrition Deficiencies in Children With Intestinal Failure Receiving Chronic Parenteral Nutrition.
Namjoshi, Shweta S; Muradian, Sarah; Bechtold, Hannah; Reyen, Laurie; Venick, Robert S; Marcus, Elizabeth A; Vargas, Jorge H; Wozniak, Laura J
2017-02-01
Home parenteral nutrition (PN) is a lifesaving therapy for children with intestinal failure (IF). Our aims were to describe the prevalence of micronutrient deficiencies (vitamin D, zinc, copper, iron, selenium) in a diverse population of children with IF receiving PN and to identify and characterize risk factors associated with micronutrient deficiencies, including hematologic abnormalities. Data were collected on 60 eligible patients through retrospective chart review between May 2012 and February 2015. Descriptive statistics included frequencies, medians, interquartile ranges (IQRs), and odds ratios (ORs). Statistical analyses included χ 2 , Fisher's exact, t tests, and logistic, univariate, and multivariate regressions. Patients were primarily young (median age, 3.3 years; IQR, 0.7-8.4), Latino (62%), and male (56%), with short bowel syndrome (70%). Of 60 study patients, 88% had ≥1 deficiency and 90% were anemic for age. Of 51 patients who had all 5 markers checked, 59% had multiple deficiencies (defined as ≥3). Multivariate analysis shows multiple deficiencies were associated with nonwhite race (OR, 9.4; P = .012) and higher body mass index z score (OR, 2.2; P = .016). Children with severe anemia (hemoglobin <8.5 g/dL) made up 50% of the cohort. Nonwhite race (OR, 6.6; P = .037) and zinc deficiency (OR, 11; P = .003) were multivariate predictors of severe anemia. Micronutrient deficiency and anemia are overwhelmingly prevalent in children with IF using chronic PN. This emphasizes the importance of universal surveillance and supplementation to potentially improve quality of life and developmental outcomes. Future research should investigate how racial disparities might contribute to nutrition outcomes for children using chronic PN.
Multivariate Longitudinal Analysis with Bivariate Correlation Test
Adjakossa, Eric Houngla; Sadissou, Ibrahim; Hounkonnou, Mahouton Norbert; Nuel, Gregory
2016-01-01
In the context of multivariate multilevel data analysis, this paper focuses on the multivariate linear mixed-effects model, including all the correlations between the random effects when the dimensional residual terms are assumed uncorrelated. Using the EM algorithm, we suggest more general expressions of the model’s parameters estimators. These estimators can be used in the framework of the multivariate longitudinal data analysis as well as in the more general context of the analysis of multivariate multilevel data. By using a likelihood ratio test, we test the significance of the correlations between the random effects of two dependent variables of the model, in order to investigate whether or not it is useful to model these dependent variables jointly. Simulation studies are done to assess both the parameter recovery performance of the EM estimators and the power of the test. Using two empirical data sets which are of longitudinal multivariate type and multivariate multilevel type, respectively, the usefulness of the test is illustrated. PMID:27537692
Multivariate Longitudinal Analysis with Bivariate Correlation Test.
Adjakossa, Eric Houngla; Sadissou, Ibrahim; Hounkonnou, Mahouton Norbert; Nuel, Gregory
2016-01-01
In the context of multivariate multilevel data analysis, this paper focuses on the multivariate linear mixed-effects model, including all the correlations between the random effects when the dimensional residual terms are assumed uncorrelated. Using the EM algorithm, we suggest more general expressions of the model's parameters estimators. These estimators can be used in the framework of the multivariate longitudinal data analysis as well as in the more general context of the analysis of multivariate multilevel data. By using a likelihood ratio test, we test the significance of the correlations between the random effects of two dependent variables of the model, in order to investigate whether or not it is useful to model these dependent variables jointly. Simulation studies are done to assess both the parameter recovery performance of the EM estimators and the power of the test. Using two empirical data sets which are of longitudinal multivariate type and multivariate multilevel type, respectively, the usefulness of the test is illustrated.
Applying Sociocultural Theory to Teaching Statistics for Doctoral Social Work Students
ERIC Educational Resources Information Center
Mogro-Wilson, Cristina; Reeves, Michael G.; Charter, Mollie Lazar
2015-01-01
This article describes the development of two doctoral-level multivariate statistics courses utilizing sociocultural theory, an integrative pedagogical framework. In the first course, the implementation of sociocultural theory helps to support the students through a rigorous introduction to statistics. The second course involves students…
A review on the multivariate statistical methods for dimensional reduction studies
NASA Astrophysics Data System (ADS)
Aik, Lim Eng; Kiang, Lam Chee; Mohamed, Zulkifley Bin; Hong, Tan Wei
2017-05-01
In this research study we have discussed multivariate statistical methods for dimensional reduction, which has been done by various researchers. The reduction of dimensionality is valuable to accelerate algorithm progression, as well as really may offer assistance with the last grouping/clustering precision. A lot of boisterous or even flawed info information regularly prompts a not exactly alluring algorithm progression. Expelling un-useful or dis-instructive information segments may for sure help the algorithm discover more broad grouping locales and principles and generally speaking accomplish better exhibitions on new data set.
Generating an Empirical Probability Distribution for the Andrews-Pregibon Statistic.
ERIC Educational Resources Information Center
Jarrell, Michele G.
A probability distribution was developed for the Andrews-Pregibon (AP) statistic. The statistic, developed by D. F. Andrews and D. Pregibon (1978), identifies multivariate outliers. It is a ratio of the determinant of the data matrix with an observation deleted to the determinant of the entire data matrix. Although the AP statistic has been used…
Lepore, Natasha; Brun, Caroline A; Chiang, Ming-Chang; Chou, Yi-Yu; Dutton, Rebecca A; Hayashi, Kiralee M; Lopez, Oscar L; Aizenstein, Howard J; Toga, Arthur W; Becker, James T; Thompson, Paul M
2006-01-01
Tensor-based morphometry (TBM) is widely used in computational anatomy as a means to understand shape variation between structural brain images. A 3D nonlinear registration technique is typically used to align all brain images to a common neuroanatomical template, and the deformation fields are analyzed statistically to identify group differences in anatomy. However, the differences are usually computed solely from the determinants of the Jacobian matrices that are associated with the deformation fields computed by the registration procedure. Thus, much of the information contained within those matrices gets thrown out in the process. Only the magnitude of the expansions or contractions is examined, while the anisotropy and directional components of the changes are ignored. Here we remedy this problem by computing multivariate shape change statistics using the strain matrices. As the latter do not form a vector space, means and covariances are computed on the manifold of positive-definite matrices to which they belong. We study the brain morphology of 26 HIV/AIDS patients and 14 matched healthy control subjects using our method. The images are registered using a high-dimensional 3D fluid registration algorithm, which optimizes the Jensen-Rényi divergence, an information-theoretic measure of image correspondence. The anisotropy of the deformation is then computed. We apply a manifold version of Hotelling's T2 test to the strain matrices. Our results complement those found from the determinants of the Jacobians alone and provide greater power in detecting group differences in brain structure.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mayer, B. P.; Mew, D. A.; DeHope, A.
Attribution of the origin of an illicit drug relies on identification of compounds indicative of its clandestine production and is a key component of many modern forensic investigations. The results of these studies can yield detailed information on method of manufacture, starting material source, and final product - all critical forensic evidence. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic fentanyl, N-(1-phenylethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods, all previously published fentanyl synthetic routes or hybrid versions thereof, were studied in an effort to identify and classify route-specific signatures. 160 distinct compounds and inorganicmore » species were identified using gas and liquid chromatographies combined with mass spectrometric methods (GC-MS and LCMS/ MS-TOF) in conjunction with inductively coupled plasma mass spectrometry (ICPMS). The complexity of the resultant data matrix urged the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 87 route-specific CAS were classified and a statistical model capable of predicting the method of fentanyl synthesis was validated and tested against CAS profiles from crude fentanyl products deposited and later extracted from two operationally relevant surfaces: stainless steel and vinyl tile. This work provides the most detailed fentanyl CAS investigation to date by using orthogonal mass spectral data to identify CAS of forensic significance for illicit drug detection, profiling, and attribution.« less
Testing for qualitative heterogeneity: An application to composite endpoints in survival analysis.
Oulhaj, Abderrahim; El Ghouch, Anouar; Holman, Rury R
2017-01-01
Composite endpoints are frequently used in clinical outcome trials to provide more endpoints, thereby increasing statistical power. A key requirement for a composite endpoint to be meaningful is the absence of the so-called qualitative heterogeneity to ensure a valid overall interpretation of any treatment effect identified. Qualitative heterogeneity occurs when individual components of a composite endpoint exhibit differences in the direction of a treatment effect. In this paper, we develop a general statistical method to test for qualitative heterogeneity, that is to test whether a given set of parameters share the same sign. This method is based on the intersection-union principle and, provided that the sample size is large, is valid whatever the model used for parameters estimation. We propose two versions of our testing procedure, one based on a random sampling from a Gaussian distribution and another version based on bootstrapping. Our work covers both the case of completely observed data and the case where some observations are censored which is an important issue in many clinical trials. We evaluated the size and power of our proposed tests by carrying out some extensive Monte Carlo simulations in the case of multivariate time to event data. The simulations were designed under a variety of conditions on dimensionality, censoring rate, sample size and correlation structure. Our testing procedure showed very good performances in terms of statistical power and type I error. The proposed test was applied to a data set from a single-center, randomized, double-blind controlled trial in the area of Alzheimer's disease.
Multivariate Analysis of Genotype-Phenotype Association.
Mitteroecker, Philipp; Cheverud, James M; Pavlicev, Mihaela
2016-04-01
With the advent of modern imaging and measurement technology, complex phenotypes are increasingly represented by large numbers of measurements, which may not bear biological meaning one by one. For such multivariate phenotypes, studying the pairwise associations between all measurements and all alleles is highly inefficient and prevents insight into the genetic pattern underlying the observed phenotypes. We present a new method for identifying patterns of allelic variation (genetic latent variables) that are maximally associated-in terms of effect size-with patterns of phenotypic variation (phenotypic latent variables). This multivariate genotype-phenotype mapping (MGP) separates phenotypic features under strong genetic control from less genetically determined features and thus permits an analysis of the multivariate structure of genotype-phenotype association, including its dimensionality and the clustering of genetic and phenotypic variables within this association. Different variants of MGP maximize different measures of genotype-phenotype association: genetic effect, genetic variance, or heritability. In an application to a mouse sample, scored for 353 SNPs and 11 phenotypic traits, the first dimension of genetic and phenotypic latent variables accounted for >70% of genetic variation present in all 11 measurements; 43% of variation in this phenotypic pattern was explained by the corresponding genetic latent variable. The first three dimensions together sufficed to account for almost 90% of genetic variation in the measurements and for all the interpretable genotype-phenotype association. Each dimension can be tested as a whole against the hypothesis of no association, thereby reducing the number of statistical tests from 7766 to 3-the maximal number of meaningful independent tests. Important alleles can be selected based on their effect size (additive or nonadditive effect on the phenotypic latent variable). This low dimensionality of the genotype-phenotype map has important consequences for gene identification and may shed light on the evolvability of organisms. Copyright © 2016 by the Genetics Society of America.
Pounds, Stan; Cheng, Cheng; Cao, Xueyuan; Crews, Kristine R.; Plunkett, William; Gandhi, Varsha; Rubnitz, Jeffrey; Ribeiro, Raul C.; Downing, James R.; Lamba, Jatinder
2009-01-01
Motivation: In some applications, prior biological knowledge can be used to define a specific pattern of association of multiple endpoint variables with a genomic variable that is biologically most interesting. However, to our knowledge, there is no statistical procedure designed to detect specific patterns of association with multiple endpoint variables. Results: Projection onto the most interesting statistical evidence (PROMISE) is proposed as a general procedure to identify genomic variables that exhibit a specific biologically interesting pattern of association with multiple endpoint variables. Biological knowledge of the endpoint variables is used to define a vector that represents the biologically most interesting values for statistics that characterize the associations of the endpoint variables with a genomic variable. A test statistic is defined as the dot-product of the vector of the observed association statistics and the vector of the most interesting values of the association statistics. By definition, this test statistic is proportional to the length of the projection of the observed vector of correlations onto the vector of most interesting associations. Statistical significance is determined via permutation. In simulation studies and an example application, PROMISE shows greater statistical power to identify genes with the interesting pattern of associations than classical multivariate procedures, individual endpoint analyses or listing genes that have the pattern of interest and are significant in more than one individual endpoint analysis. Availability: Documented R routines are freely available from www.stjuderesearch.org/depts/biostats and will soon be available as a Bioconductor package from www.bioconductor.org. Contact: stanley.pounds@stjude.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19528086
Liver Rapid Reference Set Application: Hemken - Abbott (2015) — EDRN Public Portal
The aim for this testing is to find a small panel of biomarkers (n=2-5) that can be tested on the Abbott ARCHITECT automated immunoassay platform for the early detection of hepatocellular carcinoma (HCC). This panel of biomarkers should perform significantly better than alpha-fetoprotein (AFP) alone based on multivariate statistical analysis. This testing of the EDRN reference set will help expedite the selection of a small panel of ARCHITECT biomarkers for the early detection of HCC. The panel of ARCHITECT biomarkers Abbott plans to test include: AFP, protein induced by vitamin K absence or antagonist-II (PIVKA-II), golgi protein 73 (GP73), hepatocellular growth factor (HGF), dipeptidyl peptidase 4 (DPP4) and DPP4/seprase (surface expressed protease) heterodimer hybrid. PIVKA-II is abnormal des-carboxylated prothrombin (DCP) present in vitamin K deficiency.
Pingault, Jean Baptiste; Côté, Sylvana M; Petitclerc, Amélie; Vitaro, Frank; Tremblay, Richard E
2015-01-01
Parental educational expectations have been associated with children's educational attainment in a number of long-term longitudinal studies, but whether this relationship is causal has long been debated. The aims of this prospective study were twofold: 1) test whether low maternal educational expectations contributed to failure to graduate from high school; and 2) compare the results obtained using different strategies for accounting for confounding variables (i.e. multivariate regression and propensity score matching). The study sample included 1,279 participants from the Quebec Longitudinal Study of Kindergarten Children. Maternal educational expectations were assessed when the participants were aged 12 years. High school graduation—measuring educational attainment—was determined through the Quebec Ministry of Education when the participants were aged 22-23 years. Findings show that when using the most common statistical approach (i.e. multivariate regressions to adjust for a restricted set of potential confounders) the contribution of low maternal educational expectations to failure to graduate from high school was statistically significant. However, when using propensity score matching, the contribution of maternal expectations was reduced and remained statistically significant only for males. The results of this study are consistent with the possibility that the contribution of parental expectations to educational attainment is overestimated in the available literature. This may be explained by the use of a restricted range of potential confounding variables as well as the dearth of studies using appropriate statistical techniques and study designs in order to minimize confounding. Each of these techniques and designs, including propensity score matching, has its strengths and limitations: A more comprehensive understanding of the causal role of parental expectations will stem from a convergence of findings from studies using different techniques and designs.
Han, Buhm; Kang, Hyun Min; Eskin, Eleazar
2009-01-01
With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu. PMID:19381255
NASA Technical Reports Server (NTRS)
Djorgovski, George
1993-01-01
The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multiparameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resource.
NASA Technical Reports Server (NTRS)
Djorgovski, Stanislav
1992-01-01
The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multi parameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resources.
Jeon, Sung-Hwan; Leem, Jong-Han; Park, Shin-Goo; Heo, Yong-Seok; Lee, Bum-Joon; Moon, So-Hyun; Jung, Dal-Young; Kim, Hwan-Cheol
2014-03-24
The purpose of the present study was to identify the association between presenteeism and long working hours, shiftwork, and occupational stress using representative national survey data on Korean workers. We analyzed data from the second Korean Working Conditions Survey (KWCS), which was conducted in 2010, in which a total of 6,220 wage workers were analyzed. The study population included the economically active population aged above 15 years, and living in the Republic of Korea. We used the chi-squared test and multivariate logistic regression to test the statistical association between presenteeism and working hours, shiftwork, and occupational stress. Approximately 19% of the workers experienced presenteeism during the previous 12 months. Women had higher rates of presenteeism than men. We found a statistically significant dose-response relationship between working hours and presenteeism. Shift workers had a slightly higher rate of presenteeism than non-shift workers, but the difference was not statistically significant. Occupational stress, such as high job demand, lack of rewards, and inadequate social support, had a significant association with presenteeism. The present study suggests that long working hours and occupational stress are significantly related to presenteeism.
R package MVR for Joint Adaptive Mean-Variance Regularization and Variance Stabilization
Dazard, Jean-Eudes; Xu, Hua; Rao, J. Sunil
2015-01-01
We present an implementation in the R language for statistical computing of our recent non-parametric joint adaptive mean-variance regularization and variance stabilization procedure. The method is specifically suited for handling difficult problems posed by high-dimensional multivariate datasets (p ≫ n paradigm), such as in ‘omics’-type data, among which are that the variance is often a function of the mean, variable-specific estimators of variances are not reliable, and tests statistics have low powers due to a lack of degrees of freedom. The implementation offers a complete set of features including: (i) normalization and/or variance stabilization function, (ii) computation of mean-variance-regularized t and F statistics, (iii) generation of diverse diagnostic plots, (iv) synthetic and real ‘omics’ test datasets, (v) computationally efficient implementation, using C interfacing, and an option for parallel computing, (vi) manual and documentation on how to setup a cluster. To make each feature as user-friendly as possible, only one subroutine per functionality is to be handled by the end-user. It is available as an R package, called MVR (‘Mean-Variance Regularization’), downloadable from the CRAN. PMID:26819572
2014-01-01
Objectives The purpose of the present study was to identify the association between presenteeism and long working hours, shiftwork, and occupational stress using representative national survey data on Korean workers. Methods We analyzed data from the second Korean Working Conditions Survey (KWCS), which was conducted in 2010, in which a total of 6,220 wage workers were analyzed. The study population included the economically active population aged above 15 years, and living in the Republic of Korea. We used the chi-squared test and multivariate logistic regression to test the statistical association between presenteeism and working hours, shiftwork, and occupational stress. Results Approximately 19% of the workers experienced presenteeism during the previous 12 months. Women had higher rates of presenteeism than men. We found a statistically significant dose–response relationship between working hours and presenteeism. Shift workers had a slightly higher rate of presenteeism than non-shift workers, but the difference was not statistically significant. Occupational stress, such as high job demand, lack of rewards, and inadequate social support, had a significant association with presenteeism. Conclusions The present study suggests that long working hours and occupational stress are significantly related to presenteeism. PMID:24661575
Dangers in Using Analysis of Covariance Procedures.
ERIC Educational Resources Information Center
Campbell, Kathleen T.
Problems associated with the use of analysis of covariance (ANCOVA) as a statistical control technique are explained. Three problems relate to the use of "OVA" methods (analysis of variance, analysis of covariance, multivariate analysis of variance, and multivariate analysis of covariance) in general. These are: (1) the wasting of information when…
A Multivariate Solution of the Multivariate Ranking and Selection Problem
1980-02-01
Taneja (1972)), a ’a for a vector of constants c (Krishnaiah and Rizvi (1966)), the generalized variance ( Gnanadesikan and Gupta (1970)), iegier (1976...Olk-in, I. and Sobel, M. (1977). Selecting and Ordering Populations: A New Statistical Methodology, John Wiley & Sons, Inc., New York. Gnanadesikan
Evaluation of Meterorite Amono Acid Analysis Data Using Multivariate Techniques
NASA Technical Reports Server (NTRS)
McDonald, G.; Storrie-Lombardi, M.; Nealson, K.
1999-01-01
The amino acid distributions in the Murchison carbonaceous chondrite, Mars meteorite ALH84001, and ice from the Allan Hills region of Antarctica are shown, using a multivariate technique known as Principal Component Analysis (PCA), to be statistically distinct from the average amino acid compostion of 101 terrestrial protein superfamilies.
Cross-validation of a dementia screening test in a heterogeneous population.
Ritchie, K A; Hallerman, E F
1989-09-01
Recognition of the increasing importance of early dementia screening for both research and clinical purposes has led to the development of numerous screening instruments. The most promising of these are based on neuropsychological measures which are able to focus on very specific cognitive functions. Of these tests the Iowa screening test is of particular interest to researchers and clinicians working with heterogenous populations or wishing to make cross-cultural comparisons as it is relatively culture-fair and does not assume literacy. A preliminary study of the performance of the Iowa in an Israeli sample of diverse ethnic origins and low education level suggests it to be a very sensitive measure even in such groups. The study also demonstrates the inadvisability of adopting item weights derived by multivariate statistical techniques from another population.
Hohn, M. Ed; Nuhfer, E.B.; Vinopal, R.J.; Klanderman, D.S.
1980-01-01
Classifying very fine-grained rocks through fabric elements provides information about depositional environments, but is subject to the biases of visual taxonomy. To evaluate the statistical significance of an empirical classification of very fine-grained rocks, samples from Devonian shales in four cored wells in West Virginia and Virginia were measured for 15 variables: quartz, illite, pyrite and expandable clays determined by X-ray diffraction; total sulfur, organic content, inorganic carbon, matrix density, bulk density, porosity, silt, as well as density, sonic travel time, resistivity, and ??-ray response measured from well logs. The four lithologic types comprised: (1) sharply banded shale, (2) thinly laminated shale, (3) lenticularly laminated shale, and (4) nonbanded shale. Univariate and multivariate analyses of variance showed that the lithologic classification reflects significant differences for the variables measured, difference that can be detected independently of stratigraphic effects. Little-known statistical methods found useful in this work included: the multivariate analysis of variance with more than one effect, simultaneous plotting of samples and variables on canonical variates, and the use of parametric ANOVA and MANOVA on ranked data. ?? 1980 Plenum Publishing Corporation.
Buttigieg, Pier Luigi; Ramette, Alban
2014-12-01
The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynamic, web-based resource providing accessible descriptions of numerous multivariate techniques relevant to microbial ecologists. A combination of interactive elements allows users to discover and navigate between methods relevant to their needs and examine how they have been used by others in the field. We have designed GUSTA ME to become a community-led and -curated service, which we hope will provide a common reference and forum to discuss and disseminate analytical techniques relevant to the microbial ecology community. © 2014 The Authors. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies.
Ensembles of radial basis function networks for spectroscopic detection of cervical precancer
NASA Technical Reports Server (NTRS)
Tumer, K.; Ramanujam, N.; Ghosh, J.; Richards-Kortum, R.
1998-01-01
The mortality related to cervical cancer can be substantially reduced through early detection and treatment. However, current detection techniques, such as Pap smear and colposcopy, fail to achieve a concurrently high sensitivity and specificity. In vivo fluorescence spectroscopy is a technique which quickly, noninvasively and quantitatively probes the biochemical and morphological changes that occur in precancerous tissue. A multivariate statistical algorithm was used to extract clinically useful information from tissue spectra acquired from 361 cervical sites from 95 patients at 337-, 380-, and 460-nm excitation wavelengths. The multivariate statistical analysis was also employed to reduce the number of fluorescence excitation-emission wavelength pairs required to discriminate healthy tissue samples from precancerous tissue samples. The use of connectionist methods such as multilayered perceptrons, radial basis function (RBF) networks, and ensembles of such networks was investigated. RBF ensemble algorithms based on fluorescence spectra potentially provide automated and near real-time implementation of precancer detection in the hands of nonexperts. The results are more reliable, direct, and accurate than those achieved by either human experts or multivariate statistical algorithms.
Two-sample tests and one-way MANOVA for multivariate biomarker data with nondetects.
Thulin, M
2016-09-10
Testing whether the mean vector of a multivariate set of biomarkers differs between several populations is an increasingly common problem in medical research. Biomarker data is often left censored because some measurements fall below the laboratory's detection limit. We investigate how such censoring affects multivariate two-sample and one-way multivariate analysis of variance tests. Type I error rates, power and robustness to increasing censoring are studied, under both normality and non-normality. Parametric tests are found to perform better than non-parametric alternatives, indicating that the current recommendations for analysis of censored multivariate data may have to be revised. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Chen, Yong; Luo, Sheng; Chu, Haitao; Wei, Peng
2013-05-01
Multivariate meta-analysis is useful in combining evidence from independent studies which involve several comparisons among groups based on a single outcome. For binary outcomes, the commonly used statistical models for multivariate meta-analysis are multivariate generalized linear mixed effects models which assume risks, after some transformation, follow a multivariate normal distribution with possible correlations. In this article, we consider an alternative model for multivariate meta-analysis where the risks are modeled by the multivariate beta distribution proposed by Sarmanov (1966). This model have several attractive features compared to the conventional multivariate generalized linear mixed effects models, including simplicity of likelihood function, no need to specify a link function, and has a closed-form expression of distribution functions for study-specific risk differences. We investigate the finite sample performance of this model by simulation studies and illustrate its use with an application to multivariate meta-analysis of adverse events of tricyclic antidepressants treatment in clinical trials.
Blaine, Kevin P; Press, Christopher; Lau, Ken; Sliwa, Jan; Rao, Vidya K; Hill, Charles
2016-12-01
The aim of this study was to compare the effectiveness of epsilon-aminocaproic acid (εACA) and tranexamic acid (TXA) in contemporary clinical practice during a national medication shortage. A retrospective cohort study. The study was performed in all consecutive cardiac surgery patients (n=128) admitted to the cardiac-surgical intensive care unit after surgery at a single academic center immediately before and during a national medication shortage. Demographic, clinical, and outcomes data were compared by descriptive statistics using χ 2 and t test. Surgical drainage and transfusions were compared by multivariate linear regression for patients receiving εACA before the shortage and TXA during the shortage. In multivariate analysis, no statistical difference was found for surgical drain output (OR 1.10, CI 0.97-1.26, P=.460) or red blood cell transfusion requirement (OR 1.79, CI 0.79-2.73, P=.176). Patients receiving εACA were more likely to receive rescue hemostatic medications (OR 1.62, CI 1.02-2.55, P=.041). Substitution of εACA with TXA during a national medication shortage produced equivalent postoperative bleeding and red cell transfusions, although patients receiving εACA were more likely to require supplemental hemostatic agents. Published by Elsevier Inc.
Li, Cen; Yang, Hongxia; Xiao, Yuancan; Zhandui; Sanglao; Wang, Zhang; Ladan, Duojie; Bi, Hongtao
2016-01-01
Zuotai (gTso thal) is one of the famous drugs containing mercury in Tibetan medicine. However, little is known about the chemical substance basis of its pharmacodynamics and the intrinsic link of different samples sources so far. Given this, energy dispersive spectrometry of X-ray (EDX), scanning electron microscopy (SEM), atomic force microscopy (AFM), and powder X-ray diffraction (XRD) were used to assay the elements, micromorphology, and phase composition of nine Zuotai samples from different regions, respectively; the XRD fingerprint features of Zuotai were analyzed by multivariate statistical analysis. EDX result shows that Zuotai contains Hg, S, O, Fe, Al, Cu, and other elements. SEM and AFM observations suggest that Zuotai is a kind of ancient nanodrug. Its particles are mainly in the range of 100–800 nm, which commonly further aggregate into 1–30 μm loosely amorphous particles. XRD test shows that β-HgS, S8, and α-HgS are its main phase compositions. XRD fingerprint analysis indicates that the similarity degrees of nine samples are very high, and the results of multivariate statistical analysis are broadly consistent with sample sources. The present research has revealed the physicochemical characteristics of Zuotai, and it would play a positive role in interpreting this mysterious Tibetan drug. PMID:27738409
Li, Cen; Yang, Hongxia; Du, Yuzhi; Xiao, Yuancan; Zhandui; Sanglao; Wang, Zhang; Ladan, Duojie; Bi, Hongtao; Wei, Lixin
2016-01-01
Zuotai ( gTso thal ) is one of the famous drugs containing mercury in Tibetan medicine. However, little is known about the chemical substance basis of its pharmacodynamics and the intrinsic link of different samples sources so far. Given this, energy dispersive spectrometry of X-ray (EDX), scanning electron microscopy (SEM), atomic force microscopy (AFM), and powder X-ray diffraction (XRD) were used to assay the elements, micromorphology, and phase composition of nine Zuotai samples from different regions, respectively; the XRD fingerprint features of Zuotai were analyzed by multivariate statistical analysis. EDX result shows that Zuotai contains Hg, S, O, Fe, Al, Cu, and other elements. SEM and AFM observations suggest that Zuotai is a kind of ancient nanodrug. Its particles are mainly in the range of 100-800 nm, which commonly further aggregate into 1-30 μ m loosely amorphous particles. XRD test shows that β -HgS, S 8 , and α -HgS are its main phase compositions. XRD fingerprint analysis indicates that the similarity degrees of nine samples are very high, and the results of multivariate statistical analysis are broadly consistent with sample sources. The present research has revealed the physicochemical characteristics of Zuotai , and it would play a positive role in interpreting this mysterious Tibetan drug.
Chounlamany, Vanseng; Tanchuling, Maria Antonia; Inoue, Takanobu
2017-09-01
Payatas landfill in Quezon City, Philippines, releases leachate to the Marikina River through a creek. Multivariate statistical techniques were applied to study temporal and spatial variations in water quality of a segment of the Marikina River. The data set included 12 physico-chemical parameters for five monitoring stations over a year. Cluster analysis grouped the monitoring stations into four clusters and identified January-May as dry season and June-September as wet season. Principal components analysis showed that three latent factors are responsible for the data set explaining 83% of its total variance. The chemical oxygen demand, biochemical oxygen demand, total dissolved solids, Cl - and PO 4 3- are influenced by anthropogenic impact/eutrophication pollution from point sources. Total suspended solids, turbidity and SO 4 2- are influenced by rain and soil erosion. The highest state of pollution is at the Payatas creek outfall from March to May, whereas at downstream stations it is in May. The current study indicates that the river monitoring requires only four stations, nine water quality parameters and testing over three specific months of the year. The findings of this study imply that Payatas landfill requires a proper leachate collection and treatment system to reduce its impact on the Marikina River.
Goode, C; LeRoy, J; Allen, D G
2007-01-01
This study reports on a multivariate analysis of the moving bed biofilm reactor (MBBR) wastewater treatment system at a Canadian pulp mill. The modelling approach involved a data overview by principal component analysis (PCA) followed by partial least squares (PLS) modelling with the objective of explaining and predicting changes in the BOD output of the reactor. Over two years of data with 87 process measurements were used to build the models. Variables were collected from the MBBR control scheme as well as upstream in the bleach plant and in digestion. To account for process dynamics, a variable lagging approach was used for variables with significant temporal correlations. It was found that wood type pulped at the mill was a significant variable governing reactor performance. Other important variables included flow parameters, faults in the temperature or pH control of the reactor, and some potential indirect indicators of biomass activity (residual nitrogen and pH out). The most predictive model was found to have an RMSEP value of 606 kgBOD/d, representing a 14.5% average error. This was a good fit, given the measurement error of the BOD test. Overall, the statistical approach was effective in describing and predicting MBBR treatment performance.
Rollins, Derrick K; Teh, Ailing
2010-12-17
Microarray data sets provide relative expression levels for thousands of genes for a small number, in comparison, of different experimental conditions called assays. Data mining techniques are used to extract specific information of genes as they relate to the assays. The multivariate statistical technique of principal component analysis (PCA) has proven useful in providing effective data mining methods. This article extends the PCA approach of Rollins et al. to the development of ranking genes of microarray data sets that express most differently between two biologically different grouping of assays. This method is evaluated on real and simulated data and compared to a current approach on the basis of false discovery rate (FDR) and statistical power (SP) which is the ability to correctly identify important genes. This work developed and evaluated two new test statistics based on PCA and compared them to a popular method that is not PCA based. Both test statistics were found to be effective as evaluated in three case studies: (i) exposing E. coli cells to two different ethanol levels; (ii) application of myostatin to two groups of mice; and (iii) a simulated data study derived from the properties of (ii). The proposed method (PM) effectively identified critical genes in these studies based on comparison with the current method (CM). The simulation study supports higher identification accuracy for PM over CM for both proposed test statistics when the gene variance is constant and for one of the test statistics when the gene variance is non-constant. PM compares quite favorably to CM in terms of lower FDR and much higher SP. Thus, PM can be quite effective in producing accurate signatures from large microarray data sets for differential expression between assays groups identified in a preliminary step of the PCA procedure and is, therefore, recommended for use in these applications.
Kimmel, Lara A; Holland, Anne E; Edwards, Elton R; Cameron, Peter A; De Steiger, Richard; Page, Richard S; Gabbe, Belinda
2012-06-01
Accurate prediction of the likelihood of discharge to inpatient rehabilitation following lower limb fracture made on admission to hospital may assist patient discharge planning and decrease the burden on the hospital system caused by delays in decision making. To develop a prognostic model for discharge to inpatient rehabilitation. Isolated lower extremity fracture cases (excluding fractured neck of femur), captured by the Victorian Orthopaedic Trauma Outcomes Registry (VOTOR), were extracted for analysis. A training data set was created for model development and validation data set for evaluation. A multivariable logistic regression model was developed based on patient and injury characteristics. Models were assessed using measures of discrimination (C-statistic) and calibration (Hosmer-Lemeshow (H-L) statistic). A total of 1429 patients met the inclusion criteria and were randomly split into training and test data sets. Increasing age, more proximal fracture type, compensation or private fund source for the admission, metropolitan location of residence, not working prior to injury and having a self-reported pre-injury disability were included in the final prediction model. The C-statistic for the model was 0.92 (95% confidence interval (CI) 0.88, 0.95) with an H-L statistic of χ(2)=11.62, p=0.17. For the test data set, the C-statistic was 0.86 (95% CI 0.83, 0.90) with an H-L statistic of χ(2)=37.98, p<0.001. A model to predict discharge to inpatient rehabilitation following lower limb fracture was developed with excellent discrimination although the calibration was reduced in the test data set. This model requires prospective testing but could form an integral part of decision making in regards to discharge disposition to facilitate timely and accurate referral to rehabilitation and optimise resource allocation. Copyright © 2011 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
O'Shea, Bethany; Jankowski, Jerzy
2006-12-01
The major ion composition of Great Artesian Basin groundwater in the lower Namoi River valley is relatively homogeneous in chemical composition. Traditional graphical techniques have been combined with multivariate statistical methods to determine whether subtle differences in the chemical composition of these waters can be delineated. Hierarchical cluster analysis and principal components analysis were successful in delineating minor variations within the groundwaters of the study area that were not visually identified in the graphical techniques applied. Hydrochemical interpretation allowed geochemical processes to be identified in each statistically defined water type and illustrated how these groundwaters differ from one another. Three main geochemical processes were identified in the groundwaters: ion exchange, precipitation, and mixing between waters from different sources. Both statistical methods delineated an anomalous sample suspected of being influenced by magmatic CO2 input. The use of statistical methods to complement traditional graphical techniques for waters appearing homogeneous is emphasized for all investigations of this type. Copyright
Clustering Multivariate Time Series Using Hidden Markov Models
Ghassempour, Shima; Girosi, Federico; Maeder, Anthony
2014-01-01
In this paper we describe an algorithm for clustering multivariate time series with variables taking both categorical and continuous values. Time series of this type are frequent in health care, where they represent the health trajectories of individuals. The problem is challenging because categorical variables make it difficult to define a meaningful distance between trajectories. We propose an approach based on Hidden Markov Models (HMMs), where we first map each trajectory into an HMM, then define a suitable distance between HMMs and finally proceed to cluster the HMMs with a method based on a distance matrix. We test our approach on a simulated, but realistic, data set of 1,255 trajectories of individuals of age 45 and over, on a synthetic validation set with known clustering structure, and on a smaller set of 268 trajectories extracted from the longitudinal Health and Retirement Survey. The proposed method can be implemented quite simply using standard packages in R and Matlab and may be a good candidate for solving the difficult problem of clustering multivariate time series with categorical variables using tools that do not require advanced statistic knowledge, and therefore are accessible to a wide range of researchers. PMID:24662996
Chromatography methods and chemometrics for determination of milk fat adulterants
NASA Astrophysics Data System (ADS)
Trbović, D.; Petronijević, R.; Đorđević, V.
2017-09-01
Milk and milk-based products are among the leading food categories according to reported cases of food adulteration. Although many authentication problems exist in all areas of the food industry, adequate control methods are required to evaluate the authenticity of milk and milk products in the dairy industry. Moreover, gas chromatography (GC) analysis of triacylglycerols (TAGs) or fatty acid (FA) profiles of milk fat (MF) in combination with multivariate statistical data processing have been used to detect adulterations of milk and dairy products with foreign fats. The adulteration of milk and butter is a major issue for the dairy industry. The major adulterants of MF are vegetable oils (soybean, sunflower, groundnut, coconut, palm and peanut oil) and animal fat (cow tallow and pork lard). Multivariate analysis enables adulterated MF to be distinguished from authentic MF, while taking into account many analytical factors. Various multivariate analysis methods have been proposed to quantitatively detect levels of adulterant non-MFs, with multiple linear regression (MLR) seemingly the most suitable. There is a need for increased use of chemometric data analyses to detect adulterated MF in foods and for their expanded use in routine quality assurance testing.
Konukoglu, Ender; Coutu, Jean-Philippe; Salat, David H; Fischl, Bruce
2016-07-01
Diffusion magnetic resonance imaging (dMRI) is a unique technology that allows the noninvasive quantification of microstructural tissue properties of the human brain in healthy subjects as well as the probing of disease-induced variations. Population studies of dMRI data have been essential in identifying pathological structural changes in various conditions, such as Alzheimer's and Huntington's diseases (Salat et al., 2010; Rosas et al., 2006). The most common form of dMRI involves fitting a tensor to the underlying imaging data (known as diffusion tensor imaging, or DTI), then deriving parametric maps, each quantifying a different aspect of the underlying microstructure, e.g. fractional anisotropy and mean diffusivity. To date, the statistical methods utilized in most DTI population studies either analyzed only one such map or analyzed several of them, each in isolation. However, it is most likely that variations in the microstructure due to pathology or normal variability would affect several parameters simultaneously, with differing variations modulating the various parameters to differing degrees. Therefore, joint analysis of the available diffusion maps can be more powerful in characterizing histopathology and distinguishing between conditions than the widely used univariate analysis. In this article, we propose a multivariate approach for statistical analysis of diffusion parameters that uses partial least squares correlation (PLSC) analysis and permutation testing as building blocks in a voxel-wise fashion. Stemming from the common formulation, we present three different multivariate procedures for group analysis, regressing-out nuisance parameters and comparing effects of different conditions. We used the proposed procedures to study the effects of non-demented aging, Alzheimer's disease and mild cognitive impairment on the white matter. Here, we present results demonstrating that the proposed PLSC-based approach can differentiate between effects of different conditions in the same region as well as uncover spatial variations of effects across the white matter. The proposed procedures were able to answer questions on structural variations such as: "are there regions in the white matter where Alzheimer's disease has a different effect than aging or similar effect as aging?" and "are there regions in the white matter that are affected by both mild cognitive impairment and Alzheimer's disease but with differing multivariate effects?" Copyright © 2016 Elsevier Inc. All rights reserved.
Konukoglu, Ender; Coutu, Jean-Philippe; Salat, David H.; Fischl, Bruce
2016-01-01
Diffusion magnetic resonance imaging (dMRI) is a unique technology that allows the noninvasive quantification of microstructural tissue properties of the human brain in healthy subjects as well as the probing of disease-induced variations. Population studies of dMRI data have been essential in identifying pathological structural changes in various conditions, such as Alzheimer’s and Huntington’s diseases1,2. The most common form of dMRI involves fitting a tensor to the underlying imaging data (known as Diffusion Tensor Imaging, or DTI), then deriving parametric maps, each quantifying a different aspect of the underlying microstructure, e.g. fractional anisotropy and mean diffusivity. To date, the statistical methods utilized in most DTI population studies either analyzed only one such map or analyzed several of them, each in isolation. However, it is most likely that variations in the microstructure due to pathology or normal variability would affect several parameters simultaneously, with differing variations modulating the various parameters to differing degrees. Therefore, joint analysis of the available diffusion maps can be more powerful in characterizing histopathology and distinguishing between conditions than the widely used univariate analysis. In this article, we propose a multivariate approach for statistical analysis of diffusion parameters that uses partial least squares correlation (PLSC) analysis and permutation testing as building blocks in a voxel-wise fashion. Stemming from the common formulation, we present three different multivariate procedures for group analysis, regressing-out nuisance parameters and comparing effects of different conditions. We used the proposed procedures to study the effects of non-demented aging, Alzheimer’s disease and mild cognitive impairment on the white matter. Here, we present results demonstrating that the proposed PLSC-based approach can differentiate between effects of different conditions in the same region as well as uncover spatial variations of effects across the white matter. The proposed procedures were able to answer questions on structural variations such as: “are there regions in the white matter where Alzheimer’s disease has a different effect than aging or similar effect as aging?” and “are there regions in the white matter that are affected by both mild cognitive impairment and Alzheimer’s disease but with differing multivariate effects?” PMID:27103138
Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.
Harrington, Peter de Boves
2018-01-02
Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.
Factors associated with abnormal eating attitudes among Greek adolescents.
Bilali, Aggeliki; Galanis, Petros; Velonakis, Emmanuel; Katostaras, Theofanis
2010-01-01
To estimate the prevalence of abnormal eating attitudes among Greek adolescents and identify possible risk factors associated with these attitudes. Cross-sectional, school-based study. Six randomly selected schools in Patras, southern Greece. The study population consisted of 540 Greek students aged 13-18 years, and the response rate was 97%. The dependent variable was scores on the Eating Attitudes Test-26, with scores > or = 20 indicating abnormal eating attitudes. Bivariate analysis included independent Student t test, chi-square test, and Fisher's exact test. Multivariate logistic regression analysis was applied for the identification of the predictive factors, which were associated independently with abnormal eating attitudes. A 2-sided P value of less than .05 was considered statistically significant. The prevalence of abnormal eating attitudes was 16.7%. Multivariate logistic regression analysis demonstrated that females, urban residents, and those with a body mass index outside normal range, a perception of being overweight, body dissatisfaction, and a family member on a diet were independently related to abnormal eating attitudes. The results indicate that a proportion of Greek adolescents report abnormal eating attitudes and suggest that multiple factors contribute to the development of these attitudes. These findings are useful for further research into this topic and would be valuable in designing preventive interventions. Copyright 2010 Society for Nutrition Education. Published by Elsevier Inc. All rights reserved.
Hoenigl, Martin; Weibel, Nadir; Mehta, Sanjay R; Anderson, Christy M; Jenks, Jeffrey; Green, Nella; Gianella, Sara; Smith, Davey M; Little, Susan J
2015-08-01
Although men who have sex with men (MSM) represent a dominant risk group for human immunodeficiency virus (HIV), the risk of HIV infection within this population is not uniform. The objective of this study was to develop and validate a score to estimate incident HIV infection risk. Adult MSM who were tested for acute and early HIV (AEH) between 2008 and 2014 were retrospectively randomized 2:1 to a derivation and validation dataset, respectively. Using the derivation dataset, each predictor associated with an AEH outcome in the multivariate prediction model was assigned a point value that corresponded to its odds ratio. The score was validated on the validation dataset using C-statistics. Data collected at a single HIV testing encounter from 8326 unique MSM were analyzed, including 200 with AEH (2.4%). Four risk behavior variables were significantly associated with an AEH diagnosis (ie, incident infection) in multivariable analysis and were used to derive the San Diego Early Test (SDET) score: condomless receptive anal intercourse (CRAI) with an HIV-positive MSM (3 points), the combination of CRAI plus ≥5 male partners (3 points), ≥10 male partners (2 points), and diagnosis of bacterial sexually transmitted infection (2 points)-all as reported for the prior 12 months. The C-statistic for this risk score was >0.7 in both data sets. The SDET risk score may help to prioritize resources and target interventions, such as preexposure prophylaxis, to MSM at greatest risk of acquiring HIV infection. The SDET risk score is deployed as a freely available tool at http://sdet.ucsd.edu. © The Author 2015. Published by Oxford University Press on behalf of the Infectious Diseases Society of America. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Theodorakou, Chrysoula; Farquharson, Michael J.
2009-08-01
The motivation behind this study is to assess whether angular dispersive x-ray diffraction (ADXRD) data, processed using multivariate analysis techniques, can be used for classifying secondary colorectal liver cancer tissue and normal surrounding liver tissue in human liver biopsy samples. The ADXRD profiles from a total of 60 samples of normal liver tissue and colorectal liver metastases were measured using a synchrotron radiation source. The data were analysed for 56 samples using nonlinear peak-fitting software. Four peaks were fitted to all of the ADXRD profiles, and the amplitude, area, amplitude and area ratios for three of the four peaks were calculated and used for the statistical and multivariate analysis. The statistical analysis showed that there are significant differences between all the peak-fitting parameters and ratios between the normal and the diseased tissue groups. The technique of soft independent modelling of class analogy (SIMCA) was used to classify normal liver tissue and colorectal liver metastases resulting in 67% of the normal tissue samples and 60% of the secondary colorectal liver tissue samples being classified correctly. This study has shown that the ADXRD data of normal and secondary colorectal liver cancer are statistically different and x-ray diffraction data analysed using multivariate analysis have the potential to be used as a method of tissue classification.
Al-Aziz, Jameel; Christou, Nicolas; Dinov, Ivo D.
2011-01-01
The amount, complexity and provenance of data have dramatically increased in the past five years. Visualization of observed and simulated data is a critical component of any social, environmental, biomedical or scientific quest. Dynamic, exploratory and interactive visualization of multivariate data, without preprocessing by dimensionality reduction, remains a nearly insurmountable challenge. The Statistics Online Computational Resource (www.SOCR.ucla.edu) provides portable online aids for probability and statistics education, technology-based instruction and statistical computing. We have developed a new Java-based infrastructure, SOCR Motion Charts, for discovery-based exploratory analysis of multivariate data. This interactive data visualization tool enables the visualization of high-dimensional longitudinal data. SOCR Motion Charts allows mapping of ordinal, nominal and quantitative variables onto time, 2D axes, size, colors, glyphs and appearance characteristics, which facilitates the interactive display of multidimensional data. We validated this new visualization paradigm using several publicly available multivariate datasets including Ice-Thickness, Housing Prices, Consumer Price Index, and California Ozone Data. SOCR Motion Charts is designed using object-oriented programming, implemented as a Java Web-applet and is available to the entire community on the web at www.socr.ucla.edu/SOCR_MotionCharts. It can be used as an instructional tool for rendering and interrogating high-dimensional data in the classroom, as well as a research tool for exploratory data analysis. PMID:21479108
A Descriptive Study of Individual and Cross-Cultural Differences in Statistics Anxiety
ERIC Educational Resources Information Center
Baloglu, Mustafa; Deniz, M. Engin; Kesici, Sahin
2011-01-01
The present study investigated individual and cross-cultural differences in statistics anxiety among 223 Turkish and 237 American college students. A 2 x 2 between-subjects factorial multivariate analysis of covariance (MANCOVA) was performed on the six dependent variables which are the six subscales of the Statistical Anxiety Rating Scale.…
Daily Mean Temperature and Urolithiasis Presentation in Six Cities in Korea: Time-Series Analysis.
Chi, Byung Hoon; Chang, In Ho; Choi, Se Young; Suh, Dong Churl; Chang, Chong Won; Choi, Yun Jung; Lee, Seo Yeon
2017-06-01
Seasonal variation in urinary stone presentation is well described in the literature. However, previous studies have some limitations. To explore overall cumulative exposure-response and the heterogeneity in the relationships between daily meteorological factors and urolithiasis incidence in 6 major Korean cities, we analyzed data on 687,833 urolithiasis patients from 2009 to 2013 for 6 large cities in Korea: Seoul, Incheon, Daejeon, Gwangju, Daegu, and Busan. Using a time-series design and distributing lag nonlinear methods, we estimated the relative risk (RR) of mean daily urolithiasis incidence (MDUI) associated with mean daily meteorological factors, including the cumulative RR for a 20-day period. The estimated location-specific associations were then pooled using multivariate meta-regression models. A positive association was confirmed between MDUI and mean daily temperature (MDT), and a negative association was shown between MDUI and mean daily relative humidity (MDRH) in all cities. The lag effect was within 5 days. The multivariate Cochran Q test for heterogeneity at MDT was 12.35 (P = 0.136), and the related I² statistic accounted for 35.2% of the variability. Additionally, the Cochran Q test for heterogeneity and I² statistic at MDHR were 26.73 (P value = 0.148) and 24.7% of variability in the total group. Association was confirmed between daily temperature, relative humidity and urolithiasis incidence, and the differences in urolithiasis incidence might have been partially attributable to the different frequencies and the ranges in temperature and humidity between cities in Korea. © 2017 The Korean Academy of Medical Sciences.
Hayes, Don; Kopp, Benjamin T; Kirkby, Stephen E; Reynolds, Susan D; Mansour, Heidi M; Tobias, Joseph D; Tumin, Dmitry
2016-08-01
Donor PaO2 levels are used for assessing organs for lung transplantation (LTx), but survival implications of PaO2 levels in adult cystic fibrosis (CF) patients receiving LTx are unclear. UNOS registry data spanning 2005-2013 were used to test for associations of donor PaO2 with patient survival and bronchiolitis obliterans syndrome (BOS) in adult (age ≥ 18 years) first-time LTx recipients diagnosed with CF. The analysis included 1587 patients, of whom 1420 had complete data for multivariable Cox models. No statistically significant differences among donor PaO2 categories of ≤200, 201-300, 301-400, or >400 mmHg were found in univariate survival analysis (log-rank test p = 0.290). BOS onset did not significantly differ across donor PaO2 categories (Chi-square p = 0.480). Multivariable Cox models of patient survival supported the lack of difference across donor PaO2 categories. Interaction analysis found a modest difference in survival between the two top categories of donor PaO2 when examining patients with body mass index (BMI) in the lowest decile (≤16.5 kg/m(2)). Donor PaO2 was not associated with survival or BOS onset in adult CF patients undergoing LTx. Notwithstanding statistically significant interactions between donor PaO2 and BMI, there was no evidence of post-LTx survival risk associated with donor PaO2 below conventional thresholds in any subgroup of adults with CF.
Characteristics of Inpatient Units Associated With Sustained Hand Hygiene Compliance.
Wolfe, Jonathan D; Domenico, Henry J; Hickson, Gerald B; Wang, Deede; Dubree, Marilyn; Feistritzer, Nancye; Wells, Nancy; Talbot, Thomas R
2018-04-20
Following institution of a hand hygiene (HH) program at an academic medical center, HH compliance increased from 58% to 92% for 3 years. Some inpatient units modeled early, sustained increases, and others exhibited protracted improvement rates. We examined the association between patterns of HH compliance improvement and unit characteristics. Adult inpatient units (N = 35) were categorized into the following three tiers based on their pattern of HH compliance: early adopters, nonsustained and late adopters, and laggards. Unit-based culture measures were collected, including nursing practice environment scores (National Database of Nursing Quality Indicators [NDNQI]), patient rated quality and teamwork (Hospital Consumer Assessment of Healthcare Provider and Systems), patient complaint rates, case mix index, staff turnover rates, and patient volume. Associations between variables and the binary outcome of laggard (n = 18) versus nonlaggard (n = 17) were tested using a Mann-Whitney U test. Multivariate analysis was performed using an ordinal regression model. In direct comparison, laggard units had clinically relevant differences in NDNQI scores, Hospital Consumer Assessment of Healthcare Provider and Systems scores, case mix index, patient complaints, patient volume, and staff turnover. The results were not statistically significant. In the multivariate model, the predictor variables explained a significant proportion of the variability associated with laggard status, (R = 0.35, P = 0.0481) and identified NDNQI scores and patient complaints as statistically significant. Uptake of an HH program was associated with factors related to a unit's safety culture. In particular, NDNQI scores and patient complaint rates might be used to assist in identifying units that may require additional attention during implementation of an HH quality improvement program.
EEMD-based multiscale ICA method for slewing bearing fault detection and diagnosis
NASA Astrophysics Data System (ADS)
Žvokelj, Matej; Zupan, Samo; Prebil, Ivan
2016-05-01
A novel multivariate and multiscale statistical process monitoring method is proposed with the aim of detecting incipient failures in large slewing bearings, where subjective influence plays a minor role. The proposed method integrates the strengths of the Independent Component Analysis (ICA) multivariate monitoring approach with the benefits of Ensemble Empirical Mode Decomposition (EEMD), which adaptively decomposes signals into different time scales and can thus cope with multiscale system dynamics. The method, which was named EEMD-based multiscale ICA (EEMD-MSICA), not only enables bearing fault detection but also offers a mechanism of multivariate signal denoising and, in combination with the Envelope Analysis (EA), a diagnostic tool. The multiscale nature of the proposed approach makes the method convenient to cope with data which emanate from bearings in complex real-world rotating machinery and frequently represent the cumulative effect of many underlying phenomena occupying different regions in the time-frequency plane. The efficiency of the proposed method was tested on simulated as well as real vibration and Acoustic Emission (AE) signals obtained through conducting an accelerated run-to-failure lifetime experiment on a purpose-built laboratory slewing bearing test stand. The ability to detect and locate the early-stage rolling-sliding contact fatigue failure of the bearing indicates that AE and vibration signals carry sufficient information on the bearing condition and that the developed EEMD-MSICA method is able to effectively extract it, thereby representing a reliable bearing fault detection and diagnosis strategy.
NASA Astrophysics Data System (ADS)
Bressan, Lucas P.; do Nascimento, Paulo Cícero; Schmidt, Marcella E. P.; Faccin, Henrique; de Machado, Leandro Carvalho; Bohrer, Denise
2017-02-01
A novel method was developed to determine low molecular weight polycyclic aromatic hydrocarbons in aqueous leachates from soils and sediments using a salting-out assisted liquid-liquid extraction, synchronous fluorescence spectrometry and a multivariate calibration technique. Several experimental parameters were controlled and the optimum conditions were: sodium carbonate as the salting-out agent at concentration of 2 mol L- 1, 3 mL of acetonitrile as extraction solvent, 6 mL of aqueous leachate, vortexing for 5 min and centrifuging at 4000 rpm for 5 min. The partial least squares calibration was optimized to the lowest values of root mean squared error and five latent variables were chosen for each of the targeted compounds. The regression coefficients for the true versus predicted concentrations were higher than 0.99. Figures of merit for the multivariate method were calculated, namely sensitivity, multivariate detection limit and multivariate quantification limit. The selectivity was also evaluated and other polycyclic aromatic hydrocarbons did not interfere in the analysis. Likewise, high performance liquid chromatography was used as a comparative methodology, and the regression analysis between the methods showed no statistical difference (t-test). The proposed methodology was applied to soils and sediments of a Brazilian river and the recoveries ranged from 74.3% to 105.8%. Overall, the proposed methodology was suitable for the targeted compounds, showing that the extraction method can be applied to spectrofluorometric analysis and that the multivariate calibration is also suitable for these compounds in leachates from real samples.
Richard. D. Wood-Smith; John M. Buffington
1996-01-01
Multivariate statistical analyses of geomorphic variables from 23 forest stream reaches in southeast Alaska result in successful discrimination between pristine streams and those disturbed by land management, specifically timber harvesting and associated road building. Results of discriminant function analysis indicate that a three-variable model discriminates 10...
Parametric Cost Models for Space Telescopes
NASA Technical Reports Server (NTRS)
Stahl, H. Philip
2010-01-01
A study is in-process to develop a multivariable parametric cost model for space telescopes. Cost and engineering parametric data has been collected on 30 different space telescopes. Statistical correlations have been developed between 19 variables of 59 variables sampled. Single Variable and Multi-Variable Cost Estimating Relationships have been developed. Results are being published.
Facilitating the Transition from Bright to Dim Environments
2016-03-04
For the parametric data, a multivariate ANOVA was used in determining the systematic presence of any statistically significant performance differences...performed. All significance levels were p < 0.05, and statistical analyses were performed with the Statistical Package for Social Sciences ( SPSS ...1950. Age changes in rate and level of visual dark adaptation. Journal of Applied Physiology, 2, 407–411. Field, A. 2009. Discovering statistics
Statistical Model of Dynamic Markers of the Alzheimer's Pathological Cascade.
Balsis, Steve; Geraci, Lisa; Benge, Jared; Lowe, Deborah A; Choudhury, Tabina K; Tirso, Robert; Doody, Rachelle S
2018-05-05
Alzheimer's disease (AD) is a progressive disease reflected in markers across assessment modalities, including neuroimaging, cognitive testing, and evaluation of adaptive function. Identifying a single continuum of decline across assessment modalities in a single sample is statistically challenging because of the multivariate nature of the data. To address this challenge, we implemented advanced statistical analyses designed specifically to model complex data across a single continuum. We analyzed data from the Alzheimer's Disease Neuroimaging Initiative (ADNI; N = 1,056), focusing on indicators from the assessments of magnetic resonance imaging (MRI) volume, fluorodeoxyglucose positron emission tomography (FDG-PET) metabolic activity, cognitive performance, and adaptive function. Item response theory was used to identify the continuum of decline. Then, through a process of statistical scaling, indicators across all modalities were linked to that continuum and analyzed. Findings revealed that measures of MRI volume, FDG-PET metabolic activity, and adaptive function added measurement precision beyond that provided by cognitive measures, particularly in the relatively mild range of disease severity. More specifically, MRI volume, and FDG-PET metabolic activity become compromised in the very mild range of severity, followed by cognitive performance and finally adaptive function. Our statistically derived models of the AD pathological cascade are consistent with existing theoretical models.
Mapping Quantitative Traits in Unselected Families: Algorithms and Examples
Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David
2009-01-01
Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016
Improved estimation of PM2.5 using Lagrangian satellite-measured aerosol optical depth
NASA Astrophysics Data System (ADS)
Olivas Saunders, Rolando
Suspended particulate matter (aerosols) with aerodynamic diameters less than 2.5 mum (PM2.5) has negative effects on human health, plays an important role in climate change and also causes the corrosion of structures by acid deposition. Accurate estimates of PM2.5 concentrations are thus relevant in air quality, epidemiology, cloud microphysics and climate forcing studies. Aerosol optical depth (AOD) retrieved by the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite instrument has been used as an empirical predictor to estimate ground-level concentrations of PM2.5 . These estimates usually have large uncertainties and errors. The main objective of this work is to assess the value of using upwind (Lagrangian) MODIS-AOD as predictors in empirical models of PM2.5. The upwind locations of the Lagrangian AOD were estimated using modeled backward air trajectories. Since the specification of an arrival elevation is somewhat arbitrary, trajectories were calculated to arrive at four different elevations at ten measurement sites within the continental United States. A systematic examination revealed trajectory model calculations to be sensitive to starting elevation. With a 500 m difference in starting elevation, the 48-hr mean horizontal separation of trajectory endpoints was 326 km. When the difference in starting elevation was doubled and tripled to 1000 m and 1500m, the mean horizontal separation of trajectory endpoints approximately doubled and tripled to 627 km and 886 km, respectively. A seasonal dependence of this sensitivity was also found: the smallest mean horizontal separation of trajectory endpoints was exhibited during the summer and the largest separations during the winter. A daily average AOD product was generated and coupled to the trajectory model in order to determine AOD values upwind of the measurement sites during the period 2003-2007. Empirical models that included in situ AOD and upwind AOD as predictors of PM2.5 were generated by multivariate linear regressions using the least squares method. The multivariate models showed improved performance over the single variable regression (PM2.5 and in situ AOD) models. The statistical significance of the improvement of the multivariate models over the single variable regression models was tested using the extra sum of squares principle. In many cases, even when the R-squared was high for the multivariate models, the improvement over the single models was not statistically significant. The R-squared of these multivariate models varied with respect to seasons, with the best performance occurring during the summer months. A set of seasonal categorical variables was included in the regressions to exploit this variability. The multivariate regression models that included these categorical seasonal variables performed better than the models that didn't account for seasonal variability. Furthermore, 71% of these regressions exhibited improvement over the single variable models that was statistically significant at a 95% confidence level.
Glass-Kaastra, Shiona K.; Pearl, David L.; Reid-Smith, Richard J.; McEwen, Beverly; Slavic, Durda; McEwen, Scott A.; Fairles, Jim
2014-01-01
Antimicrobial susceptibility data on Escherichia coli F4, Pasteurella multocida, and Streptococcus suis isolates from Ontario swine (January 1998 to October 2010) were acquired from a comprehensive diagnostic veterinary laboratory in Ontario, Canada. In relation to the possible development of a surveillance system for antimicrobial resistance, data were assessed for ease of management, completeness, consistency, and applicability for temporal and spatial statistical analyses. Limited farm location data precluded spatial analyses and missing demographic data limited their use as predictors within multivariable statistical models. Changes in the standard panel of antimicrobials used for susceptibility testing reduced the number of antimicrobials available for temporal analyses. Data consistency and quality could improve over time in this and similar diagnostic laboratory settings by encouraging complete reporting with sample submission and by modifying database systems to limit free-text data entry. These changes could make more statistical methods available for disease surveillance and cluster detection. PMID:24688133
Glass-Kaastra, Shiona K; Pearl, David L; Reid-Smith, Richard J; McEwen, Beverly; Slavic, Durda; McEwen, Scott A; Fairles, Jim
2014-04-01
Antimicrobial susceptibility data on Escherichia coli F4, Pasteurella multocida, and Streptococcus suis isolates from Ontario swine (January 1998 to October 2010) were acquired from a comprehensive diagnostic veterinary laboratory in Ontario, Canada. In relation to the possible development of a surveillance system for antimicrobial resistance, data were assessed for ease of management, completeness, consistency, and applicability for temporal and spatial statistical analyses. Limited farm location data precluded spatial analyses and missing demographic data limited their use as predictors within multivariable statistical models. Changes in the standard panel of antimicrobials used for susceptibility testing reduced the number of antimicrobials available for temporal analyses. Data consistency and quality could improve over time in this and similar diagnostic laboratory settings by encouraging complete reporting with sample submission and by modifying database systems to limit free-text data entry. These changes could make more statistical methods available for disease surveillance and cluster detection.
NASA Technical Reports Server (NTRS)
Szuch, J. R.; Soeder, J. F.; Seldner, K.; Cwynar, D. S.
1977-01-01
The design, evaluation, and testing of a practical, multivariable, linear quadratic regulator control for the F100 turbofan engine were accomplished. NASA evaluation of the multivariable control logic and implementation are covered. The evaluation utilized a real time, hybrid computer simulation of the engine. Results of the evaluation are presented, and recommendations concerning future engine testing of the control are made. Results indicated that the engine testing of the control should be conducted as planned.
Lu, Tsui-Shan; Longnecker, Matthew P.; Zhou, Haibo
2016-01-01
Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data and the general ODS design for a continuous response. While substantial work has been done for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome dependent sampling (Multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the Multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the Multivariate-ODS or the estimator from a simple random sample with the same sample size. The Multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of PCB exposure to hearing loss in children born to the Collaborative Perinatal Study. PMID:27966260
Human immunotoxicologic markers of chemical exposures: preliminary validation studies.
Wartenberg, D; Laskin, D; Kipen, H
1993-01-01
The circulating cells of the immune system are sensitive to environmental contaminants, and effects are often manifested as changes in the cell surface differentiation antigens of affected populations of cells, particularly lymphocytes. In this investigation, we explore the likelihood that variation in the expression of the surface markers of immune cells can be used as an index of exposure to toxic chemicals. We recruited 38 healthy New Jersey men to study pesticides effects: 19 orchard farmers (high exposure); 13 berry farmers (low exposure); and 6 hardware store owners (no exposure). Immunophenotyping was performed assaying the following cell surface antigens: CD2, CD4, CD8, CD14, CD20, CD26, CD29, CD45R, CD56, and PMN. Data were analyzed using univariate and multivariate methods. There were no significant differences among the groups with respect to routine medical histories, physical examinations, or routine laboratory parameters. No striking differences between groups were seen in univariate tests. Multivariate tests suggested some differences among groups and limited ability to correctly classify individuals based on immunophenotyping results. Immunophenotyping represents a fruitful area of research for improved exposure classification. Work is needed both on mechanistic understanding of the patterns observed and on the statistical interpretation of these patterns.
Alamilla, Francisco; Calcerrada, Matías; García-Ruiz, Carmen; Torre, Mercedes
2013-05-10
The differentiation of blue ballpoint pen inks written on documents through an LA-ICP-MS methodology is proposed. Small common office paper portions containing ink strokes from 21 blue pens of known origin were cut and measured without any sample preparation. In a first step, Mg, Ca and Sr were proposed as internal standards (ISs) and used in order to normalize elemental intensities and subtract background signals from the paper. Then, specific criteria were designed and employed to identify target elements (Li, V, Mn, Co, Ni, Cu, Zn, Zr, Sn, W and Pb) which resulted independent of the IS chosen in a 98% of the cases and allowed a qualitative clustering of the samples. In a second step, an elemental-related ratio (ink ratio) based on the targets previously identified was used to obtain mass independent intensities and perform pairwise comparisons by means of multivariate statistical analyses (MANOVA, Tukey's HSD and T2 Hotelling). This treatment improved the discrimination power (DP) and provided objective results, achieving a complete differentiation among different brands and a partial differentiation within pen inks from the same brands. The designed data treatment, together with the use of multivariate statistical tools, represents an easy and useful tool for differentiating among blue ballpoint pen inks, with hardly sample destruction and without the need for methodological calibrations, being its use potentially advantageous from a forensic-practice standpoint. To test the procedure, it was applied to analyze real handwritten questioned contracts, previously studied by the Department of Forensic Document Exams of the Criminalistics Service of Civil Guard (Spain). The results showed that all questioned ink entries were clustered in the same group, being those different from the remaining ink on the document. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Boente, C; Matanzas, N; García-González, N; Rodríguez-Valdés, E; Gallego, J R
2017-09-01
The urban and peri-urban soils used for agriculture could be contaminated by atmospheric deposition or industrial releases, thus raising concerns about the potential risk to public health. Here we propose a method to evaluate potential soil pollution based on multivariate statistics, geostatistics (kriging), a novel soil pollution index, and bioavailability assessments. This approach was tested in two districts of a highly populated and industrialized city (Gijón, Spain). The soils showed anomalous content of several trace elements, such as As and Pb (up to 80 and 585 mg kg -1 respectively). In addition, factor analyses associated these elements with anthropogenic activity, whereas other elements were attributed to natural sources. Subsequent clustering also facilitated the differentiation between the northern area studied (only limited Pb pollution found) and the southern area (pattern of coal combustion, including simultaneous anomalies of trace elements and benzo(a)pyrene). A normalized soil pollution index (SPI) was calculated by kriging, using only the elements falling above threshold levels; therefore point-source polluted zones in the northern area and diffuse contamination in the south were identified. In addition, in the six mapping units with the highest SPIs of the fifty studied, we observed low bioavailability for most of the elements that surpassed the threshold levels. However, some anomalies of Pb contents and the pollution fingerprint in the central area of the southern grid call for further site-specific studies. On the whole, the combination of a multivariate (geo) statistic approach and a bioavailability assessment allowed us to efficiently identify sources of contamination and potential risks. Copyright © 2017 Elsevier Ltd. All rights reserved.
Li, Jinling; He, Ming; Han, Wei; Gu, Yifan
2009-05-30
An investigation on heavy metal sources, i.e., Cu, Zn, Ni, Pb, Cr, and Cd in the coastal soils of Shanghai, China, was conducted using multivariate statistical methods (principal component analysis, clustering analysis, and correlation analysis). All the results of the multivariate analysis showed that: (i) Cu, Ni, Pb, and Cd had anthropogenic sources (e.g., overuse of chemical fertilizers and pesticides, industrial and municipal discharges, animal wastes, sewage irrigation, etc.); (ii) Zn and Cr were associated with parent materials and therefore had natural sources (e.g., the weathering process of parent materials and subsequent pedo-genesis due to the alluvial deposits). The effect of heavy metals in the soils was greatly affected by soil formation, atmospheric deposition, and human activities. These findings provided essential information on the possible sources of heavy metals, which would contribute to the monitoring and assessment process of agricultural soils in worldwide regions.
Analysis/forecast experiments with a multivariate statistical analysis scheme using FGGE data
NASA Technical Reports Server (NTRS)
Baker, W. E.; Bloom, S. C.; Nestler, M. S.
1985-01-01
A three-dimensional, multivariate, statistical analysis method, optimal interpolation (OI) is described for modeling meteorological data from widely dispersed sites. The model was developed to analyze FGGE data at the NASA-Goddard Laboratory of Atmospherics. The model features a multivariate surface analysis over the oceans, including maintenance of the Ekman balance and a geographically dependent correlation function. Preliminary comparisons are made between the OI model and similar schemes employed at the European Center for Medium Range Weather Forecasts and the National Meteorological Center. The OI scheme is used to provide input to a GCM, and model error correlations are calculated for forecasts of 500 mb vertical water mixing ratios and the wind profiles. Comparisons are made between the predictions and measured data. The model is shown to be as accurate as a successive corrections model out to 4.5 days.
A Randomized Study of Incentivizing HIV Testing for Parolees in Community Aftercare.
Saxena, Preeta; Hall, Elizabeth A; Prendergast, Michael
2016-04-01
HIV risk-behaviors are high in criminal justice populations and more efforts are necessary to address them among criminal justice-involved substance abusers. This study examines the role of incentives in promoting HIV testing among parolees. Participants were randomly assigned to either an incentive (n = 104) or education group (control; n = 98), where the incentive group received a voucher for testing for HIV. Bivariate comparisons showed that a larger proportion of those in the incentive group received HIV testing (59% versus 47%), but this was not statistically significant (p = .09). However, in a multivariate logistic regression model controlling for covariates likely to influence HIV-testing behavior, those in the incentive group had increased odds of HIV testing in comparison to those in the education group (OR = 1.99, p < .05, CI [1.05, 3.78]). As a first of its kind, this study provides a foundation for further research on the utility of incentives in promoting HIV testing and other healthy behaviors in criminal justice populations.
Predicting trauma patient mortality: ICD [or ICD-10-AM] versus AIS based approaches.
Willis, Cameron D; Gabbe, Belinda J; Jolley, Damien; Harrison, James E; Cameron, Peter A
2010-11-01
The International Classification of Diseases Injury Severity Score (ICISS) has been proposed as an International Classification of Diseases (ICD)-10-based alternative to mortality prediction tools that use Abbreviated Injury Scale (AIS) data, including the Trauma and Injury Severity Score (TRISS). To date, studies have not examined the performance of ICISS using Australian trauma registry data. This study aimed to compare the performance of ICISS with other mortality prediction tools in an Australian trauma registry. This was a retrospective review of prospectively collected data from the Victorian State Trauma Registry. A training dataset was created for model development and a validation dataset for evaluation. The multiplicative ICISS model was compared with a worst injury ICISS approach, Victorian TRISS (V-TRISS, using local coefficients), maximum AIS severity and a multivariable model including ICD-10-AM codes as predictors. Models were investigated for discrimination (C-statistic) and calibration (Hosmer-Lemeshow statistic). The multivariable approach had the highest level of discrimination (C-statistic 0.90) and calibration (H-L 7.65, P= 0.468). Worst injury ICISS, V-TRISS and maximum AIS had similar performance. The multiplicative ICISS produced the lowest level of discrimination (C-statistic 0.80) and poorest calibration (H-L 50.23, P < 0.001). The performance of ICISS may be affected by the data used to develop estimates, the ICD version employed, the methods for deriving estimates and the inclusion of covariates. In this analysis, a multivariable approach using ICD-10-AM codes was the best-performing method. A multivariable ICISS approach may therefore be a useful alternative to AIS-based methods and may have comparable predictive performance to locally derived TRISS models. © 2010 The Authors. ANZ Journal of Surgery © 2010 Royal Australasian College of Surgeons.
Latent structure of the Wisconsin Card Sorting Test: a confirmatory factor analytic study.
Greve, Kevin W; Stickle, Timothy R; Love, Jeffrey M; Bianchini, Kevin J; Stanford, Matthew S
2005-05-01
The present study represents the first large scale confirmatory factor analysis of the Wisconsin Card Sorting Test (WCST). The results generally support the three factor solutions reported in the exploratory factor analysis literature. However, only the first factor, which reflects general executive functioning, is statistically sound. The secondary factors, while likely reflecting meaningful cognitive abilities, are less stable except when all subjects complete all 128 cards. It is likely that having two discontinuation rules for the WCST has contributed to the varied factor analytic solutions reported in the literature and early discontinuation may result in some loss of useful information. Continued multivariate research will be necessary to better clarify the processes underlying WCST performance and their relationships to one another.
Blind, P-J; Eriksson, S.
1991-01-01
The probability that routine hematological laboratory tests of liver and pancreatic function can discriminate between malignant and benign pancreatic tumours, incidentally detected during operation, was investigated. The records of 53 patients with a verified diagnosis of pancreatic carcinoma and 19 patients with chronic pancreatitis were reviewed with regard to preoperative total bilirubin, direct reacting bilirubin, alkaline phosphatase, glutamyltranspeptidase, aminotransferases, lactic dehydrogenase and amylase. Multivariate and discriminant analysis were performed to calculate the predictive value for cancer, using SYSTAT statistical package in a Macintosh II computer. Total and direct reacting bilirubin and glutamyltranspeptidase were significantly higher in patients with pancreatic carcinoma. However, only considerably increased levels of direct reating bilirubin were predictive of pancreatic carcinoma. PMID:1931781
A model of the human observer and decision maker
NASA Technical Reports Server (NTRS)
Wewerinke, P. H.
1981-01-01
The decision process is described in terms of classical sequential decision theory by considering the hypothesis that an abnormal condition has occurred by means of a generalized likelihood ratio test. For this, a sufficient statistic is provided by the innovation sequence which is the result of the perception an information processing submodel of the human observer. On the basis of only two model parameters, the model predicts the decision speed/accuracy trade-off and various attentional characteristics. A preliminary test of the model for single variable failure detection tasks resulted in a very good fit of the experimental data. In a formal validation program, a variety of multivariable failure detection tasks was investigated and the predictive capability of the model was demonstrated.
Orthostatic function during a stand test before and after head-up or head-down bedrest
NASA Technical Reports Server (NTRS)
Lathers, Claire M.; Diamandis, Peter H.; Riddle, Jeanne M.; Mukai, Chiaki; Elton, Kay F.; Bungo, Michael W.; Charles, John B.
1991-01-01
The effects of head-down or head-up bedrest at -5, +10, +20, or +42 deg (simulating 0, 1/6, 1/3, and 2/3 g, respectively) for 6 hrs on four different days on the orthostatic tolerance were investigated by measuring relevant physiological reactions to orthostatic test taken before and after bedrest sessions. The multivariate analysis of variance statistical analyses indicates that there was no angle effect on any of the cardiovascular parameters monitored during the last 3 min of the stand test, suggesting that partial gravity loads would have no effect on the cardiovascular deconditioning exhibited postflight. There was, however, a significant elevation in the heart rate post-bedrest, and the heart rate increased on standing. Results from the stand test pre- and post-bedrest at -5 deg (but not at +10, +20, and +42 deg) were similar to those observed after space flight.
Differentiating clinical groups using the serial color-word test (S-CWT).
Hentschel, Uwe; Rubino, I Alex; Bijleveld, Catrien
2011-04-01
The present study attempted to differentiate 11 diagnostic groups by means of the Serial Color-Word Test (S-CWT), using multivariate discriminant analysis. Two alternative scoring systems of the S-CWT were outlined. Asample of 514 individuals who had clinical diagnoses of various types and 397 controls who had no diagnostic findings comprised the sample. The first discriminant analysis failed to differentiate the groups adequately. The groups were consequently reduced to four (schizophrenia, bipolar disorders, temporo-mandibular joint pain dysfunction syndrome, and eating disturbances), which gave better reclassification findings for a clinical application of the test. This classification gave over 55% correct assignments. The final four groups had a statistically significant discrimination on the test, which remained stable also in a bootstrap procedure. Implications for treatment indications and outcomes as well as strategies for further studies using the S-CWT are discussed.
Forcino, Frank L; Leighton, Lindsey R; Twerdy, Pamela; Cahill, James F
2015-01-01
Community ecologists commonly perform multivariate techniques (e.g., ordination, cluster analysis) to assess patterns and gradients of taxonomic variation. A critical requirement for a meaningful statistical analysis is accurate information on the taxa found within an ecological sample. However, oversampling (too many individuals counted per sample) also comes at a cost, particularly for ecological systems in which identification and quantification is substantially more resource consuming than the field expedition itself. In such systems, an increasingly larger sample size will eventually result in diminishing returns in improving any pattern or gradient revealed by the data, but will also lead to continually increasing costs. Here, we examine 396 datasets: 44 previously published and 352 created datasets. Using meta-analytic and simulation-based approaches, the research within the present paper seeks (1) to determine minimal sample sizes required to produce robust multivariate statistical results when conducting abundance-based, community ecology research. Furthermore, we seek (2) to determine the dataset parameters (i.e., evenness, number of taxa, number of samples) that require larger sample sizes, regardless of resource availability. We found that in the 44 previously published and the 220 created datasets with randomly chosen abundances, a conservative estimate of a sample size of 58 produced the same multivariate results as all larger sample sizes. However, this minimal number varies as a function of evenness, where increased evenness resulted in increased minimal sample sizes. Sample sizes as small as 58 individuals are sufficient for a broad range of multivariate abundance-based research. In cases when resource availability is the limiting factor for conducting a project (e.g., small university, time to conduct the research project), statistically viable results can still be obtained with less of an investment.
Monitoring of an antigen manufacturing process.
Zavatti, Vanessa; Budman, Hector; Legge, Raymond; Tamer, Melih
2016-06-01
Fluorescence spectroscopy in combination with multivariate statistical methods was employed as a tool for monitoring the manufacturing process of pertactin (PRN), one of the virulence factors of Bordetella pertussis utilized in whopping cough vaccines. Fluorophores such as amino acids and co-enzymes were detected throughout the process. The fluorescence data collected at different stages of the fermentation and purification process were treated employing principal component analysis (PCA). Through PCA, it was feasible to identify sources of variability in PRN production. Then, partial least square (PLS) was employed to correlate the fluorescence spectra obtained from pure PRN samples and the final protein content measured by a Kjeldahl test from these samples. In view that a statistically significant correlation was found between fluorescence and PRN levels, this approach could be further used as a method to predict the final protein content.
Attitudes toward Advanced and Multivariate Statistics When Using Computers.
ERIC Educational Resources Information Center
Kennedy, Robert L.; McCallister, Corliss Jean
This study investigated the attitudes toward statistics of graduate students who studied advanced statistics in a course in which the focus of instruction was the use of a computer program in class. The use of the program made it possible to provide an individualized, self-paced, student-centered, and activity-based course. The three sections…
ERIC Educational Resources Information Center
Williams, Amanda S.
2015-01-01
Statistics anxiety is a common problem for graduate students. This study explores the multivariate relationship between a set of worry-related variables and six types of statistics anxiety. Canonical correlation analysis indicates a significant relationship between the two sets of variables. Findings suggest that students who are more intolerant…
Mayer, Brian P.; DeHope, Alan J.; Mew, Daniel A.; ...
2016-03-24
Attribution of the origin of an illicit drug relies on identification of compounds indicative of its clandestine production and is a key component of many modern forensic investigations. Here, the results of these studies can yield detailed information on method of manufacture, starting material source, and final product, all critical forensic evidence. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic fentanyl, N-(1-phenylethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods, all previously published fentanyl synthetic routes or hybrid versions thereof, were studied in an effort to identify and classify route-specific signatures. A total of 160 distinctmore » compounds and inorganic species were identified using gas and liquid chromatographies combined with mass spectrometric methods (gas chromatography/mass spectrometry (GC/MS) and liquid chromatography–tandem mass spectrometry-time of-flight (LC–MS/MS-TOF)) in conjunction with inductively coupled plasma mass spectrometry (ICPMS). The complexity of the resultant data matrix urged the use of multivariate statistical analysis. Using partial least-squares-discriminant analysis (PLS-DA), 87 route-specific CAS were classified and a statistical model capable of predicting the method of fentanyl synthesis was validated and tested against CAS profiles from crude fentanyl products deposited and later extracted from two operationally relevant surfaces: stainless steel and vinyl tile. Finally, this work provides the most detailed fentanyl CAS investigation to date by using orthogonal mass spectral data to identify CAS of forensic significance for illicit drug detection, profiling, and attribution.« less
Fruit and vegetable intake and risk of breast cancer by hormone receptor status.
Jung, Seungyoun; Spiegelman, Donna; Baglietto, Laura; Bernstein, Leslie; Boggs, Deborah A; van den Brandt, Piet A; Buring, Julie E; Cerhan, James R; Gaudet, Mia M; Giles, Graham G; Goodman, Gary; Hakansson, Niclas; Hankinson, Susan E; Helzlsouer, Kathy; Horn-Ross, Pamela L; Inoue, Manami; Krogh, Vittorio; Lof, Marie; McCullough, Marjorie L; Miller, Anthony B; Neuhouser, Marian L; Palmer, Julie R; Park, Yikyung; Robien, Kim; Rohan, Thomas E; Scarmo, Stephanie; Schairer, Catherine; Schouten, Leo J; Shikany, James M; Sieri, Sabina; Tsugane, Schoichiro; Visvanathan, Kala; Weiderpass, Elisabete; Willett, Walter C; Wolk, Alicja; Zeleniuch-Jacquotte, Anne; Zhang, Shumin M; Zhang, Xuehong; Ziegler, Regina G; Smith-Warner, Stephanie A
2013-02-06
Estrogen receptor-negative (ER(-)) breast cancer has few known or modifiable risk factors. Because ER(-) tumors account for only 15% to 20% of breast cancers, large pooled analyses are necessary to evaluate precisely the suspected inverse association between fruit and vegetable intake and risk of ER(-) breast cancer. Among 993 466 women followed for 11 to 20 years in 20 cohort studies, we documented 19 869 estrogen receptor positive (ER(+)) and 4821 ER(-) breast cancers. We calculated study-specific multivariable relative risks (RRs) and 95% confidence intervals (CIs) using Cox proportional hazards regression analyses and then combined them using a random-effects model. All statistical tests were two-sided. Total fruit and vegetable intake was statistically significantly inversely associated with risk of ER(-) breast cancer but not with risk of breast cancer overall or of ER(+) tumors. The inverse association for ER(-) tumors was observed primarily for vegetable consumption. The pooled relative risks comparing the highest vs lowest quintile of total vegetable consumption were 0.82 (95% CI = 0.74 to 0.90) for ER(-) breast cancer and 1.04 (95% CI = 0.97 to 1.11) for ER(+) breast cancer (P (common-effects) by ER status < .001). Total fruit consumption was non-statistically significantly associated with risk of ER(-) breast cancer (pooled multivariable RR comparing the highest vs lowest quintile = 0.94, 95% CI = 0.85 to 1.04). We observed no association between total fruit and vegetable intake and risk of overall breast cancer. However, vegetable consumption was inversely associated with risk of ER(-) breast cancer in our large pooled analyses.
Li Destri, Giovanni; Rubino, Antonio Salvatore; Latino, Rosalia; Giannone, Fabio; Lanteri, Raffaele; Scilletta, Beniamino; Di Cataldo, Antonio
2015-01-01
To evaluate whether, in a sample of patients radically treated for colorectal carcinoma, the preoperative determination of the carcinoembryonic antigen (p-CEA) may have a prognostic value and constitute an independent risk factor in relation to disease-free survival. The preoperative CEA seems to be related both to the staging of colorectal neoplasia and to the patient's prognosis, although this—to date—has not been conclusively demonstrated and is still a matter of intense debate in the scientific community. This is a retrospective analysis of prospectively collected data. A total of 395 patients were radically treated for colorectal carcinoma. The preoperative CEA was statistically compared with the 2010 American Joint Committee on Cancer (AJCC) staging, the T and N parameters, and grading. All parameters recorded in our database were tested for an association with disease-free survival (DFS). Only factors significantly associated (P < 0.05) with the DFS were used to build multivariate stepwise forward logistic regression models to establish their independent predictors. A statistically significant relationship was found between p-CEA and tumor staging (P < 0.001), T (P < 0.001) and N parameters (P = 0.006). In a multivariate analysis, the independent prognostic factors found were: p-CEA, stages N1 and N2 according to AJCC, and G3 grading (grade). A statistically significant difference (P < 0.001) was evident between the DFS of patients with normal and high p-CEA levels. Preoperative CEA makes a pre-operative selection possible of those patients for whom it is likely to be able to predict a more advanced staging. PMID:25875542
Li Destri, Giovanni; Rubino, Antonio Salvatore; Latino, Rosalia; Giannone, Fabio; Lanteri, Raffaele; Scilletta, Beniamino; Di Cataldo, Antonio
2015-04-01
To evaluate whether, in a sample of patients radically treated for colorectal carcinoma, the preoperative determination of the carcinoembryonic antigen (p-CEA) may have a prognostic value and constitute an independent risk factor in relation to disease-free survival. The preoperative CEA seems to be related both to the staging of colorectal neoplasia and to the patient's prognosis, although this-to date-has not been conclusively demonstrated and is still a matter of intense debate in the scientific community. This is a retrospective analysis of prospectively collected data. A total of 395 patients were radically treated for colorectal carcinoma. The preoperative CEA was statistically compared with the 2010 American Joint Committee on Cancer (AJCC) staging, the T and N parameters, and grading. All parameters recorded in our database were tested for an association with disease-free survival (DFS). Only factors significantly associated (P < 0.05) with the DFS were used to build multivariate stepwise forward logistic regression models to establish their independent predictors. A statistically significant relationship was found between p-CEA and tumor staging (P < 0.001), T (P < 0.001) and N parameters (P = 0.006). In a multivariate analysis, the independent prognostic factors found were: p-CEA, stages N1 and N2 according to AJCC, and G3 grading (grade). A statistically significant difference (P < 0.001) was evident between the DFS of patients with normal and high p-CEA levels. Preoperative CEA makes a pre-operative selection possible of those patients for whom it is likely to be able to predict a more advanced staging.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mayer, Brian P.; DeHope, Alan J.; Mew, Daniel A.
Attribution of the origin of an illicit drug relies on identification of compounds indicative of its clandestine production and is a key component of many modern forensic investigations. Here, the results of these studies can yield detailed information on method of manufacture, starting material source, and final product, all critical forensic evidence. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic fentanyl, N-(1-phenylethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods, all previously published fentanyl synthetic routes or hybrid versions thereof, were studied in an effort to identify and classify route-specific signatures. A total of 160 distinctmore » compounds and inorganic species were identified using gas and liquid chromatographies combined with mass spectrometric methods (gas chromatography/mass spectrometry (GC/MS) and liquid chromatography–tandem mass spectrometry-time of-flight (LC–MS/MS-TOF)) in conjunction with inductively coupled plasma mass spectrometry (ICPMS). The complexity of the resultant data matrix urged the use of multivariate statistical analysis. Using partial least-squares-discriminant analysis (PLS-DA), 87 route-specific CAS were classified and a statistical model capable of predicting the method of fentanyl synthesis was validated and tested against CAS profiles from crude fentanyl products deposited and later extracted from two operationally relevant surfaces: stainless steel and vinyl tile. Finally, this work provides the most detailed fentanyl CAS investigation to date by using orthogonal mass spectral data to identify CAS of forensic significance for illicit drug detection, profiling, and attribution.« less
Test-retest stability of the Task and Ego Orientation Questionnaire.
Lane, Andrew M; Nevill, Alan M; Bowes, Neal; Fox, Kenneth R
2005-09-01
Establishing stability, defined as observing minimal measurement error in a test-retest assessment, is vital to validating psychometric tools. Correlational methods, such as Pearson product-moment, intraclass, and kappa are tests of association or consistency, whereas stability or reproducibility (regarded here as synonymous) assesses the agreement between test-retest scores. Indexes of reproducibility using the Task and Ego Orientation in Sport Questionnaire (TEOSQ; Duda & Nicholls, 1992) were investigated using correlational (Pearson product-moment, intraclass, and kappa) methods, repeated measures multivariate analysis of variance, and calculating the proportion of agreement within a referent value of +/-1 as suggested by Nevill, Lane, Kilgour, Bowes, and Whyte (2001). Two hundred thirteen soccer players completed the TEOSQ on two occasions, 1 week apart. Correlation analyses indicated a stronger test-retest correlation for the Ego subscale than the Task subscale. Multivariate analysis of variance indicated stability for ego items but with significant increases in four task items. The proportion of test-retest agreement scores indicated that all ego items reported relatively poor stability statistics with test-retest scores within a range of +/-1, ranging from 82.7-86.9%. By contrast, all task items showed test-retest difference scores ranging from 92.5-99%, although further analysis indicated that four task subscale items increased significantly. Findings illustrated that correlational methods (Pearson product-moment, intraclass, and kappa) are influenced by the range in scores, and calculating the proportion of agreement of test-retest differences with a referent value of +/-1 could provide additional insight into the stability of the questionnaire. It is suggested that the item-by-item proportion of agreement method proposed by Nevill et al. (2001) should be used to supplement existing methods and could be especially helpful in identifying rogue items in the initial stages of psychometric questionnaire validation.
Reflectance of vegetation, soil, and water
NASA Technical Reports Server (NTRS)
Wiegand, C. L. (Principal Investigator)
1973-01-01
There are no author-identified significant results in this report. This report deals with the selection of the best channels from the 24-channel aircraft data to represent crop and soil conditions. A three-step procedure has been developed that involves using univariate statistics and an F-ratio test to indicate the best 14 channels. From the 14, the 10 best channels are selected by a multivariate stochastic process. The third step involves the pattern recognition procedures developed in the data analysis plan. Indications are that the procedures in use are satsifactory and will extract the desired information from the data.
Statistical methods and neural network approaches for classification of data from multiple sources
NASA Technical Reports Server (NTRS)
Benediktsson, Jon Atli; Swain, Philip H.
1990-01-01
Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.
Hypothesis testing for differentially correlated features.
Sheng, Elisa; Witten, Daniela; Zhou, Xiao-Hua
2016-10-01
In a multivariate setting, we consider the task of identifying features whose correlations with the other features differ across conditions. Such correlation shifts may occur independently of mean shifts, or differences in the means of the individual features across conditions. Previous approaches for detecting correlation shifts consider features simultaneously, by computing a correlation-based test statistic for each feature. However, since correlations involve two features, such approaches do not lend themselves to identifying which feature is the culprit. In this article, we instead consider a serial testing approach, by comparing columns of the sample correlation matrix across two conditions, and removing one feature at a time. Our method provides a novel perspective and favorable empirical results compared with competing approaches. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Interfaces between statistical analysis packages and the ESRI geographic information system
NASA Technical Reports Server (NTRS)
Masuoka, E.
1980-01-01
Interfaces between ESRI's geographic information system (GIS) data files and real valued data files written to facilitate statistical analysis and display of spatially referenced multivariable data are described. An example of data analysis which utilized the GIS and the statistical analysis system is presented to illustrate the utility of combining the analytic capability of a statistical package with the data management and display features of the GIS.
A generalized association test based on U statistics.
Wei, Changshuai; Lu, Qing
2017-07-01
Second generation sequencing technologies are being increasingly used for genetic association studies, where the main research interest is to identify sets of genetic variants that contribute to various phenotypes. The phenotype can be univariate disease status, multivariate responses and even high-dimensional outcomes. Considering the genotype and phenotype as two complex objects, this also poses a general statistical problem of testing association between complex objects. We here proposed a similarity-based test, generalized similarity U (GSU), that can test the association between complex objects. We first studied the theoretical properties of the test in a general setting and then focused on the application of the test to sequencing association studies. Based on theoretical analysis, we proposed to use Laplacian Kernel-based similarity for GSU to boost power and enhance robustness. Through simulation, we found that GSU did have advantages over existing methods in terms of power and robustness. We further performed a whole genome sequencing (WGS) scan for Alzherimer's disease neuroimaging initiative data, identifying three genes, APOE , APOC1 and TOMM40 , associated with imaging phenotype. We developed a C ++ package for analysis of WGS data using GSU. The source codes can be downloaded at https://github.com/changshuaiwei/gsu . weichangshuai@gmail.com ; qlu@epi.msu.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Denis Valle; Benjamin Baiser; Christopher W. Woodall; Robin Chazdon; Jerome Chave
2014-01-01
We propose a novel multivariate method to analyse biodiversity data based on the Latent Dirichlet Allocation (LDA) model. LDA, a probabilistic model, reduces assemblages to sets of distinct component communities. It produces easily interpretable results, can represent abrupt and gradual changes in composition, accommodates missing data and allows for coherent estimates...
Johnson, Karen A.
2013-01-01
Background and Aims Convergent floral traits hypothesized as attracting particular pollinators are known as pollination syndromes. Floral diversity suggests that the Australian epacrid flora may be adapted to pollinator type. Currently there are empirical data on the pollination systems for 87 species (approx. 15 % of Australian epacrids). This provides an opportunity to test for pollination syndromes and their important morphological traits in an iconic element of the Australian flora. Methods Data on epacrid–pollinator relationships were obtained from published literature and field observation. A multivariate approach was used to test whether epacrid floral attributes related to pollinator profiles. Statistical classification was then used to rank floral attributes according to their predictive value. Data sets excluding mixed pollination systems were used to test the predictive power of statistical classification to identify pollination models. Key Results Floral attributes are correlated with bird, fly and bee pollination. Using floral attributes identified as correlating with pollinator type, bird pollination is classified with 86 % accuracy, red flowers being the most important predictor. Fly and bee pollination are classified with 78 and 69 % accuracy, but have a lack of individually important floral predictors. Excluding mixed pollination systems improved the accuracy of the prediction of both bee and fly pollination systems. Conclusions Although most epacrids have generalized pollination systems, a correlation between bird pollination and red, long-tubed epacrids is found. Statistical classification highlights the relative importance of each floral attribute in relation to pollinator type and proves useful in classifying epacrids to bird, fly and bee pollination systems. PMID:23681546
Applied statistics in agricultural, biological, and environmental sciences.
USDA-ARS?s Scientific Manuscript database
Agronomic research often involves measurement and collection of multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate statistical methods encompass the simultaneous analysis of all random variables measured on each experimental or s...
Selvarasu, Suresh; Kim, Do Yun; Karimi, Iftekhar A; Lee, Dong-Yup
2010-10-01
We present an integrated framework for characterizing fed-batch cultures of mouse hybridoma cells producing monoclonal antibody (mAb). This framework systematically combines data preprocessing, elemental balancing and statistical analysis technique. Initially, specific rates of cell growth, glucose/amino acid consumptions and mAb/metabolite productions were calculated via curve fitting using logistic equations, with subsequent elemental balancing of the preprocessed data indicating the presence of experimental measurement errors. Multivariate statistical analysis was then employed to understand physiological characteristics of the cellular system. The results from principal component analysis (PCA) revealed three major clusters of amino acids with similar trends in their consumption profiles: (i) arginine, threonine and serine, (ii) glycine, tyrosine, phenylalanine, methionine, histidine and asparagine, and (iii) lysine, valine and isoleucine. Further analysis using partial least square (PLS) regression identified key amino acids which were positively or negatively correlated with the cell growth, mAb production and the generation of lactate and ammonia. Based on these results, the optimal concentrations of key amino acids in the feed medium can be inferred, potentially leading to an increase in cell viability and productivity, as well as a decrease in toxic waste production. The study demonstrated how the current methodological framework using multivariate statistical analysis techniques can serve as a potential tool for deriving rational medium design strategies. Copyright © 2010 Elsevier B.V. All rights reserved.
Multivariate model of female black bear habitat use for a Geographic Information System
Clark, Joseph D.; Dunn, James E.; Smith, Kimberly G.
1993-01-01
Simple univariate statistical techniques may not adequately assess the multidimensional nature of habitats used by wildlife. Thus, we developed a multivariate method to model habitat-use potential using a set of female black bear (Ursus americanus) radio locations and habitat data consisting of forest cover type, elevation, slope, aspect, distance to roads, distance to streams, and forest cover type diversity score in the Ozark Mountains of Arkansas. The model is based on the Mahalanobis distance statistic coupled with Geographic Information System (GIS) technology. That statistic is a measure of dissimilarity and represents a standardized squared distance between a set of sample variates and an ideal based on the mean of variates associated with animal observations. Calculations were made with the GIS to produce a map containing Mahalanobis distance values within each cell on a 60- × 60-m grid. The model identified areas of high habitat use potential that could not otherwise be identified by independent perusal of any single map layer. This technique avoids many pitfalls that commonly affect typical multivariate analyses of habitat use and is a useful tool for habitat manipulation or mitigation to favor terrestrial vertebrates that use habitats on a landscape scale.
NASA Astrophysics Data System (ADS)
Lee, An-Sheng; Lu, Wei-Li; Huang, Jyh-Jaan; Chang, Queenie; Wei, Kuo-Yen; Lin, Chin-Jung; Liou, Sofia Ya Hsuan
2016-04-01
Through the geology and climate characteristic in Taiwan, generally rivers carry a lot of suspended particles. After these particles settled, they become sediments which are good sorbent for heavy metals in river system. Consequently, sediments can be found recording contamination footprint at low flow energy region, such as estuary. Seven sediment cores were collected along Nankan River, northern Taiwan, which is seriously contaminated by factory, household and agriculture input. Physico-chemical properties of these cores were derived from Itrax-XRF Core Scanner and grain size analysis. In order to interpret these complex data matrices, the multivariate statistical techniques (cluster analysis, factor analysis and discriminant analysis) were introduced to this study. Through the statistical determination, the result indicates four types of sediment. One of them represents contamination event which shows high concentration of Cu, Zn, Pb, Ni and Fe, and low concentration of Si and Zr. Furthermore, three possible contamination sources of this type of sediment were revealed by Factor Analysis. The combination of sediment analysis and multivariate statistical techniques used provides new insights into the contamination depositional history of Nankan River and could be similarly applied to other river systems to determine the scale of anthropogenic contamination.
Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques
NASA Astrophysics Data System (ADS)
Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein
2017-10-01
The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.
Sepehrband, Farshid; Lynch, Kirsten M; Cabeen, Ryan P; Gonzalez-Zacarias, Clio; Zhao, Lu; D'Arcy, Mike; Kesselman, Carl; Herting, Megan M; Dinov, Ivo D; Toga, Arthur W; Clark, Kristi A
2018-05-15
Exploring neuroanatomical sex differences using a multivariate statistical learning approach can yield insights that cannot be derived with univariate analysis. While gross differences in total brain volume are well-established, uncovering the more subtle, regional sex-related differences in neuroanatomy requires a multivariate approach that can accurately model spatial complexity as well as the interactions between neuroanatomical features. Here, we developed a multivariate statistical learning model using a support vector machine (SVM) classifier to predict sex from MRI-derived regional neuroanatomical features from a single-site study of 967 healthy youth from the Philadelphia Neurodevelopmental Cohort (PNC). Then, we validated the multivariate model on an independent dataset of 682 healthy youth from the multi-site Pediatric Imaging, Neurocognition and Genetics (PING) cohort study. The trained model exhibited an 83% cross-validated prediction accuracy, and correctly predicted the sex of 77% of the subjects from the independent multi-site dataset. Results showed that cortical thickness of the middle occipital lobes and the angular gyri are major predictors of sex. Results also demonstrated the inferential benefits of going beyond classical regression approaches to capture the interactions among brain features in order to better characterize sex differences in male and female youths. We also identified specific cortical morphological measures and parcellation techniques, such as cortical thickness as derived from the Destrieux atlas, that are better able to discriminate between males and females in comparison to other brain atlases (Desikan-Killiany, Brodmann and subcortical atlases). Copyright © 2018 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Valder, J.; Kenner, S.; Long, A.
2008-12-01
Portions of the Cheyenne River are characterized as impaired by the U.S. Environmental Protection Agency because of water-quality exceedences. The Cheyenne River watershed includes the Black Hills National Forest and part of the Badlands National Park. Preliminary analysis indicates that the Badlands National Park is a major contributor to the exceedances of the water-quality constituents for total dissolved solids and total suspended solids. Water-quality data have been collected continuously since 2007, and in the second year of collection (2008), monthly grab and passive sediment samplers are being used to collect total suspended sediment and total dissolved solids in both base-flow and runoff-event conditions. In addition, sediment samples from the river channel, including bed, bank, and floodplain, have been collected. These samples are being analyzed at the South Dakota School of Mines and Technology's X-Ray Diffraction Lab to quantify the mineralogy of the sediments. A multivariate statistical approach (including principal components, least squares, and maximum likelihood techniques) is applied to the mineral percentages that were characterized for each site to identify the contributing source areas that are causing exceedances of sediment transport in the Cheyenne River watershed. Results of the multivariate analysis demonstrate the likely sources of solids found in the Cheyenne River samples. A further refinement of the methods is in progress that utilizes a conceptual model which, when applied with the multivariate statistical approach, provides a better estimate for sediment sources.
Rakotonirina, Jean Claude; Csősz, Sándor; Fisher, Brian L
2016-01-01
The Malagasy Camponotus edmondi species group is revised based on both qualitative morphological traits and multivariate analysis of continuous morphometric data. To minimize the effect of the scaling properties of diverse traits due to worker caste polymorphism, and to achieve the desired near-linearity of data, morphometric analyses were done only on minor workers. The majority of traits exhibit broken scaling on head size, dividing Camponotus workers into two discrete subcastes, minors and majors. This broken scaling prevents the application of algorithms that uses linear combination of data to the entire dataset, hence only minor workers were analyzed statistically. The elimination of major workers resulted in linearity and the data meet required assumptions. However, morphometric ratios for the subsets of minor and major workers were used in species descriptions and redefinitions. Prior species hypotheses and the goodness of clusters were tested on raw data by confirmatory linear discriminant analysis. Due to the small sample size available for some species, a factor known to reduce statistical reliability, hypotheses generated by exploratory analyses were tested with extreme care and species delimitations were inferred via the combined evidence of both qualitative (morphology and biology) and quantitative data. Altogether, fifteen species are recognized, of which 11 are new to science: Camponotus alamaina sp. n. , Camponotus androy sp. n. , Camponotus bevohitra sp. n. , Camponotus galoko sp. n. , Camponotus matsilo sp. n. , Camponotus mifaka sp. n. , Camponotus orombe sp. n. , Camponotus tafo sp. n. , Camponotus tratra sp. n. , Camponotus varatra sp. n. , and Camponotus zavo sp. n. Four species are redescribed: Camponotus echinoploides Forel, Camponotus edmondi André, Camponotus ethicus Forel, and Camponotus robustus Roger. Camponotus edmondi ernesti Forel, syn. n. is synonymized under Camponotus edmondi . This revision also includes an identification key to species for both minor and major castes, information on geographic distribution and biology, taxonomic discussions, and descriptions of intraspecific variation. Traditional taxonomy and multivariate morphometric analysis are independent sources of information which, in combination, allow more precise species delimitation. Moreover, quantitative characters included in identification keys improve accuracy of determination in difficult cases.
Rakotonirina, Jean Claude; Csősz, Sándor; Fisher, Brian L.
2016-01-01
Abstract The Malagasy Camponotus edmondi species group is revised based on both qualitative morphological traits and multivariate analysis of continuous morphometric data. To minimize the effect of the scaling properties of diverse traits due to worker caste polymorphism, and to achieve the desired near-linearity of data, morphometric analyses were done only on minor workers. The majority of traits exhibit broken scaling on head size, dividing Camponotus workers into two discrete subcastes, minors and majors. This broken scaling prevents the application of algorithms that uses linear combination of data to the entire dataset, hence only minor workers were analyzed statistically. The elimination of major workers resulted in linearity and the data meet required assumptions. However, morphometric ratios for the subsets of minor and major workers were used in species descriptions and redefinitions. Prior species hypotheses and the goodness of clusters were tested on raw data by confirmatory linear discriminant analysis. Due to the small sample size available for some species, a factor known to reduce statistical reliability, hypotheses generated by exploratory analyses were tested with extreme care and species delimitations were inferred via the combined evidence of both qualitative (morphology and biology) and quantitative data. Altogether, fifteen species are recognized, of which 11 are new to science: Camponotus alamaina sp. n., Camponotus androy sp. n., Camponotus bevohitra sp. n., Camponotus galoko sp. n., Camponotus matsilo sp. n., Camponotus mifaka sp. n., Camponotus orombe sp. n., Camponotus tafo sp. n., Camponotus tratra sp. n., Camponotus varatra sp. n., and Camponotus zavo sp. n. Four species are redescribed: Camponotus echinoploides Forel, Camponotus edmondi André, Camponotus ethicus Forel, and Camponotus robustus Roger. Camponotus edmondi ernesti Forel, syn. n. is synonymized under Camponotus edmondi. This revision also includes an identification key to species for both minor and major castes, information on geographic distribution and biology, taxonomic discussions, and descriptions of intraspecific variation. Traditional taxonomy and multivariate morphometric analysis are independent sources of information which, in combination, allow more precise species delimitation. Moreover, quantitative characters included in identification keys improve accuracy of determination in difficult cases. PMID:28050160
Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu
2015-01-01
Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing equivalent from nonequivalent cell populations. FlowMap‐FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F‐measure of 0.88 was obtained, indicating high precision and recall of the FR‐based population matching results. FlowMap‐FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © 2015 International Society for Advancement of Cytometry PMID:26274018
Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu; Scheuermann, Richard H
2016-01-01
Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell populations. FlowMap-FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F-measure of 0.88 was obtained, indicating high precision and recall of the FR-based population matching results. FlowMap-FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © The Authors. Published by Wiley Periodicals, Inc. on behalf of ISAC.
Corron, Louise; Marchal, François; Condemi, Silvana; Telmon, Norbert; Chaumoitre, Kathia; Adalian, Pascal
2018-05-31
Subadult age estimation should rely on sampling and statistical protocols capturing development variability for more accurate age estimates. In this perspective, measurements were taken on the fifth lumbar vertebrae and/or clavicles of 534 French males and females aged 0-19 years and the ilia of 244 males and females aged 0-12 years. These variables were fitted in nonparametric multivariate adaptive regression splines (MARS) models with 95% prediction intervals (PIs) of age. The models were tested on two independent samples from Marseille and the Luis Lopes reference collection from Lisbon. Models using ilium width and module, maximum clavicle length, and lateral vertebral body heights were more than 92% accurate. Precision was lower for postpubertal individuals. Integrating punctual nonlinearities of the relationship between age and the variables and dynamic prediction intervals incorporated the normal increase in interindividual growth variability (heteroscedasticity of variance) with age for more biologically accurate predictions. © 2018 American Academy of Forensic Sciences.
Craters on Earth, Moon, and Mars: Multivariate classification and mode of origin
Pike, R.J.
1974-01-01
Testing extraterrestrial craters and candidate terrestrial analogs for morphologic similitude is treated as a problem in numerical taxonomy. According to a principal-components solution and a cluster analysis, 402 representative craters on the Earth, the Moon, and Mars divide into two major classes of contrasting shapes and modes of origin. Craters of net accumulation of material (cratered lunar domes, Martian "calderas," and all terrestrial volcanoes except maars and tuff rings) group apart from craters of excavation (terrestrial meteorite impact and experimental explosion craters, typical Martian craters, and all other lunar craters). Maars and tuff rings belong to neither group but are transitional. The classification criteria are four independent attributes of topographic geometry derived from seven descriptive variables by the principal-components transformation. Morphometric differences between crater bowl and raised rim constitute the strongest of the four components. Although single topographic variables cannot confidently predict the genesis of individual extraterrestrial craters, multivariate statistical models constructed from several variables can distinguish consistently between large impact craters and volcanoes. ?? 1974.
Chiu, Chi-yang; Jung, Jeesun; Wang, Yifan; Weeks, Daniel E.; Wilson, Alexander F.; Bailey-Wilson, Joan E.; Amos, Christopher I.; Mills, James L.; Boehnke, Michael; Xiong, Momiao; Fan, Ruzong
2016-01-01
In this paper, extensive simulations are performed to compare two statistical methods to analyze multiple correlated quantitative phenotypes: (1) approximate F-distributed tests of multivariate functional linear models (MFLM) and additive models of multivariate analysis of variance (MANOVA), and (2) Gene Association with Multiple Traits (GAMuT) for association testing of high-dimensional genotype data. It is shown that approximate F-distributed tests of MFLM and MANOVA have higher power and are more appropriate for major gene association analysis (i.e., scenarios in which some genetic variants have relatively large effects on the phenotypes); GAMuT has higher power and is more appropriate for analyzing polygenic effects (i.e., effects from a large number of genetic variants each of which contributes a small amount to the phenotypes). MFLM and MANOVA are very flexible and can be used to perform association analysis for: (i) rare variants, (ii) common variants, and (iii) a combination of rare and common variants. Although GAMuT was designed to analyze rare variants, it can be applied to analyze a combination of rare and common variants and it performs well when (1) the number of genetic variants is large and (2) each variant contributes a small amount to the phenotypes (i.e., polygenes). MFLM and MANOVA are fixed effect models which perform well for major gene association analysis. GAMuT can be viewed as an extension of sequence kernel association tests (SKAT). Both GAMuT and SKAT are more appropriate for analyzing polygenic effects and they perform well not only in the rare variant case, but also in the case of a combination of rare and common variants. Data analyses of European cohorts and the Trinity Students Study are presented to compare the performance of the two methods. PMID:27917525
A new multivariate zero-adjusted Poisson model with applications to biomedicine.
Liu, Yin; Tian, Guo-Liang; Tang, Man-Lai; Yuen, Kam Chuen
2018-05-25
Recently, although advances were made on modeling multivariate count data, existing models really has several limitations: (i) The multivariate Poisson log-normal model (Aitchison and Ho, ) cannot be used to fit multivariate count data with excess zero-vectors; (ii) The multivariate zero-inflated Poisson (ZIP) distribution (Li et al., 1999) cannot be used to model zero-truncated/deflated count data and it is difficult to apply to high-dimensional cases; (iii) The Type I multivariate zero-adjusted Poisson (ZAP) distribution (Tian et al., 2017) could only model multivariate count data with a special correlation structure for random components that are all positive or negative. In this paper, we first introduce a new multivariate ZAP distribution, based on a multivariate Poisson distribution, which allows the correlations between components with a more flexible dependency structure, that is some of the correlation coefficients could be positive while others could be negative. We then develop its important distributional properties, and provide efficient statistical inference methods for multivariate ZAP model with or without covariates. Two real data examples in biomedicine are used to illustrate the proposed methods. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
ERIC Educational Resources Information Center
Yuan, Ke-Hai
2008-01-01
In the literature of mean and covariance structure analysis, noncentral chi-square distribution is commonly used to describe the behavior of the likelihood ratio (LR) statistic under alternative hypothesis. Due to the inaccessibility of the rather technical literature for the distribution of the LR statistic, it is widely believed that the…
Statistical polarization in greenhouse gas emissions: Theory and evidence.
Remuzgo, Lorena; Trueba, Carmen
2017-11-01
The current debate on climate change is over whether global warming can be limited in order to lessen its impacts. In this sense, evidence of a decrease in the statistical polarization in greenhouse gas (GHG) emissions could encourage countries to establish a stronger multilateral climate change agreement. Based on the interregional and intraregional components of the multivariate generalised entropy measures (Maasoumi, 1986), Gigliarano and Mosler (2009) proposed to study the statistical polarization concept from a multivariate view. In this paper, we apply this approach to study the evolution of such phenomenon in the global distribution of the main GHGs. The empirical analysis has been carried out for the time period 1990-2011, considering an endogenous grouping of countries (Aghevli and Mehran, 1981; Davies and Shorrocks, 1989). Most of the statistical polarization indices showed a slightly increasing pattern that was similar regardless of the number of groups considered. Finally, some policy implications are commented. Copyright © 2017 Elsevier Ltd. All rights reserved.
Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo
2017-03-15
Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Willard, Melissa A Bodnar; McGuffin, Victoria L; Smith, Ruth Waddell
2012-01-01
Salvia divinorum is a hallucinogenic herb that is internationally regulated. In this study, salvinorin A, the active compound in S. divinorum, was extracted from S. divinorum plant leaves using a 5-min extraction with dichloromethane. Four additional Salvia species (Salvia officinalis, Salvia guaranitica, Salvia splendens, and Salvia nemorosa) were extracted using this procedure, and all extracts were analyzed by gas chromatography-mass spectrometry. Differentiation of S. divinorum from other Salvia species was successful based on visual assessment of the resulting chromatograms. To provide a more objective comparison, the total ion chromatograms (TICs) were subjected to principal components analysis (PCA). Prior to PCA, the TICs were subjected to a series of data pretreatment procedures to minimize non-chemical sources of variance in the data set. Successful discrimination of S. divinorum from the other four Salvia species was possible based on visual assessment of the PCA scores plot. To provide a numerical assessment of the discrimination, a series of statistical procedures such as Euclidean distance measurement, hierarchical cluster analysis, Student's t tests, Wilcoxon rank-sum tests, and Pearson product moment correlation were also applied to the PCA scores. The statistical procedures were then compared to determine the advantages and disadvantages for forensic applications.
On Restructurable Control System Theory
NASA Technical Reports Server (NTRS)
Athans, M.
1983-01-01
The state of stochastic system and control theory as it impacts restructurable control issues is addressed. The multivariable characteristics of the control problem are addressed. The failure detection/identification problem is discussed as a multi-hypothesis testing problem. Control strategy reconfiguration, static multivariable controls, static failure hypothesis testing, dynamic multivariable controls, fault-tolerant control theory, dynamic hypothesis testing, generalized likelihood ratio (GLR) methods, and adaptive control are discussed.
Nojima, Masanori; Tokunaga, Mutsumi; Nagamura, Fumitaka
2018-05-05
To investigate under what circumstances inappropriate use of 'multivariate analysis' is likely to occur and to identify the population that needs more support with medical statistics. The frequency of inappropriate regression model construction in multivariate analysis and related factors were investigated in observational medical research publications. The inappropriate algorithm of using only variables that were significant in univariate analysis was estimated to occur at 6.4% (95% CI 4.8% to 8.5%). This was observed in 1.1% of the publications with a medical statistics expert (hereinafter 'expert') as the first author, 3.5% if an expert was included as coauthor and in 12.2% if experts were not involved. In the publications where the number of cases was 50 or less and the study did not include experts, inappropriate algorithm usage was observed with a high proportion of 20.2%. The OR of the involvement of experts for this outcome was 0.28 (95% CI 0.15 to 0.53). A further, nation-level, analysis showed that the involvement of experts and the implementation of unfavourable multivariate analysis are associated at the nation-level analysis (R=-0.652). Based on the results of this study, the benefit of participation of medical statistics experts is obvious. Experts should be involved for proper confounding adjustment and interpretation of statistical models. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
A direct-gradient multivariate index of biotic condition
Miranda, Leandro E.; Aycock, J.N.; Killgore, K. J.
2012-01-01
Multimetric indexes constructed by summing metric scores have been criticized despite many of their merits. A leading criticism is the potential for investigator bias involved in metric selection and scoring. Often there is a large number of competing metrics equally well correlated with environmental stressors, requiring a judgment call by the investigator to select the most suitable metrics to include in the index and how to score them. Data-driven procedures for multimetric index formulation published during the last decade have reduced this limitation, yet apprehension remains. Multivariate approaches that select metrics with statistical algorithms may reduce the level of investigator bias and alleviate a weakness of multimetric indexes. We investigated the suitability of a direct-gradient multivariate procedure to derive an index of biotic condition for fish assemblages in oxbow lakes in the Lower Mississippi Alluvial Valley. Although this multivariate procedure also requires that the investigator identify a set of suitable metrics potentially associated with a set of environmental stressors, it is different from multimetric procedures because it limits investigator judgment in selecting a subset of biotic metrics to include in the index and because it produces metric weights suitable for computation of index scores. The procedure, applied to a sample of 35 competing biotic metrics measured at 50 oxbow lakes distributed over a wide geographical region in the Lower Mississippi Alluvial Valley, selected 11 metrics that adequately indexed the biotic condition of five test lakes. Because the multivariate index includes only metrics that explain the maximum variability in the stressor variables rather than a balanced set of metrics chosen to reflect various fish assemblage attributes, it is fundamentally different from multimetric indexes of biotic integrity with advantages and disadvantages. As such, it provides an alternative to multimetric procedures.
[Cardiovascular diseases in the population of industrial towns and environmental factors].
Ibraeva, L K; Azhimetova, G N; Amanbekova, A U; Bakirova, R E
2015-01-01
To study the influence of environmental factors (EFs) on the development of cardiovascular diseases in the population of industrial towns of the Republic of Kazakhstan. The investigation covered an 18-59-year-old adult population who had been living in the urbanized areas of the Republic of Kazakhstan for at least 10 years, who worked in harmless conditions and were unregistered as having chronic diseases. At Stage 1, screening (a therapist's examination, blood general and immunological tests, and electrocardiography) was carried out for risk group persons who underwent in-depth clinical examination (blood biochemical test) at Stage 2. Multivariate statistical analysis has revealed that the development of hypertension is associated with the high concentration of sulfur dioxide in atmospheric air, copper in dust sediments, and zinc in soil and that of coronary heart disease (CHD) is related to the high levels of nitrogen dioxide in atmospheric air and zinc in dust sediments. Based on pathogenetic and statistical data and information available in the literature, hypertension and CHD are referred to as the diseases that may result from the influence of EFs.
Choline and betaine intake and the risk of colorectal cancer in men.
Lee, Jung Eun; Giovannucci, Edward; Fuchs, Charles S; Willett, Walter C; Zeisel, Steven H; Cho, Eunyoung
2010-03-01
Dietary choline and betaine have been hypothesized to decrease the risk of cancer because of their role as methyl donors in the one-carbon metabolism. However, it remains unknown whether dietary intake of choline and betaine is associated with colorectal cancer risk. We prospectively examined the associations between dietary choline and betaine intake and risk of colorectal cancer in men in the Health Professionals Follow-up Study. We followed 47,302 men and identified a total of 987 incident colorectal cancer cases from 1986 to 2004. We assessed dietary and supplemental choline and betaine intake every 4 years using a validated semiquantitative food frequency questionnaire. The Cox proportional hazards model was used to estimate multivariate relative risks and 95% confidence intervals. All statistical tests were two-sided. We did not find any statistically significant associations between choline intake or betaine intake and risk of colorectal cancer. Comparing the top quintile with bottom quintile, multivariate relative risks (95% confidence interval) were 0.97 (0.79-1.20; P(trend) = 0.87) for choline intake and 0.94 (0.77-1.16; P(trend) = 0.79) for betaine intake. Similarly, we observed no associations between colorectal cancer risk and choline from free choline, glycerophosphocholine, phosphocholine, phosphatidylcholine, or sphingomyelin. Our data do not support the hypothesis that choline and betaine intake is inversely associated with colorectal cancer risk.
Choline and betaine intake and the risk of colorectal cancer in men
Lee, Jung Eun; Giovannucci, Edward; Fuchs, Charles S.; Willett, Walter C.; Zeisel, Steven H.; Cho, Eunyoung
2010-01-01
Dietary choline and betaine have been hypothesized to decrease the risk of cancer because of their role as methyl donors in the one-carbon metabolism. However, it remains unknown whether dietary intake of choline and betaine is associated with colorectal cancer risk. We prospectively examined the associations between dietary choline and betaine intake and risk of colorectal cancer in men in the Health Professionals Follow-up Study. We followed 47,302 men and identified a total of 987 incident colorectal cancer cases from 1986 to 2004. We assessed dietary and supplemental choline and betaine intake every four years using a validated semi-quantitative food frequency questionnaire. The Cox proportional hazards model was used to estimate multivariate relative risks (RRs) and 95% confidence intervals (95% CIs). All statistical tests were two-sided. We did not find any statistically significant associations between choline intake or betaine intake and risk of colorectal cancer. Comparing the top quintile with bottom quintile, multivariate RRs (95% CI) were 0.97 (0.79-1.20; Ptrend = 0.87) for choline intake and 0.94 (0.77-1.16; Ptrend = 0.79) for betaine intake. Similarly, we observed no associations between colorectal cancer risk and choline from free choline, glycerophosphocholine, phosphocholine, phosphatidylcholine, or sphingomyelin. Our data do not support that choline and betaine intake is inversely associated with colorectal cancer risk. PMID:20160273
NASA Astrophysics Data System (ADS)
Zhou, Chao; Yin, Kunlong; Cao, Ying; Ahmed, Bayes; Li, Yuanyao; Catani, Filippo; Pourghasemi, Hamid Reza
2018-03-01
Landslide is a common natural hazard and responsible for extensive damage and losses in mountainous areas. In this study, Longju in the Three Gorges Reservoir area in China was taken as a case study for landslide susceptibility assessment in order to develop effective risk prevention and mitigation strategies. To begin, 202 landslides were identified, including 95 colluvial landslides and 107 rockfalls. Twelve landslide causal factor maps were prepared initially, and the relationship between these factors and each landslide type was analyzed using the information value model. Later, the unimportant factors were selected and eliminated using the information gain ratio technique. The landslide locations were randomly divided into two groups: 70% for training and 30% for verifying. Two machine learning models: the support vector machine (SVM) and artificial neural network (ANN), and a multivariate statistical model: the logistic regression (LR), were applied for landslide susceptibility modeling (LSM) for each type. The LSM index maps, obtained from combining the assessment results of the two landslide types, were classified into five levels. The performance of the LSMs was evaluated using the receiver operating characteristics curve and Friedman test. Results show that the elimination of noise-generating factors and the separated modeling of each landslide type have significantly increased the prediction accuracy. The machine learning models outperformed the multivariate statistical model and SVM model was found ideal for the case study area.
Devarajan, Karthik; Parsons, Theodore; Wang, Qiong; O'Neill, Raymond; Solomides, Charalambos; Peiper, Stephen C.; Testa, Joseph R.; Uzzo, Robert; Yang, Haifeng
2017-01-01
Intratumoral heterogeneity (ITH) is a prominent feature of kidney cancer. It is not known whether it has utility in finding associations between protein expression and clinical parameters. We used ITH that is detected by immunohistochemistry (IHC) to aid the association analysis between the loss of SWI/SNF components and clinical parameters.160 ccRCC tumors (40 per tumor stage) were used to generate tissue microarray (TMA). Four foci from different regions of each tumor were selected. IHC was performed against PBRM1, ARID1A, SETD2, SMARCA4, and SMARCA2. Statistical analyses were performed to correlate biomarker losses with patho-clinical parameters. Categorical variables were compared between groups using Fisher's exact tests. Univariate and multivariable analyses were used to correlate biomarker changes and patient survivals. Multivariable analyses were performed by constructing decision trees using the classification and regression trees (CART) methodology. IHC detected widespread ITH in ccRCC tumors. The statistical analysis of the “Truncal loss” (root loss) found additional correlations between biomarker losses and tumor stages than the traditional “Loss in tumor (total)”. Losses of SMARCA4 or SMARCA2 significantly improved prognosis for overall survival (OS). Losses of PBRM1, ARID1A or SETD2 had the opposite effect. Thus “Truncal Loss” analysis revealed hidden links between protein losses and patient survival in ccRCC. PMID:28445125
A Baseline for the Multivariate Comparison of Resting-State Networks
Allen, Elena A.; Erhardt, Erik B.; Damaraju, Eswar; Gruner, William; Segall, Judith M.; Silva, Rogers F.; Havlicek, Martin; Rachakonda, Srinivas; Fries, Jill; Kalyanam, Ravi; Michael, Andrew M.; Caprihan, Arvind; Turner, Jessica A.; Eichele, Tom; Adelsheim, Steven; Bryan, Angela D.; Bustillo, Juan; Clark, Vincent P.; Feldstein Ewing, Sarah W.; Filbey, Francesca; Ford, Corey C.; Hutchison, Kent; Jung, Rex E.; Kiehl, Kent A.; Kodituwakku, Piyadasa; Komesu, Yuko M.; Mayer, Andrew R.; Pearlson, Godfrey D.; Phillips, John P.; Sadek, Joseph R.; Stevens, Michael; Teuscher, Ursina; Thoma, Robert J.; Calhoun, Vince D.
2011-01-01
As the size of functional and structural MRI datasets expands, it becomes increasingly important to establish a baseline from which diagnostic relevance may be determined, a processing strategy that efficiently prepares data for analysis, and a statistical approach that identifies important effects in a manner that is both robust and reproducible. In this paper, we introduce a multivariate analytic approach that optimizes sensitivity and reduces unnecessary testing. We demonstrate the utility of this mega-analytic approach by identifying the effects of age and gender on the resting-state networks (RSNs) of 603 healthy adolescents and adults (mean age: 23.4 years, range: 12–71 years). Data were collected on the same scanner, preprocessed using an automated analysis pipeline based in SPM, and studied using group independent component analysis. RSNs were identified and evaluated in terms of three primary outcome measures: time course spectral power, spatial map intensity, and functional network connectivity. Results revealed robust effects of age on all three outcome measures, largely indicating decreases in network coherence and connectivity with increasing age. Gender effects were of smaller magnitude but suggested stronger intra-network connectivity in females and more inter-network connectivity in males, particularly with regard to sensorimotor networks. These findings, along with the analysis approach and statistical framework described here, provide a useful baseline for future investigations of brain networks in health and disease. PMID:21442040
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
ERIC Educational Resources Information Center
Joo, Soohyung; Kipp, Margaret E. I.
2015-01-01
Introduction: This study examines the structure of Web space in the field of library and information science using multivariate analysis of social tags from the Website, Delicious.com. A few studies have examined mathematical modelling of tags, mainly examining tagging in terms of tripartite graphs, pattern tracing and descriptive statistics. This…
ERIC Educational Resources Information Center
Magis, David; De Boeck, Paul
2011-01-01
We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is…
ERIC Educational Resources Information Center
Arbaugh, J. B.; Hwang, Alvin
2013-01-01
Seeking to assess the analytical rigor of empirical research in management education, this article reviews the use of multivariate statistical techniques in 85 studies of online and blended management education over the past decade and compares them with prescriptions offered by both the organization studies and educational research communities.…
On Some Multiple Decision Problems
1976-08-01
parameter space. Some recent results in the area of subset selection formulation are Gnanadesikan and Gupta [28], Gupta and Studden [43], Gupta and...York, pp. 363-376. [27) Gnanadesikan , M. (1966). Some Selection and Ranking Procedures for Multivariate Normal Populations. Ph.D. Thesis. Dept. of...Statist., Purdue Univ., West Lafayette, Indiana 47907. [28) Gnanadesikan , M. and Gupta, S. S. (1970). Selection procedures for multivariate normal
A lower-extremities kinematic comparison of deep-water running styles and treadmill running.
Killgore, Garry L; Wilcox, Anthony R; Caster, Brian L; Wood, Terry M
2006-11-01
The purpose of this investigation was to identify a deep-water running (DWR) style that most closely approximates terrestrial running, particularly relative to the lower extremities. Twenty intercollegiate distance runners (women, N = 12; men, N = 8) were videotaped from the right sagittal view while running on a treadmill (TR) and in deep water at 55-60% of their TR VO(2)max using 2 DWR styles: cross-country (CC) and high-knee (HK). Variables of interest were horizontal (X) and vertical (Y) displacement of the knee and ankle, stride rate (SR), VO(2), heart rate (HR), and rating of perceived exertion (RPE). Multivariate omnibus tests revealed statistically significant differences for RPE (p < 0.001). The post hoc pairwise comparisons revealed significant differences between TR and both DWR styles (p < 0.001). The kinematic variables multivariate omnibus tests were found to be statistically significant (p < 0.001 to p < 0.019). The post hoc pairwise comparisons revealed significant differences in SR (p < 0.001) between TR (1.25 +/- 0.08 Hz) and both DWR styles and also between the CC (0.81 +/- 0.08 Hz) and HK (1.14 +/- 0.10 Hz) styles of DWR. The CC style of DWR was found to be similar to TR with respect to linear ankle displacement, whereas the HK style was significantly different from TR in all comparisons made for ankle and knee displacement. The CC style of DWR is recommended as an adjunct to distance running training if the goal is to mimic the specificity of the ankle linear horizontal displacement of land-based running, but the SR will be slower at a comparable percentage of VO(2)max.
Gavini, S; Borges, L F; Finn, R T; Lo, W-K; Goldberg, H J; Burakoff, R; Feldman, N; Chan, W W
2017-05-01
Gastroesophageal reflux (GER) has been associated with idiopathic pulmonary fibrosis (IPF). Pathogenesis may be related to chronic micro-aspiration. We aimed to assess objective measures of GER on multichannel intraluminal impedance and pH study (MII-pH) and their relationship with pulmonary function testing (PFT) results, and to compare the performance of pH/acid reflux parameters vs corresponding MII/bolus parameters in predicting pulmonary dysfunction in IPF. This was a retrospective cohort study of IPF patients undergoing prelung transplant evaluation with MII-pH off acid suppression, and having received PFT within 3 months. Patients with prior fundoplication were excluded. Severe pulmonary dysfunction was defined using diffusion capacity of the lung for carbon monoxide (DLCO) ≤40%. Six pH/acid reflux parameters with corresponding MII/bolus reflux measures were specified a priori. Multivariate analyses were applied using forward stepwise logistic regression. Predictive value of each parameter for severe pulmonary dysfunction was calculated by area-under-the-receiver-operating-characteristic-curve or c-statistic. Forty-five subjects (67% M, age 59, 15 mild-moderate vs 30 severe) met criteria for inclusion. Patient demographics and clinical characteristics were similar between pulmonary dysfunction groups. Abnormal total reflux episodes and prolonged bolus clearance time were significantly associated with pulmonary dysfunction severity on univariate and multivariate analyses. No pH parameters were significant. The c-statistic of each pH parameter was lower than its MII counterpart in predicting pulmonary dysfunction. MII/bolus reflux, but not pH/acid reflux, was associated with pulmonary dysfunction in prelung transplant patients with IPF. MII-pH may be more valuable than pH testing alone in characterizing GER in IPF. © 2016 John Wiley & Sons Ltd.
Zhou, Jia; Ma, Yinghua; Ma, Jun; Zou, Zhiyong; Meng, Xiangkun; Tao, Fangbiao; Luo, Chunyan; Jing, Jin; Pan, Dehong; Luo, Jiayou; Zhang, Xin; Wang, Hong; Zhao, Haiping
2016-01-01
To understand the prevalence of myopia in primary and middle school students in 6 provinces and the possible influencing factors. Primary and middle school students were selected through multistage cluster sampling in 60 primary and middle schools in 6 provinces in China. The questionnaire survey and eyesight test were conducted among all the students selected according to the national student's physique and health survey protocol. Pearson chi-square test and binary multivariate logistic regression analysis were done to identify the influencing factors for myopia in students. The prevalence of myopia among primary and middle school students surveyed was 55.7%, the gender specific difference was statistically significant (59.7% for girls, 51.9% for boys) (P<0.01). The prevalence of myopia increased with age obviously. The prevalence was 35.8% in age group 6-8 years, 58.9% in age group 10-12 years, 73.4% in age group 13-15 years and 81.2% in age group 16-18 years, the differences were statistically significant (P<0.001). Single factor and multivariate analysis showed that parents' myopia, distance between computer screen and eyes, distance less than 30 cm between eyes and book while reading, distance less than 10 cm between chest and the table edge while studying, distance less than 3 cm between fingers and pen tip, sleep time, average outdoor activity time during last week, school sport activities in the afternoon, the size of television set at home, time spent on watching TV and playing computer were the influencing factors for myopia. The prevalence of myopia is till high in primary and middle school students. Myopia is associated with both genetic factors and individual eye health related behaviors.
A prospective study of periodontal disease and pancreatic cancer in US male health professionals.
Michaud, Dominique S; Joshipura, Kaumudi; Giovannucci, Edward; Fuchs, Charles S
2007-01-17
Two previous cohort studies reported positive associations between tooth loss or periodontitis and pancreatic cancer risk. Data on periodontal disease were obtained at baseline and every other year thereafter in a cohort of 51,529 male health professionals aged 40-75 years. A total of 216 patients were diagnosed with incident pancreatic cancer during 16 years of follow-up. Multivariable relative risks (RRs) and 95% confidence intervals (CIs) were estimated using Cox proportional hazards models controlling for potential confounders, including detailed smoking history. All statistical tests were two-sided. Compared with no periodontal disease, history of periodontal disease was associated with increased pancreatic cancer risk (overall, multivariable RR = 1.64, 95% CI = 1.19 to 2.26; P = .002; crude incidence rates: 61 versus 25 per 100,000 person-years; among never smokers, multivariable RR = 2.09, 95% CI = 1.18 to 3.71; P = .01; crude incidence rates: 61 versus 19 per 100,000 person-years). In contrast, baseline number of natural teeth and cumulative tooth loss during follow-up were not strongly associated with pancreatic cancer. The association between periodontal disease and increased risk of pancreatic cancer may occur through plausible biologic mechanisms, but confirmation of this association is necessary.
Buttini, Francesca; Pasquali, Irene; Brambilla, Gaetano; Copelli, Diego; Alberi, Massimiliano Dagli; Balducci, Anna Giulia; Bettini, Ruggero; Sisti, Viviana
2016-03-01
The aim of this work was to evaluate the effect of two different dry powder inhalers, of the NGI induction port and Alberta throat and of the actual inspiratory profiles of asthmatic patients on in-vitro drug inhalation performances. The two devices considered were a reservoir multidose and a capsule-based inhaler. The formulation used to test the inhalers was a combination of formoterol fumarate and beclomethasone dipropionate. A breath simulator was used to mimic inhalatory patterns previously determined in vivo. A multivariate approach was adopted to estimate the significance of the effect of the investigated variables in the explored domain. Breath simulator was a useful tool to mimic in vitro the in vivo inspiratory profiles of asthmatic patients. The type of throat coupled with the impactor did not affect the aerodynamic distribution of the investigated formulation. However, the type of inhaler and inspiratory profiles affected the respirable dose of drugs. The multivariate statistical approach demonstrated that the multidose inhaler, released efficiently a high fine particle mass independently from the inspiratory profiles adopted. Differently, the single dose capsule inhaler, showed a significant decrease of fine particle mass of both drugs when the device was activated using the minimum inspiratory volume (592 mL).
Ting, Hui-Min; Chang, Liyun; Huang, Yu-Jie; Wu, Jia-Ming; Wang, Hung-Yu; Horng, Mong-Fong; Chang, Chun-Ming; Lan, Jen-Hong; Huang, Ya-Yu; Fang, Fu-Min; Leung, Stephen Wan
2014-01-01
Purpose The aim of this study was to develop a multivariate logistic regression model with least absolute shrinkage and selection operator (LASSO) to make valid predictions about the incidence of moderate-to-severe patient-rated xerostomia among head and neck cancer (HNC) patients treated with IMRT. Methods and Materials Quality of life questionnaire datasets from 206 patients with HNC were analyzed. The European Organization for Research and Treatment of Cancer QLQ-H&N35 and QLQ-C30 questionnaires were used as the endpoint evaluation. The primary endpoint (grade 3+ xerostomia) was defined as moderate-to-severe xerostomia at 3 (XER3m) and 12 months (XER12m) after the completion of IMRT. Normal tissue complication probability (NTCP) models were developed. The optimal and suboptimal numbers of prognostic factors for a multivariate logistic regression model were determined using the LASSO with bootstrapping technique. Statistical analysis was performed using the scaled Brier score, Nagelkerke R2, chi-squared test, Omnibus, Hosmer-Lemeshow test, and the AUC. Results Eight prognostic factors were selected by LASSO for the 3-month time point: Dmean-c, Dmean-i, age, financial status, T stage, AJCC stage, smoking, and education. Nine prognostic factors were selected for the 12-month time point: Dmean-i, education, Dmean-c, smoking, T stage, baseline xerostomia, alcohol abuse, family history, and node classification. In the selection of the suboptimal number of prognostic factors by LASSO, three suboptimal prognostic factors were fine-tuned by Hosmer-Lemeshow test and AUC, i.e., Dmean-c, Dmean-i, and age for the 3-month time point. Five suboptimal prognostic factors were also selected for the 12-month time point, i.e., Dmean-i, education, Dmean-c, smoking, and T stage. The overall performance for both time points of the NTCP model in terms of scaled Brier score, Omnibus, and Nagelkerke R2 was satisfactory and corresponded well with the expected values. Conclusions Multivariate NTCP models with LASSO can be used to predict patient-rated xerostomia after IMRT. PMID:24586971
Lee, Tsair-Fwu; Chao, Pei-Ju; Ting, Hui-Min; Chang, Liyun; Huang, Yu-Jie; Wu, Jia-Ming; Wang, Hung-Yu; Horng, Mong-Fong; Chang, Chun-Ming; Lan, Jen-Hong; Huang, Ya-Yu; Fang, Fu-Min; Leung, Stephen Wan
2014-01-01
The aim of this study was to develop a multivariate logistic regression model with least absolute shrinkage and selection operator (LASSO) to make valid predictions about the incidence of moderate-to-severe patient-rated xerostomia among head and neck cancer (HNC) patients treated with IMRT. Quality of life questionnaire datasets from 206 patients with HNC were analyzed. The European Organization for Research and Treatment of Cancer QLQ-H&N35 and QLQ-C30 questionnaires were used as the endpoint evaluation. The primary endpoint (grade 3(+) xerostomia) was defined as moderate-to-severe xerostomia at 3 (XER3m) and 12 months (XER12m) after the completion of IMRT. Normal tissue complication probability (NTCP) models were developed. The optimal and suboptimal numbers of prognostic factors for a multivariate logistic regression model were determined using the LASSO with bootstrapping technique. Statistical analysis was performed using the scaled Brier score, Nagelkerke R(2), chi-squared test, Omnibus, Hosmer-Lemeshow test, and the AUC. Eight prognostic factors were selected by LASSO for the 3-month time point: Dmean-c, Dmean-i, age, financial status, T stage, AJCC stage, smoking, and education. Nine prognostic factors were selected for the 12-month time point: Dmean-i, education, Dmean-c, smoking, T stage, baseline xerostomia, alcohol abuse, family history, and node classification. In the selection of the suboptimal number of prognostic factors by LASSO, three suboptimal prognostic factors were fine-tuned by Hosmer-Lemeshow test and AUC, i.e., Dmean-c, Dmean-i, and age for the 3-month time point. Five suboptimal prognostic factors were also selected for the 12-month time point, i.e., Dmean-i, education, Dmean-c, smoking, and T stage. The overall performance for both time points of the NTCP model in terms of scaled Brier score, Omnibus, and Nagelkerke R(2) was satisfactory and corresponded well with the expected values. Multivariate NTCP models with LASSO can be used to predict patient-rated xerostomia after IMRT.
Milic, Natasa M.; Trajkovic, Goran Z.; Bukumiric, Zoran M.; Cirkovic, Andja; Nikolic, Ivan M.; Milin, Jelena S.; Milic, Nikola V.; Savic, Marko D.; Corac, Aleksandar M.; Marinkovic, Jelena M.; Stanisavljevic, Dejana M.
2016-01-01
Background Although recent studies report on the benefits of blended learning in improving medical student education, there is still no empirical evidence on the relative effectiveness of blended over traditional learning approaches in medical statistics. We implemented blended along with on-site (i.e. face-to-face) learning to further assess the potential value of web-based learning in medical statistics. Methods This was a prospective study conducted with third year medical undergraduate students attending the Faculty of Medicine, University of Belgrade, who passed (440 of 545) the final exam of the obligatory introductory statistics course during 2013–14. Student statistics achievements were stratified based on the two methods of education delivery: blended learning and on-site learning. Blended learning included a combination of face-to-face and distance learning methodologies integrated into a single course. Results Mean exam scores for the blended learning student group were higher than for the on-site student group for both final statistics score (89.36±6.60 vs. 86.06±8.48; p = 0.001) and knowledge test score (7.88±1.30 vs. 7.51±1.36; p = 0.023) with a medium effect size. There were no differences in sex or study duration between the groups. Current grade point average (GPA) was higher in the blended group. In a multivariable regression model, current GPA and knowledge test scores were associated with the final statistics score after adjusting for study duration and learning modality (p<0.001). Conclusion This study provides empirical evidence to support educator decisions to implement different learning environments for teaching medical statistics to undergraduate medical students. Blended and on-site training formats led to similar knowledge acquisition; however, students with higher GPA preferred the technology assisted learning format. Implementation of blended learning approaches can be considered an attractive, cost-effective, and efficient alternative to traditional classroom training in medical statistics. PMID:26859832
Milic, Natasa M; Trajkovic, Goran Z; Bukumiric, Zoran M; Cirkovic, Andja; Nikolic, Ivan M; Milin, Jelena S; Milic, Nikola V; Savic, Marko D; Corac, Aleksandar M; Marinkovic, Jelena M; Stanisavljevic, Dejana M
2016-01-01
Although recent studies report on the benefits of blended learning in improving medical student education, there is still no empirical evidence on the relative effectiveness of blended over traditional learning approaches in medical statistics. We implemented blended along with on-site (i.e. face-to-face) learning to further assess the potential value of web-based learning in medical statistics. This was a prospective study conducted with third year medical undergraduate students attending the Faculty of Medicine, University of Belgrade, who passed (440 of 545) the final exam of the obligatory introductory statistics course during 2013-14. Student statistics achievements were stratified based on the two methods of education delivery: blended learning and on-site learning. Blended learning included a combination of face-to-face and distance learning methodologies integrated into a single course. Mean exam scores for the blended learning student group were higher than for the on-site student group for both final statistics score (89.36±6.60 vs. 86.06±8.48; p = 0.001) and knowledge test score (7.88±1.30 vs. 7.51±1.36; p = 0.023) with a medium effect size. There were no differences in sex or study duration between the groups. Current grade point average (GPA) was higher in the blended group. In a multivariable regression model, current GPA and knowledge test scores were associated with the final statistics score after adjusting for study duration and learning modality (p<0.001). This study provides empirical evidence to support educator decisions to implement different learning environments for teaching medical statistics to undergraduate medical students. Blended and on-site training formats led to similar knowledge acquisition; however, students with higher GPA preferred the technology assisted learning format. Implementation of blended learning approaches can be considered an attractive, cost-effective, and efficient alternative to traditional classroom training in medical statistics.
[Analysis on willingness to pay for HIV antibody saliva rapid test and related factors].
Li, Junjie; Huo, Junli; Cui, Wenqing; Zhang, Xiujie; Hu, Yi; Su, Xingfang; Zhang, Wanyue; Li, Youfang; Shi, Yuhua; Jia, Manhong
2015-02-01
To understand the willingness to pay for HIV antibody saliva rapid test and its influential factors among people seeking counsel and HIV test, STD clinic patients, university students, migrant people, female sex workers (FSWs), men who have sex with men (MSM) and injecting drug users (IDUs). An anonymous questionnaire survey was conducted among 511 subjects in the 7 groups selected by different sampling methods, and 509 valid questionnaires were collected. The majority of subjects were males (54.8%) and aged 20-29 years (41.5%). Among the subjects, 60.3% had education level of high school or above, 55.4% were unmarried, 37.3% were unemployed, 73.3% had monthly expenditure <2 000 Yuan RMB, 44.2% had received HIV test, 28.3% knew HIV saliva test, 21.0% were willing to receive HIV saliva test, 2.0% had received HIV saliva test, only 1.0% had bought HIV test kit for self-test, and 84.1% were willing to pay for HIV antibody saliva rapid test. Univariate logistic regression analysis indicated that subject group, age, education level, employment status, monthly expenditure level, HIV test experience and willingness to receive HIV saliva test were correlated statistically with willingness to pay for HIV antibody saliva rapid test. Multivariate logistic regression analysis showed that subject group and monthly expenditure level were statistically correlated with willingness to pay for HIV antibody saliva rapid test. The willingness to pay for HIV antibody saliva rapid test and acceptable price of HIV antibody saliva rapid test varied in different areas and populations. Different populations may have different willingness to pay for HIV antibody saliva rapid test;the affordability of the test could influence the willingness to pay for the test.
Barton, Mitch; Yeatts, Paul E; Henson, Robin K; Martin, Scott B
2016-12-01
There has been a recent call to improve data reporting in kinesiology journals, including the appropriate use of univariate and multivariate analysis techniques. For example, a multivariate analysis of variance (MANOVA) with univariate post hocs and a Bonferroni correction is frequently used to investigate group differences on multiple dependent variables. However, this univariate approach decreases power, increases the risk for Type 1 error, and contradicts the rationale for conducting multivariate tests in the first place. The purpose of this study was to provide a user-friendly primer on conducting descriptive discriminant analysis (DDA), which is a post-hoc strategy to MANOVA that takes into account the complex relationships among multiple dependent variables. A real-world example using the Statistical Package for the Social Sciences syntax and data from 1,095 middle school students on their body composition and body image are provided to explain and interpret the results from DDA. While univariate post hocs increased the risk for Type 1 error to 76%, the DDA identified which dependent variables contributed to group differences and which groups were different from each other. For example, students in the very lean and Healthy Fitness Zone categories for body mass index experienced less pressure to lose weight, more satisfaction with their body, and higher physical self-concept than the Needs Improvement Zone groups. However, perceived pressure to gain weight did not contribute to group differences because it was a suppressor variable. Researchers are encouraged to use DDA when investigating group differences on multiple correlated dependent variables to determine which variables contributed to group differences.
NASA Astrophysics Data System (ADS)
DSouza, Adora M.; Abidin, Anas Z.; Leistritz, Lutz; Wismüller, Axel
2017-02-01
We investigate the applicability of large-scale Granger Causality (lsGC) for extracting a measure of multivariate information flow between pairs of regional brain activities from resting-state functional MRI (fMRI) and test the effectiveness of these measures for predicting a disease state. Such pairwise multivariate measures of interaction provide high-dimensional representations of connectivity profiles for each subject and are used in a machine learning task to distinguish between healthy controls and individuals presenting with symptoms of HIV Associated Neurocognitive Disorder (HAND). Cognitive impairment in several domains can occur as a result of HIV infection of the central nervous system. The current paradigm for assessing such impairment is through neuropsychological testing. With fMRI data analysis, we aim at non-invasively capturing differences in brain connectivity patterns between healthy subjects and subjects presenting with symptoms of HAND. To classify the extracted interaction patterns among brain regions, we use a prototype-based learning algorithm called Generalized Matrix Learning Vector Quantization (GMLVQ). Our approach to characterize connectivity using lsGC followed by GMLVQ for subsequent classification yields good prediction results with an accuracy of 87% and an area under the ROC curve (AUC) of up to 0.90. We obtain a statistically significant improvement (p<0.01) over a conventional Granger causality approach (accuracy = 0.76, AUC = 0.74). High accuracy and AUC values using our multivariate method to connectivity analysis suggests that our approach is able to better capture changes in interaction patterns between different brain regions when compared to conventional Granger causality analysis known from the literature.
Prognostic value of stromal decorin expression in patients with breast cancer: a meta-analysis.
Li, Shuang-Jiang; Chen, Da-Li; Zhang, Wen-Biao; Shen, Cheng; Che, Guo-Wei
2015-11-01
Numbers of studies have investigated the biological functions of decorin (DCN) in oncogenesis, tumor progression, angiogenesis and metastasis. Although many of them aim to highlight the prognostic value of stromal DCN expression in breast cancer, some controversial results still exist and a consensus has not been reached until now. Therefore, our meta-analysis aims to determine the prognostic significance of stromal DCN expression in breast cancer patients. PubMed, EMBASE, the Web of Science and China National Knowledge Infrastructure (CNKI) databases were searched for full-text literatures met out inclusion criteria. We applied the hazard ratio (HR) with 95% confidence interval (CI) as the appropriate summarized statistics. Q-test and I(2) statistic were employed to estimate the level of heterogeneity across the included studies. Sensitivity analysis was conducted to further identify the possible origins of heterogeneity. The publication bias was detected by Begg's test and Egger's test. There were three English literatures (involving 6 studies) included into our meta-analysis. On the one hand, both the summarized outcomes based on univariate analysis (HR: 0.513; 95% CI: 0.406-0.648; P<0.001) and multivariate analysis (HR: 0.544; 95% CI: 0.388-0.763; P<0.001) indicated that stromal DCN expression could promise the high cancer-specific survival (CSS) of breast cancer patients. On the other hand, both the summarized outcomes based on univariate analysis (HR: 0.504; 95% CI: 0.389-0.651; P<0.001) and multivariate analysis (HR: 0.568; 95% CI: 0.400-0.806; P=0.002) also indicated that stromal DCN expression was positively associated with high disease-free survival (DFS) of breast cancer patients. No significant heterogeneity or publication bias was observed within this meta-analysis. The present evidences indicate that high stromal DCN expression can significantly predict the good prognosis in patients with breast cancer. The discoveries from our meta-analysis have better be confirmed in the updated review pooling more relevant investigations in the future.
NASA Technical Reports Server (NTRS)
Djorgovski, S. G.
1994-01-01
We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complex database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects of the SKICAT system, and of some of the scientific results achieved to date. We also developed a user-friendly package for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications and has produced real, published results.
Mendell, M J; Eliseeva, E A; Davies, M M; Lobscheid, A
2016-08-01
Limited evidence has associated lower ventilation rates (VRs) in schools with reduced student learning or achievement. We analyzed longitudinal data collected over two school years from 150 classrooms in 28 schools within three California school districts. We estimated daily classroom VRs from real-time indoor carbon dioxide measured by web-connected sensors. School districts provided individual-level scores on standard tests in Math and English, and classroom-level demographic data. Analyses assessing learning effects used two VR metrics: average VRs for 30 days prior to tests, and proportion of prior daily VRs above specified thresholds during the year. We estimated relationships between scores and VR metrics in multivariate models with generalized estimating equations. All school districts had median school-year VRs below the California VR standard. Most models showed some positive associations of VRs with test scores; however, estimates varied in magnitude and few 95% confidence intervals excluded the null. Combined-district models estimated statistically significant increases of 0.6 points (P = 0.01) on English tests for each 10% increase in prior 30-day VRs. Estimated increases in Math were of similar magnitude but not statistically significant. Findings suggest potential small positive associations between classroom VRs and learning. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.
NASA Technical Reports Server (NTRS)
Dimitri, P. S.; Wall, C. 3rd; Oas, J. G.; Rauch, S. D.
2001-01-01
Meniere's disease (MD) and migraine associated dizziness (MAD) are two disorders that can have similar symptomatologies, but differ vastly in treatment. Vestibular testing is sometimes used to help differentiate between these disorders, but the inefficiency of a human interpreter analyzing a multitude of variables independently decreases its utility. Our hypothesis was that we could objectively discriminate between patients with MD and those with MAD using select variables from the vestibular test battery. Sinusoidal harmonic acceleration test variables were reduced to three vestibulo-ocular reflex physiologic parameters: gain, time constant, and asymmetry. A combination of these parameters plus a measurement of reduced vestibular response from caloric testing allowed us to achieve a joint classification rate of 91%, independent quadratic classification algorithm. Data from posturography were not useful for this type of differentiation. Overall, our classification function can be used as an unbiased assistant to discriminate between MD and MAD and gave us insight into the pathophysiologic differences between the two disorders.
Bogart, Laura M; Howerton, Devery; Lange, James; Setodji, Claude Messan; Becker, Kirsten; Klein, David J; Asch, Steven M
2010-06-01
We examined provider-reported barriers to rapid HIV testing in U.S. urban non-profit community clinics, community-based organizations (CBOs), and hospitals. 12 primary metropolitan statistical areas (PMSAs; three per region) were sampled randomly, with sampling weights proportional to AIDS case reports. Across PMSAs, all 671 hospitals and a random sample of 738 clinics/CBOs were telephoned for a survey on rapid HIV test availability. Of the 671 hospitals, 172 hospitals were randomly selected for barriers questions, for which 158 laboratory and 136 department staff were eligible and interviewed in 2005. Of the 738 clinics/CBOs, 276 were randomly selected for barriers questions, 206 were reached, and 118 were eligible and interviewed in 2005-2006. In multivariate models, barriers regarding translation of administrative/quality assurance policies into practice were significantly associated with rapid HIV testing availability. For greater rapid testing diffusion, policies are needed to reduce administrative barriers and provide quality assurance training to non-laboratory staff.
Newell, John D; Fuld, Matthew K; Allmendinger, Thomas; Sieren, Jered P; Chan, Kung-Sik; Guo, Junfeng; Hoffman, Eric A
2015-01-01
The purpose of this study was to evaluate the impact of ultralow radiation dose single-energy computed tomographic (CT) acquisitions with Sn prefiltration and third-generation iterative reconstruction on density-based quantitative measures of growing interest in phenotyping pulmonary disease. The effects of both decreasing dose and different body habitus on the accuracy of the mean CT attenuation measurements and the level of image noise (SD) were evaluated using the COPDGene 2 test object, containing 8 different materials of interest ranging from air to acrylic and including various density foams. A third-generation dual-source multidetector CT scanner (Siemens SOMATOM FORCE; Siemens Healthcare AG, Erlangen, Germany) running advanced modeled iterative reconstruction (ADMIRE) software (Siemens Healthcare AG) was used.We used normal and very large body habitus rings at dose levels varying from 1.5 to 0.15 mGy using a spectral-shaped (0.6-mm Sn) tube output of 100 kV(p). Three CT scans were obtained at each dose level using both rings. Regions of interest for each material in the test object scans were automatically extracted. The Hounsfield unit values of each material using weighted filtered back projection (WFBP) at 1.5 mGy was used as the reference value to evaluate shifts in CT attenuation at lower dose levels using either WFBP or ADMIRE. Statistical analysis included basic statistics, Welch t tests, multivariable covariant model using the F test to assess the significance of the explanatory (independent) variables on the response (dependent) variable, and CT mean attenuation, in the multivariable covariant model including reconstruction method. Multivariable regression analysis of the mean CT attenuation values showed a significant difference with decreasing dose between ADMIRE and WFBP. The ADMIRE has reduced noise and more stable CT attenuation compared with WFBP. There was a strong effect on the mean CT attenuation values of the scanned materials for ring size (P < 0.0001) and dose level (P < 0.0001). The number of voxels in the region of interest for the particular material studied did not demonstrate a significant effect (P > 0.05). The SD was lower with ADMIRE compared with WFBP at all dose levels and ring sizes (P < 0.05). The third-generation dual-source CT scanners using third-generation iterative reconstruction methods can acquire accurate quantitative CT images with acceptable image noise at very low-dose levels (0.15 mGy). This opens up new diagnostic and research opportunities in CT phenotyping of the lung for developing new treatments and increased understanding of pulmonary disease.
Haile, Demewoz; Nigatu, Dabere; Gashaw, Ketema; Demelash, Habtamu
2016-01-01
Academic achievement of school age children can be affected by several factors such as nutritional status, demographics, and socioeconomic factors. Though evidence about the magnitude of malnutrition is well established in Ethiopia, there is a paucity of evidence about the association of nutritional status with academic performance among the nation's school age children. Hence, this study aimed to determine how nutritional status and cognitive function are associated with academic performance of school children in Goba town, South East Ethiopia. An institution based cross-sectional study was conducted among 131 school age students from primary schools in Goba town enrolled during the 2013/2014 academic year. The nutritional status of students was assessed by anthropometric measurement, while the cognitive assessment was measured by the Kaufman Assessment Battery for Children (KABC-II) and Ravens colored progressive matrices (Raven's CPM) tests. The academic performance of the school children was measured by collecting the preceding semester academic result from the school record. Descriptive statistics, bivariate and multivariable linear regression were used in the statistical analysis. This study found a statistically significant positive association between all cognitive test scores and average academic performance except for number recall (p = 0.12) and hand movements (p = 0.08). The correlation between all cognitive test scores and mathematics score was found positive and statistically significant (p < 0.05). In the multivariable linear regression model, better wealth index was significantly associated with higher mathematics score (ß = 0.63; 95 % CI: 0.12-0.74). Similarly a unit change in height for age z score resulted in 2.11 unit change in mathematics score (ß = 2.11; 95 % CI: 0.002-4.21). A single unit change of wealth index resulted 0.53 unit changes in average score of all academic subjects among school age children (ß = 0.53; 95 % CI: 0.11-0.95). A single unit change of age resulted 3.23 unit change in average score of all academic subjects among school age children (ß = 3.23; 95 % CI: 1.20-5.27). Nutritional status (height for age Z score) and wealth could be modifiable factors to improve academic performance of school age children. Moreover, interventions to improve nutrition for mothers and children may be an important contributor to academic success and national economic growth in Ethiopia. Further study with strong design and large sample size is needed.
Sheehan, D V; Sheehan, K H
1982-08-01
The history of the classification of anxiety, hysterical, and hypochondriacal disorders is reviewed. Problems in the ability of current classification schemes to predict, control, and describe the relationship between the symptoms and other phenomena are outlined. Existing classification schemes failed the first test of a good classification model--that of providing categories that are mutually exclusive. The independence of these diagnostic categories from each other does not appear to hold up on empirical testing. In the absence of inherently mutually exclusive categories, further empirical investigation of these classes is obstructed since statistically valid analysis of the nominal data and any useful multivariate analysis would be difficult if not impossible. It is concluded that the existing classifications are unsatisfactory and require some fundamental reconceptualization.
The impact of moderate wine consumption on the risk of developing prostate cancer.
Vartolomei, Mihai Dorin; Kimura, Shoji; Ferro, Matteo; Foerster, Beat; Abufaraj, Mohammad; Briganti, Alberto; Karakiewicz, Pierre I; Shariat, Shahrokh F
2018-01-01
To investigate the impact of moderate wine consumption on the risk of prostate cancer (PCa). We focused on the differential effect of moderate consumption of red versus white wine. This study was a meta-analysis that includes data from case-control and cohort studies. A systematic search of Web of Science, Medline/PubMed, and Cochrane library was performed on December 1, 2017. Studies were deemed eligible if they assessed the risk of PCa due to red, white, or any wine using multivariable logistic regression analysis. We performed a formal meta-analysis for the risk of PCa according to moderate wine and wine type consumption (white or red). Heterogeneity between studies was assessed using Cochrane's Q test and I 2 statistics. Publication bias was assessed using Egger's regression test. A total of 930 abstracts and titles were initially identified. After removal of duplicates, reviews, and conference abstracts, 83 full-text original articles were screened. Seventeen studies (611,169 subjects) were included for final evaluation and fulfilled the inclusion criteria. In the case of moderate wine consumption: the pooled risk ratio (RR) for the risk of PCa was 0.98 (95% CI 0.92-1.05, p =0.57) in the multivariable analysis. Moderate white wine consumption increased the risk of PCa with a pooled RR of 1.26 (95% CI 1.10-1.43, p =0.001) in the multi-variable analysis. Meanwhile, moderate red wine consumption had a protective role reducing the risk by 12% (RR 0.88, 95% CI 0.78-0.999, p =0.047) in the multivariable analysis that comprised 222,447 subjects. In this meta-analysis, moderate wine consumption did not impact the risk of PCa. Interestingly, regarding the type of wine, moderate consumption of white wine increased the risk of PCa, whereas moderate consumption of red wine had a protective effect. Further analyses are needed to assess the differential molecular effect of white and red wine conferring their impact on PCa risk.
NASA Astrophysics Data System (ADS)
Belianinov, Alex; Ganesh, Panchapakesan; Lin, Wenzhi; Sales, Brian C.; Sefat, Athena S.; Jesse, Stephen; Pan, Minghu; Kalinin, Sergei V.
2014-12-01
Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1-xSex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.
NASA Astrophysics Data System (ADS)
Brizzi, S.; Sandri, L.; Funiciello, F.; Corbi, F.; Piromallo, C.; Heuret, A.
2018-03-01
The observed maximum magnitude of subduction megathrust earthquakes is highly variable worldwide. One key question is which conditions, if any, favor the occurrence of giant earthquakes (Mw ≥ 8.5). Here we carry out a multivariate statistical study in order to investigate the factors affecting the maximum magnitude of subduction megathrust earthquakes. We find that the trench-parallel extent of subduction zones and the thickness of trench sediments provide the largest discriminating capability between subduction zones that have experienced giant earthquakes and those having significantly lower maximum magnitude. Monte Carlo simulations show that the observed spatial distribution of giant earthquakes cannot be explained by pure chance to a statistically significant level. We suggest that the combination of a long subduction zone with thick trench sediments likely promotes a great lateral rupture propagation, characteristic of almost all giant earthquakes.
Testing the causality of Hawkes processes with time reversal
NASA Astrophysics Data System (ADS)
Cordi, Marcus; Challet, Damien; Muni Toke, Ioane
2018-03-01
We show that univariate and symmetric multivariate Hawkes processes are only weakly causal: the true log-likelihoods of real and reversed event time vectors are almost equal, thus parameter estimation via maximum likelihood only weakly depends on the direction of the arrow of time. In ideal (synthetic) conditions, tests of goodness of parametric fit unambiguously reject backward event times, which implies that inferring kernels from time-symmetric quantities, such as the autocovariance of the event rate, only rarely produce statistically significant fits. Finally, we find that fitting financial data with many-parameter kernels may yield significant fits for both arrows of time for the same event time vector, sometimes favouring the backward time direction. This goes to show that a significant fit of Hawkes processes to real data with flexible kernels does not imply a definite arrow of time unless one tests it.
Comparative Research of Navy Voluntary Education at Operational Commands
2017-03-01
return on investment, ROI, logistic regression, multivariate analysis, descriptive statistics, Markov, time-series, linear programming 15. NUMBER...21 B. DESCRIPTIVE STATISTICS TABLES ...............................................25 C. PRIVACY CONSIDERATIONS...THIS PAGE INTENTIONALLY LEFT BLANK xi LIST OF TABLES Table 1. Variables and Descriptions . Adapted from NETC (2016). .......................21
ERIC Educational Resources Information Center
Henry, Gary T.; And Others
1992-01-01
A statistical technique is presented for developing performance standards based on benchmark groups. The benchmark groups are selected using a multivariate technique that relies on a squared Euclidean distance method. For each observation unit (a school district in the example), a unique comparison group is selected. (SLD)
Most analyses of daily time series epidemiology data relate mortality or morbidity counts to PM and other air pollutants by means of single-outcome regression models using multiple predictors, without taking into account the complex statistical structure of the predictor variable...
Challenging Conventional Wisdom for Multivariate Statistical Models with Small Samples
ERIC Educational Resources Information Center
McNeish, Daniel
2017-01-01
In education research, small samples are common because of financial limitations, logistical challenges, or exploratory studies. With small samples, statistical principles on which researchers rely do not hold, leading to trust issues with model estimates and possible replication issues when scaling up. Researchers are generally aware of such…
Yan, Zhengbing; Kuang, Te-Hui; Yao, Yuan
2017-09-01
In recent years, multivariate statistical monitoring of batch processes has become a popular research topic, wherein multivariate fault isolation is an important step aiming at the identification of the faulty variables contributing most to the detected process abnormality. Although contribution plots have been commonly used in statistical fault isolation, such methods suffer from the smearing effect between correlated variables. In particular, in batch process monitoring, the high autocorrelations and cross-correlations that exist in variable trajectories make the smearing effect unavoidable. To address such a problem, a variable selection-based fault isolation method is proposed in this research, which transforms the fault isolation problem into a variable selection problem in partial least squares discriminant analysis and solves it by calculating a sparse partial least squares model. As different from the traditional methods, the proposed method emphasizes the relative importance of each process variable. Such information may help process engineers in conducting root-cause diagnosis. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Odgaard, Eric C; Fowler, Robert L
2010-06-01
In 2005, the Journal of Consulting and Clinical Psychology (JCCP) became the first American Psychological Association (APA) journal to require statistical measures of clinical significance, plus effect sizes (ESs) and associated confidence intervals (CIs), for primary outcomes (La Greca, 2005). As this represents the single largest editorial effort to improve statistical reporting practices in any APA journal in at least a decade, in this article we investigate the efficacy of that change. All intervention studies published in JCCP in 2003, 2004, 2007, and 2008 were reviewed. Each article was coded for method of clinical significance, type of ES, and type of associated CI, broken down by statistical test (F, t, chi-square, r/R(2), and multivariate modeling). By 2008, clinical significance compliance was 75% (up from 31%), with 94% of studies reporting some measure of ES (reporting improved for individual statistical tests ranging from eta(2) = .05 to .17, with reasonable CIs). Reporting of CIs for ESs also improved, although only to 40%. Also, the vast majority of reported CIs used approximations, which become progressively less accurate for smaller sample sizes and larger ESs (cf. Algina & Kessleman, 2003). Changes are near asymptote for ESs and clinical significance, but CIs lag behind. As CIs for ESs are required for primary outcomes, we show how to compute CIs for the vast majority of ESs reported in JCCP, with an example of how to use CIs for ESs as a method to assess clinical significance.
Yeo, Heather; Niland, Joyce; Milne, Dana; ter Veer, Anna; Bekaii-Saab, Tanios; Farma, Jeffrey M.; Lai, Lily; Skibber, John M.; Small, William; Wilkinson, Neal; Schrag, Deborah
2015-01-01
Background: Laparoscopic colectomy has been shown to have equivalent oncologic outcomes to open colectomy for the management of colon cancer, but its adoption nationally has been slow. This study investigates the prevalence and factors associated with laparoscopic colorectal resection at National Comprehensive Cancer Network (NCCN) centers. Methods: Data on patients undergoing surgery for colon and rectal cancer at NCCN centers from 2005 to 2010 were obtained from chart review of medical records for the NCCN Outcomes Project and included information on socioeconomic status, insurance coverage, comorbidity, and physician-reported Eastern Cooperative Oncology Group (ECOG) performance status. Associations between receipt of minimally invasive surgery and patient and clinical variables were analyzed with univariate and multivariable logistic regression. All statistical tests were two-sided. Results: A total of 4032 patients, diagnosed between September 2005 and December 2010, underwent elective colon or rectal resection for cancer at NCCN centers. Median age of colon cancer patients was 62.6 years, and 49% were men. The percent of colon cancer patients treated with minimally invasive surgery (MIS) increased from 35% in 2006 to 51% in 2010 across all centers but varied statistically significantly between centers. On multivariable analysis, factors associated with minimally invasive surgery for colon cancer patients who had surgery at an NCCN institution were older age (P = .02), male sex (P = .006), fewer comorbidities (P ≤ .001), lower final T-stage (P < .001), median household income greater than or equal to $80000 (P < .001), ECOG performance status = 0 (P = .02), and NCCN institution (P ≤ .001). Conclusions: The use of MIS increased at NCCN centers. However, there was statistically significant variation in adoption of MIS technique among centers. PMID:25527640
Péron, Julien; Pond, Gregory R; Gan, Hui K; Chen, Eric X; Almufti, Roula; Maillet, Denis; You, Benoit
2012-07-03
The Consolidated Standards of Reporting Trials (CONSORT) guidelines were developed in the mid-1990s for the explicit purpose of improving clinical trial reporting. However, there is little information regarding the adherence to CONSORT guidelines of recent publications of randomized controlled trials (RCTs) in oncology. All phase III RCTs published between 2005 and 2009 were reviewed using an 18-point overall quality score for reporting based on the 2001 CONSORT statement. Multivariable linear regression was used to identify features associated with improved reporting quality. To provide baseline data for future evaluations of reporting quality, RCTs were also assessed according to the 2010 revised CONSORT statement. All statistical tests were two-sided. A total of 357 RCTs were reviewed. The mean 2001 overall quality score was 13.4 on a scale of 0-18, whereas the mean 2010 overall quality score was 19.3 on a scale of 0-27. The overall RCT reporting quality score improved by 0.21 points per year from 2005 to 2009. Poorly reported items included method used to generate the random allocation (adequately reported in 29% of trials), whether and how blinding was applied (41%), method of allocation concealment (51%), and participant flow (59%). High impact factor (IF, P = .003), recent publication date (P = .008), and geographic origin of RCTs (P = .003) were independent factors statistically significantly associated with higher reporting quality in a multivariable regression model. Sample size, tumor type, and positivity of trial results were not associated with higher reporting quality, whereas funding source and treatment type had a borderline statistically significant impact. The results show that numerous items remained unreported for many trials. Thus, given the potential impact of poorly reported trials, oncology journals should require even stricter adherence to the CONSORT guidelines.
Pingault, Jean Baptiste; Côté, Sylvana M.; Petitclerc, Amélie; Vitaro, Frank; Tremblay, Richard E.
2015-01-01
Background Parental educational expectations have been associated with children’s educational attainment in a number of long-term longitudinal studies, but whether this relationship is causal has long been debated. The aims of this prospective study were twofold: 1) test whether low maternal educational expectations contributed to failure to graduate from high school; and 2) compare the results obtained using different strategies for accounting for confounding variables (i.e. multivariate regression and propensity score matching). Methodology/Principal Findings The study sample included 1,279 participants from the Quebec Longitudinal Study of Kindergarten Children. Maternal educational expectations were assessed when the participants were aged 12 years. High school graduation – measuring educational attainment – was determined through the Quebec Ministry of Education when the participants were aged 22–23 years. Findings show that when using the most common statistical approach (i.e. multivariate regressions to adjust for a restricted set of potential confounders) the contribution of low maternal educational expectations to failure to graduate from high school was statistically significant. However, when using propensity score matching, the contribution of maternal expectations was reduced and remained statistically significant only for males. Conclusions/Significance The results of this study are consistent with the possibility that the contribution of parental expectations to educational attainment is overestimated in the available literature. This may be explained by the use of a restricted range of potential confounding variables as well as the dearth of studies using appropriate statistical techniques and study designs in order to minimize confounding. Each of these techniques and designs, including propensity score matching, has its strengths and limitations: A more comprehensive understanding of the causal role of parental expectations will stem from a convergence of findings from studies using different techniques and designs. PMID:25803867
Lake bed classification using acoustic data
Yin, Karen K.; Li, Xing; Bonde, John; Richards, Carl; Cholwek, Gary
1998-01-01
As part of our effort to identify the lake bed surficial substrates using remote sensing data, this work designs pattern classifiers by multivariate statistical methods. Probability distribution of the preprocessed acoustic signal is analyzed first. A confidence region approach is then adopted to improve the design of the existing classifier. A technique for further isolation is proposed which minimizes the expected loss from misclassification. The devices constructed are applicable for real-time lake bed categorization. A mimimax approach is suggested to treat more general cases where the a priori probability distribution of the substrate types is unknown. Comparison of the suggested methods with the traditional likelihood ratio tests is discussed.
Smith, Joseph M.; Mather, Martha E.
2012-01-01
Ecological indicators are science-based tools used to assess how human activities have impacted environmental resources. For monitoring and environmental assessment, existing species assemblage data can be used to make these comparisons through time or across sites. An impediment to using assemblage data, however, is that these data are complex and need to be simplified in an ecologically meaningful way. Because multivariate statistics are mathematical relationships, statistical groupings may not make ecological sense and will not have utility as indicators. Our goal was to define a process to select defensible and ecologically interpretable statistical simplifications of assemblage data in which researchers and managers can have confidence. For this, we chose a suite of statistical methods, compared the groupings that resulted from these analyses, identified convergence among groupings, then we interpreted the groupings using species and ecological guilds. When we tested this approach using a statewide stream fish dataset, not all statistical methods worked equally well. For our dataset, logistic regression (Log), detrended correspondence analysis (DCA), cluster analysis (CL), and non-metric multidimensional scaling (NMDS) provided consistent, simplified output. Specifically, the Log, DCA, CL-1, and NMDS-1 groupings were ≥60% similar to each other, overlapped with the fluvial-specialist ecological guild, and contained a common subset of species. Groupings based on number of species (e.g., Log, DCA, CL and NMDS) outperformed groupings based on abundance [e.g., principal components analysis (PCA) and Poisson regression]. Although the specific methods that worked on our test dataset have generality, here we are advocating a process (e.g., identifying convergent groupings with redundant species composition that are ecologically interpretable) rather than the automatic use of any single statistical tool. We summarize this process in step-by-step guidance for the future use of these commonly available ecological and statistical methods in preparing assemblage data for use in ecological indicators.
Huynh-Thu, Vân Anh; Saeys, Yvan; Wehenkel, Louis; Geurts, Pierre
2012-07-01
Univariate statistical tests are widely used for biomarker discovery in bioinformatics. These procedures are simple, fast and their output is easily interpretable by biologists but they can only identify variables that provide a significant amount of information in isolation from the other variables. As biological processes are expected to involve complex interactions between variables, univariate methods thus potentially miss some informative biomarkers. Variable relevance scores provided by machine learning techniques, however, are potentially able to highlight multivariate interacting effects, but unlike the p-values returned by univariate tests, these relevance scores are usually not statistically interpretable. This lack of interpretability hampers the determination of a relevance threshold for extracting a feature subset from the rankings and also prevents the wide adoption of these methods by practicians. We evaluated several, existing and novel, procedures that extract relevant features from rankings derived from machine learning approaches. These procedures replace the relevance scores with measures that can be interpreted in a statistical way, such as p-values, false discovery rates, or family wise error rates, for which it is easier to determine a significance level. Experiments were performed on several artificial problems as well as on real microarray datasets. Although the methods differ in terms of computing times and the tradeoff, they achieve in terms of false positives and false negatives, some of them greatly help in the extraction of truly relevant biomarkers and should thus be of great practical interest for biologists and physicians. As a side conclusion, our experiments also clearly highlight that using model performance as a criterion for feature selection is often counter-productive. Python source codes of all tested methods, as well as the MATLAB scripts used for data simulation, can be found in the Supplementary Material.
FADTTS: functional analysis of diffusion tensor tract statistics.
Zhu, Hongtu; Kong, Linglong; Li, Runze; Styner, Martin; Gerig, Guido; Lin, Weili; Gilmore, John H
2011-06-01
The aim of this paper is to present a functional analysis of a diffusion tensor tract statistics (FADTTS) pipeline for delineating the association between multiple diffusion properties along major white matter fiber bundles with a set of covariates of interest, such as age, diagnostic status and gender, and the structure of the variability of these white matter tract properties in various diffusion tensor imaging studies. The FADTTS integrates five statistical tools: (i) a multivariate varying coefficient model for allowing the varying coefficient functions in terms of arc length to characterize the varying associations between fiber bundle diffusion properties and a set of covariates, (ii) a weighted least squares estimation of the varying coefficient functions, (iii) a functional principal component analysis to delineate the structure of the variability in fiber bundle diffusion properties, (iv) a global test statistic to test hypotheses of interest, and (v) a simultaneous confidence band to quantify the uncertainty in the estimated coefficient functions. Simulated data are used to evaluate the finite sample performance of FADTTS. We apply FADTTS to investigate the development of white matter diffusivities along the splenium of the corpus callosum tract and the right internal capsule tract in a clinical study of neurodevelopment. FADTTS can be used to facilitate the understanding of normal brain development, the neural bases of neuropsychiatric disorders, and the joint effects of environmental and genetic factors on white matter fiber bundles. The advantages of FADTTS compared with the other existing approaches are that they are capable of modeling the structured inter-subject variability, testing the joint effects, and constructing their simultaneous confidence bands. However, FADTTS is not crucial for estimation and reduces to the functional analysis method for the single measure. Copyright © 2011 Elsevier Inc. All rights reserved.
Anger Expression Types and Interpersonal Problems in Nurses.
Han, Aekyung; Won, Jongsoon; Kim, Oksoo; Lee, Sang E
2015-06-01
The purpose of this study was to investigate the anger expression types in nurses and to analyze the differences between the anger expression types and interpersonal problems. The data were collected from 149 nurses working in general hospitals with 300 beds or more in Seoul or Gyeonggi province, Korea. For anger expression type, the anger expression scale from the Korean State-Trait Anger Expression Inventory was used. For interpersonal problems, the short form of the Korean Inventory of Interpersonal Problems Circumplex Scales was used. Data were analyzed using descriptive statistics, cluster analysis, multivariate analysis of variance, and Duncan's multiple comparisons test. Three anger expression types in nurses were found: low-anger expression, anger-in, and anger-in/control type. From the results of multivariate analysis of variance, there were significant differences between anger expression types and interpersonal problems (Wilks lambda F = 3.52, p < .001). Additionally, anger-in/control type was found to have the most difficulty with interpersonal problems by Duncan's post hoc test (p < .050). Based on this research, the development of an anger expression intervention program for nurses is recommended to establish the means of expressing the suppressed emotions, which would help the nurses experience less interpersonal problems. Copyright © 2015. Published by Elsevier B.V.
Kernel canonical-correlation Granger causality for multiple time series
NASA Astrophysics Data System (ADS)
Wu, Guorong; Duan, Xujun; Liao, Wei; Gao, Qing; Chen, Huafu
2011-04-01
Canonical-correlation analysis as a multivariate statistical technique has been applied to multivariate Granger causality analysis to infer information flow in complex systems. It shows unique appeal and great superiority over the traditional vector autoregressive method, due to the simplified procedure that detects causal interaction between multiple time series, and the avoidance of potential model estimation problems. However, it is limited to the linear case. Here, we extend the framework of canonical correlation to include the estimation of multivariate nonlinear Granger causality for drawing inference about directed interaction. Its feasibility and effectiveness are verified on simulated data.
Narayanan, Roshni; Nugent, Rebecca; Nugent, Kenneth
2015-10-01
Accreditation Council for Graduate Medical Education guidelines require internal medicine residents to develop skills in the interpretation of medical literature and to understand the principles of research. A necessary component is the ability to understand the statistical methods used and their results, material that is not an in-depth focus of most medical school curricula and residency programs. Given the breadth and depth of the current medical literature and an increasing emphasis on complex, sophisticated statistical analyses, the statistical foundation and education necessary for residents are uncertain. We reviewed the statistical methods and terms used in 49 articles discussed at the journal club in the Department of Internal Medicine residency program at Texas Tech University between January 1, 2013 and June 30, 2013. We collected information on the study type and on the statistical methods used for summarizing and comparing samples, determining the relations between independent variables and dependent variables, and estimating models. We then identified the typical statistics education level at which each term or method is learned. A total of 14 articles came from the Journal of the American Medical Association Internal Medicine, 11 from the New England Journal of Medicine, 6 from the Annals of Internal Medicine, 5 from the Journal of the American Medical Association, and 13 from other journals. Twenty reported randomized controlled trials. Summary statistics included mean values (39 articles), category counts (38), and medians (28). Group comparisons were based on t tests (14 articles), χ2 tests (21), and nonparametric ranking tests (10). The relations between dependent and independent variables were analyzed with simple regression (6 articles), multivariate regression (11), and logistic regression (8). Nine studies reported odds ratios with 95% confidence intervals, and seven analyzed test performance using sensitivity and specificity calculations. These papers used 128 statistical terms and context-defined concepts, including some from data analysis (56), epidemiology-biostatistics (31), modeling (24), data collection (12), and meta-analysis (5). Ten different software programs were used in these articles. Based on usual undergraduate and graduate statistics curricula, 64.3% of the concepts and methods used in these papers required at least a master's degree-level statistics education. The interpretation of the current medical literature can require an extensive background in statistical methods at an education level exceeding the material and resources provided to most medical students and residents. Given the complexity and time pressure of medical education, these deficiencies will be hard to correct, but this project can serve as a basis for developing a curriculum in study design and statistical methods needed by physicians-in-training.
Defining the ecological hydrology of Taiwan Rivers using multivariate statistical methods
NASA Astrophysics Data System (ADS)
Chang, Fi-John; Wu, Tzu-Ching; Tsai, Wen-Ping; Herricks, Edwin E.
2009-09-01
SummaryThe identification and verification of ecohydrologic flow indicators has found new support as the importance of ecological flow regimes is recognized in modern water resources management, particularly in river restoration and reservoir management. An ecohydrologic indicator system reflecting the unique characteristics of Taiwan's water resources and hydrology has been developed, the Taiwan ecohydrological indicator system (TEIS). A major challenge for the water resources community is using the TEIS to provide environmental flow rules that improve existing water resources management. This paper examines data from the extensive network of flow monitoring stations in Taiwan using TEIS statistics to define and refine environmental flow options in Taiwan. Multivariate statistical methods were used to examine TEIS statistics for 102 stations representing the geographic and land use diversity of Taiwan. The Pearson correlation coefficient showed high multicollinearity between the TEIS statistics. Watersheds were separated into upper and lower-watershed locations. An analysis of variance indicated significant differences between upstream, more natural, and downstream, more developed, locations in the same basin with hydrologic indicator redundancy in flow change and magnitude statistics. Issues of multicollinearity were examined using a Principal Component Analysis (PCA) with the first three components related to general flow and high/low flow statistics, frequency and time statistics, and quantity statistics. These principle components would explain about 85% of the total variation. A major conclusion is that managers must be aware of differences among basins, as well as differences within basins that will require careful selection of management procedures to achieve needed flow regimes.
Characterizations of linear sufficient statistics
NASA Technical Reports Server (NTRS)
Peters, B. C., Jr.; Reoner, R.; Decell, H. P., Jr.
1977-01-01
A surjective bounded linear operator T from a Banach space X to a Banach space Y must be a sufficient statistic for a dominated family of probability measures defined on the Borel sets of X. These results were applied, so that they characterize linear sufficient statistics for families of the exponential type, including as special cases the Wishart and multivariate normal distributions. The latter result was used to establish precisely which procedures for sampling from a normal population had the property that the sample mean was a sufficient statistic.
Integrated environmental monitoring and multivariate data analysis-A case study.
Eide, Ingvar; Westad, Frank; Nilssen, Ingunn; de Freitas, Felipe Sales; Dos Santos, Natalia Gomes; Dos Santos, Francisco; Cabral, Marcelo Montenegro; Bicego, Marcia Caruso; Figueira, Rubens; Johnsen, Ståle
2017-03-01
The present article describes integration of environmental monitoring and discharge data and interpretation using multivariate statistics, principal component analysis (PCA), and partial least squares (PLS) regression. The monitoring was carried out at the Peregrino oil field off the coast of Brazil. One sensor platform and 3 sediment traps were placed on the seabed. The sensors measured current speed and direction, turbidity, temperature, and conductivity. The sediment trap samples were used to determine suspended particulate matter that was characterized with respect to a number of chemical parameters (26 alkanes, 16 PAHs, N, C, calcium carbonate, and Ba). Data on discharges of drill cuttings and water-based drilling fluid were provided on a daily basis. The monitoring was carried out during 7 campaigns from June 2010 to October 2012, each lasting 2 to 3 months due to the capacity of the sediment traps. The data from the campaigns were preprocessed, combined, and interpreted using multivariate statistics. No systematic difference could be observed between campaigns or traps despite the fact that the first campaign was carried out before drilling, and 1 of 3 sediment traps was located in an area not expected to be influenced by the discharges. There was a strong covariation between suspended particulate matter and total N and organic C suggesting that the majority of the sediment samples had a natural and biogenic origin. Furthermore, the multivariate regression showed no correlation between discharges of drill cuttings and sediment trap or turbidity data taking current speed and direction into consideration. Because of this lack of correlation with discharges from the drilling location, a more detailed evaluation of chemical indicators providing information about origin was carried out in addition to numerical modeling of dispersion and deposition. The chemical indicators and the modeling of dispersion and deposition support the conclusions from the multivariate statistics. Integr Environ Assess Manag 2017;13:387-395. © 2016 SETAC. © 2016 SETAC.
Characterizing multivariate decoding models based on correlated EEG spectral features
McFarland, Dennis J.
2013-01-01
Objective Multivariate decoding methods are popular techniques for analysis of neurophysiological data. The present study explored potential interpretative problems with these techniques when predictors are correlated. Methods Data from sensorimotor rhythm-based cursor control experiments was analyzed offline with linear univariate and multivariate models. Features were derived from autoregressive (AR) spectral analysis of varying model order which produced predictors that varied in their degree of correlation (i.e., multicollinearity). Results The use of multivariate regression models resulted in much better prediction of target position as compared to univariate regression models. However, with lower order AR features interpretation of the spectral patterns of the weights was difficult. This is likely to be due to the high degree of multicollinearity present with lower order AR features. Conclusions Care should be exercised when interpreting the pattern of weights of multivariate models with correlated predictors. Comparison with univariate statistics is advisable. Significance While multivariate decoding algorithms are very useful for prediction their utility for interpretation may be limited when predictors are correlated. PMID:23466267
Multivariate Methods for Meta-Analysis of Genetic Association Studies.
Dimou, Niki L; Pantavou, Katerina G; Braliou, Georgia G; Bagos, Pantelis G
2018-01-01
Multivariate meta-analysis of genetic association studies and genome-wide association studies has received a remarkable attention as it improves the precision of the analysis. Here, we review, summarize and present in a unified framework methods for multivariate meta-analysis of genetic association studies and genome-wide association studies. Starting with the statistical methods used for robust analysis and genetic model selection, we present in brief univariate methods for meta-analysis and we then scrutinize multivariate methodologies. Multivariate models of meta-analysis for a single gene-disease association studies, including models for haplotype association studies, multiple linked polymorphisms and multiple outcomes are discussed. The popular Mendelian randomization approach and special cases of meta-analysis addressing issues such as the assumption of the mode of inheritance, deviation from Hardy-Weinberg Equilibrium and gene-environment interactions are also presented. All available methods are enriched with practical applications and methodologies that could be developed in the future are discussed. Links for all available software implementing multivariate meta-analysis methods are also provided.
Multivariate pattern dependence
Saxe, Rebecca
2017-01-01
When we perform a cognitive task, multiple brain regions are engaged. Understanding how these regions interact is a fundamental step to uncover the neural bases of behavior. Most research on the interactions between brain regions has focused on the univariate responses in the regions. However, fine grained patterns of response encode important information, as shown by multivariate pattern analysis. In the present article, we introduce and apply multivariate pattern dependence (MVPD): a technique to study the statistical dependence between brain regions in humans in terms of the multivariate relations between their patterns of responses. MVPD characterizes the responses in each brain region as trajectories in region-specific multidimensional spaces, and models the multivariate relationship between these trajectories. We applied MVPD to the posterior superior temporal sulcus (pSTS) and to the fusiform face area (FFA), using a searchlight approach to reveal interactions between these seed regions and the rest of the brain. Across two different experiments, MVPD identified significant statistical dependence not detected by standard functional connectivity. Additionally, MVPD outperformed univariate connectivity in its ability to explain independent variance in the responses of individual voxels. In the end, MVPD uncovered different connectivity profiles associated with different representational subspaces of FFA: the first principal component of FFA shows differential connectivity with occipital and parietal regions implicated in the processing of low-level properties of faces, while the second and third components show differential connectivity with anterior temporal regions implicated in the processing of invariant representations of face identity. PMID:29155809
Statistical Knowledge for Teaching: Exploring it in the Classroom
ERIC Educational Resources Information Center
Burgess, Tim
2009-01-01
This paper first reports on the methodology of a study of teacher knowledge for statistics, conducted in a classroom at the primary school level. The methodology included videotaping of a sequence of lessons that involved students in investigating multivariate data sets, followed up by audiotaped interviews with each teacher. These stimulated…
2003-07-01
4, Gnanadesikan , 1977). An entity whose measured features fall into one of the regions is classified accordingly. For the approaches we discuss here... Gnanadesikan , R. 1977. Methods for Statistical Data Analysis of Multivariate Observations. John Wiley & Sons, New York. Hassig, N. L., O’Brien, R. F
Evaluation of statistical protocols for quality control of ecosystem carbon dioxide fluxes
Jorge F. Perez-Quezada; Nicanor Z. Saliendra; William E. Emmerich; Emilio A. Laca
2007-01-01
The process of quality control of micrometeorological and carbon dioxide (CO2) flux data can be subjective and may lack repeatability, which would undermine the results of many studies. Multivariate statistical methods and time series analysis were used together and independently to detect and replace outliers in CO2 flux...
Texture as a basis for acoustic classification of substrate in the nearshore region
NASA Astrophysics Data System (ADS)
Dennison, A.; Wattrus, N. J.
2016-12-01
Segmentation and classification of substrate type from two locations in Lake Superior, are predicted using multivariate statistical processing of textural measures derived from shallow-water, high-resolution multibeam bathymetric data. During a multibeam sonar survey, both bathymetric and backscatter data are collected. It is well documented that the statistical characteristic of a sonar backscatter mosaic is dependent on substrate type. While classifying the bottom-type on the basis on backscatter alone can accurately predict and map bottom-type, it lacks the ability to resolve and capture fine textural details, an important factor in many habitat mapping studies. Statistical processing can capture the pertinent details about the bottom-type that are rich in textural information. Further multivariate statistical processing can then isolate characteristic features, and provide the basis for an accurate classification scheme. Preliminary results from an analysis of bathymetric data and ground-truth samples collected from the Amnicon River, Superior, Wisconsin, and the Lester River, Duluth, Minnesota, demonstrate the ability to process and develop a novel classification scheme of the bottom type in two geomorphologically distinct areas.
Raman spectroscopy-based screening of IgM positive and negative sera for dengue virus infection
NASA Astrophysics Data System (ADS)
Bilal, M.; Saleem, M.; Bilal, Maria; Ijaz, T.; Khan, Saranjam; Ullah, Rahat; Raza, A.; Khurram, M.; Akram, W.; Ahmed, M.
2016-11-01
A statistical method based on Raman spectroscopy for the screening of immunoglobulin M (IgM) in dengue virus (DENV) infected human sera is presented. In total, 108 sera samples were collected and their antibody indexes (AI) for IgM were determined through enzyme-linked immunosorbent assay (ELISA). Raman spectra of these samples were acquired using a 785 nm wavelength excitation laser. Seventy-eight Raman spectra were selected randomly and unbiasedly for the development of a statistical model using partial least square (PLS) regression, while the remaining 30 were used for testing the developed model. An R-square (r 2) value of 0.929 was determined using the leave-one-sample-out (LOO) cross validation method, showing the validity of this model. It considers all molecular changes related to IgM concentration, and describes their role in infection. A graphical user interface (GUI) platform has been developed to run a developed multivariate model for the prediction of AI of IgM for blindly tested samples, and an excellent agreement has been found between model predicted and clinically determined values. Parameters like sensitivity, specificity, accuracy, and area under receiver operator characteristic (ROC) curve for these tested samples are also reported to visualize model performance.
Measuring multivariate association and beyond
Josse, Julie; Holmes, Susan
2017-01-01
Simple correlation coefficients between two variables have been generalized to measure association between two matrices in many ways. Coefficients such as the RV coefficient, the distance covariance (dCov) coefficient and kernel based coefficients are being used by different research communities. Scientists use these coefficients to test whether two random vectors are linked. Once it has been ascertained that there is such association through testing, then a next step, often ignored, is to explore and uncover the association’s underlying patterns. This article provides a survey of various measures of dependence between random vectors and tests of independence and emphasizes the connections and differences between the various approaches. After providing definitions of the coefficients and associated tests, we present the recent improvements that enhance their statistical properties and ease of interpretation. We summarize multi-table approaches and provide scenarii where the indices can provide useful summaries of heterogeneous multi-block data. We illustrate these different strategies on several examples of real data and suggest directions for future research. PMID:29081877
Exploratory Multivariate Analysis. A Graphical Approach.
1981-01-01
Gnanadesikan , 1977) but we feel that these should be used with great caution unless one really has good reason to believe that the data came from such a...are referred to Gnanadesikan (1977). The present author hopes that the convenience of a single summary or significance level will not deter his readers...fit of a harmonic model to meteorological data. (In preparation). Gnanadesikan , R. (1977). Methods for Statistical Data Analysis of Multivariate
The intervals method: a new approach to analyse finite element outputs using multivariate statistics
De Esteban-Trivigno, Soledad; Püschel, Thomas A.; Fortuny, Josep
2017-01-01
Background In this paper, we propose a new method, named the intervals’ method, to analyse data from finite element models in a comparative multivariate framework. As a case study, several armadillo mandibles are analysed, showing that the proposed method is useful to distinguish and characterise biomechanical differences related to diet/ecomorphology. Methods The intervals’ method consists of generating a set of variables, each one defined by an interval of stress values. Each variable is expressed as a percentage of the area of the mandible occupied by those stress values. Afterwards these newly generated variables can be analysed using multivariate methods. Results Applying this novel method to the biological case study of whether armadillo mandibles differ according to dietary groups, we show that the intervals’ method is a powerful tool to characterize biomechanical performance and how this relates to different diets. This allows us to positively discriminate between specialist and generalist species. Discussion We show that the proposed approach is a useful methodology not affected by the characteristics of the finite element mesh. Additionally, the positive discriminating results obtained when analysing a difficult case study suggest that the proposed method could be a very useful tool for comparative studies in finite element analysis using multivariate statistical approaches. PMID:29043107
Nonlinear multivariate and time series analysis by neural network methods
NASA Astrophysics Data System (ADS)
Hsieh, William W.
2004-03-01
Methods in multivariate statistical analysis are essential for working with large amounts of geophysical data, data from observational arrays, from satellites, or from numerical model output. In classical multivariate statistical analysis, there is a hierarchy of methods, starting with linear regression at the base, followed by principal component analysis (PCA) and finally canonical correlation analysis (CCA). A multivariate time series method, the singular spectrum analysis (SSA), has been a fruitful extension of the PCA technique. The common drawback of these classical methods is that only linear structures can be correctly extracted from the data. Since the late 1980s, neural network methods have become popular for performing nonlinear regression and classification. More recently, neural network methods have been extended to perform nonlinear PCA (NLPCA), nonlinear CCA (NLCCA), and nonlinear SSA (NLSSA). This paper presents a unified view of the NLPCA, NLCCA, and NLSSA techniques and their applications to various data sets of the atmosphere and the ocean (especially for the El Niño-Southern Oscillation and the stratospheric quasi-biennial oscillation). These data sets reveal that the linear methods are often too simplistic to describe real-world systems, with a tendency to scatter a single oscillatory phenomenon into numerous unphysical modes or higher harmonics, which can be largely alleviated in the new nonlinear paradigm.
A Multivariate Multilevel Approach to the Modeling of Accuracy and Speed of Test Takers
ERIC Educational Resources Information Center
Klein Entink, R. H.; Fox, J. P.; van der Linden, W. J.
2009-01-01
Response times on test items are easily collected in modern computerized testing. When collecting both (binary) responses and (continuous) response times on test items, it is possible to measure the accuracy and speed of test takers. To study the relationships between these two constructs, the model is extended with a multivariate multilevel…
Proton radius from electron scattering data
NASA Astrophysics Data System (ADS)
Higinbotham, Douglas W.; Kabir, Al Amin; Lin, Vincent; Meekins, David; Norum, Blaine; Sawatzky, Brad
2016-05-01
Background: The proton charge radius extracted from recent muonic hydrogen Lamb shift measurements is significantly smaller than that extracted from atomic hydrogen and electron scattering measurements. The discrepancy has become known as the proton radius puzzle. Purpose: In an attempt to understand the discrepancy, we review high-precision electron scattering results from Mainz, Jefferson Lab, Saskatoon, and Stanford. Methods: We make use of stepwise regression techniques using the F test as well as the Akaike information criterion to systematically determine the predictive variables to use for a given set and range of electron scattering data as well as to provide multivariate error estimates. Results: Starting with the precision, low four-momentum transfer (Q2) data from Mainz (1980) and Saskatoon (1974), we find that a stepwise regression of the Maclaurin series using the F test as well as the Akaike information criterion justify using a linear extrapolation which yields a value for the proton radius that is consistent with the result obtained from muonic hydrogen measurements. Applying the same Maclaurin series and statistical criteria to the 2014 Rosenbluth results on GE from Mainz, we again find that the stepwise regression tends to favor a radius consistent with the muonic hydrogen radius but produces results that are extremely sensitive to the range of data included in the fit. Making use of the high-Q2 data on GE to select functions which extrapolate to high Q2, we find that a Padé (N =M =1 ) statistical model works remarkably well, as does a dipole function with a 0.84 fm radius, GE(Q2) =(1+Q2/0.66 GeV2) -2 . Conclusions: Rigorous applications of stepwise regression techniques and multivariate error estimates result in the extraction of a proton charge radius that is consistent with the muonic hydrogen result of 0.84 fm; either from linear extrapolation of the extremely-low-Q2 data or by use of the Padé approximant for extrapolation using a larger range of data. Thus, based on a purely statistical analysis of electron scattering data, we conclude that the electron scattering results and the muonic hydrogen results are consistent. It is the atomic hydrogen results that are the outliers.
Muto, Satoru; Sugiura, Syo-Ichiro; Nakajima, Akiko; Horiuchi, Akira; Inoue, Masahiro; Saito, Keisuke; Isotani, Shuji; Yamaguchi, Raizo; Ide, Hisamitsu; Horie, Shigeo
2014-10-01
We aimed to identify patients with a chief complaint of hematuria who could safely avoid unnecessary radiation and instrumentation in the diagnosis of bladder cancer (BC), using automated urine flow cytometry to detect isomorphic red blood cells (RBCs) in urine. We acquired urine samples from 134 patients over the age of 35 years with a chief complaint of hematuria and a positive urine occult blood test or microhematuria. The data were analyzed using the UF-1000i (®) (Sysmex Co., Ltd., Kobe, Japan) automated urine flow cytometer to determine RBC morphology, which was classified as isomorphic or dysmorphic. The patients were divided into two groups (BC versus non-BC) for statistical analysis. Multivariate logistic regression analysis was used to determine the predictive value of flow cytometry versus urine cytology, the bladder tumor antigen test, occult blood in urine test, and microhematuria test. BC was confirmed in 26 of 134 patients (19.4 %). The area under the curve for RBC count using the automated urine flow cytometer was 0.94, representing the highest reference value obtained in this study. Isomorphic RBCs were detected in all patients in the BC group. On multivariate logistic regression analysis, only isomorphic RBC morphology was significantly predictive for BC (p < 0.001). Analytical parameters such as sensitivity, specificity, positive predictive value, and negative predictive value of isomorphic RBCs in urine were 100.0, 91.7, 74.3, and 100.0 %, respectively. Detection of urinary isomorphic RBCs using automated urine flow cytometry is a reliable method in the diagnosis of BC with hematuria.
Libiger, Ondrej; Schork, Nicholas J.
2015-01-01
It is now feasible to examine the composition and diversity of microbial communities (i.e., “microbiomes”) that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology “Metastats” across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency distributions obtained on a small to moderate number of samples. PMID:26734061
Salud, Margaret C; Marshak, Helen Hopp; Natto, Zuhair S; Montgomery, Susanne
2014-01-01
While HIV rates are low for Asian/Pacific Islanders (APIs), they have been increasing, especially for API women in the USA. We conducted a cross-sectional study with 299 young API women (18-24 years old) in the Inland Empire region of Southern California to better understand their intention for HIV testing and their perceptions about HIV/AIDS. Data analyses included descriptive statistics, bivariate exploration for model building and multivariate analyses to determine variables associated with HIV-testing intentions. Results suggest that more lifetime sexual partners, greater perceived gender susceptibility, higher HIV/AIDS knowledge, sexually active, more positive attitudes about HIV testing and higher self-perceptions/experiences related to risk contribute to stronger intentions for HIV testing in young API women. Findings from this study will contribute to the limited literature on HIV/AIDS in API women and provide information that can be used for developing and implementing culturally appropriate programs that encourage HIV prevention and testing in this population.
Salud, Margaret C.; Marshak, Helen Hopp; Natto, Zuhair S.; Montgomery, Susanne
2015-01-01
While HIV rates are low for Asian/Pacific Islanders (APIs), they have been increasing, especially for API women in the USA. We conducted a cross-sectional study with 299 young API women (18–24 years old) in the Inland Empire region of Southern California to better understand their intention for HIV testing and their perceptions about HIV/AIDS. Data analyses included descriptive statistics, bivariate exploration for model building and multivariate analyses to determine variables associated with HIV-testing intentions. Results suggest that more lifetime sexual partners, greater perceived gender susceptibility, higher HIV/AIDS knowledge, sexually active, more positive attitudes about HIV testing and higher self-perceptions/experiences related to risk contribute to stronger intentions for HIV testing in young API women. Findings from this study will contribute to the limited literature on HIV/AIDS in API women and provide information that can be used for developing and implementing culturally appropriate programs that encourage HIV prevention and testing in this population. PMID:24111859
Thiagarajah, Shankar; Wilkinson, J. Mark; Panoutsopoulou, Kalliope; Day‐Williams, Aaron G.; Cootes, Timothy F.; Wallis, Gillian A.; Loughlin, John; Arden, Nigel; Birrell, Fraser; Carr, Andrew; Chapman, Kay; Deloukas, Panos; Doherty, Michael; McCaskie, Andrew; Ollier, William E. R.; Rai, Ashok; Ralston, Stuart H.; Spector, Timothy D.; Valdes, Ana M.; Wallis, Gillian A.; Mark Wilkinson, J.; Zeggini, Eleftheria
2015-01-01
Objective To test whether previously reported hip morphology or osteoarthritis (OA) susceptibility loci are associated with proximal femur shape as represented by statistical shape model (SSM) modes and as univariate or multivariate quantitative traits. Methods We used pelvic radiographs and genotype data from 929 subjects with unilateral hip OA who had been recruited previously for the Arthritis Research UK Osteoarthritis Genetics Consortium genome‐wide association study. We built 3 SSMs capturing the shape variation of the OA‐unaffected proximal femur in the entire mixed‐sex cohort and for male/female‐stratified cohorts. We selected 41 candidate single‐nucleotide polymorphisms (SNPs) previously reported as being associated with hip morphology (for replication analysis) or OA (for discovery analysis) and for which genotype data were available. We performed 2 types of analysis for genotype–phenotype associations between these SNPs and the modes of the SSMs: 1) a univariate analysis using individual SSM modes and 2) a multivariate analysis using combinations of SSM modes. Results The univariate analysis identified association between rs4836732 (within the ASTN2 gene) and mode 5 of the female SSM (P = 0.0016) and between rs6976 (within the GLT8D1 gene) and mode 7 of the mixed‐sex SSM (P = 0.0003). The multivariate analysis identified association between rs5009270 (near the IFRD1 gene) and a combination of modes 3, 4, and 9 of the mixed‐sex SSM (P = 0.0004). Evidence of associations remained significant following adjustment for multiple testing. All 3 SNPs had previously been associated with hip OA. Conclusion These de novo findings suggest that rs4836732, rs6976, and rs5009270 may contribute to hip OA susceptibility by altering proximal femur shape. PMID:25939412
Ríos, A; López-Navas, A; Ayala-García, M A; Sebastián, M; Febrero, B; Ramírez, E J; Muñoz, G; Palacios, G; Rodríguez, J S; Martínez, M A; Nieto, A; Martínez-Alarcón, L; Ramis, G; Ramírez, P; Parrilla, P
2012-01-01
Healthcare assistants are an important group of workers who can influence public opinion. Their attitudes toward organ donation may influence public awareness of healthcare matters; negative attitudes toward donation and transplantation could have a negative impact on public attitudes. Our objective was analyze the attitudes of healthcare assistants, in Spanish and Mexican healthcare centers toward organ donation and determine factors affecting them using a multivariate analysis. As part of the "International Collaborative Donor Project," 32 primary care centers and 4 hospitals were selected in Spain and 5 hospitals in Mexico. A randomized sample of healthcare assistants was stratified according to healthcare services. Attitudes were evaluated using a validated questionnaire of the psychosocial aspects of donation, which was self-completed anonymously by the respondent. Statistical analysis used the chi-square test, Student t test, and logistic regression analysis. Of 532 respondents, 66% in favored donation and 34% were against it or undecided. Upon multivariate analysis, the following variables had the most weight: 1) country of origin (Mexicans were more in favor than Spanish; odds ratio [OR]) = 1.964; P = .014); 2) a partner with a favorable attitude (OR = 2.597; P = .013); 3) not being concerned about possible bodily mutilation after donation (OR = 2.631; P = .006); 4) preference for options apart from burial for handling the body after death (OR = 4.694; P < .001) and 5) accepting an autopsy if one was needed (OR = 3.584; P < .001). The attitudes of healthcare assistants toward organ donation varied considerably according to the respondent's country of origin. The psycho-social profile of a person with a positive attitude to donation was similar to that described within the general public. Copyright © 2012 Elsevier Inc. All rights reserved.
4-protein signature predicting tamoxifen treatment outcome in recurrent breast cancer.
De Marchi, Tommaso; Liu, Ning Qing; Stingl, Cristoph; Timmermans, Mieke A; Smid, Marcel; Look, Maxime P; Tjoa, Mila; Braakman, Rene B H; Opdam, Mark; Linn, Sabine C; Sweep, Fred C G J; Span, Paul N; Kliffen, Mike; Luider, Theo M; Foekens, John A; Martens, John W M; Umar, Arzu
2016-01-01
Estrogen receptor (ER) positive tumors represent the majority of breast malignancies, and are effectively treated with hormonal therapies, such as tamoxifen. However, in the recurrent disease resistance to tamoxifen therapy is common and a major cause of death. In recent years, in-depth proteome analyses have enabled identification of clinically useful biomarkers, particularly, when heterogeneity in complex tumor tissue was reduced using laser capture microdissection (LCM). In the current study, we performed high resolution proteomic analysis on two cohorts of ER positive breast tumors derived from patients who either manifested good or poor outcome to tamoxifen treatment upon recurrence. A total of 112 fresh frozen tumors were collected from multiple medical centers and divided into two sets: an in-house training and a multi-center test set. Epithelial tumor cells were enriched with LCM and analyzed by nano-LC Orbitrap mass spectrometry (MS), which yielded >3000 and >4000 quantified proteins in the training and test sets, respectively. Raw data are available via ProteomeXchange with identifiers PXD000484 and PXD000485. Statistical analysis showed differential abundance of 99 proteins, of which a subset of 4 proteins was selected through a multivariate step-down to develop a predictor for tamoxifen treatment outcome. The 4-protein signature significantly predicted poor outcome patients in the test set, independent of predictive histopathological characteristics (hazard ratio [HR] = 2.17; 95% confidence interval [CI] = 1.15 to 4.17; multivariate Cox regression p value = 0.017). Immunohistochemical (IHC) staining of PDCD4, one of the signature proteins, on an independent set of formalin-fixed paraffin-embedded tumor tissues provided and independent technical validation (HR = 0.72; 95% CI = 0.57 to 0.92; multivariate Cox regression p value = 0.009). We hereby report the first validated protein predictor for tamoxifen treatment outcome in recurrent ER-positive breast cancer. IHC further showed that PDCD4 is an independent marker. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Romain, Ahmed Jerôme; Bernard, Paquito; Hokayem, Marie; Gernigon, Christophe; Avignon, Antoine
2016-03-01
This study aimed to test three factorial structures conceptualizing the processes of change (POC) from the transtheoretical model and to examine the relationships between the POC and stages of change (SOC) among overweight and obese adults. Cross-sectional study. This study was conducted at the University Hospital of Montpellier, France. A sample of 289 overweight or obese participants (199 women) was enrolled in the study. Participants completed the POC and SOC questionnaires during a 5-day hospitalization for weight management. Structural equation modeling was used to compare the different factorial structures. The unweighted least-squares method was used to identify the best-fit indices for the five fully correlated model (goodness-of-fit statistic = .96; adjusted goodness-of-fit statistic = .95; standardized root mean residual = .062; normed-fit index = .95; parsimonious normed-fit index = .83; parsimonious goodness-of-fit statistic = .78). The multivariate analysis of variance was significant (p < .001). A post hoc test showed that individuals in advanced SOC used more of both experiential and behavioral POC than those in preaction stages, with effect sizes ranging from .06 to .29. This study supports the validity of the factorial structure of POC concerning physical activity and confirms the assumption that, in this context, people with excess weight use both experiential and behavioral processes. These preliminary results should be confirmed in a longitudinal study. © The Author(s) 2016.
Multivariate Statistical Modelling of Drought and Heat Wave Events
NASA Astrophysics Data System (ADS)
Manning, Colin; Widmann, Martin; Vrac, Mathieu; Maraun, Douglas; Bevaqua, Emanuele
2016-04-01
Multivariate Statistical Modelling of Drought and Heat Wave Events C. Manning1,2, M. Widmann1, M. Vrac2, D. Maraun3, E. Bevaqua2,3 1. School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham, UK 2. Laboratoire des Sciences du Climat et de l'Environnement, (LSCE-IPSL), Centre d'Etudes de Saclay, Gif-sur-Yvette, France 3. Wegener Center for Climate and Global Change, University of Graz, Brandhofgasse 5, 8010 Graz, Austria Compound extreme events are a combination of two or more contributing events which in themselves may not be extreme but through their joint occurrence produce an extreme impact. Compound events are noted in the latest IPCC report as an important type of extreme event that have been given little attention so far. As part of the CE:LLO project (Compound Events: muLtivariate statisticaL mOdelling) we are developing a multivariate statistical model to gain an understanding of the dependence structure of certain compound events. One focus of this project is on the interaction between drought and heat wave events. Soil moisture has both a local and non-local effect on the occurrence of heat waves where it strongly controls the latent heat flux affecting the transfer of sensible heat to the atmosphere. These processes can create a feedback whereby a heat wave maybe amplified or suppressed by the soil moisture preconditioning, and vice versa, the heat wave may in turn have an effect on soil conditions. An aim of this project is to capture this dependence in order to correctly describe the joint probabilities of these conditions and the resulting probability of their compound impact. We will show an application of Pair Copula Constructions (PCCs) to study the aforementioned compound event. PCCs allow in theory for the formulation of multivariate dependence structures in any dimension where the PCC is a decomposition of a multivariate distribution into a product of bivariate components modelled using copulas. A copula is a multivariate distribution function which allows one to model the dependence structure of given variables separately from the marginal behaviour. We firstly look at the structure of soil moisture drought over the entire of France using the SAFRAN dataset between 1959 and 2009. Soil moisture is represented using the Standardised Precipitation Evapotranspiration Index (SPEI). Drought characteristics are computed at grid point scale where drought conditions are identified as those with an SPEI value below -1.0. We model the multivariate dependence structure of drought events defined by certain characteristics and compute return levels of these events. We initially find that drought characteristics such as duration, mean SPEI and the maximum contiguous area to a grid point all have positive correlations, though the degree to which they are correlated can vary considerably spatially. A spatial representation of return levels then may provide insight into the areas most prone to drought conditions. As a next step, we analyse the dependence structure between soil moisture conditions preceding the onset of a heat wave and the heat wave itself.
Huang, Jiongli; Tang, Tiantong; Hu, Guocheng; Zheng, Jing; Wang, Yuyu; Wang, Qiang; Su, Jing; Zou, Yunfeng; Peng, Xiaowu
2013-01-01
Background Evidence for a possible causal relationship between exposure to electromagnetic fields (EMF) emitted by high voltage transmission (HVT) lines and neurobehavioral dysfunction in children is insufficient. The present study aims to investigate the association between EMF exposure from HVT lines and neurobehavioral function in children. Methods Two primary schools were chosen based on monitoring data of ambient electromagnetic radiation. A cross-sectional study with 437 children (9 to 13 years old) was conducted. Exposure to EMF from HVT lines was monitored at each school. Information was collected on possible confounders and relevant exposure predictors using standardized questionnaires. Neurobehavioral function in children was evaluated using established computerized neurobehavioral tests. Data was analyzed using multivariable regression models adjusted for relevant confounders. Results After controlling for potential confounding factors, multivariable regression revealed that children attending a school near 500 kV HVT lines had poorer performance on the computerized neurobehavioral tests for Visual Retention and Pursuit Aiming compared to children attending a school that was not in close proximity to HVT lines. Conclusions The results suggest long-term low-level exposure to EMF from HVT lines might have a negative impact on neurobehavioral function in children. However, because of differences in results only for two of four tests achieved statistical significance and potential limitations, more studies are needed to explore the effects of exposure to extremely low frequency EMF on neurobehavioral function and development in children. PMID:23843999
Gavira Pavón, Alberto; Walker Chao, Carolina; Rodríguez Rodríguez, Nicomedes; Gavira Iglesias, Francisco Javier
2014-02-01
Estimating prevalence and risk factors of urinary incontinence (UI) in women with low back pain (LBP) and describing their social and demographic and clinical features. Cross-sectional study. Two primary care health centres in south of Cordoba and a private center in Madrid. 364 women of 20-65 years of age (of 466 who were contacted, 33 of them were excluded and 69 refused to participate) who had low back pain located between the twelfth rib and the gluteal fold. Medical questionnaire. Questionnaires (Oswestry Disability Index and UI questionnaires [International Consultation on Incontinence Questionnaire SF and Incontinence Impact Questionnaire-7]), functional test (ASLR Test) and comorbidity of interest for the UI. Descriptive and multivariate statistical analysis. UI was detected in 155 women (43%, 95% CI: 37%-48%), the majority of stress (83%) and a minimal impact (60%). Front of the continents, incontinent women showed significant differences in age, body mass index, marital status, level of education, coexistence, consumption of drugs/day, number of vaginal and total deliveries, abdominal and pelvic surgery, asthma, constipation, hypertension, diabetes, percentage of disability and functional ASLR test. In multivariate analysis, the variables influencing the probability of being incontinent were asthma, hypertension, constipation, total parity, BMI and the percentage of disability. Prevalence of UI is higher than in women without low back pain. Asthma, constipation and parity are the most influential factors in the occurrence of UI. Copyright © 2013 Elsevier España, S.L. All rights reserved.
NASA Astrophysics Data System (ADS)
Lowe, David J.; Pearce, Nicholas J. G.; Jorgensen, Murray A.; Kuehn, Stephen C.; Tryon, Christian A.; Hayward, Chris L.
2017-11-01
We define tephras and cryptotephras and their components (mainly ash-sized particles of glass ± crystals in distal deposits) and summarize the basis of tephrochronology as a chronostratigraphic correlational and dating tool for palaeoenvironmental, geological, and archaeological research. We then document and appraise recent advances in analytical methods used to determine the major, minor, and trace elements of individual glass shards from tephra or cryptotephra deposits to aid their correlation and application. Protocols developed recently for the electron probe microanalysis of major elements in individual glass shards help to improve data quality and standardize reporting procedures. A narrow electron beam (diameter ∼3-5 μm) can now be used to analyze smaller glass shards than previously attainable. Reliable analyses of 'microshards' (defined here as glass shards <32 μm in diameter) using narrow beams are useful for fine-grained samples from distal or ultra-distal geographic locations, and for vesicular or microlite-rich glass shards or small melt inclusions. Caveats apply, however, in the microprobe analysis of very small microshards (≤∼5 μm in diameter), where particle geometry becomes important, and of microlite-rich glass shards where the potential problem of secondary fluorescence across phase boundaries needs to be recognised. Trace element analyses of individual glass shards using laser ablation inductively coupled plasma-mass spectrometry (LA-ICP-MS), with crater diameters of 20 μm and 10 μm, are now effectively routine, giving detection limits well below 1 ppm. Smaller ablation craters (<10 μm) can be subject to significant element fractionation during analysis, but the systematic relationship of such fractionation with glass composition suggests that analyses for some elements at these resolutions may be quantifiable. In undertaking analyses, either by microprobe or LA-ICP-MS, reference material data acquired using the same procedure, and preferably from the same analytical session, should be presented alongside new analytical data. In part 2 of the review, we describe, critically assess, and recommend ways in which tephras or cryptotephras can be correlated (in conjunction with other information) using numerical or statistical analyses of compositional data. Statistical methods provide a less subjective means of dealing with analytical data pertaining to tephra components (usually glass or crystals/phenocrysts) than heuristic alternatives. They enable a better understanding of relationships among the data from multiple viewpoints to be developed and help quantify the degree of uncertainty in establishing correlations. In common with other scientific hypothesis testing, it is easier to infer using such analysis that two or more tephras are different rather than the same. Adding stratigraphic, chronological, spatial, or palaeoenvironmental data (i.e. multiple criteria) is usually necessary and allows for more robust correlations to be made. A two-stage approach is useful, the first focussed on differences in the mean composition of samples, or their range, which can be visualised graphically via scatterplot matrices or bivariate plots coupled with the use of statistical tools such as distance measures, similarity coefficients, hierarchical cluster analysis (informed by distance measures or similarity or cophenetic coefficients), and principal components analysis (PCA). Some statistical methods (cluster analysis, discriminant analysis) are referred to as 'machine learning' in the computing literature. The second stage examines sample variance and the degree of compositional similarity so that sample equivalence or otherwise can be established on a statistical basis. This stage may involve discriminant function analysis (DFA), support vector machines (SVMs), canonical variates analysis (CVA), and ANOVA or MANOVA (or its two-sample special case, the Hotelling two-sample T2 test). Randomization tests can be used where distributional assumptions such as multivariate normality underlying parametric tests are doubtful. Compositional data may be transformed and scaled before being subjected to multivariate statistical procedures including calculation of distance matrices, hierarchical cluster analysis, and PCA. Such transformations may make the assumption of multivariate normality more appropriate. A sequential procedure using Mahalanobis distance and the Hotelling two-sample T2 test is illustrated using glass major element data from trachytic to phonolitic Kenyan tephras. All these methods require a broad range of high-quality compositional data which can be used to compare 'unknowns' with reference (training) sets that are sufficiently complete to account for all possible correlatives, including tephras with heterogeneous glasses that contain multiple compositional groups. Currently, incomplete databases are tending to limit correlation efficacy. The development of an open, online global database to facilitate progress towards integrated, high-quality tephrostratigraphic frameworks for different regions is encouraged.
Association factor analysis between osteoporosis with cerebral artery disease: The STROBE study.
Jin, Eun-Sun; Jeong, Je Hoon; Lee, Bora; Im, Soo Bin
2017-03-01
The purpose of this study was to determine the clinical association factors between osteoporosis and cerebral artery disease in Korean population. Two hundred nineteen postmenopausal women and men undergoing cerebral computed tomography angiography were enrolled in this study to evaluate the cerebral artery disease by cross-sectional study. Cerebral artery disease was diagnosed if there was narrowing of 50% higher diameter in one or more cerebral vessel artery or presence of vascular calcification. History of osteoporotic fracture was assessed using medical record, and radiographic data such as simple radiography, MRI, and bone scan. Bone mineral density was checked by dual-energy x-ray absorptiometry. We reviewed clinical characteristics in all patients and also performed subgroup analysis for total or extracranial/ intracranial cerebral artery disease group retrospectively. We performed statistical analysis by means of chi-square test or Fisher's exact test for categorical variables and Student's t-test or Wilcoxon's rank sum test for continuous variables. We also used univariate and multivariate logistic regression analyses were conducted to assess the factors associated with the prevalence of cerebral artery disease. A two-tailed p-value of less than 0.05 was considered as statistically significant. All statistical analyses were performed using R (version 3.1.3; The R Foundation for Statistical Computing, Vienna, Austria) and SPSS (version 14.0; SPSS, Inc, Chicago, Ill, USA). Of the 219 patients, 142 had cerebral artery disease. All vertebral fracture was observed in 29 (13.24%) patients. There was significant difference in hip fracture according to the presence or absence of cerebral artery disease. In logistic regression analysis, osteoporotic hip fracture was significantly associated with extracranial cerebral artery disease after adjusting for multiple risk factors. Females with osteoporotic hip fracture were associated with total calcified cerebral artery disease. Some clinical factors such as age, hypertension, and osteoporotic hip fracture, smoking history and anti-osteoporosis drug use were associated with cerebral artery disease.
Predicting clinical diagnosis in Huntington's disease: An imaging polymarker
Daws, Richard E.; Soreq, Eyal; Johnson, Eileanoir B.; Scahill, Rachael I.; Tabrizi, Sarah J.; Barker, Roger A.; Hampshire, Adam
2018-01-01
Objective Huntington's disease (HD) gene carriers can be identified before clinical diagnosis; however, statistical models for predicting when overt motor symptoms will manifest are too imprecise to be useful at the level of the individual. Perfecting this prediction is integral to the search for disease modifying therapies. This study aimed to identify an imaging marker capable of reliably predicting real‐life clinical diagnosis in HD. Method A multivariate machine learning approach was applied to resting‐state and structural magnetic resonance imaging scans from 19 premanifest HD gene carriers (preHD, 8 of whom developed clinical disease in the 5 years postscanning) and 21 healthy controls. A classification model was developed using cross‐group comparisons between preHD and controls, and within the preHD group in relation to “estimated” and “actual” proximity to disease onset. Imaging measures were modeled individually, and combined, and permutation modeling robustly tested classification accuracy. Results Classification performance for preHDs versus controls was greatest when all measures were combined. The resulting polymarker predicted converters with high accuracy, including those who were not expected to manifest in that time scale based on the currently adopted statistical models. Interpretation We propose that a holistic multivariate machine learning treatment of brain abnormalities in the premanifest phase can be used to accurately identify those patients within 5 years of developing motor features of HD, with implications for prognostication and preclinical trials. Ann Neurol 2018;83:532–543 PMID:29405351
Prognostic value of cell cycle regulatory proteins in muscle-infiltrating bladder cancer.
Galmozzi, Fabia; Rubagotti, Alessandra; Romagnoli, Andrea; Carmignani, Giorgio; Perdelli, Luisa; Gatteschi, Beatrice; Boccardo, Francesco
2006-12-01
The aims of this study were to investigate the expression levels of proteins involved in cell cycle regulation in specimens of bladder cancer and to correlate them with the clinicopathological characteristics, proliferative activity and survival. Eighty-two specimens obtained from patients affected by muscle-invasive bladder cancer were evaluated immunohistochemically for p53, p21 and cyclin D1 expression, as well as for the tumour proliferation index, Ki-67. The statistical analysis included Kaplan-Meier curves with log-rank test and Cox proportional hazards models. In univariate analyses, low Ki-67 proliferation index (P = 0.045) and negative p21 immunoreactivity (P = 0.04) were associated to patient's overall survival (OS), but in multivariate models p21 did not reach statistical significance. When the combinations of the variables were assessed in two separate multivariate models that included tumour stage, grading, lymph node status, vascular invasion and perineural invasion, the combined variables p21/Ki-67 or p21/cyclin D1 expression were independent predictors for OS; in particular, patients with positive p21/high Ki-67 (P = 0.015) or positive p21/negative cyclin D1 (P = 0.04) showed the worst survival outcome. Important alterations in the cell cycle regulatory pathways occur in muscle-invasive bladder cancer and the combined use of cell cycle regulators appears to provide significant prognostic information that could be used to select the patients most suitable for multimodal therapeutic approaches.
Physiology declines prior to death in Drosophila melanogaster.
Shahrestani, Parvin; Tran, Xuan; Mueller, Laurence D
2012-10-01
For a period of 6-15 days prior to death, the fecundity and virility of Drosophila melanogaster fall significantly below those of same-aged flies that are not near death. It is likely that other aspects of physiology may decline during this period. This study attempts to document changes in two physiological characteristics prior to death: desiccation resistance and time-in-motion. Using individual fecundity estimates and previously described models, it is possible to accurately predict which flies in a population are near death at any given age; these flies are said to be in the "death spiral". In this study of approximately 7,600 females, we used cohort mortality data and individual fecundity estimates to dichotomize each of five replicate populations of same-aged D. melanogaster into "death spiral" and "non-spiral" groups. We then compared these groups for two physiological characteristics that decline during aging. We describe the statistical properties of a new multivariate test statistic that allows us to compare the desiccation resistance and time-in-motion for two populations chosen on the basis of their fecundity. This multivariate representation of the desiccation resistance and time-in-motion of spiral and non-spiral females was shown to be significantly different with the spiral females characterized by lower desiccation resistance and time spent in motion. Our results suggest that D. melanogaster may be used as a model organism to study physiological changes that occur when death is imminent.
Berg, Gregory D; Donnelly, Shawn; Warnick, Kathleen; Medina, Wendie; Miller, Mary
2014-07-03
The prevalence of schizophrenia and depression in the United States is far higher among Medicaid recipients than in the general population. Individuals suffering from mental illness, including schizophrenia and depression, also have higher rates of emergency department utilization, which is costly and may not generate the positive health outcomes desired. Disease management programs strive to help individuals suffering from chronic illnesses better manage their condition(s) and seek health care in the appropriate settings. The objective of this manuscript is to estimate a dose-response impact on hospital inpatient and emergency room utilizations for any reason by Medicaid recipients with depression or schizophrenia who received disease management contacts. Multivariate regression analysis of panel data taken from administrative claims was conducted to test the hypothesis that increased contacts lower the likelihood of all-cause inpatient admissions and emergency room visits. Subjects included 6,274 members of Illinois' non-institutionalized Medicaid-only aged, blind or disabled population diagnosed with depression or schizophrenia. The statistical measure is the odds ratio. The odds ratio association is between the monthly utilization indicators and the number of contacts (doses) a member had for each particular disease management intervention. Higher numbers of intervention contacts for Medicaid recipients diagnosed with depression or schizophrenia were associated with statistically significant reductions in all-cause inpatient admissions and emergency room utilizations. There is a high correlation between depression and schizophrenia disease management contacts and lowered all-cause hospital inpatient and emergency room utilizations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gunn, Andrew J., E-mail: agunn@uabmc.edu; Sheth, Rahul A.; Luber, Brandon
2017-01-15
PurposeThe purpse of this study was to evaluate the ability of various radiologic response criteria to predict patient outcomes after trans-arterial chemo-embolization with drug-eluting beads (DEB-TACE) in patients with advanced-stage (BCLC C) hepatocellular carcinoma (HCC).Materials and methodsHospital records from 2005 to 2011 were retrospectively reviewed. Non-infiltrative lesions were measured at baseline and on follow-up scans after DEB-TACE according to various common radiologic response criteria, including guidelines of the World Health Organization (WHO), Response Evaluation Criteria in Solid Tumors (RECIST), the European Association for the Study of the Liver (EASL), and modified RECIST (mRECIST). Statistical analysis was performed to see which,more » if any, of the response criteria could be used as a predictor of overall survival (OS) or time-to-progression (TTP).Results75 patients met inclusion criteria. Median OS and TTP were 22.6 months (95 % CI 11.6–24.8) and 9.8 months (95 % CI 7.1–21.6), respectively. Univariate and multivariate Cox analyses revealed that none of the evaluated criteria had the ability to be used as a predictor for OS or TTP. Analysis of the C index in both univariate and multivariate models showed that the evaluated criteria were not accurate predictors of either OS (C-statistic range: 0.51–0.58 in the univariate model; range: 0.54–0.58 in the multivariate model) or TTP (C-statistic range: 0.55–0.59 in the univariate model; range: 0.57–0.61 in the multivariate model).ConclusionCurrent response criteria are not accurate predictors of OS or TTP in patients with advanced-stage HCC after DEB-TACE.« less
Gunn, Andrew J; Sheth, Rahul A; Luber, Brandon; Huynh, Minh-Huy; Rachamreddy, Niranjan R; Kalva, Sanjeeva P
2017-01-01
The purpse of this study was to evaluate the ability of various radiologic response criteria to predict patient outcomes after trans-arterial chemo-embolization with drug-eluting beads (DEB-TACE) in patients with advanced-stage (BCLC C) hepatocellular carcinoma (HCC). Hospital records from 2005 to 2011 were retrospectively reviewed. Non-infiltrative lesions were measured at baseline and on follow-up scans after DEB-TACE according to various common radiologic response criteria, including guidelines of the World Health Organization (WHO), Response Evaluation Criteria in Solid Tumors (RECIST), the European Association for the Study of the Liver (EASL), and modified RECIST (mRECIST). Statistical analysis was performed to see which, if any, of the response criteria could be used as a predictor of overall survival (OS) or time-to-progression (TTP). 75 patients met inclusion criteria. Median OS and TTP were 22.6 months (95 % CI 11.6-24.8) and 9.8 months (95 % CI 7.1-21.6), respectively. Univariate and multivariate Cox analyses revealed that none of the evaluated criteria had the ability to be used as a predictor for OS or TTP. Analysis of the C index in both univariate and multivariate models showed that the evaluated criteria were not accurate predictors of either OS (C-statistic range: 0.51-0.58 in the univariate model; range: 0.54-0.58 in the multivariate model) or TTP (C-statistic range: 0.55-0.59 in the univariate model; range: 0.57-0.61 in the multivariate model). Current response criteria are not accurate predictors of OS or TTP in patients with advanced-stage HCC after DEB-TACE.
Huang, X C; Guo, H X; Wu, Z H; Guo, C X; Wei, W J; Li, H C; Sun, Q; Zhang, C C; Li, Z Y; Chen, T; Zhong, Q; Zhou, L
2017-05-12
Objective: To understand the characteristics of Mycobacterium tuberculosis (MTB) in epidemiology and distribution from Guangdong Province, and to explore the risk factors associated with drug resistance. Methods: A total of 225 clinical strains of MTB collected from 5 drug resistance monitoring sites of Guangdong Province in 2015 were tested by Regions of Difference 105 (RD105) deletion test and 15 loci mycobacterial interspersed repetitive units (MIRU) were used for genotyping. Gene clustering was analyzed using BioNumerics7.6. Drug susceptibility test was tested by proportion method. The statistical analysis used chi-square test and multivariate logistic regression. Results: There were 158 (70.2%) Beijing family strains from the 225 cases. Hunter-gaston index of MIRU loci varied from each other. The MTBs from Guangdong Province were categorized into 2 gene clusters by clustering analysis in which the rate of cluster of complexⅠwas significantly higher than complexⅡ(χ(2) values were 9.331, P values were 0.020). It was found by multivariate logistic regression that Qub11b was associated with resistance to rifampicin and isoniazid ( P values were 0.013, 0.012 respectively.), ETR F with resistance to isoniazid, streptomycin, ethambutol and ofloxacin ( P values were 0.039, 0.040, 0.023 and 0.003 respectively), Mtub21 with resistance to capreomycin ( P values were 0.040), and QUB26 with resistance to ethionamide ( P values were 0.047). Conclusions: The genes of MTB from Guangdong Province were of polymorphisms and the distribution of strains were stable. QUB11b, ETR F, Mtub21 and QUB26 could be related to biomarkers for predicting drug resistance.
Yang, Xiaowei; Nie, Kun
2008-03-15
Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data.
Uddameri, Venkatesh; Singaraju, Sreeram; Hernandez, E Annette
2018-02-21
Seasonal and cyclic trends in nutrient concentrations at four agricultural drainage ditches were assessed using a dataset generated from a multivariate, multiscale, multiyear water quality monitoring effort in the agriculturally dominant Lower Rio Grande Valley (LRGV) River Watershed in South Texas. An innovative bootstrap sampling-based power analysis procedure was developed to evaluate the ability of Mann-Whitney and Noether tests to discern trends and to guide future monitoring efforts. The Mann-Whitney U test was able to detect significant changes between summer and winter nutrient concentrations at sites with lower depths and unimpeded flows. Pollutant dilution, non-agricultural loadings, and in-channel flow structures (weirs) masked the effects of seasonality. The detection of cyclical trends using the Noether test was highest in the presence of vegetation mainly for total phosphorus and oxidized nitrogen (nitrite + nitrate) compared to dissolved phosphorus and reduced nitrogen (total Kjeldahl nitrogen-TKN). Prospective power analysis indicated that while increased monitoring can lead to higher statistical power, the effect size (i.e., the total number of trend sequences within a time-series) had a greater influence on the Noether test. Both Mann-Whitney and Noether tests provide complementary information on seasonal and cyclic behavior of pollutant concentrations and are affected by different processes. The results from these statistical tests when evaluated in the context of flow, vegetation, and in-channel hydraulic alterations can help guide future data collection and monitoring efforts. The study highlights the need for long-term monitoring of agricultural drainage ditches to properly discern seasonal and cyclical trends.
Forster, H.-J.; Davis, J.C.; Tischendorf, G.; Seltmann, R.
1999-01-01
High-precision major, minor and trace element analyses for 44 elements have been made of 329 Late Variscan granitic and rhyolitic rocks from the Erzgebirge metallogenic province of Germany. The intrusive histories of some of these granites are not completely understood and exposures of rock are not adequate to resolve relationships between what apparently are different plutons. Therefore, it is necessary to turn to chemical analyses to decipher the evolution of the plutons and their relationships. A new classification of Erzgebirge plutons into five major groups of granites, based on petrologic interpretations of geochemical and mineralogical relationships (low-F biotite granites; low-F two-mica granites; high-F, high-P2O5 Li-mica granites; high-F, low-P2O5 Li-mica granites; high-F, low-P2O5 biotite granites) was tested by multivariate techniques. Canonical analyses of major elements, minor elements, trace elements and ratio variables all distinguish the groups with differing amounts of success. Univariate ANOVA's, in combination with forward-stepwise and backward-elimination canonical analyses, were used to select ten variables which were most effective in distinguishing groups. In a biplot, groups form distinct clusters roughly arranged along a quadratic path. Within groups, individual plutons tend to be arranged in patterns possibly reflecting granitic evolution. Canonical functions were used to classify samples of rhyolites of unknown association into the five groups. Another canonical analysis was based on ten elements traditionally used in petrology and which were important in the new classification of granites. Their biplot pattern is similar to that from statistically chosen variables but less effective at distinguishing the five groups of granites. This study shows that multivariate statistical techniques can provide significant insight into problems of granitic petrogenesis and may be superior to conventional procedures for petrological interpretation.
The influence of dose distribution on treatment outcome in the SCOPE 1 oesophageal cancer trial.
Carrington, Rhys; Spezi, Emiliano; Gwynne, Sarah; Dutton, Peter; Hurt, Chris; Staffurth, John; Crosby, Thomas
2016-02-06
The first aim of this study was to assess plan quality using a conformity index (CI) and analyse its influence on patient outcome. The second aim was to identify whether clinical and technological factors including planning treatment volume (PTV) volume and treatment delivery method could be related to the CI value. By extending the original concept of the mean distance to conformity (MDC) index, the OverMDC and UnderMDC of the 95 % isodose line (50Gy prescribed dose) to the PTV was calculated for 97 patients from the UK SCOPE 1 trial (ISCRT47718479). Data preparation was carried out in CERR, with Kaplan-Meier and multivariate analysis undertaken in EUCLID and further tests in Microsoft Excel and IBM's SPSS. A statistically significant breakpoint in the overall survival data, independent of cetuximab, was found with OverMDC (4.4 mm, p < 0.05). This was not the case with UnderMDC. There was a statistically significant difference in PTV volume either side of the OverMDC breakpoint (Mann Whitney p < 0.001) and in OverMDC value dependent on the treatment delivery method (mean IMRT = 2.1 mm, mean 3D-CRT = 4.1 mm Mann Whitney p < 0.001). Re-planning the worst performing patients according to OverMDC from 3D-CRT to VMAT resulted in a mean reduction in OverMDC of 2.8 mm (1.6-4.0 mm). OverMDC was not significant in multivariate analysis that included age, sex, staging, tumour type, and position. Although not significant when included in multivariate analysis, we have shown in univariate analysis that a patient's OverMDC is correlated with overall survival. OverMDC is strongly related to IMRT and to a lesser extent with PTV volume. We recommend that VMAT planning should be used for oesophageal planning when available and that attention should be paid to the conformity of the 95 % to the PTV.
Does speed matter? The impact of operative time on outcome in laparoscopic surgery
Jackson, Timothy D.; Wannares, Jeffrey J.; Lancaster, R. Todd; Rattner, David W.
2012-01-01
Introduction Controversy exists concerning the importance of operative time on patient outcomes. It is unclear whether faster is better or haste makes waste or similarly whether slower procedures represent a safe, meticulous approach or inexperienced dawdling. The objective of the present study was to determine the effect of operative time on 30-day outcomes in laparoscopic surgery. Methods Patients who underwent laparoscopic general surgery procedures (colectomy, cholecystectomy, Nissen fundoplication, inguinal hernia, and gastric bypass) from the ACS-NSQIP 2005–2008 participant use file were identified. Exclusion criteria were defined a priori to identify same-day admission, elective procedures. Operative time was divided into deciles and summary statistics were analyzed. Univariate analyses using a Cochran-Armitage test for trend were completed. The effect of operative time on 30-day morbidity was further analyzed for each procedure type using multivariate regression controlling for case complexity and additional patient factors. Patients within the highest deciles were excluded to reduce outlier effect. Results A total of 76,748 elective general surgical patients who underwent laparoscopic procedures were analyzed. Univariate analyses of deciles of operative time demonstrated a statistically significant trend (p \\ 0.0001) toward increasing odds of complications with increasing operative time for laparoscopic colectomy (n = 10,135), cholecystectomy (n = 37,407), Nissen fundoplication (n = 4,934), and gastric bypass (n = 17,842). The trend was not found to be significant for laparoscopic inguinal hernia repair (n = 6,430; p = 0.14). Multivariate modeling revealed the effect of operative time to remain significant after controlling for additional patient factors. Conclusion Increasing operative time was associated with increased odds of complications and, therefore, it appears that speed may matter in laparoscopic surgery. These analyses are limited in their inability to adjust for all patient factors, potential confounders, and case complexities. Additional hierarchical multivariate analyses at the surgeon level would be important to examine this relationship further. PMID:21298533
Does speed matter? The impact of operative time on outcome in laparoscopic surgery.
Jackson, Timothy D; Wannares, Jeffrey J; Lancaster, R Todd; Rattner, David W; Hutter, Matthew M
2011-07-01
Controversy exists concerning the importance of operative time on patient outcomes. It is unclear whether faster is better or haste makes waste or similarly whether slower procedures represent a safe, meticulous approach or inexperienced dawdling. The objective of the present study was to determine the effect of operative time on 30-day outcomes in laparoscopic surgery. Patients who underwent laparoscopic general surgery procedures (colectomy, cholecystectomy, Nissen fundoplication, inguinal hernia, and gastric bypass) from the ACS-NSQIP 2005-2008 participant use file were identified. Exclusion criteria were defined a priori to identify same-day admission, elective procedures. Operative time was divided into deciles and summary statistics were analyzed. Univariate analyses using a Cochran-Armitage test for trend were completed. The effect of operative time on 30-day morbidity was further analyzed for each procedure type using multivariate regression controlling for case complexity and additional patient factors. Patients within the highest deciles were excluded to reduce outlier effect. A total of 76,748 elective general surgical patients who underwent laparoscopic procedures were analyzed. Univariate analyses of deciles of operative time demonstrated a statistically significant trend (p<0.0001) toward increasing odds of complications with increasing operative time for laparoscopic colectomy (n=10,135), cholecystectomy (n=37,407), Nissen fundoplication (n=4,934), and gastric bypass (n=17,842). The trend was not found to be significant for laparoscopic inguinal hernia repair (n=6,430; p=0.14). Multivariate modeling revealed the effect of operative time to remain significant after controlling for additional patient factors. Increasing operative time was associated with increased odds of complications and, therefore, it appears that speed may matter in laparoscopic surgery. These analyses are limited in their inability to adjust for all patient factors, potential confounders, and case complexities. Additional hierarchical multivariate analyses at the surgeon level would be important to examine this relationship further.
NASA Astrophysics Data System (ADS)
Guillen, George; Rainey, Gail; Morin, Michelle
2004-04-01
Currently, the Minerals Management Service uses the Oil Spill Risk Analysis model (OSRAM) to predict the movement of potential oil spills greater than 1000 bbl originating from offshore oil and gas facilities. OSRAM generates oil spill trajectories using meteorological and hydrological data input from either actual physical measurements or estimates generated from other hydrological models. OSRAM and many other models produce output matrices of average, maximum and minimum contact probabilities to specific landfall or target segments (columns) from oil spills at specific points (rows). Analysts and managers are often interested in identifying geographic areas or groups of facilities that pose similar risks to specific targets or groups of targets if a spill occurred. Unfortunately, due to the potentially large matrix generated by many spill models, this question is difficult to answer without the use of data reduction and visualization methods. In our study we utilized a multivariate statistical method called cluster analysis to group areas of similar risk based on potential distribution of landfall target trajectory probabilities. We also utilized ArcView™ GIS to display spill launch point groupings. The combination of GIS and multivariate statistical techniques in the post-processing of trajectory model output is a powerful tool for identifying and delineating areas of similar risk from multiple spill sources. We strongly encourage modelers, statistical and GIS software programmers to closely collaborate to produce a more seamless integration of these technologies and approaches to analyzing data. They are complimentary methods that strengthen the overall assessment of spill risks.
Steiner, John F.; Ho, P. Michael; Beaty, Brenda L.; Dickinson, L. Miriam; Hanratty, Rebecca; Zeng, Chan; Tavel, Heather M.; Havranek, Edward P.; Davidson, Arthur J.; Magid, David J.; Estacio, Raymond O.
2009-01-01
Background Although many studies have identified patient characteristics or chronic diseases associated with medication adherence, the clinical utility of such predictors has rarely been assessed. We attempted to develop clinical prediction rules for adherence with antihypertensive medications in two health care delivery systems. Methods and Results Retrospective cohort studies of hypertension registries in an inner-city health care delivery system (N = 17176) and a health maintenance organization (N = 94297) in Denver, Colorado. Adherence was defined by acquisition of 80% or more of antihypertensive medications. A multivariable model in the inner-city system found that adherent patients (36.3% of the total) were more likely than non-adherent patients to be older, white, married, and acculturated in US society, to have diabetes or cerebrovascular disease, not to abuse alcohol or controlled substances, and to be prescribed less than three antihypertensive medications. Although statistically significant, all multivariate odds ratios were 1.7 or less, and the model did not accurately discriminate adherent from non-adherent patients (C-statistic = 0.606). In the health maintenance organization, where 72.1% of patients were adherent, significant but weak associations existed between adherence and older age, white race, the lack of alcohol abuse, and fewer antihypertensive medications. The multivariate model again failed to accurately discriminate adherent from non-adherent individuals (C-statistic = 0.576). Conclusions Although certain socio-demographic characteristics or clinical diagnoses are statistically associated with adherence to refills of antihypertensive medications, a combination of these characteristics is not sufficiently accurate to allow clinicians to predict whether their patients will be adherent with treatment. PMID:20031876
Constructing networks from a dynamical system perspective for multivariate nonlinear time series.
Nakamura, Tomomichi; Tanizawa, Toshihiro; Small, Michael
2016-03-01
We describe a method for constructing networks for multivariate nonlinear time series. We approach the interaction between the various scalar time series from a deterministic dynamical system perspective and provide a generic and algorithmic test for whether the interaction between two measured time series is statistically significant. The method can be applied even when the data exhibit no obvious qualitative similarity: a situation in which the naive method utilizing the cross correlation function directly cannot correctly identify connectivity. To establish the connectivity between nodes we apply the previously proposed small-shuffle surrogate (SSS) method, which can investigate whether there are correlation structures in short-term variabilities (irregular fluctuations) between two data sets from the viewpoint of deterministic dynamical systems. The procedure to construct networks based on this idea is composed of three steps: (i) each time series is considered as a basic node of a network, (ii) the SSS method is applied to verify the connectivity between each pair of time series taken from the whole multivariate time series, and (iii) the pair of nodes is connected with an undirected edge when the null hypothesis cannot be rejected. The network constructed by the proposed method indicates the intrinsic (essential) connectivity of the elements included in the system or the underlying (assumed) system. The method is demonstrated for numerical data sets generated by known systems and applied to several experimental time series.
Madaniyazi, Lina; Guo, Yuming; Chen, Renjie; Kan, Haidong; Tong, Shilu
2016-01-01
Estimating the burden of mortality associated with particulates requires knowledge of exposure-response associations. However, the evidence on exposure-response associations is limited in many cities, especially in developing countries. In this study, we predicted associations of particulates smaller than 10 μm in aerodynamic diameter (PM10) with mortality in 73 Chinese cities. The meta-regression model was used to test and quantify which city-specific characteristics contributed significantly to the heterogeneity of PM10-mortality associations for 16 Chinese cities. Then, those city-specific characteristics with statistically significant regression coefficients were treated as independent variables to build multivariate meta-regression models. The model with the best fitness was used to predict PM10-mortality associations in 73 Chinese cities in 2010. Mean temperature, PM10 concentration and green space per capita could best explain the heterogeneity in PM10-mortality associations. Based on city-specific characteristics, we were able to develop multivariate meta-regression models to predict associations between air pollutants and health outcomes reasonably well. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Rish, Irina; Bashivan, Pouya; Cecchi, Guillermo A.; Goldstein, Rita Z.
2016-03-01
The objective of this study is to investigate effects of methylphenidate on brain activity in individuals with cocaine use disorder (CUD) using functional MRI (fMRI). Methylphenidate hydrochloride (MPH) is an indirect dopamine agonist commonly used for treating attention deficit/hyperactivity disorders; it was also shown to have some positive effects on CUD subjects, such as improved stop signal reaction times associated with better control/inhibition,1 as well as normalized task-related brain activity2 and resting-state functional connectivity in specific areas.3 While prior fMRI studies of MPH in CUDs have focused on mass-univariate statistical hypothesis testing, this paper evaluates multivariate, whole-brain effects of MPH as captured by the generalization (prediction) accuracy of different classification techniques applied to features extracted from resting-state functional networks (e.g., node degrees). Our multivariate predictive results based on resting-state data from3 suggest that MPH tends to normalize network properties such as voxel degrees in CUD subjects, thus providing additional evidence for potential benefits of MPH in treating cocaine addiction.
Nurses' decision making in heart failure management based on heart failure certification status.
Albert, Nancy M; Bena, James F; Buxbaum, Denise; Martensen, Linda; Morrison, Shannon L; Prasun, Marilyn A; Stamp, Kelly D
Research findings on the value of nurse certification were based on subjective perceptions or biased by correlations of certification status and global clinical factors. In heart failure, the value of certification is unknown. Examine the value of certification based nurses' decision-making. Cross-sectional study of nurses who completed heart failure clinical vignettes that reflected decision-making in clinical heart failure scenarios. Statistical tests included multivariable linear, logistic and proportional odds logistic regression models. Of nurses (N = 605), 29.1% were heart failure certified, 35.0% were certified in another specialty/job role and 35.9% were not certified. In multivariable modeling, nurses certified in heart failure (versus not heart failure certified) had higher clinical vignette scores (p = 0.002), reflecting higher evidence-based decision making; nurses with another specialty/role certification (versus no certification) did not (p = 0.62). Heart failure certification, but not in other specialty/job roles was associated with decisions that reflected delivery of high-quality care. Copyright © 2018 Elsevier Inc. All rights reserved.
Papageorgiou, Spyridon N; Kloukos, Dimitrios; Petridis, Haralampos; Pandis, Nikolaos
2015-10-01
To assess the hypothesis that there is excessive reporting of statistically significant studies published in prosthodontic and implantology journals, which could indicate selective publication. The last 30 issues of 9 journals in prosthodontics and implant dentistry were hand-searched for articles with statistical analyses. The percentages of significant and non-significant results were tabulated by parameter of interest. Univariable/multivariable logistic regression analyses were applied to identify possible predictors of reporting statistically significance findings. The results of this study were compared with similar studies in dentistry with random-effects meta-analyses. From the 2323 included studies 71% of them reported statistically significant results, with the significant results ranging from 47% to 86%. Multivariable modeling identified that geographical area and involvement of statistician were predictors of statistically significant results. Compared to interventional studies, the odds that in vitro and observational studies would report statistically significant results was increased by 1.20 times (OR: 2.20, 95% CI: 1.66-2.92) and 0.35 times (OR: 1.35, 95% CI: 1.05-1.73), respectively. The probability of statistically significant results from randomized controlled trials was significantly lower compared to various study designs (difference: 30%, 95% CI: 11-49%). Likewise the probability of statistically significant results in prosthodontics and implant dentistry was lower compared to other dental specialties, but this result did not reach statistical significant (P>0.05). The majority of studies identified in the fields of prosthodontics and implant dentistry presented statistically significant results. The same trend existed in publications of other specialties in dentistry. Copyright © 2015 Elsevier Ltd. All rights reserved.
Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang
2010-07-01
We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root- n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided.
Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang
2013-01-01
We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root-n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided. PMID:24790286
. Another project used multivariate statistics to develop a novel device to non-invasively measure hydrogen Cellulosic Ethanol Production due to Experimental Measurement Uncertainty," Biotechnology for Biofuels
Statistics anxiety, state anxiety during an examination, and academic achievement.
Macher, Daniel; Paechter, Manuela; Papousek, Ilona; Ruggeri, Kai; Freudenthaler, H Harald; Arendasy, Martin
2013-12-01
A large proportion of students identify statistics courses as the most anxiety-inducing courses in their curriculum. Many students feel impaired by feelings of state anxiety in the examination and therefore probably show lower achievements. The study investigates how statistics anxiety, attitudes (e.g., interest, mathematical self-concept) and trait anxiety, as a general disposition to anxiety, influence experiences of anxiety as well as achievement in an examination. Participants were 284 undergraduate psychology students, 225 females and 59 males. Two weeks prior to the examination, participants completed a demographic questionnaire and measures of the STARS, the STAI, self-concept in mathematics, and interest in statistics. At the beginning of the statistics examination, students assessed their present state anxiety by the KUSTA scale. After 25 min, all examination participants gave another assessment of their anxiety at that moment. Students' examination scores were recorded. Structural equation modelling techniques were used to test relationships between the variables in a multivariate context. Statistics anxiety was the only variable related to state anxiety in the examination. Via state anxiety experienced before and during the examination, statistics anxiety had a negative influence on achievement. However, statistics anxiety also had a direct positive influence on achievement. This result may be explained by students' motivational goals in the specific educational setting. The results provide insight into the relationship between students' attitudes, dispositions, experiences of anxiety in the examination, and academic achievement, and give recommendations to instructors on how to support students prior to and in the examination. © 2012 The British Psychological Society.
2014-09-01
approaches. Ecological Modelling Volume 200, Issues 1–2, 10, pp 1–19. Buhlmann, Kurt A ., Thomas S.B. Akre , John B. Iverson, Deno Karapatakis, Russell A ...statistical multivariate analysis to define the current and projected future range probability for species of interest to Army land managers. A software...15 Figure 4. RCW omission rate and predicted area as a function of the cumulative threshold