Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo
2018-01-01
This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555
J. Grabinsky; A. Aldama; A. Chacalo; H. J. Vazquez
2000-01-01
Inventory data of Mexico City's street trees were studied using classical statistical arboricultural and ecological statistical approaches. Multivariate techniques were applied to both. Results did not differ substantially and were complementary. It was possible to reduce inventory data and to group species, boroughs, blocks, and variables.
Quantifying the impact of between-study heterogeneity in multivariate meta-analyses
Jackson, Dan; White, Ian R; Riley, Richard D
2012-01-01
Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950
Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J.; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T.; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti
2016-01-01
Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco Contacts: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153689
Taylor, Sandra L; Ruhaak, L Renee; Weiss, Robert H; Kelly, Karen; Kim, Kyoungmi
2017-01-01
High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact between-biospecimen correlation and multivariate analysis results. We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen. We provide R functions to implement and illustrate our method as supplementary information CONTACT: sltaylor@ucdavis.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Almeida, Tiago P; Chu, Gavin S; Li, Xin; Dastagir, Nawshin; Tuan, Jiun H; Stafford, Peter J; Schlindwein, Fernando S; Ng, G André
2017-01-01
Purpose: Complex fractionated atrial electrograms (CFAE)-guided ablation after pulmonary vein isolation (PVI) has been used for persistent atrial fibrillation (persAF) therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model. Methods: 207 pairs of atrial electrograms (AEGs) were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA) and linear discriminant analysis (LDA) have been used to characterize the atrial regions and AEGs. Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P < 0.0001). Four types of LA regions were identified, based on the AEGs characteristics: (i) fractionated before PVI that remained fractionated after PVI (31% of the collected points); (ii) fractionated that converted to normal (39%); (iii) normal prior to PVI that became fractionated (9%) and; (iv) normal that remained normal (21%). Individually, the attributes failed to distinguish these LA regions, but multivariate statistical models were effective in their discrimination ( P < 0.0001). Conclusion: Our results have unveiled that there are LA regions resistant to PVI, while others are affected by it. Although, traditional methods were unable to identify these different regions, the proposed multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information.
Multivariate meta-analysis: potential and promise.
Jackson, Dan; Riley, Richard; White, Ian R
2011-09-10
The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day 'Multivariate meta-analysis' event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd.
Multivariate meta-analysis: Potential and promise
Jackson, Dan; Riley, Richard; White, Ian R
2011-01-01
The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day ‘Multivariate meta-analysis’ event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd. PMID:21268052
Zhi, Ruicong; Zhao, Lei; Xie, Nan; Wang, Houyin; Shi, Bolin; Shi, Jingye
2016-01-13
A framework of establishing standard reference scale (texture) is proposed by multivariate statistical analysis according to instrumental measurement and sensory evaluation. Multivariate statistical analysis is conducted to rapidly select typical reference samples with characteristics of universality, representativeness, stability, substitutability, and traceability. The reasonableness of the framework method is verified by establishing standard reference scale of texture attribute (hardness) with Chinese well-known food. More than 100 food products in 16 categories were tested using instrumental measurement (TPA test), and the result was analyzed with clustering analysis, principal component analysis, relative standard deviation, and analysis of variance. As a result, nine kinds of foods were determined to construct the hardness standard reference scale. The results indicate that the regression coefficient between the estimated sensory value and the instrumentally measured value is significant (R(2) = 0.9765), which fits well with Stevens's theory. The research provides reliable a theoretical basis and practical guide for quantitative standard reference scale establishment on food texture characteristics.
Souza, Iara da Costa; Morozesk, Mariana; Duarte, Ian Drumond; Bonomo, Marina Marques; Rocha, Lívia Dorsch; Furlan, Larissa Maria; Arrivabene, Hiulana Pereira; Monferrán, Magdalena Victoria; Matsumoto, Silvia Tamie; Milanez, Camilla Rozindo Dias; Wunderlin, Daniel Alberto; Fernandes, Marisa Narciso
2014-08-01
Roots of mangrove trees have an important role in depurating water and sediments by retaining metals that may accumulate in different plant tissues, affecting physiological processes and anatomy. The present study aimed to evaluate adaptive changes in root of Rhizophora mangle in response to different levels of chemical elements (metals/metalloids) in interstitial water and sediments from four neotropical mangroves in Brazil. What sets this study apart from other studies is that we not only investigate adaptive modifications in R. mangle but also changes in environments where this plant grows, evaluating correspondence between physical, chemical and biological issues by a combined set of multivariate statistical methods (pattern recognition). Thus, we looked to match changes in the environment with adaptations in plants. Multivariate statistics highlighted that the lignified periderm and the air gaps are directly related to the environmental contamination. Current results provide new evidences of root anatomical strategies to deal with contaminated environments. Multivariate statistics greatly contributes to extrapolate results from complex data matrixes obtained when analyzing environmental issues, pointing out parameters involved in environmental changes and also evidencing the adaptive response of the exposed biota. Copyright © 2014 Elsevier Ltd. All rights reserved.
Richard. D. Wood-Smith; John M. Buffington
1996-01-01
Multivariate statistical analyses of geomorphic variables from 23 forest stream reaches in southeast Alaska result in successful discrimination between pristine streams and those disturbed by land management, specifically timber harvesting and associated road building. Results of discriminant function analysis indicate that a three-variable model discriminates 10...
Forcino, Frank L; Leighton, Lindsey R; Twerdy, Pamela; Cahill, James F
2015-01-01
Community ecologists commonly perform multivariate techniques (e.g., ordination, cluster analysis) to assess patterns and gradients of taxonomic variation. A critical requirement for a meaningful statistical analysis is accurate information on the taxa found within an ecological sample. However, oversampling (too many individuals counted per sample) also comes at a cost, particularly for ecological systems in which identification and quantification is substantially more resource consuming than the field expedition itself. In such systems, an increasingly larger sample size will eventually result in diminishing returns in improving any pattern or gradient revealed by the data, but will also lead to continually increasing costs. Here, we examine 396 datasets: 44 previously published and 352 created datasets. Using meta-analytic and simulation-based approaches, the research within the present paper seeks (1) to determine minimal sample sizes required to produce robust multivariate statistical results when conducting abundance-based, community ecology research. Furthermore, we seek (2) to determine the dataset parameters (i.e., evenness, number of taxa, number of samples) that require larger sample sizes, regardless of resource availability. We found that in the 44 previously published and the 220 created datasets with randomly chosen abundances, a conservative estimate of a sample size of 58 produced the same multivariate results as all larger sample sizes. However, this minimal number varies as a function of evenness, where increased evenness resulted in increased minimal sample sizes. Sample sizes as small as 58 individuals are sufficient for a broad range of multivariate abundance-based research. In cases when resource availability is the limiting factor for conducting a project (e.g., small university, time to conduct the research project), statistically viable results can still be obtained with less of an investment.
Exploring the Replicability of a Study's Results: Bootstrap Statistics for the Multivariate Case.
ERIC Educational Resources Information Center
Thompson, Bruce
Conventional statistical significance tests do not inform the researcher regarding the likelihood that results will replicate. One strategy for evaluating result replication is to use a "bootstrap" resampling of a study's data so that the stability of results across numerous configurations of the subjects can be explored. This paper…
Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F
2017-04-01
Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.
McArtor, Daniel B.; Lubke, Gitta H.; Bergeman, C. S.
2017-01-01
Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains. PMID:27738957
McArtor, Daniel B; Lubke, Gitta H; Bergeman, C S
2017-12-01
Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains.
Deconstructing multivariate decoding for the study of brain function.
Hebart, Martin N; Baker, Chris I
2017-08-04
Multivariate decoding methods were developed originally as tools to enable accurate predictions in real-world applications. The realization that these methods can also be employed to study brain function has led to their widespread adoption in the neurosciences. However, prior to the rise of multivariate decoding, the study of brain function was firmly embedded in a statistical philosophy grounded on univariate methods of data analysis. In this way, multivariate decoding for brain interpretation grew out of two established frameworks: multivariate decoding for predictions in real-world applications, and classical univariate analysis based on the study and interpretation of brain activation. We argue that this led to two confusions, one reflecting a mixture of multivariate decoding for prediction or interpretation, and the other a mixture of the conceptual and statistical philosophies underlying multivariate decoding and classical univariate analysis. Here we attempt to systematically disambiguate multivariate decoding for the study of brain function from the frameworks it grew out of. After elaborating these confusions and their consequences, we describe six, often unappreciated, differences between classical univariate analysis and multivariate decoding. We then focus on how the common interpretation of what is signal and noise changes in multivariate decoding. Finally, we use four examples to illustrate where these confusions may impact the interpretation of neuroimaging data. We conclude with a discussion of potential strategies to help resolve these confusions in interpreting multivariate decoding results, including the potential departure from multivariate decoding methods for the study of brain function. Copyright © 2017. Published by Elsevier Inc.
Bonetti, Jennifer; Quarino, Lawrence
2014-05-01
This study has shown that the combination of simple techniques with the use of multivariate statistics offers the potential for the comparative analysis of soil samples. Five samples were obtained from each of twelve state parks across New Jersey in both the summer and fall seasons. Each sample was examined using particle-size distribution, pH analysis in both water and 1 M CaCl2 , and a loss on ignition technique. Data from each of the techniques were combined, and principal component analysis (PCA) and canonical discriminant analysis (CDA) were used for multivariate data transformation. Samples from different locations could be visually differentiated from one another using these multivariate plots. Hold-one-out cross-validation analysis showed error rates as low as 3.33%. Ten blind study samples were analyzed resulting in no misclassifications using Mahalanobis distance calculations and visual examinations of multivariate plots. Seasonal variation was minimal between corresponding samples, suggesting potential success in forensic applications. © 2014 American Academy of Forensic Sciences.
Papageorgiou, Spyridon N; Kloukos, Dimitrios; Petridis, Haralampos; Pandis, Nikolaos
2015-10-01
To assess the hypothesis that there is excessive reporting of statistically significant studies published in prosthodontic and implantology journals, which could indicate selective publication. The last 30 issues of 9 journals in prosthodontics and implant dentistry were hand-searched for articles with statistical analyses. The percentages of significant and non-significant results were tabulated by parameter of interest. Univariable/multivariable logistic regression analyses were applied to identify possible predictors of reporting statistically significance findings. The results of this study were compared with similar studies in dentistry with random-effects meta-analyses. From the 2323 included studies 71% of them reported statistically significant results, with the significant results ranging from 47% to 86%. Multivariable modeling identified that geographical area and involvement of statistician were predictors of statistically significant results. Compared to interventional studies, the odds that in vitro and observational studies would report statistically significant results was increased by 1.20 times (OR: 2.20, 95% CI: 1.66-2.92) and 0.35 times (OR: 1.35, 95% CI: 1.05-1.73), respectively. The probability of statistically significant results from randomized controlled trials was significantly lower compared to various study designs (difference: 30%, 95% CI: 11-49%). Likewise the probability of statistically significant results in prosthodontics and implant dentistry was lower compared to other dental specialties, but this result did not reach statistical significant (P>0.05). The majority of studies identified in the fields of prosthodontics and implant dentistry presented statistically significant results. The same trend existed in publications of other specialties in dentistry. Copyright © 2015 Elsevier Ltd. All rights reserved.
Parametric Cost Models for Space Telescopes
NASA Technical Reports Server (NTRS)
Stahl, H. Philip
2010-01-01
A study is in-process to develop a multivariable parametric cost model for space telescopes. Cost and engineering parametric data has been collected on 30 different space telescopes. Statistical correlations have been developed between 19 variables of 59 variables sampled. Single Variable and Multi-Variable Cost Estimating Relationships have been developed. Results are being published.
Preliminary Multi-Variable Parametric Cost Model for Space Telescopes
NASA Technical Reports Server (NTRS)
Stahl, H. Philip; Hendrichs, Todd
2010-01-01
This slide presentation reviews creating a preliminary multi-variable cost model for the contract costs of making a space telescope. There is discussion of the methodology for collecting the data, definition of the statistical analysis methodology, single variable model results, testing of historical models and an introduction of the multi variable models.
A Primer on Multivariate Analysis of Variance (MANOVA) for Behavioral Scientists
ERIC Educational Resources Information Center
Warne, Russell T.
2014-01-01
Reviews of statistical procedures (e.g., Bangert & Baumberger, 2005; Kieffer, Reese, & Thompson, 2001; Warne, Lazo, Ramos, & Ritter, 2012) show that one of the most common multivariate statistical methods in psychological research is multivariate analysis of variance (MANOVA). However, MANOVA and its associated procedures are often not…
NASA Technical Reports Server (NTRS)
Wolf, S. F.; Lipschutz, M. E.
1993-01-01
Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.
Characterizations of linear sufficient statistics
NASA Technical Reports Server (NTRS)
Peters, B. C., Jr.; Reoner, R.; Decell, H. P., Jr.
1977-01-01
A surjective bounded linear operator T from a Banach space X to a Banach space Y must be a sufficient statistic for a dominated family of probability measures defined on the Borel sets of X. These results were applied, so that they characterize linear sufficient statistics for families of the exponential type, including as special cases the Wishart and multivariate normal distributions. The latter result was used to establish precisely which procedures for sampling from a normal population had the property that the sample mean was a sufficient statistic.
NASA Astrophysics Data System (ADS)
Fuchs, Julia; Cermak, Jan; Andersen, Hendrik
2017-04-01
This study aims at untangling the impacts of external dynamics and local conditions on cloud properties in the Southeast Atlantic (SEA) by combining satellite and reanalysis data using multivariate statistics. The understanding of clouds and their determinants at different scales is important for constraining the Earth's radiative budget, and thus prominent in climate-system research. In this study, SEA stratocumulus cloud properties are observed not only as the result of local environmental conditions but also as affected by external dynamics and spatial origins of air masses entering the study area. In order to assess to what extent cloud properties are impacted by aerosol concentration, air mass history, and meteorology, a multivariate approach is conducted using satellite observations of aerosol and cloud properties (MODIS, SEVIRI), information on aerosol species composition (MACC) and meteorological context (ERA-Interim reanalysis). To account for the often-neglected but important role of air mass origin, information on air mass history based on HYSPLIT modeling is included in the statistical model. This multivariate approach is intended to lead to a better understanding of the physical processes behind observed stratocumulus cloud properties in the SEA.
Multivariate statistical model for 3D image segmentation with application to medical images.
John, Nigel M; Kabuka, Mansur R; Ibrahim, Mohamed O
2003-12-01
In this article we describe a statistical model that was developed to segment brain magnetic resonance images. The statistical segmentation algorithm was applied after a pre-processing stage involving the use of a 3D anisotropic filter along with histogram equalization techniques. The segmentation algorithm makes use of prior knowledge and a probability-based multivariate model designed to semi-automate the process of segmentation. The algorithm was applied to images obtained from the Center for Morphometric Analysis at Massachusetts General Hospital as part of the Internet Brain Segmentation Repository (IBSR). The developed algorithm showed improved accuracy over the k-means, adaptive Maximum Apriori Probability (MAP), biased MAP, and other algorithms. Experimental results showing the segmentation and the results of comparisons with other algorithms are provided. Results are based on an overlap criterion against expertly segmented images from the IBSR. The algorithm produced average results of approximately 80% overlap with the expertly segmented images (compared with 85% for manual segmentation and 55% for other algorithms).
Denis Valle; Benjamin Baiser; Christopher W. Woodall; Robin Chazdon; Jerome Chave
2014-01-01
We propose a novel multivariate method to analyse biodiversity data based on the Latent Dirichlet Allocation (LDA) model. LDA, a probabilistic model, reduces assemblages to sets of distinct component communities. It produces easily interpretable results, can represent abrupt and gradual changes in composition, accommodates missing data and allows for coherent estimates...
Problems with Multivariate Normality: Can the Multivariate Bootstrap Help?
ERIC Educational Resources Information Center
Thompson, Bruce
Multivariate normality is required for some statistical tests. This paper explores the implications of violating the assumption of multivariate normality and illustrates a graphical procedure for evaluating multivariate normality. The logic for using the multivariate bootstrap is presented. The multivariate bootstrap can be used when distribution…
Analyzing Faculty Salaries When Statistics Fail.
ERIC Educational Resources Information Center
Simpson, William A.
The role played by nonstatistical procedures, in contrast to multivariant statistical approaches, in analyzing faculty salaries is discussed. Multivariant statistical methods are usually used to establish or defend against prima facia cases of gender and ethnic discrimination with respect to faculty salaries. These techniques are not applicable,…
NASA Astrophysics Data System (ADS)
Darvishzadeh, R.; Skidmore, A. K.; Mirzaie, M.; Atzberger, C.; Schlerf, M.
2014-12-01
Accurate estimation of grassland biomass at their peak productivity can provide crucial information regarding the functioning and productivity of the rangelands. Hyperspectral remote sensing has proved to be valuable for estimation of vegetation biophysical parameters such as biomass using different statistical techniques. However, in statistical analysis of hyperspectral data, multicollinearity is a common problem due to large amount of correlated hyper-spectral reflectance measurements. The aim of this study was to examine the prospect of above ground biomass estimation in a heterogeneous Mediterranean rangeland employing multivariate calibration methods. Canopy spectral measurements were made in the field using a GER 3700 spectroradiometer, along with concomitant in situ measurements of above ground biomass for 170 sample plots. Multivariate calibrations including partial least squares regression (PLSR), principal component regression (PCR), and Least-Squared Support Vector Machine (LS-SVM) were used to estimate the above ground biomass. The prediction accuracy of the multivariate calibration methods were assessed using cross validated R2 and RMSE. The best model performance was obtained using LS_SVM and then PLSR both calibrated with first derivative reflectance dataset with R2cv = 0.88 & 0.86 and RMSEcv= 1.15 & 1.07 respectively. The weakest prediction accuracy was appeared when PCR were used (R2cv = 0.31 and RMSEcv= 2.48). The obtained results highlight the importance of multivariate calibration methods for biomass estimation when hyperspectral data are used.
Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques
NASA Astrophysics Data System (ADS)
Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein
2017-10-01
The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.
Multivariate Relationships between Statistics Anxiety and Motivational Beliefs
ERIC Educational Resources Information Center
Baloglu, Mustafa; Abbassi, Amir; Kesici, Sahin
2017-01-01
In general, anxiety has been found to be associated with motivational beliefs and the current study investigated multivariate relationships between statistics anxiety and motivational beliefs among 305 college students (60.0% women). The Statistical Anxiety Rating Scale, the Motivated Strategies for Learning Questionnaire, and a set of demographic…
Borrowing of strength and study weights in multivariate and network meta-analysis.
Jackson, Dan; White, Ian R; Price, Malcolm; Copas, John; Riley, Richard D
2017-12-01
Multivariate and network meta-analysis have the potential for the estimated mean of one effect to borrow strength from the data on other effects of interest. The extent of this borrowing of strength is usually assessed informally. We present new mathematical definitions of 'borrowing of strength'. Our main proposal is based on a decomposition of the score statistic, which we show can be interpreted as comparing the precision of estimates from the multivariate and univariate models. Our definition of borrowing of strength therefore emulates the usual informal assessment. We also derive a method for calculating study weights, which we embed into the same framework as our borrowing of strength statistics, so that percentage study weights can accompany the results from multivariate and network meta-analyses as they do in conventional univariate meta-analyses. Our proposals are illustrated using three meta-analyses involving correlated effects for multiple outcomes, multiple risk factor associations and multiple treatments (network meta-analysis).
Borrowing of strength and study weights in multivariate and network meta-analysis
Jackson, Dan; White, Ian R; Price, Malcolm; Copas, John; Riley, Richard D
2016-01-01
Multivariate and network meta-analysis have the potential for the estimated mean of one effect to borrow strength from the data on other effects of interest. The extent of this borrowing of strength is usually assessed informally. We present new mathematical definitions of ‘borrowing of strength’. Our main proposal is based on a decomposition of the score statistic, which we show can be interpreted as comparing the precision of estimates from the multivariate and univariate models. Our definition of borrowing of strength therefore emulates the usual informal assessment. We also derive a method for calculating study weights, which we embed into the same framework as our borrowing of strength statistics, so that percentage study weights can accompany the results from multivariate and network meta-analyses as they do in conventional univariate meta-analyses. Our proposals are illustrated using three meta-analyses involving correlated effects for multiple outcomes, multiple risk factor associations and multiple treatments (network meta-analysis). PMID:26546254
Statistical methods and neural network approaches for classification of data from multiple sources
NASA Technical Reports Server (NTRS)
Benediktsson, Jon Atli; Swain, Philip H.
1990-01-01
Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.
Selvarasu, Suresh; Kim, Do Yun; Karimi, Iftekhar A; Lee, Dong-Yup
2010-10-01
We present an integrated framework for characterizing fed-batch cultures of mouse hybridoma cells producing monoclonal antibody (mAb). This framework systematically combines data preprocessing, elemental balancing and statistical analysis technique. Initially, specific rates of cell growth, glucose/amino acid consumptions and mAb/metabolite productions were calculated via curve fitting using logistic equations, with subsequent elemental balancing of the preprocessed data indicating the presence of experimental measurement errors. Multivariate statistical analysis was then employed to understand physiological characteristics of the cellular system. The results from principal component analysis (PCA) revealed three major clusters of amino acids with similar trends in their consumption profiles: (i) arginine, threonine and serine, (ii) glycine, tyrosine, phenylalanine, methionine, histidine and asparagine, and (iii) lysine, valine and isoleucine. Further analysis using partial least square (PLS) regression identified key amino acids which were positively or negatively correlated with the cell growth, mAb production and the generation of lactate and ammonia. Based on these results, the optimal concentrations of key amino acids in the feed medium can be inferred, potentially leading to an increase in cell viability and productivity, as well as a decrease in toxic waste production. The study demonstrated how the current methodological framework using multivariate statistical analysis techniques can serve as a potential tool for deriving rational medium design strategies. Copyright © 2010 Elsevier B.V. All rights reserved.
On Some Multiple Decision Problems
1976-08-01
parameter space. Some recent results in the area of subset selection formulation are Gnanadesikan and Gupta [28], Gupta and Studden [43], Gupta and...York, pp. 363-376. [27) Gnanadesikan , M. (1966). Some Selection and Ranking Procedures for Multivariate Normal Populations. Ph.D. Thesis. Dept. of...Statist., Purdue Univ., West Lafayette, Indiana 47907. [28) Gnanadesikan , M. and Gupta, S. S. (1970). Selection procedures for multivariate normal
Exploring the Replicability of a Study's Results: Bootstrap Statistics for the Multivariate Case.
ERIC Educational Resources Information Center
Thompson, Bruce
1995-01-01
Use of the bootstrap method in a canonical correlation analysis to evaluate the replicability of a study's results is illustrated. More confidence may be vested in research results that replicate. (SLD)
NONPARAMETRIC MANOVA APPROACHES FOR NON-NORMAL MULTIVARIATE OUTCOMES WITH MISSING VALUES
He, Fanyin; Mazumdar, Sati; Tang, Gong; Bhatia, Triptish; Anderson, Stewart J.; Dew, Mary Amanda; Krafty, Robert; Nimgaonkar, Vishwajit; Deshpande, Smita; Hall, Martica; Reynolds, Charles F.
2017-01-01
Between-group comparisons often entail many correlated response variables. The multivariate linear model, with its assumption of multivariate normality, is the accepted standard tool for these tests. When this assumption is violated, the nonparametric multivariate Kruskal-Wallis (MKW) test is frequently used. However, this test requires complete cases with no missing values in response variables. Deletion of cases with missing values likely leads to inefficient statistical inference. Here we extend the MKW test to retain information from partially-observed cases. Results of simulated studies and analysis of real data show that the proposed method provides adequate coverage and superior power to complete-case analyses. PMID:29416225
Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti
2016-07-01
A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Sepehrband, Farshid; Lynch, Kirsten M; Cabeen, Ryan P; Gonzalez-Zacarias, Clio; Zhao, Lu; D'Arcy, Mike; Kesselman, Carl; Herting, Megan M; Dinov, Ivo D; Toga, Arthur W; Clark, Kristi A
2018-05-15
Exploring neuroanatomical sex differences using a multivariate statistical learning approach can yield insights that cannot be derived with univariate analysis. While gross differences in total brain volume are well-established, uncovering the more subtle, regional sex-related differences in neuroanatomy requires a multivariate approach that can accurately model spatial complexity as well as the interactions between neuroanatomical features. Here, we developed a multivariate statistical learning model using a support vector machine (SVM) classifier to predict sex from MRI-derived regional neuroanatomical features from a single-site study of 967 healthy youth from the Philadelphia Neurodevelopmental Cohort (PNC). Then, we validated the multivariate model on an independent dataset of 682 healthy youth from the multi-site Pediatric Imaging, Neurocognition and Genetics (PING) cohort study. The trained model exhibited an 83% cross-validated prediction accuracy, and correctly predicted the sex of 77% of the subjects from the independent multi-site dataset. Results showed that cortical thickness of the middle occipital lobes and the angular gyri are major predictors of sex. Results also demonstrated the inferential benefits of going beyond classical regression approaches to capture the interactions among brain features in order to better characterize sex differences in male and female youths. We also identified specific cortical morphological measures and parcellation techniques, such as cortical thickness as derived from the Destrieux atlas, that are better able to discriminate between males and females in comparison to other brain atlases (Desikan-Killiany, Brodmann and subcortical atlases). Copyright © 2018 Elsevier Inc. All rights reserved.
Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.
Harrington, Peter de Boves
2018-01-02
Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.
Multivariate assessment of event-related potentials with the t-CWT method.
Bostanov, Vladimir
2015-11-05
Event-related brain potentials (ERPs) are usually assessed with univariate statistical tests although they are essentially multivariate objects. Brain-computer interface applications are a notable exception to this practice, because they are based on multivariate classification of single-trial ERPs. Multivariate ERP assessment can be facilitated by feature extraction methods. One such method is t-CWT, a mathematical-statistical algorithm based on the continuous wavelet transform (CWT) and Student's t-test. This article begins with a geometric primer on some basic concepts of multivariate statistics as applied to ERP assessment in general and to the t-CWT method in particular. Further, it presents for the first time a detailed, step-by-step, formal mathematical description of the t-CWT algorithm. A new multivariate outlier rejection procedure based on principal component analysis in the frequency domain is presented as an important pre-processing step. The MATLAB and GNU Octave implementation of t-CWT is also made publicly available for the first time as free and open source code. The method is demonstrated on some example ERP data obtained in a passive oddball paradigm. Finally, some conceptually novel applications of the multivariate approach in general and of the t-CWT method in particular are suggested and discussed. Hopefully, the publication of both the t-CWT source code and its underlying mathematical algorithm along with a didactic geometric introduction to some basic concepts of multivariate statistics would make t-CWT more accessible to both users and developers in the field of neuroscience research.
Instrumental Neutron Activation Analysis and Multivariate Statistics for Pottery Provenance
NASA Astrophysics Data System (ADS)
Glascock, M. D.; Neff, H.; Vaughn, K. J.
2004-06-01
The application of instrumental neutron activation analysis and multivariate statistics to archaeological studies of ceramics and clays is described. A small pottery data set from the Nasca culture in southern Peru is presented for illustration.
Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan
2017-12-01
Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis (PCA) and cluster analysis (CA). Copyright © 2017 Elsevier Ltd. All rights reserved.
Characterizing multivariate decoding models based on correlated EEG spectral features
McFarland, Dennis J.
2013-01-01
Objective Multivariate decoding methods are popular techniques for analysis of neurophysiological data. The present study explored potential interpretative problems with these techniques when predictors are correlated. Methods Data from sensorimotor rhythm-based cursor control experiments was analyzed offline with linear univariate and multivariate models. Features were derived from autoregressive (AR) spectral analysis of varying model order which produced predictors that varied in their degree of correlation (i.e., multicollinearity). Results The use of multivariate regression models resulted in much better prediction of target position as compared to univariate regression models. However, with lower order AR features interpretation of the spectral patterns of the weights was difficult. This is likely to be due to the high degree of multicollinearity present with lower order AR features. Conclusions Care should be exercised when interpreting the pattern of weights of multivariate models with correlated predictors. Comparison with univariate statistics is advisable. Significance While multivariate decoding algorithms are very useful for prediction their utility for interpretation may be limited when predictors are correlated. PMID:23466267
Ensembles of radial basis function networks for spectroscopic detection of cervical precancer
NASA Technical Reports Server (NTRS)
Tumer, K.; Ramanujam, N.; Ghosh, J.; Richards-Kortum, R.
1998-01-01
The mortality related to cervical cancer can be substantially reduced through early detection and treatment. However, current detection techniques, such as Pap smear and colposcopy, fail to achieve a concurrently high sensitivity and specificity. In vivo fluorescence spectroscopy is a technique which quickly, noninvasively and quantitatively probes the biochemical and morphological changes that occur in precancerous tissue. A multivariate statistical algorithm was used to extract clinically useful information from tissue spectra acquired from 361 cervical sites from 95 patients at 337-, 380-, and 460-nm excitation wavelengths. The multivariate statistical analysis was also employed to reduce the number of fluorescence excitation-emission wavelength pairs required to discriminate healthy tissue samples from precancerous tissue samples. The use of connectionist methods such as multilayered perceptrons, radial basis function (RBF) networks, and ensembles of such networks was investigated. RBF ensemble algorithms based on fluorescence spectra potentially provide automated and near real-time implementation of precancer detection in the hands of nonexperts. The results are more reliable, direct, and accurate than those achieved by either human experts or multivariate statistical algorithms.
SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *
Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.
2014-01-01
The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844
Evaluation of statistical protocols for quality control of ecosystem carbon dioxide fluxes
Jorge F. Perez-Quezada; Nicanor Z. Saliendra; William E. Emmerich; Emilio A. Laca
2007-01-01
The process of quality control of micrometeorological and carbon dioxide (CO2) flux data can be subjective and may lack repeatability, which would undermine the results of many studies. Multivariate statistical methods and time series analysis were used together and independently to detect and replace outliers in CO2 flux...
Li, Jinling; He, Ming; Han, Wei; Gu, Yifan
2009-05-30
An investigation on heavy metal sources, i.e., Cu, Zn, Ni, Pb, Cr, and Cd in the coastal soils of Shanghai, China, was conducted using multivariate statistical methods (principal component analysis, clustering analysis, and correlation analysis). All the results of the multivariate analysis showed that: (i) Cu, Ni, Pb, and Cd had anthropogenic sources (e.g., overuse of chemical fertilizers and pesticides, industrial and municipal discharges, animal wastes, sewage irrigation, etc.); (ii) Zn and Cr were associated with parent materials and therefore had natural sources (e.g., the weathering process of parent materials and subsequent pedo-genesis due to the alluvial deposits). The effect of heavy metals in the soils was greatly affected by soil formation, atmospheric deposition, and human activities. These findings provided essential information on the possible sources of heavy metals, which would contribute to the monitoring and assessment process of agricultural soils in worldwide regions.
Badran, M; Morsy, R; Soliman, H; Elnimr, T
2016-01-01
The trace elements metabolism has been reported to possess specific roles in the pathogenesis and progress of diabetes mellitus. Due to the continuous increase in the population of patients with Type 2 diabetes (T2D), this study aims to assess the levels and inter-relationships of fast blood glucose (FBG) and serum trace elements in Type 2 diabetic patients. This study was conducted on 40 Egyptian Type 2 diabetic patients and 36 healthy volunteers (Hospital of Tanta University, Tanta, Egypt). The blood serum was digested and then used to determine the levels of 24 trace elements using an inductive coupled plasma mass spectroscopy (ICP-MS). Multivariate statistical analysis depended on correlation coefficient, cluster analysis (CA) and principal component analysis (PCA), were used to analysis the data. The results exhibited significant changes in FBG and eight of trace elements, Zn, Cu, Se, Fe, Mn, Cr, Mg, and As, levels in the blood serum of Type 2 diabetic patients relative to those of healthy controls. The statistical analyses using multivariate statistical techniques were obvious in the reduction of the experimental variables, and grouping the trace elements in patients into three clusters. The application of PCA revealed a distinct difference in associations of trace elements and their clustering patterns in control and patients group in particular for Mg, Fe, Cu, and Zn that appeared to be the most crucial factors which related with Type 2 diabetes. Therefore, on the basis of this study, the contributors of trace elements content in Type 2 diabetic patients can be determine and specify with correlation relationship and multivariate statistical analysis, which confirm that the alteration of some essential trace metals may play a role in the development of diabetes mellitus. Copyright © 2015 Elsevier GmbH. All rights reserved.
2017-09-01
efficacy of statistical post-processing methods downstream of these dynamical model components with a hierarchical multivariate Bayesian approach to...Bayesian hierarchical modeling, Markov chain Monte Carlo methods , Metropolis algorithm, machine learning, atmospheric prediction 15. NUMBER OF PAGES...scale processes. However, this dissertation explores the efficacy of statistical post-processing methods downstream of these dynamical model components
The Statistical Consulting Center for Astronomy (SCCA)
NASA Technical Reports Server (NTRS)
Akritas, Michael
2001-01-01
The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to scientists both within and outside of astronomy. The most popular sections are multivariate techniques, image analysis, and time series analysis. Hundreds of copies of the ASURV, SLOPES and CENS-TAU codes developed by SCCA scientists were also downloaded from the StatCodes site. In addition to formal SCCA duties, SCCA scientists continued a variety of related activities in astrostatistics, including refereeing of statistically oriented papers submitted to the Astrophysical Journal, talks in meetings including Feigelson's talk to science journalists entitled "The reemergence of astrostatistics" at the American Association for the Advancement of Science meeting, and published papers of astrostatistical content.
The intervals method: a new approach to analyse finite element outputs using multivariate statistics
De Esteban-Trivigno, Soledad; Püschel, Thomas A.; Fortuny, Josep
2017-01-01
Background In this paper, we propose a new method, named the intervals’ method, to analyse data from finite element models in a comparative multivariate framework. As a case study, several armadillo mandibles are analysed, showing that the proposed method is useful to distinguish and characterise biomechanical differences related to diet/ecomorphology. Methods The intervals’ method consists of generating a set of variables, each one defined by an interval of stress values. Each variable is expressed as a percentage of the area of the mandible occupied by those stress values. Afterwards these newly generated variables can be analysed using multivariate methods. Results Applying this novel method to the biological case study of whether armadillo mandibles differ according to dietary groups, we show that the intervals’ method is a powerful tool to characterize biomechanical performance and how this relates to different diets. This allows us to positively discriminate between specialist and generalist species. Discussion We show that the proposed approach is a useful methodology not affected by the characteristics of the finite element mesh. Additionally, the positive discriminating results obtained when analysing a difficult case study suggest that the proposed method could be a very useful tool for comparative studies in finite element analysis using multivariate statistical approaches. PMID:29043107
USDA-ARS?s Scientific Manuscript database
Characterizing population genetic structure across geographic space is a fundamental challenge in population genetics. Multivariate statistical analyses are powerful tools for summarizing genetic variability, but geographic information and accompanying metadata is not always easily integrated into t...
Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Alejandro Q; Musolf, Anthony; Matise, Tara C; Finch, Stephen J; Gordon, Derek
2012-01-01
As with any new technology, next-generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to those data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single-variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p value, no matter how many loci. Copyright © 2013 S. Karger AG, Basel.
Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Andrew; Musolf, Anthony; Matise, Tara C.; Finch, Stephen J.; Gordon, Derek
2013-01-01
As with any new technology, next generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model, based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to that data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p-value, no matter how many loci. PMID:23594495
NASA Astrophysics Data System (ADS)
Theodorakou, Chrysoula; Farquharson, Michael J.
2009-08-01
The motivation behind this study is to assess whether angular dispersive x-ray diffraction (ADXRD) data, processed using multivariate analysis techniques, can be used for classifying secondary colorectal liver cancer tissue and normal surrounding liver tissue in human liver biopsy samples. The ADXRD profiles from a total of 60 samples of normal liver tissue and colorectal liver metastases were measured using a synchrotron radiation source. The data were analysed for 56 samples using nonlinear peak-fitting software. Four peaks were fitted to all of the ADXRD profiles, and the amplitude, area, amplitude and area ratios for three of the four peaks were calculated and used for the statistical and multivariate analysis. The statistical analysis showed that there are significant differences between all the peak-fitting parameters and ratios between the normal and the diseased tissue groups. The technique of soft independent modelling of class analogy (SIMCA) was used to classify normal liver tissue and colorectal liver metastases resulting in 67% of the normal tissue samples and 60% of the secondary colorectal liver tissue samples being classified correctly. This study has shown that the ADXRD data of normal and secondary colorectal liver cancer are statistically different and x-ray diffraction data analysed using multivariate analysis have the potential to be used as a method of tissue classification.
Multivariate Strategies in Functional Magnetic Resonance Imaging
ERIC Educational Resources Information Center
Hansen, Lars Kai
2007-01-01
We discuss aspects of multivariate fMRI modeling, including the statistical evaluation of multivariate models and means for dimensional reduction. In a case study we analyze linear and non-linear dimensional reduction tools in the context of a "mind reading" predictive multivariate fMRI model.
A Civilian/Military Trauma Institute: National Trauma Coordinating Center
2015-12-01
zip codes was used in “proximity to violence” analysis. Data were analyzed using SPSS (version 20.0, SPSS Inc., Chicago, IL). Multivariable linear...number of adverse events and serious events was not statistically higher in one group, the incidence of deep venous thrombosis (DVT) was statistically ...subjects the lack of statistical difference on multivariate analysis may be related to an underpowered sample size. It was recommended that the
A new test of multivariate nonlinear causality
Bai, Zhidong; Jiang, Dandan; Lv, Zhihui; Wong, Wing-Keung; Zheng, Shurong
2018-01-01
The multivariate nonlinear Granger causality developed by Bai et al. (2010) (Mathematics and Computers in simulation. 2010; 81: 5-17) plays an important role in detecting the dynamic interrelationships between two groups of variables. Following the idea of Hiemstra-Jones (HJ) test proposed by Hiemstra and Jones (1994) (Journal of Finance. 1994; 49(5): 1639-1664), they attempt to establish a central limit theorem (CLT) of their test statistic by applying the asymptotical property of multivariate U-statistic. However, Bai et al. (2016) (2016; arXiv: 1701.03992) revisit the HJ test and find that the test statistic given by HJ is NOT a function of U-statistics which implies that the CLT neither proposed by Hiemstra and Jones (1994) nor the one extended by Bai et al. (2010) is valid for statistical inference. In this paper, we re-estimate the probabilities and reestablish the CLT of the new test statistic. Numerical simulation shows that our new estimates are consistent and our new test performs decent size and power. PMID:29304085
A new test of multivariate nonlinear causality.
Bai, Zhidong; Hui, Yongchang; Jiang, Dandan; Lv, Zhihui; Wong, Wing-Keung; Zheng, Shurong
2018-01-01
The multivariate nonlinear Granger causality developed by Bai et al. (2010) (Mathematics and Computers in simulation. 2010; 81: 5-17) plays an important role in detecting the dynamic interrelationships between two groups of variables. Following the idea of Hiemstra-Jones (HJ) test proposed by Hiemstra and Jones (1994) (Journal of Finance. 1994; 49(5): 1639-1664), they attempt to establish a central limit theorem (CLT) of their test statistic by applying the asymptotical property of multivariate U-statistic. However, Bai et al. (2016) (2016; arXiv: 1701.03992) revisit the HJ test and find that the test statistic given by HJ is NOT a function of U-statistics which implies that the CLT neither proposed by Hiemstra and Jones (1994) nor the one extended by Bai et al. (2010) is valid for statistical inference. In this paper, we re-estimate the probabilities and reestablish the CLT of the new test statistic. Numerical simulation shows that our new estimates are consistent and our new test performs decent size and power.
Applied Statistics: From Bivariate through Multivariate Techniques [with CD-ROM
ERIC Educational Resources Information Center
Warner, Rebecca M.
2007-01-01
This book provides a clear introduction to widely used topics in bivariate and multivariate statistics, including multiple regression, discriminant analysis, MANOVA, factor analysis, and binary logistic regression. The approach is applied and does not require formal mathematics; equations are accompanied by verbal explanations. Students are asked…
NASA Astrophysics Data System (ADS)
Guimarães Nobre, Gabriela; Arnbjerg-Nielsen, Karsten; Rosbjerg, Dan; Madsen, Henrik
2016-04-01
Traditionally, flood risk assessment studies have been carried out from a univariate frequency analysis perspective. However, statistical dependence between hydrological variables, such as extreme rainfall and extreme sea surge, is plausible to exist, since both variables to some extent are driven by common meteorological conditions. Aiming to overcome this limitation, multivariate statistical techniques has the potential to combine different sources of flooding in the investigation. The aim of this study was to apply a range of statistical methodologies for analyzing combined extreme hydrological variables that can lead to coastal and urban flooding. The study area is the Elwood Catchment, which is a highly urbanized catchment located in the city of Port Phillip, Melbourne, Australia. The first part of the investigation dealt with the marginal extreme value distributions. Two approaches to extract extreme value series were applied (Annual Maximum and Partial Duration Series), and different probability distribution functions were fit to the observed sample. Results obtained by using the Generalized Pareto distribution demonstrate the ability of the Pareto family to model the extreme events. Advancing into multivariate extreme value analysis, first an investigation regarding the asymptotic properties of extremal dependence was carried out. As a weak positive asymptotic dependence between the bivariate extreme pairs was found, the Conditional method proposed by Heffernan and Tawn (2004) was chosen. This approach is suitable to model bivariate extreme values, which are relatively unlikely to occur together. The results show that the probability of an extreme sea surge occurring during a one-hour intensity extreme precipitation event (or vice versa) can be twice as great as what would occur when assuming independent events. Therefore, presuming independence between these two variables would result in severe underestimation of the flooding risk in the study area.
NASA Astrophysics Data System (ADS)
Lee, An-Sheng; Lu, Wei-Li; Huang, Jyh-Jaan; Chang, Queenie; Wei, Kuo-Yen; Lin, Chin-Jung; Liou, Sofia Ya Hsuan
2016-04-01
Through the geology and climate characteristic in Taiwan, generally rivers carry a lot of suspended particles. After these particles settled, they become sediments which are good sorbent for heavy metals in river system. Consequently, sediments can be found recording contamination footprint at low flow energy region, such as estuary. Seven sediment cores were collected along Nankan River, northern Taiwan, which is seriously contaminated by factory, household and agriculture input. Physico-chemical properties of these cores were derived from Itrax-XRF Core Scanner and grain size analysis. In order to interpret these complex data matrices, the multivariate statistical techniques (cluster analysis, factor analysis and discriminant analysis) were introduced to this study. Through the statistical determination, the result indicates four types of sediment. One of them represents contamination event which shows high concentration of Cu, Zn, Pb, Ni and Fe, and low concentration of Si and Zr. Furthermore, three possible contamination sources of this type of sediment were revealed by Factor Analysis. The combination of sediment analysis and multivariate statistical techniques used provides new insights into the contamination depositional history of Nankan River and could be similarly applied to other river systems to determine the scale of anthropogenic contamination.
Giordano, Bruno L.; Kayser, Christoph; Rousselet, Guillaume A.; Gross, Joachim; Schyns, Philippe G.
2016-01-01
Abstract We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open‐source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541–1573, 2017. © 2016 Wiley Periodicals, Inc. PMID:27860095
NASA Astrophysics Data System (ADS)
Schwartz, Craig R.; Thelen, Brian J.; Kenton, Arthur C.
1995-06-01
A statistical parametric multispectral sensor performance model was developed by ERIM to support mine field detection studies, multispectral sensor design/performance trade-off studies, and target detection algorithm development. The model assumes target detection algorithms and their performance models which are based on data assumed to obey multivariate Gaussian probability distribution functions (PDFs). The applicability of these algorithms and performance models can be generalized to data having non-Gaussian PDFs through the use of transforms which convert non-Gaussian data to Gaussian (or near-Gaussian) data. An example of one such transform is the Box-Cox power law transform. In practice, such a transform can be applied to non-Gaussian data prior to the introduction of a detection algorithm that is formally based on the assumption of multivariate Gaussian data. This paper presents an extension of these techniques to the case where the joint multivariate probability density function of the non-Gaussian input data is known, and where the joint estimate of the multivariate Gaussian statistics, under the Box-Cox transform, is desired. The jointly estimated multivariate Gaussian statistics can then be used to predict the performance of a target detection algorithm which has an associated Gaussian performance model.
Prognostic Significance of POLE Proofreading Mutations in Endometrial Cancer
Church, David N.; Stelloo, Ellen; Nout, Remi A.; Valtcheva, Nadejda; Depreeuw, Jeroen; ter Haar, Natalja; Noske, Aurelia; Amant, Frederic; Wild, Peter J.; Lambrechts, Diether; Jürgenliemk-Schulz, Ina M.; Jobsen, Jan J.; Smit, Vincent T. H. B. M.; Creutzberg, Carien L.; Bosse, Tjalling
2015-01-01
Background: Current risk stratification in endometrial cancer (EC) results in frequent over- and underuse of adjuvant therapy, and may be improved by novel biomarkers. We examined whether POLE proofreading mutations, recently reported in about 7% of ECs, predict prognosis. Methods: We performed targeted POLE sequencing in ECs from the PORTEC-1 and -2 trials (n = 788), and analyzed clinical outcome according to POLE status. We combined these results with those from three additional series (n = 628) by meta-analysis to generate multivariable-adjusted, pooled hazard ratios (HRs) for recurrence-free survival (RFS) and cancer-specific survival (CSS) of POLE-mutant ECs. All statistical tests were two-sided. Results: POLE mutations were detected in 48 of 788 (6.1%) ECs from PORTEC-1 and-2 and were associated with high tumor grade (P < .001). Women with POLE-mutant ECs had fewer recurrences (6.2% vs 14.1%) and EC deaths (2.3% vs 9.7%), though, in the total PORTEC cohort, differences in RFS and CSS were not statistically significant (multivariable-adjusted HR = 0.43, 95% CI = 0.13 to 1.37, P = .15; HR = 0.19, 95% CI = 0.03 to 1.44, P = .11 respectively). However, of 109 grade 3 tumors, 0 of 15 POLE-mutant ECs recurred, compared with 29 of 94 (30.9%) POLE wild-type cancers; reflected in statistically significantly greater RFS (multivariable-adjusted HR = 0.11, 95% CI = 0.001 to 0.84, P = .03). In the additional series, there were no EC-related events in any of 33 POLE-mutant ECs, resulting in a multivariable-adjusted, pooled HR of 0.33 for RFS (95% CI = 0.12 to 0.91, P = .03) and 0.26 for CSS (95% CI = 0.06 to 1.08, P = .06). Conclusion: POLE proofreading mutations predict favorable EC prognosis, independently of other clinicopathological variables, with the greatest effect seen in high-grade tumors. This novel biomarker may help to reduce overtreatment in EC. PMID:25505230
NASA Astrophysics Data System (ADS)
Ye, M.; Pacheco Castro, R. B.; Pacheco Avila, J.; Cabrera Sansores, A.
2014-12-01
The karstic aquifer of Yucatan is a vulnerable and complex system. The first fifteen meters of this aquifer have been polluted, due to this the protection of this resource is important because is the only source of potable water of the entire State. Through the assessment of groundwater quality we can gain some knowledge about the main processes governing water chemistry as well as spatial patterns which are important to establish protection zones. In this work multivariate statistical techniques are used to assess the groundwater quality of the supply wells (30 to 40 meters deep) in the hidrogeologic region of the Ring of Cenotes, located in Yucatan, Mexico. Cluster analysis and principal component analysis are applied in groundwater chemistry data of the study area. Results of principal component analysis show that the main sources of variation in the data are due sea water intrusion and the interaction of the water with the carbonate rocks of the system and some pollution processes. The cluster analysis shows that the data can be divided in four clusters. The spatial distribution of the clusters seems to be random, but is consistent with sea water intrusion and pollution with nitrates. The overall results show that multivariate statistical analysis can be successfully applied in the groundwater quality assessment of this karstic aquifer.
NASA Technical Reports Server (NTRS)
Park, Steve
1990-01-01
A large and diverse number of computational techniques are routinely used to process and analyze remotely sensed data. These techniques include: univariate statistics; multivariate statistics; principal component analysis; pattern recognition and classification; other multivariate techniques; geometric correction; registration and resampling; radiometric correction; enhancement; restoration; Fourier analysis; and filtering. Each of these techniques will be considered, in order.
Xu, Min; Zhang, Lei; Yue, Hong-Shui; Pang, Hong-Wei; Ye, Zheng-Liang; Ding, Li
2017-10-01
To establish an on-line monitoring method for extraction process of Schisandrae Chinensis Fructus, the formula medicinal material of Yiqi Fumai lyophilized injection by combining near infrared spectroscopy with multi-variable data analysis technology. The multivariate statistical process control (MSPC) model was established based on 5 normal batches in production and 2 test batches were monitored by PC scores, DModX and Hotelling T2 control charts. The results showed that MSPC model had a good monitoring ability for the extraction process. The application of the MSPC model to actual production process could effectively achieve on-line monitoring for extraction process of Schisandrae Chinensis Fructus, and can reflect the change of material properties in the production process in real time. This established process monitoring method could provide reference for the application of process analysis technology in the process quality control of traditional Chinese medicine injections. Copyright© by the Chinese Pharmaceutical Association.
Yang, James J; Williams, L Keoki; Buu, Anne
2017-08-24
A multivariate genome-wide association test is proposed for analyzing data on multivariate quantitative phenotypes collected from related subjects. The proposed method is a two-step approach. The first step models the association between the genotype and marginal phenotype using a linear mixed model. The second step uses the correlation between residuals of the linear mixed model to estimate the null distribution of the Fisher combination test statistic. The simulation results show that the proposed method controls the type I error rate and is more powerful than the marginal tests across different population structures (admixed or non-admixed) and relatedness (related or independent). The statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that applying the multivariate association test may facilitate identification of the pleiotropic genes contributing to the risk for alcohol dependence commonly expressed by four correlated phenotypes. This study proposes a multivariate method for identifying pleiotropic genes while adjusting for cryptic relatedness and population structure between subjects. The two-step approach is not only powerful but also computationally efficient even when the number of subjects and the number of phenotypes are both very large.
Keenan, Michael R; Smentkowski, Vincent S; Ulfig, Robert M; Oltman, Edward; Larson, David J; Kelly, Thomas F
2011-06-01
We demonstrate for the first time that multivariate statistical analysis techniques can be applied to atom probe tomography data to estimate the chemical composition of a sample at the full spatial resolution of the atom probe in three dimensions. Whereas the raw atom probe data provide the specific identity of an atom at a precise location, the multivariate results can be interpreted in terms of the probabilities that an atom representing a particular chemical phase is situated there. When aggregated to the size scale of a single atom (∼0.2 nm), atom probe spectral-image datasets are huge and extremely sparse. In fact, the average spectrum will have somewhat less than one total count per spectrum due to imperfect detection efficiency. These conditions, under which the variance in the data is completely dominated by counting noise, test the limits of multivariate analysis, and an extensive discussion of how to extract the chemical information is presented. Efficient numerical approaches to performing principal component analysis (PCA) on these datasets, which may number hundreds of millions of individual spectra, are put forward, and it is shown that PCA can be computed in a few seconds on a typical laptop computer.
The Effect of the Multivariate Box-Cox Transformation on the Power of MANOVA.
ERIC Educational Resources Information Center
Kirisci, Levent; Hsu, Tse-Chi
Most of the multivariate statistical techniques rely on the assumption of multivariate normality. The effects of non-normality on multivariate tests are assumed to be negligible when variance-covariance matrices and sample sizes are equal. Therefore, in practice, investigators do not usually attempt to remove non-normality. In this simulation…
Ringham, Brandy M; Kreidler, Sarah M; Muller, Keith E; Glueck, Deborah H
2016-07-30
Multilevel and longitudinal studies are frequently subject to missing data. For example, biomarker studies for oral cancer may involve multiple assays for each participant. Assays may fail, resulting in missing data values that can be assumed to be missing completely at random. Catellier and Muller proposed a data analytic technique to account for data missing at random in multilevel and longitudinal studies. They suggested modifying the degrees of freedom for both the Hotelling-Lawley trace F statistic and its null case reference distribution. We propose parallel adjustments to approximate power for this multivariate test in studies with missing data. The power approximations use a modified non-central F statistic, which is a function of (i) the expected number of complete cases, (ii) the expected number of non-missing pairs of responses, or (iii) the trimmed sample size, which is the planned sample size reduced by the anticipated proportion of missing data. The accuracy of the method is assessed by comparing the theoretical results to the Monte Carlo simulated power for the Catellier and Muller multivariate test. Over all experimental conditions, the closest approximation to the empirical power of the Catellier and Muller multivariate test is obtained by adjusting power calculations with the expected number of complete cases. The utility of the method is demonstrated with a multivariate power analysis for a hypothetical oral cancer biomarkers study. We describe how to implement the method using standard, commercially available software products and give example code. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Valder, J.; Kenner, S.; Long, A.
2008-12-01
Portions of the Cheyenne River are characterized as impaired by the U.S. Environmental Protection Agency because of water-quality exceedences. The Cheyenne River watershed includes the Black Hills National Forest and part of the Badlands National Park. Preliminary analysis indicates that the Badlands National Park is a major contributor to the exceedances of the water-quality constituents for total dissolved solids and total suspended solids. Water-quality data have been collected continuously since 2007, and in the second year of collection (2008), monthly grab and passive sediment samplers are being used to collect total suspended sediment and total dissolved solids in both base-flow and runoff-event conditions. In addition, sediment samples from the river channel, including bed, bank, and floodplain, have been collected. These samples are being analyzed at the South Dakota School of Mines and Technology's X-Ray Diffraction Lab to quantify the mineralogy of the sediments. A multivariate statistical approach (including principal components, least squares, and maximum likelihood techniques) is applied to the mineral percentages that were characterized for each site to identify the contributing source areas that are causing exceedances of sediment transport in the Cheyenne River watershed. Results of the multivariate analysis demonstrate the likely sources of solids found in the Cheyenne River samples. A further refinement of the methods is in progress that utilizes a conceptual model which, when applied with the multivariate statistical approach, provides a better estimate for sediment sources.
Multivariate Regression Analysis and Slaughter Livestock,
AGRICULTURE, *ECONOMICS), (*MEAT, PRODUCTION), MULTIVARIATE ANALYSIS, REGRESSION ANALYSIS , ANIMALS, WEIGHT, COSTS, PREDICTIONS, STABILITY, MATHEMATICAL MODELS, STORAGE, BEEF, PORK, FOOD, STATISTICAL DATA, ACCURACY
Statistical analysis of multivariate atmospheric variables. [cloud cover
NASA Technical Reports Server (NTRS)
Tubbs, J. D.
1979-01-01
Topics covered include: (1) estimation in discrete multivariate distributions; (2) a procedure to predict cloud cover frequencies in the bivariate case; (3) a program to compute conditional bivariate normal parameters; (4) the transformation of nonnormal multivariate to near-normal; (5) test of fit for the extreme value distribution based upon the generalized minimum chi-square; (6) test of fit for continuous distributions based upon the generalized minimum chi-square; (7) effect of correlated observations on confidence sets based upon chi-square statistics; and (8) generation of random variates from specified distributions.
NASA Astrophysics Data System (ADS)
Yan, Ying; Zhang, Shen; Tang, Jinjun; Wang, Xiaofei
2017-07-01
Discovering dynamic characteristics in traffic flow is the significant step to design effective traffic managing and controlling strategy for relieving traffic congestion in urban cities. A new method based on complex network theory is proposed to study multivariate traffic flow time series. The data were collected from loop detectors on freeway during a year. In order to construct complex network from original traffic flow, a weighted Froenius norm is adopt to estimate similarity between multivariate time series, and Principal Component Analysis is implemented to determine the weights. We discuss how to select optimal critical threshold for networks at different hour in term of cumulative probability distribution of degree. Furthermore, two statistical properties of networks: normalized network structure entropy and cumulative probability of degree, are utilized to explore hourly variation in traffic flow. The results demonstrate these two statistical quantities express similar pattern to traffic flow parameters with morning and evening peak hours. Accordingly, we detect three traffic states: trough, peak and transitional hours, according to the correlation between two aforementioned properties. The classifying results of states can actually represent hourly fluctuation in traffic flow by analyzing annual average hourly values of traffic volume, occupancy and speed in corresponding hours.
USDA-ARS?s Scientific Manuscript database
The mixed linear model (MLM) is currently among the most advanced and flexible statistical modeling techniques and its use in tackling problems in plant pathology has begun surfacing in the literature. The longitudinal MLM is a multivariate extension that handles repeatedly measured data, such as r...
ERIC Educational Resources Information Center
Martin, James L.
This paper reports on attempts by the author to construct a theoretical framework of adult education participation using a theory development process and the corresponding multivariate statistical techniques. Two problems are identified: the lack of theoretical framework in studying problems, and the limiting of statistical analysis to univariate…
MIDAS: Regionally linear multivariate discriminative statistical mapping.
Varol, Erdem; Sotiras, Aristeidis; Davatzikos, Christos
2018-07-01
Statistical parametric maps formed via voxel-wise mass-univariate tests, such as the general linear model, are commonly used to test hypotheses about regionally specific effects in neuroimaging cross-sectional studies where each subject is represented by a single image. Despite being informative, these techniques remain limited as they ignore multivariate relationships in the data. Most importantly, the commonly employed local Gaussian smoothing, which is important for accounting for registration errors and making the data follow Gaussian distributions, is usually chosen in an ad hoc fashion. Thus, it is often suboptimal for the task of detecting group differences and correlations with non-imaging variables. Information mapping techniques, such as searchlight, which use pattern classifiers to exploit multivariate information and obtain more powerful statistical maps, have become increasingly popular in recent years. However, existing methods may lead to important interpretation errors in practice (i.e., misidentifying a cluster as informative, or failing to detect truly informative voxels), while often being computationally expensive. To address these issues, we introduce a novel efficient multivariate statistical framework for cross-sectional studies, termed MIDAS, seeking highly sensitive and specific voxel-wise brain maps, while leveraging the power of regional discriminant analysis. In MIDAS, locally linear discriminative learning is applied to estimate the pattern that best discriminates between two groups, or predicts a variable of interest. This pattern is equivalent to local filtering by an optimal kernel whose coefficients are the weights of the linear discriminant. By composing information from all neighborhoods that contain a given voxel, MIDAS produces a statistic that collectively reflects the contribution of the voxel to the regional classifiers as well as the discriminative power of the classifiers. Critically, MIDAS efficiently assesses the statistical significance of the derived statistic by analytically approximating its null distribution without the need for computationally expensive permutation tests. The proposed framework was extensively validated using simulated atrophy in structural magnetic resonance imaging (MRI) and further tested using data from a task-based functional MRI study as well as a structural MRI study of cognitive performance. The performance of the proposed framework was evaluated against standard voxel-wise general linear models and other information mapping methods. The experimental results showed that MIDAS achieves relatively higher sensitivity and specificity in detecting group differences. Together, our results demonstrate the potential of the proposed approach to efficiently map effects of interest in both structural and functional data. Copyright © 2018. Published by Elsevier Inc.
NASA Astrophysics Data System (ADS)
Ngan Nguyen, Thi To; Liu, Cheng-Chien
2013-04-01
How landslides occurred and which factors triggered and sped up landslide occurrences were usually asked by researchers in the past decades. Many investigations carried out in many places in the world to finding out methods that predict and prevent damages from landslides phenomena. Chen-Yu-Lan River watershed is reputed as a 'hot pot' of landslide researches in Taiwan by its complicated geological structures with the significant tectonic fault systems and steeply mountainous terrain. Beside annual high precipitation concentration and the abrupt slopes, some natural disaster, as typhoons (Sinlaku-2008, Kalmaegi-2008, and Marakot-2009) and earthquake (Chi-Chi earthquake-1999) are also the triggered factors cause landslides with serious damages in this place. This research expresses the quantitative approaches to generate landslide susceptible map for Chen-Yu-Lan watershed, a mountainous area in the central Taiwan. Landslide inventories data, which were detected from the Formosat-2 imageries for eight years from 2004 to 2011, were applied to carry out landslide susceptibility mapping. Bivariate statistics analysis and multivariate statistics analysis would be applied to calculate susceptible index of landslides. The weights of parameters were computed based on landslide data for eight years from 2004 to 2011. To validate effective levels of factors to landslide occurrences, this method built some multivariate algorithms and compared these results with real landslide occurrences. Besides this method, the historical data of landslides were also used to assess and classify landslide susceptibility levels. From long-term landslide data, relation between landslide susceptibility levels and landslide repetition was assigned. The results demonstrated differently effective levels of potential factors, such as, slope gradient, drainage density, lithology and land use to landslide phenomena. The results also showed logical relationship between weights and characteristics of factors' classes. Depending on these results be able to help planning managers localize the high risk areas of landslide or safely areas by building and human activities.
Nojima, Masanori; Tokunaga, Mutsumi; Nagamura, Fumitaka
2018-05-05
To investigate under what circumstances inappropriate use of 'multivariate analysis' is likely to occur and to identify the population that needs more support with medical statistics. The frequency of inappropriate regression model construction in multivariate analysis and related factors were investigated in observational medical research publications. The inappropriate algorithm of using only variables that were significant in univariate analysis was estimated to occur at 6.4% (95% CI 4.8% to 8.5%). This was observed in 1.1% of the publications with a medical statistics expert (hereinafter 'expert') as the first author, 3.5% if an expert was included as coauthor and in 12.2% if experts were not involved. In the publications where the number of cases was 50 or less and the study did not include experts, inappropriate algorithm usage was observed with a high proportion of 20.2%. The OR of the involvement of experts for this outcome was 0.28 (95% CI 0.15 to 0.53). A further, nation-level, analysis showed that the involvement of experts and the implementation of unfavourable multivariate analysis are associated at the nation-level analysis (R=-0.652). Based on the results of this study, the benefit of participation of medical statistics experts is obvious. Experts should be involved for proper confounding adjustment and interpretation of statistical models. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Ince, Robin A A; Giordano, Bruno L; Kayser, Christoph; Rousselet, Guillaume A; Gross, Joachim; Schyns, Philippe G
2017-03-01
We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open-source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541-1573, 2017. © 2016 Wiley Periodicals, Inc. 2016 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.
A multivariate model and statistical method for validating tree grade lumber yield equations
Donald W. Seegrist
1975-01-01
Lumber yields within lumber grades can be described by a multivariate linear model. A method for validating lumber yield prediction equations when there are several tree grades is presented. The method is based on multivariate simultaneous test procedures.
TATES: Efficient Multivariate Genotype-Phenotype Analysis for Genome-Wide Association Studies
van der Sluis, Sophie; Posthuma, Danielle; Dolan, Conor V.
2013-01-01
To date, the genome-wide association study (GWAS) is the primary tool to identify genetic variants that cause phenotypic variation. As GWAS analyses are generally univariate in nature, multivariate phenotypic information is usually reduced to a single composite score. This practice often results in loss of statistical power to detect causal variants. Multivariate genotype–phenotype methods do exist but attain maximal power only in special circumstances. Here, we present a new multivariate method that we refer to as TATES (Trait-based Association Test that uses Extended Simes procedure), inspired by the GATES procedure proposed by Li et al (2011). For each component of a multivariate trait, TATES combines p-values obtained in standard univariate GWAS to acquire one trait-based p-value, while correcting for correlations between components. Extensive simulations, probing a wide variety of genotype–phenotype models, show that TATES's false positive rate is correct, and that TATES's statistical power to detect causal variants explaining 0.5% of the variance can be 2.5–9 times higher than the power of univariate tests based on composite scores and 1.5–2 times higher than the power of the standard MANOVA. Unlike other multivariate methods, TATES detects both genetic variants that are common to multiple phenotypes and genetic variants that are specific to a single phenotype, i.e. TATES provides a more complete view of the genetic architecture of complex traits. As the actual causal genotype–phenotype model is usually unknown and probably phenotypically and genetically complex, TATES, available as an open source program, constitutes a powerful new multivariate strategy that allows researchers to identify novel causal variants, while the complexity of traits is no longer a limiting factor. PMID:23359524
Texture as a basis for acoustic classification of substrate in the nearshore region
NASA Astrophysics Data System (ADS)
Dennison, A.; Wattrus, N. J.
2016-12-01
Segmentation and classification of substrate type from two locations in Lake Superior, are predicted using multivariate statistical processing of textural measures derived from shallow-water, high-resolution multibeam bathymetric data. During a multibeam sonar survey, both bathymetric and backscatter data are collected. It is well documented that the statistical characteristic of a sonar backscatter mosaic is dependent on substrate type. While classifying the bottom-type on the basis on backscatter alone can accurately predict and map bottom-type, it lacks the ability to resolve and capture fine textural details, an important factor in many habitat mapping studies. Statistical processing can capture the pertinent details about the bottom-type that are rich in textural information. Further multivariate statistical processing can then isolate characteristic features, and provide the basis for an accurate classification scheme. Preliminary results from an analysis of bathymetric data and ground-truth samples collected from the Amnicon River, Superior, Wisconsin, and the Lester River, Duluth, Minnesota, demonstrate the ability to process and develop a novel classification scheme of the bottom type in two geomorphologically distinct areas.
Paixão, Paulo; Gouveia, Luís F; Silva, Nuno; Morais, José A G
2017-03-01
A simulation study is presented, evaluating the performance of the f 2 , the model-independent multivariate statistical distance and the f 2 bootstrap methods in the ability to conclude similarity between two dissolution profiles. Different dissolution profiles, based on the Noyes-Whitney equation and ranging from theoretical f 2 values between 100 and 40, were simulated. Variability was introduced in the dissolution model parameters in an increasing order, ranging from a situation complying with the European guidelines requirements for the use of the f 2 metric to several situations where the f 2 metric could not be used anymore. Results have shown that the f 2 is an acceptable metric when used according to the regulatory requirements, but loses its applicability when variability increases. The multivariate statistical distance presented contradictory results in several of the simulation scenarios, which makes it an unreliable metric for dissolution profile comparisons. The bootstrap f 2 , although conservative in its conclusions is an alternative suitable method. Overall, as variability increases, all of the discussed methods reveal problems that can only be solved by increasing the number of dosage form units used in the comparison, which is usually not practical or feasible. Additionally, experimental corrective measures may be undertaken in order to reduce the overall variability, particularly when it is shown that it is mainly due to the dissolution assessment instead of being intrinsic to the dosage form. Copyright © 2016. Published by Elsevier B.V.
Application of multivariate statistical techniques in microbial ecology
Paliy, O.; Shankar, V.
2016-01-01
Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large scale ecological datasets. Especially noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions, and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amounts of data, powerful statistical techniques of multivariate analysis are well suited to analyze and interpret these datasets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular dataset. In this review we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive, and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and dataset structure. PMID:26786791
Multivariate analysis in thoracic research.
Mengual-Macenlle, Noemí; Marcos, Pedro J; Golpe, Rafael; González-Rivas, Diego
2015-03-01
Multivariate analysis is based in observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. The development of multivariate methods emerged to analyze large databases and increasingly complex data. Since the best way to represent the knowledge of reality is the modeling, we should use multivariate statistical methods. Multivariate methods are designed to simultaneously analyze data sets, i.e., the analysis of different variables for each person or object studied. Keep in mind at all times that all variables must be treated accurately reflect the reality of the problem addressed. There are different types of multivariate analysis and each one should be employed according to the type of variables to analyze: dependent, interdependence and structural methods. In conclusion, multivariate methods are ideal for the analysis of large data sets and to find the cause and effect relationships between variables; there is a wide range of analysis types that we can use.
Avalappampatty Sivasamy, Aneetha; Sundan, Bose
2015-01-01
The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668
Sivasamy, Aneetha Avalappampatty; Sundan, Bose
2015-01-01
The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.
Multivariate analysis: A statistical approach for computations
NASA Astrophysics Data System (ADS)
Michu, Sachin; Kaushik, Vandana
2014-10-01
Multivariate analysis is a type of multivariate statistical approach commonly used in, automotive diagnosis, education evaluating clusters in finance etc and more recently in the health-related professions. The objective of the paper is to provide a detailed exploratory discussion about factor analysis (FA) in image retrieval method and correlation analysis (CA) of network traffic. Image retrieval methods aim to retrieve relevant images from a collected database, based on their content. The problem is made more difficult due to the high dimension of the variable space in which the images are represented. Multivariate correlation analysis proposes an anomaly detection and analysis method based on the correlation coefficient matrix. Anomaly behaviors in the network include the various attacks on the network like DDOs attacks and network scanning.
Processes and subdivisions in diogenites, a multivariate statistical analysis
NASA Technical Reports Server (NTRS)
Harriott, T. A.; Hewins, R. H.
1984-01-01
Multivariate statistical techniques used on diogenite orthopyroxene analyses show the relationships that occur within diogenites and the two orthopyroxenite components (class I and II) in the polymict diogenite Garland. Cluster analysis shows that only Peckelsheim is similar to Garland class I (Fe-rich) and the other diogenites resemble Garland class II. The unique diogenite Y 75032 may be related to type I by fractionation. Factor analysis confirms the subdivision and shows that Fe does not correlate with the weakly incompatible elements across the entire pyroxene composition range, indicating that igneous fractionation is not the process controlling total diogenite composition variation. The occurrence of two groups of diogenites is interpreted as the result of sampling or mixing of two main sequences of orthopyroxene cumulates with slightly different compositions.
Jack, John; Havener, Tammy M; McLeod, Howard L; Motsinger-Reif, Alison A; Foster, Matthew
2015-01-01
Aim: We investigate the role of ethnicity and admixture in drug response across a broad group of chemotherapeutic drugs. Also, we generate hypotheses on the genetic variants driving differential drug response through multivariate genome-wide association studies. Methods: Immortalized lymphoblastoid cell lines from 589 individuals (Hispanic or non-Hispanic/Caucasian) were used to investigate dose-response for 28 chemotherapeutic compounds. Univariate and multivariate statistical models were used to elucidate associations between genetic variants and differential drug response as well as the role of ethnicity in drug potency and efficacy. Results & Conclusion: For many drugs, the variability in drug response appears to correlate with self-reported race and estimates of genetic ancestry. Additionally, multivariate genome-wide association analyses offered interesting hypotheses governing these differential responses. PMID:26314407
Liu, Ya-Juan; André, Silvère; Saint Cristau, Lydia; Lagresle, Sylvain; Hannas, Zahia; Calvosa, Éric; Devos, Olivier; Duponchel, Ludovic
2017-02-01
Multivariate statistical process control (MSPC) is increasingly popular as the challenge provided by large multivariate datasets from analytical instruments such as Raman spectroscopy for the monitoring of complex cell cultures in the biopharmaceutical industry. However, Raman spectroscopy for in-line monitoring often produces unsynchronized data sets, resulting in time-varying batches. Moreover, unsynchronized data sets are common for cell culture monitoring because spectroscopic measurements are generally recorded in an alternate way, with more than one optical probe parallelly connecting to the same spectrometer. Synchronized batches are prerequisite for the application of multivariate analysis such as multi-way principal component analysis (MPCA) for the MSPC monitoring. Correlation optimized warping (COW) is a popular method for data alignment with satisfactory performance; however, it has never been applied to synchronize acquisition time of spectroscopic datasets in MSPC application before. In this paper we propose, for the first time, to use the method of COW to synchronize batches with varying durations analyzed with Raman spectroscopy. In a second step, we developed MPCA models at different time intervals based on the normal operation condition (NOC) batches synchronized by COW. New batches are finally projected considering the corresponding MPCA model. We monitored the evolution of the batches using two multivariate control charts based on Hotelling's T 2 and Q. As illustrated with results, the MSPC model was able to identify abnormal operation condition including contaminated batches which is of prime importance in cell culture monitoring We proved that Raman-based MSPC monitoring can be used to diagnose batches deviating from the normal condition, with higher efficacy than traditional diagnosis, which would save time and money in the biopharmaceutical industry. Copyright © 2016 Elsevier B.V. All rights reserved.
Tourkmani, Abdo Karim; Sánchez-Huerta, Valeria; De Wit, Guillermo; Martínez, Jaime D.; Mingo, David; Mahillo-Fernández, Ignacio; Jiménez-Alfaro, Ignacio
2017-01-01
AIM To analyze the relationship between the score obtained in the Risk Score System (RSS) proposed by Hicks et al with penetrating keratoplasty (PKP) graft failure at 1y postoperatively and among each factor in the RSS with the risk of PKP graft failure using univariate and multivariate analysis. METHODS The retrospective cohort study had 152 PKPs from 152 patients. Eighteen cases were excluded from our study due to primary failure (10 cases), incomplete medical notes (5 cases) and follow-up less than 1y (3 cases). We included 134 PKPs from 134 patients stratified by preoperative risk score. Spearman coefficient was calculated for the relationship between the score obtained and risk of failure at 1y. Univariate and multivariate analysis were calculated for the impact of every single risk factor included in the RSS over graft failure at 1y. RESULTS Spearman coefficient showed statistically significant correlation between the score in the RSS and graft failure (P<0.05). Multivariate logistic regression analysis showed no statistically significant relationship (P>0.05) between diagnosis and lens status with graft failure. The relationship between the other risk factors studied and graft failure was significant (P<0.05), although the results for previous grafts and graft failure was unreliable. None of our patients had previous blood transfusion, thus, it had no impact. CONCLUSION After the application of multivariate analysis techniques, some risk factors do not show the expected impact over graft failure at 1y. PMID:28393027
Use of Multivariate Linkage Analysis for Dissection of a Complex Cognitive Trait
Marlow, Angela J.; Fisher, Simon E.; Francks, Clyde; MacPhie, I. Laurence; Cherny, Stacey S.; Richardson, Alex J.; Talcott, Joel B.; Stein, John F.; Monaco, Anthony P.; Cardon, Lon R.
2003-01-01
Replication of linkage results for complex traits has been exceedingly difficult, owing in part to the inability to measure the precise underlying phenotype, small sample sizes, genetic heterogeneity, and statistical methods employed in analysis. Often, in any particular study, multiple correlated traits have been collected, yet these have been analyzed independently or, at most, in bivariate analyses. Theoretical arguments suggest that full multivariate analysis of all available traits should offer more power to detect linkage; however, this has not yet been evaluated on a genomewide scale. Here, we conduct multivariate genomewide analyses of quantitative-trait loci that influence reading- and language-related measures in families affected with developmental dyslexia. The results of these analyses are substantially clearer than those of previous univariate analyses of the same data set, helping to resolve a number of key issues. These outcomes highlight the relevance of multivariate analysis for complex disorders for dissection of linkage results in correlated traits. The approach employed here may aid positional cloning of susceptibility genes in a wide spectrum of complex traits. PMID:12587094
Extracting chemical information from high-resolution Kβ X-ray emission spectroscopy
NASA Astrophysics Data System (ADS)
Limandri, S.; Robledo, J.; Tirao, G.
2018-06-01
High-resolution X-ray emission spectroscopy allows studying the chemical environment of a wide variety of materials. Chemical information can be obtained by fitting the X-ray spectra and observing the behavior of some spectral features. Spectral changes can also be quantified by means of statistical parameters calculated by considering the spectrum as a probability distribution. Another possibility is to perform statistical multivariate analysis, such as principal component analysis. In this work the performance of these procedures for extracting chemical information in X-ray emission spectroscopy spectra for mixtures of Mn2+ and Mn4+ oxides are studied. A detail analysis of the parameters obtained, as well as the associated uncertainties is shown. The methodologies are also applied for Mn oxidation state characterization of double perovskite oxides Ba1+xLa1-xMnSbO6 (with 0 ≤ x ≤ 0.7). The results show that statistical parameters and multivariate analysis are the most suitable for the analysis of this kind of spectra.
Sun, Gang; Hoff, Steven J; Zelle, Brian C; Nelson, Minda A
2008-12-01
It is vital to forecast gas and particle matter concentrations and emission rates (GPCER) from livestock production facilities to assess the impact of airborne pollutants on human health, ecological environment, and global warming. Modeling source air quality is a complex process because of abundant nonlinear interactions between GPCER and other factors. The objective of this study was to introduce statistical methods and radial basis function (RBF) neural network to predict daily source air quality in Iowa swine deep-pit finishing buildings. The results show that four variables (outdoor and indoor temperature, animal units, and ventilation rates) were identified as relative important model inputs using statistical methods. It can be further demonstrated that only two factors, the environment factor and the animal factor, were capable of explaining more than 94% of the total variability after performing principal component analysis. The introduction of fewer uncorrelated variables to the neural network would result in the reduction of the model structure complexity, minimize computation cost, and eliminate model overfitting problems. The obtained results of RBF network prediction were in good agreement with the actual measurements, with values of the correlation coefficient between 0.741 and 0.995 and very low values of systemic performance indexes for all the models. The good results indicated the RBF network could be trained to model these highly nonlinear relationships. Thus, the RBF neural network technology combined with multivariate statistical methods is a promising tool for air pollutant emissions modeling.
Multivariate meta-analysis: a robust approach based on the theory of U-statistic.
Ma, Yan; Mazumdar, Madhu
2011-10-30
Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.
Multivariate postprocessing techniques for probabilistic hydrological forecasting
NASA Astrophysics Data System (ADS)
Hemri, Stephan; Lisniak, Dmytro; Klein, Bastian
2016-04-01
Hydrologic ensemble forecasts driven by atmospheric ensemble prediction systems need statistical postprocessing in order to account for systematic errors in terms of both mean and spread. Runoff is an inherently multivariate process with typical events lasting from hours in case of floods to weeks or even months in case of droughts. This calls for multivariate postprocessing techniques that yield well calibrated forecasts in univariate terms and ensure a realistic temporal dependence structure at the same time. To this end, the univariate ensemble model output statistics (EMOS; Gneiting et al., 2005) postprocessing method is combined with two different copula approaches that ensure multivariate calibration throughout the entire forecast horizon. These approaches comprise ensemble copula coupling (ECC; Schefzik et al., 2013), which preserves the dependence structure of the raw ensemble, and a Gaussian copula approach (GCA; Pinson and Girard, 2012), which estimates the temporal correlations from training observations. Both methods are tested in a case study covering three subcatchments of the river Rhine that represent different sizes and hydrological regimes: the Upper Rhine up to the gauge Maxau, the river Moselle up to the gauge Trier, and the river Lahn up to the gauge Kalkofen. The results indicate that both ECC and GCA are suitable for modelling the temporal dependences of probabilistic hydrologic forecasts (Hemri et al., 2015). References Gneiting, T., A. E. Raftery, A. H. Westveld, and T. Goldman (2005), Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation, Monthly Weather Review, 133(5), 1098-1118, DOI: 10.1175/MWR2904.1. Hemri, S., D. Lisniak, and B. Klein, Multivariate postprocessing techniques for probabilistic hydrological forecasting, Water Resources Research, 51(9), 7436-7451, DOI: 10.1002/2014WR016473. Pinson, P., and R. Girard (2012), Evaluating the quality of scenarios of short-term wind power generation, Applied Energy, 96, 12-20, DOI: 10.1016/j.apenergy.2011.11.004. Schefzik, R., T. L. Thorarinsdottir, and T. Gneiting (2013), Uncertainty quantification in complex simulation models using ensemble copula coupling, Statistical Science, 28, 616-640, DOI: 10.1214/13-STS443.
Time Series Model Identification by Estimating Information.
1982-11-01
principle, Applications of Statistics, P. R. Krishnaiah , ed., North-Holland: Amsterdam, 27-41. Anderson, T. W. (1971). The Statistical Analysis of Time Series...E. (1969). Multiple Time Series Modeling, Multivariate Analysis II, edited by P. Krishnaiah , Academic Press: New York, 389-409. Parzen, E. (1981...Newton, H. J. (1980). Multiple Time Series Modeling, II Multivariate Analysis - V, edited by P. Krishnaiah , North Holland: Amsterdam, 181-197. Shibata, R
Applications of modern statistical methods to analysis of data in physical science
NASA Astrophysics Data System (ADS)
Wicker, James Eric
Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.
Adams, Dean C
2014-09-01
Phylogenetic signal is the tendency for closely related species to display similar trait values due to their common ancestry. Several methods have been developed for quantifying phylogenetic signal in univariate traits and for sets of traits treated simultaneously, and the statistical properties of these approaches have been extensively studied. However, methods for assessing phylogenetic signal in high-dimensional multivariate traits like shape are less well developed, and their statistical performance is not well characterized. In this article, I describe a generalization of the K statistic of Blomberg et al. that is useful for quantifying and evaluating phylogenetic signal in highly dimensional multivariate data. The method (K(mult)) is found from the equivalency between statistical methods based on covariance matrices and those based on distance matrices. Using computer simulations based on Brownian motion, I demonstrate that the expected value of K(mult) remains at 1.0 as trait variation among species is increased or decreased, and as the number of trait dimensions is increased. By contrast, estimates of phylogenetic signal found with a squared-change parsimony procedure for multivariate data change with increasing trait variation among species and with increasing numbers of trait dimensions, confounding biological interpretations. I also evaluate the statistical performance of hypothesis testing procedures based on K(mult) and find that the method displays appropriate Type I error and high statistical power for detecting phylogenetic signal in high-dimensional data. Statistical properties of K(mult) were consistent for simulations using bifurcating and random phylogenies, for simulations using different numbers of species, for simulations that varied the number of trait dimensions, and for different underlying models of trait covariance structure. Overall these findings demonstrate that K(mult) provides a useful means of evaluating phylogenetic signal in high-dimensional multivariate traits. Finally, I illustrate the utility of the new approach by evaluating the strength of phylogenetic signal for head shape in a lineage of Plethodon salamanders. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Hao, Zengchao; Hao, Fanghua; Singh, Vijay P.
2016-08-01
Drought is among the costliest natural hazards worldwide and extreme drought events in recent years have caused huge losses to various sectors. Drought prediction is therefore critically important for providing early warning information to aid decision making to cope with drought. Due to the complicated nature of drought, it has been recognized that the univariate drought indicator may not be sufficient for drought characterization and hence multivariate drought indices have been developed for drought monitoring. Alongside the substantial effort in drought monitoring with multivariate drought indices, it is of equal importance to develop a drought prediction method with multivariate drought indices to integrate drought information from various sources. This study proposes a general framework for multivariate multi-index drought prediction that is capable of integrating complementary prediction skills from multiple drought indices. The Multivariate Ensemble Streamflow Prediction (MESP) is employed to sample from historical records for obtaining statistical prediction of multiple variables, which is then used as inputs to achieve multivariate prediction. The framework is illustrated with a linearly combined drought index (LDI), which is a commonly used multivariate drought index, based on climate division data in California and New York in the United States with different seasonality of precipitation. The predictive skill of LDI (represented with persistence) is assessed by comparison with the univariate drought index and results show that the LDI prediction skill is less affected by seasonality than the meteorological drought prediction based on SPI. Prediction results from the case study show that the proposed multivariate drought prediction outperforms the persistence prediction, implying a satisfactory performance of multivariate drought prediction. The proposed method would be useful for drought prediction to integrate drought information from various sources for early drought warning.
Characterizing multivariate decoding models based on correlated EEG spectral features.
McFarland, Dennis J
2013-07-01
Multivariate decoding methods are popular techniques for analysis of neurophysiological data. The present study explored potential interpretative problems with these techniques when predictors are correlated. Data from sensorimotor rhythm-based cursor control experiments was analyzed offline with linear univariate and multivariate models. Features were derived from autoregressive (AR) spectral analysis of varying model order which produced predictors that varied in their degree of correlation (i.e., multicollinearity). The use of multivariate regression models resulted in much better prediction of target position as compared to univariate regression models. However, with lower order AR features interpretation of the spectral patterns of the weights was difficult. This is likely to be due to the high degree of multicollinearity present with lower order AR features. Care should be exercised when interpreting the pattern of weights of multivariate models with correlated predictors. Comparison with univariate statistics is advisable. While multivariate decoding algorithms are very useful for prediction their utility for interpretation may be limited when predictors are correlated. Copyright © 2013 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Meta-analysis of gene-level associations for rare variants based on single-variant statistics.
Hu, Yi-Juan; Berndt, Sonja I; Gustafsson, Stefan; Ganna, Andrea; Hirschhorn, Joel; North, Kari E; Ingelsson, Erik; Lin, Dan-Yu
2013-08-08
Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Gap Shape Classification using Landscape Indices and Multivariate Statistics
Wu, Chih-Da; Cheng, Chi-Chuan; Chang, Che-Chang; Lin, Chinsu; Chang, Kun-Cheng; Chuang, Yung-Chung
2016-01-01
This study proposed a novel methodology to classify the shape of gaps using landscape indices and multivariate statistics. Patch-level indices were used to collect the qualified shape and spatial configuration characteristics for canopy gaps in the Lienhuachih Experimental Forest in Taiwan in 1998 and 2002. Non-hierarchical cluster analysis was used to assess the optimal number of gap clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy gap classification. The gaps for the two periods were optimally classified into three categories. In general, gap type 1 had a more complex shape, gap type 2 was more elongated and gap type 3 had the largest gaps that were more regular in shape. The results were evaluated using Wilks’ lambda as satisfactory (p < 0.001). The agreement rate of confusion matrices exceeded 96%. Differences in gap characteristics between the classified gap types that were determined using a one-way ANOVA showed a statistical significance in all patch indices (p = 0.00), except for the Euclidean nearest neighbor distance (ENN) in 2002. Taken together, these results demonstrated the feasibility and applicability of the proposed methodology to classify the shape of a gap. PMID:27901127
Gap Shape Classification using Landscape Indices and Multivariate Statistics.
Wu, Chih-Da; Cheng, Chi-Chuan; Chang, Che-Chang; Lin, Chinsu; Chang, Kun-Cheng; Chuang, Yung-Chung
2016-11-30
This study proposed a novel methodology to classify the shape of gaps using landscape indices and multivariate statistics. Patch-level indices were used to collect the qualified shape and spatial configuration characteristics for canopy gaps in the Lienhuachih Experimental Forest in Taiwan in 1998 and 2002. Non-hierarchical cluster analysis was used to assess the optimal number of gap clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy gap classification. The gaps for the two periods were optimally classified into three categories. In general, gap type 1 had a more complex shape, gap type 2 was more elongated and gap type 3 had the largest gaps that were more regular in shape. The results were evaluated using Wilks' lambda as satisfactory (p < 0.001). The agreement rate of confusion matrices exceeded 96%. Differences in gap characteristics between the classified gap types that were determined using a one-way ANOVA showed a statistical significance in all patch indices (p = 0.00), except for the Euclidean nearest neighbor distance (ENN) in 2002. Taken together, these results demonstrated the feasibility and applicability of the proposed methodology to classify the shape of a gap.
A Statistical Discrimination Experiment for Eurasian Events Using a Twenty-Seven-Station Network
1980-07-08
to test the effectiveness of a multivariate method of analysis for distinguishing earthquakes from explosions. The data base for the experiment...to test the effectiveness of a multivariate method of analysis for distinguishing earthquakes from explosions. The data base for the experiment...the weight assigned to each variable whenever a new one is added. Jennrich, R. I. (1977). Stepwise discriminant analysis , in Statistical Methods for
Data exploration systems for databases
NASA Technical Reports Server (NTRS)
Greene, Richard J.; Hield, Christopher
1992-01-01
Data exploration systems apply machine learning techniques, multivariate statistical methods, information theory, and database theory to databases to identify significant relationships among the data and summarize information. The result of applying data exploration systems should be a better understanding of the structure of the data and a perspective of the data enabling an analyst to form hypotheses for interpreting the data. This paper argues that data exploration systems need a minimum amount of domain knowledge to guide both the statistical strategy and the interpretation of the resulting patterns discovered by these systems.
Can multivariate models based on MOAKS predict OA knee pain? Data from the Osteoarthritis Initiative
NASA Astrophysics Data System (ADS)
Luna-Gómez, Carlos D.; Zanella-Calzada, Laura A.; Galván-Tejada, Jorge I.; Galván-Tejada, Carlos E.; Celaya-Padilla, José M.
2017-03-01
Osteoarthritis is the most common rheumatic disease in the world. Knee pain is the most disabling symptom in the disease, the prediction of pain is one of the targets in preventive medicine, this can be applied to new therapies or treatments. Using the magnetic resonance imaging and the grading scales, a multivariate model based on genetic algorithms is presented. Using a predictive model can be useful to associate minor structure changes in the joint with the future knee pain. Results suggest that multivariate models can be predictive with future knee chronic pain. All models; T0, T1 and T2, were statistically significant, all p values were < 0.05 and all AUC > 0.60.
Estimating an Effect Size in One-Way Multivariate Analysis of Variance (MANOVA)
ERIC Educational Resources Information Center
Steyn, H. S., Jr.; Ellis, S. M.
2009-01-01
When two or more univariate population means are compared, the proportion of variation in the dependent variable accounted for by population group membership is eta-squared. This effect size can be generalized by using multivariate measures of association, based on the multivariate analysis of variance (MANOVA) statistics, to establish whether…
Docking and multivariate methods to explore HIV-1 drug-resistance: a comparative analysis
NASA Astrophysics Data System (ADS)
Almerico, Anna Maria; Tutone, Marco; Lauria, Antonino
2008-05-01
In this paper we describe a comparative analysis between multivariate and docking methods in the study of the drug resistance to the reverse transcriptase and the protease inhibitors. In our early papers we developed a simple but efficient method to evaluate the features of compounds that are less likely to trigger resistance or are effective against mutant HIV strains, using the multivariate statistical procedures PCA and DA. In the attempt to create a more solid background for the prediction of susceptibility or resistance, we carried out a comparative analysis between our previous multivariate approach and molecular docking study. The intent of this paper is not only to find further support to the results obtained by the combined use of PCA and DA, but also to evidence the structural features, in terms of molecular descriptors, similarity, and energetic contributions, derived from docking, which can account for the arising of drug-resistance against mutant strains.
Yan, Binjun; Fang, Zhonghua; Shen, Lijuan; Qu, Haibin
2015-01-01
The batch-to-batch quality consistency of herbal drugs has always been an important issue. To propose a methodology for batch-to-batch quality control based on HPLC-MS fingerprints and process knowledgebase. The extraction process of Compound E-jiao Oral Liquid was taken as a case study. After establishing the HPLC-MS fingerprint analysis method, the fingerprints of the extract solutions produced under normal and abnormal operation conditions were obtained. Multivariate statistical models were built for fault detection and a discriminant analysis model was built using the probabilistic discriminant partial-least-squares method for fault diagnosis. Based on multivariate statistical analysis, process knowledge was acquired and the cause-effect relationship between process deviations and quality defects was revealed. The quality defects were detected successfully by multivariate statistical control charts and the type of process deviations were diagnosed correctly by discriminant analysis. This work has demonstrated the benefits of combining HPLC-MS fingerprints, process knowledge and multivariate analysis for the quality control of herbal drugs. Copyright © 2015 John Wiley & Sons, Ltd.
Application of multivariate statistical techniques in microbial ecology.
Paliy, O; Shankar, V
2016-03-01
Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure. © 2016 John Wiley & Sons Ltd.
Applying the multivariate time-rescaling theorem to neural population models
Gerhard, Felipe; Haslinger, Robert; Pipa, Gordon
2011-01-01
Statistical models of neural activity are integral to modern neuroscience. Recently, interest has grown in modeling the spiking activity of populations of simultaneously recorded neurons to study the effects of correlations and functional connectivity on neural information processing. However any statistical model must be validated by an appropriate goodness-of-fit test. Kolmogorov-Smirnov tests based upon the time-rescaling theorem have proven to be useful for evaluating point-process-based statistical models of single-neuron spike trains. Here we discuss the extension of the time-rescaling theorem to the multivariate (neural population) case. We show that even in the presence of strong correlations between spike trains, models which neglect couplings between neurons can be erroneously passed by the univariate time-rescaling test. We present the multivariate version of the time-rescaling theorem, and provide a practical step-by-step procedure for applying it towards testing the sufficiency of neural population models. Using several simple analytically tractable models and also more complex simulated and real data sets, we demonstrate that important features of the population activity can only be detected using the multivariate extension of the test. PMID:21395436
Time management in acute vertebrobasilar occlusion.
Kamper, Lars; Rybacki, Konrad; Mansour, Michael; Winkler, Sven B; Kempkes, Udo; Haage, Patrick
2009-03-01
Acute vertebrobasilar occlusion (VBO) is associated with a high risk of stroke and death. Although local thrombolysis may achieve recanalization and improve outcome, mortality is still between 35% and 75%. However, without recanalization the chance of a good outcome is extremely poor, with mortality rates of 80-90%. Early treatment is a fundamental factor, but detailed studies of the exact time management of the diagnostic and interventional workflow are still lacking. Data on 18 patients were retrospectively evaluated. Time periods between symptom onset, admission to hospital, time of diagnosis, and beginning of intervention were correlated with postinterventional neurological status. The Glasgow Coma Scale and National Institute of Health Stroke Scale (NIHSS) were used to examine patients before and after local thrombolysis. Additionally, multivariate statistics were applied to reveal similarities between patients with neurological improvement. Primary recanalization was achieved in 77% of patients. The overall mortality was 55%. Major complications were intracranial hemorrhage and peripheral embolism. The time period from symptom onset to intervention showed a strong correlation with the postinterventional NIHSS as well as the patient's age, with the best results in a 4-h interval. Multivariate statistics revealed similarities among the patients. Evaluation of time management in acute VBO by multivariate statistics is a helpful tool for definition of similarities in this patient group. Similarly to the door-to-balloon time for acute coronary interventions, the chances for a good outcome depend on a short time interval between symptom onset and intervention. While the only manipulable time period starts with hospital admission, our results emphasize the necessity of efficient intrahospital workflow.
Assessment of water quality parameters using multivariate analysis for Klang River basin, Malaysia.
Mohamed, Ibrahim; Othman, Faridah; Ibrahim, Adriana I N; Alaa-Eldin, M E; Yunus, Rossita M
2015-01-01
This case study uses several univariate and multivariate statistical techniques to evaluate and interpret a water quality data set obtained from the Klang River basin located within the state of Selangor and the Federal Territory of Kuala Lumpur, Malaysia. The river drains an area of 1,288 km(2), from the steep mountain rainforests of the main Central Range along Peninsular Malaysia to the river mouth in Port Klang, into the Straits of Malacca. Water quality was monitored at 20 stations, nine of which are situated along the main river and 11 along six tributaries. Data was collected from 1997 to 2007 for seven parameters used to evaluate the status of the water quality, namely dissolved oxygen, biochemical oxygen demand, chemical oxygen demand, suspended solids, ammoniacal nitrogen, pH, and temperature. The data were first investigated using descriptive statistical tools, followed by two practical multivariate analyses that reduced the data dimensions for better interpretation. The analyses employed were factor analysis and principal component analysis, which explain 60 and 81.6% of the total variation in the data, respectively. We found that the resulting latent variables from the factor analysis are interpretable and beneficial for describing the water quality in the Klang River. This study presents the usefulness of several statistical methods in evaluating and interpreting water quality data for the purpose of monitoring the effectiveness of water resource management. The results should provide more straightforward data interpretation as well as valuable insight for managers to conceive optimum action plans for controlling pollution in river water.
Harnessing Multivariate Statistics for Ellipsoidal Data in Structural Geology
NASA Astrophysics Data System (ADS)
Roberts, N.; Davis, J. R.; Titus, S.; Tikoff, B.
2015-12-01
Most structural geology articles do not state significance levels, report confidence intervals, or perform regressions to find trends. This is, in part, because structural data tend to include directions, orientations, ellipsoids, and tensors, which are not treatable by elementary statistics. We describe a full procedural methodology for the statistical treatment of ellipsoidal data. We use a reconstructed dataset of deformed ooids in Maryland from Cloos (1947) to illustrate the process. Normalized ellipsoids have five degrees of freedom and can be represented by a second order tensor. This tensor can be permuted into a five dimensional vector that belongs to a vector space and can be treated with standard multivariate statistics. Cloos made several claims about the distribution of deformation in the South Mountain fold, Maryland, and we reexamine two particular claims using hypothesis testing: 1) octahedral shear strain increases towards the axial plane of the fold; 2) finite strain orientation varies systematically along the trend of the axial trace as it bends with the Appalachian orogen. We then test the null hypothesis that the southern segment of South Mountain is the same as the northern segment. This test illustrates the application of ellipsoidal statistics, which combine both orientation and shape. We report confidence intervals for each test, and graphically display our results with novel plots. This poster illustrates the importance of statistics in structural geology, especially when working with noisy or small datasets.
Steiner, John F.; Ho, P. Michael; Beaty, Brenda L.; Dickinson, L. Miriam; Hanratty, Rebecca; Zeng, Chan; Tavel, Heather M.; Havranek, Edward P.; Davidson, Arthur J.; Magid, David J.; Estacio, Raymond O.
2009-01-01
Background Although many studies have identified patient characteristics or chronic diseases associated with medication adherence, the clinical utility of such predictors has rarely been assessed. We attempted to develop clinical prediction rules for adherence with antihypertensive medications in two health care delivery systems. Methods and Results Retrospective cohort studies of hypertension registries in an inner-city health care delivery system (N = 17176) and a health maintenance organization (N = 94297) in Denver, Colorado. Adherence was defined by acquisition of 80% or more of antihypertensive medications. A multivariable model in the inner-city system found that adherent patients (36.3% of the total) were more likely than non-adherent patients to be older, white, married, and acculturated in US society, to have diabetes or cerebrovascular disease, not to abuse alcohol or controlled substances, and to be prescribed less than three antihypertensive medications. Although statistically significant, all multivariate odds ratios were 1.7 or less, and the model did not accurately discriminate adherent from non-adherent patients (C-statistic = 0.606). In the health maintenance organization, where 72.1% of patients were adherent, significant but weak associations existed between adherence and older age, white race, the lack of alcohol abuse, and fewer antihypertensive medications. The multivariate model again failed to accurately discriminate adherent from non-adherent individuals (C-statistic = 0.576). Conclusions Although certain socio-demographic characteristics or clinical diagnoses are statistically associated with adherence to refills of antihypertensive medications, a combination of these characteristics is not sufficiently accurate to allow clinicians to predict whether their patients will be adherent with treatment. PMID:20031876
Multivariate Phylogenetic Comparative Methods: Evaluations, Comparisons, and Recommendations.
Adams, Dean C; Collyer, Michael L
2018-01-01
Recent years have seen increased interest in phylogenetic comparative analyses of multivariate data sets, but to date the varied proposed approaches have not been extensively examined. Here we review the mathematical properties required of any multivariate method, and specifically evaluate existing multivariate phylogenetic comparative methods in this context. Phylogenetic comparative methods based on the full multivariate likelihood are robust to levels of covariation among trait dimensions and are insensitive to the orientation of the data set, but display increasing model misspecification as the number of trait dimensions increases. This is because the expected evolutionary covariance matrix (V) used in the likelihood calculations becomes more ill-conditioned as trait dimensionality increases, and as evolutionary models become more complex. Thus, these approaches are only appropriate for data sets with few traits and many species. Methods that summarize patterns across trait dimensions treated separately (e.g., SURFACE) incorrectly assume independence among trait dimensions, resulting in nearly a 100% model misspecification rate. Methods using pairwise composite likelihood are highly sensitive to levels of trait covariation, the orientation of the data set, and the number of trait dimensions. The consequences of these debilitating deficiencies are that a user can arrive at differing statistical conclusions, and therefore biological inferences, simply from a dataspace rotation, like principal component analysis. By contrast, algebraic generalizations of the standard phylogenetic comparative toolkit that use the trace of covariance matrices are insensitive to levels of trait covariation, the number of trait dimensions, and the orientation of the data set. Further, when appropriate permutation tests are used, these approaches display acceptable Type I error and statistical power. We conclude that methods summarizing information across trait dimensions, as well as pairwise composite likelihood methods should be avoided, whereas algebraic generalizations of the phylogenetic comparative toolkit provide a useful means of assessing macroevolutionary patterns in multivariate data. Finally, we discuss areas in which multivariate phylogenetic comparative methods are still in need of future development; namely highly multivariate Ornstein-Uhlenbeck models and approaches for multivariate evolutionary model comparisons. © The Author(s) 2017. Published by Oxford University Press on behalf of the Systematic Biology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
A multi-analyte serum test for the detection of non-small cell lung cancer
Farlow, E C; Vercillo, M S; Coon, J S; Basu, S; Kim, A W; Faber, L P; Warren, W H; Bonomi, P; Liptay, M J; Borgia, J A
2010-01-01
Background: In this study, we appraised a wide assortment of biomarkers previously shown to have diagnostic or prognostic value for non-small cell lung cancer (NSCLC) with the intent of establishing a multi-analyte serum test capable of identifying patients with lung cancer. Methods: Circulating levels of 47 biomarkers were evaluated against patient cohorts consisting of 90 NSCLC and 43 non-cancer controls using commercial immunoassays. Multivariate statistical methods were used on all biomarkers achieving statistical relevance to define an optimised panel of diagnostic biomarkers for NSCLC. The resulting biomarkers were fashioned into a classification algorithm and validated against serum from a second patient cohort. Results: A total of 14 analytes achieved statistical relevance upon evaluation. Multivariate statistical methods then identified a panel of six biomarkers (tumour necrosis factor-α, CYFRA 21-1, interleukin-1ra, matrix metalloproteinase-2, monocyte chemotactic protein-1 and sE-selectin) as being the most efficacious for diagnosing early stage NSCLC. When tested against a second patient cohort, the panel successfully classified 75 of 88 patients. Conclusions: Here, we report the development of a serum algorithm with high specificity for classifying patients with NSCLC against cohorts of various ‘high-risk' individuals. A high rate of false positives was observed within the cohort in which patients had non-neoplastic lung nodules, possibly as a consequence of the inflammatory nature of these conditions. PMID:20859284
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gunn, Andrew J., E-mail: agunn@uabmc.edu; Sheth, Rahul A.; Luber, Brandon
2017-01-15
PurposeThe purpse of this study was to evaluate the ability of various radiologic response criteria to predict patient outcomes after trans-arterial chemo-embolization with drug-eluting beads (DEB-TACE) in patients with advanced-stage (BCLC C) hepatocellular carcinoma (HCC).Materials and methodsHospital records from 2005 to 2011 were retrospectively reviewed. Non-infiltrative lesions were measured at baseline and on follow-up scans after DEB-TACE according to various common radiologic response criteria, including guidelines of the World Health Organization (WHO), Response Evaluation Criteria in Solid Tumors (RECIST), the European Association for the Study of the Liver (EASL), and modified RECIST (mRECIST). Statistical analysis was performed to see which,more » if any, of the response criteria could be used as a predictor of overall survival (OS) or time-to-progression (TTP).Results75 patients met inclusion criteria. Median OS and TTP were 22.6 months (95 % CI 11.6–24.8) and 9.8 months (95 % CI 7.1–21.6), respectively. Univariate and multivariate Cox analyses revealed that none of the evaluated criteria had the ability to be used as a predictor for OS or TTP. Analysis of the C index in both univariate and multivariate models showed that the evaluated criteria were not accurate predictors of either OS (C-statistic range: 0.51–0.58 in the univariate model; range: 0.54–0.58 in the multivariate model) or TTP (C-statistic range: 0.55–0.59 in the univariate model; range: 0.57–0.61 in the multivariate model).ConclusionCurrent response criteria are not accurate predictors of OS or TTP in patients with advanced-stage HCC after DEB-TACE.« less
Clinical Trials With Large Numbers of Variables: Important Advantages of Canonical Analysis.
Cleophas, Ton J
2016-01-01
Canonical analysis assesses the combined effects of a set of predictor variables on a set of outcome variables, but it is little used in clinical trials despite the omnipresence of multiple variables. The aim of this study was to assess the performance of canonical analysis as compared with traditional multivariate methods using multivariate analysis of covariance (MANCOVA). As an example, a simulated data file with 12 gene expression levels and 4 drug efficacy scores was used. The correlation coefficient between the 12 predictor and 4 outcome variables was 0.87 (P = 0.0001) meaning that 76% of the variability in the outcome variables was explained by the 12 covariates. Repeated testing after the removal of 5 unimportant predictor and 1 outcome variable produced virtually the same overall result. The MANCOVA identified identical unimportant variables, but it was unable to provide overall statistics. (1) Canonical analysis is remarkable, because it can handle many more variables than traditional multivariate methods such as MANCOVA can. (2) At the same time, it accounts for the relative importance of the separate variables, their interactions and differences in units. (3) Canonical analysis provides overall statistics of the effects of sets of variables, whereas traditional multivariate methods only provide the statistics of the separate variables. (4) Unlike other methods for combining the effects of multiple variables such as factor analysis/partial least squares, canonical analysis is scientifically entirely rigorous. (5) Limitations include that it is less flexible than factor analysis/partial least squares, because only 2 sets of variables are used and because multiple solutions instead of one is offered. We do hope that this article will stimulate clinical investigators to start using this remarkable method.
Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.
Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V
2007-01-01
The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.
Alkarkhi, Abbas F M; Ramli, Saifullah Bin; Easa, Azhar Mat
2009-01-01
Major (sodium, potassium, calcium, magnesium) and minor elements (iron, copper, zinc, manganese) and one heavy metal (lead) of Cavendish banana flour and Dream banana flour were determined, and data were analyzed using multivariate statistical techniques of factor analysis and discriminant analysis. Factor analysis yielded four factors explaining more than 81% of the total variance: the first factor explained 28.73%, comprising magnesium, sodium, and iron; the second factor explained 21.47%, comprising only manganese and copper; the third factor explained 15.66%, comprising zinc and lead; while the fourth factor explained 15.50%, comprising potassium. Discriminant analysis showed that magnesium and sodium exhibited a strong contribution in discriminating the two types of banana flour, affording 100% correct assignation. This study presents the usefulness of multivariate statistical techniques for analysis and interpretation of complex mineral content data from banana flour of different varieties.
Optimal moment determination in POME-copula based hydrometeorological dependence modelling
NASA Astrophysics Data System (ADS)
Liu, Dengfeng; Wang, Dong; Singh, Vijay P.; Wang, Yuankun; Wu, Jichun; Wang, Lachun; Zou, Xinqing; Chen, Yuanfang; Chen, Xi
2017-07-01
Copula has been commonly applied in multivariate modelling in various fields where marginal distribution inference is a key element. To develop a flexible, unbiased mathematical inference framework in hydrometeorological multivariate applications, the principle of maximum entropy (POME) is being increasingly coupled with copula. However, in previous POME-based studies, determination of optimal moment constraints has generally not been considered. The main contribution of this study is the determination of optimal moments for POME for developing a coupled optimal moment-POME-copula framework to model hydrometeorological multivariate events. In this framework, margins (marginals, or marginal distributions) are derived with the use of POME, subject to optimal moment constraints. Then, various candidate copulas are constructed according to the derived margins, and finally the most probable one is determined, based on goodness-of-fit statistics. This optimal moment-POME-copula framework is applied to model the dependence patterns of three types of hydrometeorological events: (i) single-site streamflow-water level; (ii) multi-site streamflow; and (iii) multi-site precipitation, with data collected from Yichang and Hankou in the Yangtze River basin, China. Results indicate that the optimal-moment POME is more accurate in margin fitting and the corresponding copulas reflect a good statistical performance in correlation simulation. Also, the derived copulas, capturing more patterns which traditional correlation coefficients cannot reflect, provide an efficient way in other applied scenarios concerning hydrometeorological multivariate modelling.
Wang, Yalin; Zhang, Jie; Gutman, Boris; Chan, Tony F.; Becker, James T.; Aizenstein, Howard J.; Lopez, Oscar L.; Tamburo, Robert J.; Toga, Arthur W.; Thompson, Paul M.
2010-01-01
Here we developed a new method, called multivariate tensor-based surface morphometry (TBM), and applied it to study lateral ventricular surface differences associated with HIV/AIDS. Using concepts from differential geometry and the theory of differential forms, we created mathematical structures known as holomorphic one-forms, to obtain an efficient and accurate conformal parameterization of the lateral ventricular surfaces in the brain. The new meshing approach also provides a natural way to register anatomical surfaces across subjects, and improves on prior methods as it handles surfaces that branch and join at complex 3D junctions. To analyze anatomical differences, we computed new statistics from the Riemannian surface metrics - these retain multivariate information on local surface geometry. We applied this framework to analyze lateral ventricular surface morphometry in 3D MRI data from 11 subjects with HIV/AIDS and 8 healthy controls. Our method detected a 3D profile of surface abnormalities even in this small sample. Multivariate statistics on the local tensors gave better effect sizes for detecting group differences, relative to other TBM-based methods including analysis of the Jacobian determinant, the largest and smallest eigenvalues of the surface metric, and the pair of eigenvalues of the Jacobian matrix. The resulting analysis pipeline may improve the power of surface-based morphometry studies of the brain. PMID:19900560
Self-Regulated Learning Strategies in Relation with Statistics Anxiety
ERIC Educational Resources Information Center
Kesici, Sahin; Baloglu, Mustafa; Deniz, M. Engin
2011-01-01
Dealing with students' attitudinal problems related to statistics is an important aspect of statistics instruction. Employing the appropriate learning strategies may have a relationship with anxiety during the process of statistics learning. Thus, the present study investigated multivariate relationships between self-regulated learning strategies…
Lü, Yiran; Hao, Shuxin; Zhang, Guoqing; Liu, Jie; Liu, Yue; Xu, Dongqun
2018-01-01
To implement the online statistical analysis function in information system of air pollution and health impact monitoring, and obtain the data analysis information real-time. Using the descriptive statistical method as well as time-series analysis and multivariate regression analysis, SQL language and visual tools to implement online statistical analysis based on database software. Generate basic statistical tables and summary tables of air pollution exposure and health impact data online; Generate tendency charts of each data part online and proceed interaction connecting to database; Generate butting sheets which can lead to R, SAS and SPSS directly online. The information system air pollution and health impact monitoring implements the statistical analysis function online, which can provide real-time analysis result to its users.
Guo, Jing; Yuan, Yahong; Dou, Pei; Yue, Tianli
2017-10-01
Fifty-one kiwifruit juice samples of seven kiwifruit varieties from five regions in China were analyzed to determine their polyphenols contents and to trace fruit varieties and geographical origins by multivariate statistical analysis. Twenty-one polyphenols belonging to four compound classes were determined by ultra-high-performance liquid chromatography coupled with ultra-high-resolution TOF mass spectrometry. (-)-Epicatechin, (+)-catechin, procyanidin B1 and caffeic acid derivatives were the predominant phenolic compounds in the juices. Principal component analysis (PCA) allowed a clear separation of the juices according to kiwifruit varieties. Stepwise linear discriminant analysis (SLDA) yielded satisfactory categorization of samples, provided 100% success rate according to kiwifruit varieties and 92.2% success rate according to geographical origins. The result showed that polyphenolic profiles of kiwifruit juices contain enough information to trace fruit varieties and geographical origins. Copyright © 2017 Elsevier Ltd. All rights reserved.
Jamshidi-Zanjani, Ahmad; Saeedi, Mohsen
2017-07-01
Vertical distribution of metals (Cu, Zn, Cr, Fe, Mn, Pb, Ni, Cd, and Li) in four sediment core samples (C 1 , C 2 , C 3 , and C 4 ) from Anzali international wetland located southwest of the Caspian Sea was examined. Background concentration of each metal was calculated according to different statistical approaches. The results of multivariate statistical analysis showed that Fe and Mn might have significant role in the fate of Ni and Zn in sediment core samples. Different sediment quality indexes were utilized to assess metal pollution in sediment cores. Moreover, a new sediment quality index named aggregative toxicity index (ATI) based on sediment quality guidelines (SQGs) was developed to assess the degree of metal toxicity in an aggregative manner. The increasing pattern of metal pollution and their toxicity degree in upper layers of core samples indicated increasing effects of anthropogenic sources in the study area.
Multivariate space - time analysis of PRE-STORM precipitation
NASA Technical Reports Server (NTRS)
Polyak, Ilya; North, Gerald R.; Valdes, Juan B.
1994-01-01
This paper presents the methodologies and results of the multivariate modeling and two-dimensional spectral and correlation analysis of PRE-STORM rainfall gauge data. Estimated parameters of the models for the specific spatial averages clearly indicate the eastward and southeastward wave propagation of rainfall fluctuations. A relationship between the coefficients of the diffusion equation and the parameters of the stochastic model of rainfall fluctuations is derived that leads directly to the exclusive use of rainfall data to estimate advection speed (about 12 m/s) as well as other coefficients of the diffusion equation of the corresponding fields. The statistical methodology developed here can be used for confirmation of physical models by comparison of the corresponding second-moment statistics of the observed and simulated data, for generating multiple samples of any size, for solving the inverse problem of the hydrodynamic equations, and for application in some other areas of meteorological and climatological data analysis and modeling.
NASA Astrophysics Data System (ADS)
Panagopoulos, George P.
2014-10-01
The multivariate statistical techniques conducted on quarterly water consumption data in Mytilene reveal valuable tools that could help the local authorities in assigning strategies aimed at the sustainable development of urban water resources. The proposed methodology is an innovative approach, applied for the first time in the international literature, to handling urban water consumption data in order to analyze statistically the interrelationships among the determinants of urban water use. Factor analysis of demographic, socio-economic and hydrological variables shows that total water consumption in Mytilene is the combined result of increases in (a) income, (b) population, (c) connections and (d) climate parameters. On the other hand, the per connection water demand is influenced by variations in water prices but with different consequences in each consumption class. Increases in water prices are faced by large consumers; they then reduce their consumption rates and transfer to lower consumption blocks. These shifts are responsible for the increase in the average consumption values in the lower blocks despite the increase in the marginal prices.
Method of identifying clusters representing statistical dependencies in multivariate data
NASA Technical Reports Server (NTRS)
Borucki, W. J.; Card, D. H.; Lyle, G. C.
1975-01-01
Approach is first to cluster and then to compute spatial boundaries for resulting clusters. Next step is to compute, from set of Monte Carlo samples obtained from scrambled data, estimates of probabilities of obtaining at least as many points within boundaries as were actually observed in original data.
Analysis of Developmental Data: Comparison Among Alternative Methods
ERIC Educational Resources Information Center
Wilson, Ronald S.
1975-01-01
To examine the ability of the correction factor epsilon to counteract statistical bias in univariate analysis, an analysis of variance (adjusted by epsilon) and a multivariate analysis of variance were performed on the same data. The results indicated that univariate analysis is a fully protected design when used with epsilon. (JMB)
Corporal Punishment and Student Outcomes in Rural Schools
ERIC Educational Resources Information Center
Han, Seunghee
2014-01-01
This study examined the effects of corporal punishment on student outcomes in rural schools by analyzing 1,067 samples from the School Survey on Crime and Safety 2007-2008. Results of descriptive statistics and multivariate regression analyses indicated that schools with corporal punishment may decrease students' violent behaviors and…
Academic Departments and Student Attitudes toward Different Dimensions of Web-based Education.
ERIC Educational Resources Information Center
Federico, Pat-Anthony
2001-01-01
Describes research at the Naval Postgraduate School that investigated student attitudes toward various aspects of Web-based instruction. Results of a survey, which were analyzed using a variety of multivariate and univariate statistical techniques, showed significantly different attitudes toward different dimensions of Web-based education…
DOE Office of Scientific and Technical Information (OSTI.GOV)
2015-09-14
This package contains statistical routines for extracting features from multivariate time-series data which can then be used for subsequent multivariate statistical analysis to identify patterns and anomalous behavior. It calculates local linear or quadratic regression model fits to moving windows for each series and then summarizes the model coefficients across user-defined time intervals for each series. These methods are domain agnostic-but they have been successfully applied to a variety of domains, including commercial aviation and electric power grid data.
1981-08-01
RATIO TEST STATISTIC FOR SPHERICITY OF COMPLEX MULTIVARIATE NORMAL DISTRIBUTION* C. Fang P. R. Krishnaiah B. N. Nagarsenker** August 1981 Technical...and their applications in time sEries, the reader is referred to Krishnaiah (1976). Motivated by the applications in the area of inference on multiple...for practical purposes. Here, we note that Krishnaiah , Lee and Chang (1976) approxi- mated the null distribution of certain power of the likeli
[Analysis of variance of repeated data measured by water maze with SPSS].
Qiu, Hong; Jin, Guo-qin; Jin, Ru-feng; Zhao, Wei-kang
2007-01-01
To introduce the method of analyzing repeated data measured by water maze with SPSS 11.0, and offer a reference statistical method to clinical and basic medicine researchers who take the design of repeated measures. Using repeated measures and multivariate analysis of variance (ANOVA) process of the general linear model in SPSS and giving comparison among different groups and different measure time pairwise. Firstly, Mauchly's test of sphericity should be used to judge whether there were relations among the repeatedly measured data. If any (P
A model-based approach to wildland fire reconstruction using sediment charcoal records
Itter, Malcolm S.; Finley, Andrew O.; Hooten, Mevin B.; Higuera, Philip E.; Marlon, Jennifer R.; Kelly, Ryan; McLachlan, Jason S.
2017-01-01
Lake sediment charcoal records are used in paleoecological analyses to reconstruct fire history, including the identification of past wildland fires. One challenge of applying sediment charcoal records to infer fire history is the separation of charcoal associated with local fire occurrence and charcoal originating from regional fire activity. Despite a variety of methods to identify local fires from sediment charcoal records, an integrated statistical framework for fire reconstruction is lacking. We develop a Bayesian point process model to estimate the probability of fire associated with charcoal counts from individual-lake sediments and estimate mean fire return intervals. A multivariate extension of the model combines records from multiple lakes to reduce uncertainty in local fire identification and estimate a regional mean fire return interval. The univariate and multivariate models are applied to 13 lakes in the Yukon Flats region of Alaska. Both models resulted in similar mean fire return intervals (100–350 years) with reduced uncertainty under the multivariate model due to improved estimation of regional charcoal deposition. The point process model offers an integrated statistical framework for paleofire reconstruction and extends existing methods to infer regional fire history from multiple lake records with uncertainty following directly from posterior distributions.
STAMATE, MIRELA CRISTINA; TODOR, NICOLAE; COSGAREA, MARCEL
2015-01-01
Background and aim The clinical utility of otoacoustic emissions as a noninvasive objective test of cochlear function has been long studied. Both transient otoacoustic emissions and distorsion products can be used to identify hearing loss, but to what extent they can be used as predictors for hearing loss is still debated. Most studies agree that multivariate analyses have better test performances than univariate analyses. The aim of the study was to determine transient otoacoustic emissions and distorsion products performance in identifying normal and impaired hearing loss, using the pure tone audiogram as a gold standard procedure and different multivariate statistical approaches. Methods The study included 105 adult subjects with normal hearing and hearing loss who underwent the same test battery: pure-tone audiometry, tympanometry, otoacoustic emission tests. We chose to use the logistic regression as a multivariate statistical technique. Three logistic regression models were developed to characterize the relations between different risk factors (age, sex, tinnitus, demographic features, cochlear status defined by otoacoustic emissions) and hearing status defined by pure-tone audiometry. The multivariate analyses allow the calculation of the logistic score, which is a combination of the inputs, weighted by coefficients, calculated within the analyses. The accuracy of each model was assessed using receiver operating characteristics curve analysis. We used the logistic score to generate receivers operating curves and to estimate the areas under the curves in order to compare different multivariate analyses. Results We compared the performance of each otoacoustic emission (transient, distorsion product) using three different multivariate analyses for each ear, when multi-frequency gold standards were used. We demonstrated that all multivariate analyses provided high values of the area under the curve proving the performance of the otoacoustic emissions. Each otoacoustic emission test presented high values of area under the curve, suggesting that implementing a multivariate approach to evaluate the performances of each otoacoustic emission test would serve to increase the accuracy in identifying the normal and impaired ears. We encountered the highest area under the curve value for the combined multivariate analysis suggesting that both otoacoustic emission tests should be used in assessing hearing status. Our multivariate analyses revealed that age is a constant predictor factor of the auditory status for both ears, but the presence of tinnitus was the most important predictor for the hearing level, only for the left ear. Age presented similar coefficients, but tinnitus coefficients, by their high value, produced the highest variations of the logistic scores, only for the left ear group, thus increasing the risk of hearing loss. We did not find gender differences between ears for any otoacoustic emission tests, but studies still debate this question as the results are contradictory. Neither gender, nor environment origin had any predictive value for the hearing status, according to the results of our study. Conclusion Like any other audiological test, using otoacoustic emissions to identify hearing loss is not without error. Even when applying multivariate analysis, perfect test performance is never achieved. Although most studies demonstrated the benefit of using the multivariate analysis, it has not been incorporated into clinical decisions maybe because of the idiosyncratic nature of multivariate solutions or because of the lack of the validation studies. PMID:26733749
Alves, Darlan Daniel; Riegel, Roberta Plangg; de Quevedo, Daniela Müller; Osório, Daniela Montanari Migliavacca; da Costa, Gustavo Marques; do Nascimento, Carlos Augusto; Telöken, Franko
2018-06-08
Assessment of surface water quality is an issue of currently high importance, especially in polluted rivers which provide water for treatment and distribution as drinking water, as is the case of the Sinos River, southern Brazil. Multivariate statistical techniques allow a better understanding of the seasonal variations in water quality, as well as the source identification and source apportionment of water pollution. In this study, the multivariate statistical techniques of cluster analysis (CA), principal component analysis (PCA), and positive matrix factorization (PMF) were used, along with the Kruskal-Wallis test and Spearman's correlation analysis in order to interpret a water quality data set resulting from a monitoring program conducted over a period of almost two years (May 2013 to April 2015). The water samples were collected from the raw water inlet of the municipal water treatment plant (WTP) operated by the Water and Sewage Services of Novo Hamburgo (COMUSA). CA allowed the data to be grouped into three periods (autumn and summer (AUT-SUM); winter (WIN); spring (SPR)). Through the PCA, it was possible to identify that the most important parameters in contribution to water quality variations are total coliforms (TCOLI) in SUM-AUT, water level (WL), water temperature (WT), and electrical conductivity (EC) in WIN and color (COLOR) and turbidity (TURB) in SPR. PMF was applied to the complete data set and enabled the source apportionment water pollution through three factors, which are related to anthropogenic sources, such as the discharge of domestic sewage (mostly represented by Escherichia coli (ECOLI)), industrial wastewaters, and agriculture runoff. The results provided by this study demonstrate the contribution provided by the use of integrated statistical techniques in the interpretation and understanding of large data sets of water quality, showing also that this approach can be used as an efficient methodology to optimize indicators for water quality assessment.
GAISE 2016 Promotes Statistical Literacy
ERIC Educational Resources Information Center
Schield, Milo
2017-01-01
In the 2005 Guidelines for Assessment and Instruction in Statistics Education (GAISE), statistical literacy featured as a primary goal. The 2016 revision eliminated statistical literacy as a stated goal. Although this looks like a rejection, this paper argues that by including multivariate thinking and--more importantly--confounding as recommended…
Moya, Claudio E; Raiber, Matthias; Taulis, Mauricio; Cox, Malcolm E
2015-03-01
The Galilee and Eromanga basins are sub-basins of the Great Artesian Basin (GAB). In this study, a multivariate statistical approach (hierarchical cluster analysis, principal component analysis and factor analysis) is carried out to identify hydrochemical patterns and assess the processes that control hydrochemical evolution within key aquifers of the GAB in these basins. The results of the hydrochemical assessment are integrated into a 3D geological model (previously developed) to support the analysis of spatial patterns of hydrochemistry, and to identify the hydrochemical and hydrological processes that control hydrochemical variability. In this area of the GAB, the hydrochemical evolution of groundwater is dominated by evapotranspiration near the recharge area resulting in a dominance of the Na-Cl water types. This is shown conceptually using two selected cross-sections which represent discrete groundwater flow paths from the recharge areas to the deeper parts of the basins. With increasing distance from the recharge area, a shift towards a dominance of carbonate (e.g. Na-HCO3 water type) has been observed. The assessment of hydrochemical changes along groundwater flow paths highlights how aquifers are separated in some areas, and how mixing between groundwater from different aquifers occurs elsewhere controlled by geological structures, including between GAB aquifers and coal bearing strata of the Galilee Basin. The results of this study suggest that distinct hydrochemical differences can be observed within the previously defined Early Cretaceous-Jurassic aquifer sequence of the GAB. A revision of the two previously recognised hydrochemical sequences is being proposed, resulting in three hydrochemical sequences based on systematic differences in hydrochemistry, salinity and dominant hydrochemical processes. The integrated approach presented in this study which combines different complementary multivariate statistical techniques with a detailed assessment of the geological framework of these sedimentary basins, can be adopted in other complex multi-aquifer systems to assess hydrochemical evolution and its geological controls. Copyright © 2014 Elsevier B.V. All rights reserved.
Cain, Meghan K; Zhang, Zhiyong; Yuan, Ke-Hai
2017-10-01
Nonnormality of univariate data has been extensively examined previously (Blanca et al., Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 9(2), 78-84, 2013; Miceeri, Psychological Bulletin, 105(1), 156, 1989). However, less is known of the potential nonnormality of multivariate data although multivariate analysis is commonly used in psychological and educational research. Using univariate and multivariate skewness and kurtosis as measures of nonnormality, this study examined 1,567 univariate distriubtions and 254 multivariate distributions collected from authors of articles published in Psychological Science and the American Education Research Journal. We found that 74 % of univariate distributions and 68 % multivariate distributions deviated from normal distributions. In a simulation study using typical values of skewness and kurtosis that we collected, we found that the resulting type I error rates were 17 % in a t-test and 30 % in a factor analysis under some conditions. Hence, we argue that it is time to routinely report skewness and kurtosis along with other summary statistics such as means and variances. To facilitate future report of skewness and kurtosis, we provide a tutorial on how to compute univariate and multivariate skewness and kurtosis by SAS, SPSS, R and a newly developed Web application.
Ramdani, Sofiane; Bonnet, Vincent; Tallon, Guillaume; Lagarde, Julien; Bernard, Pierre Louis; Blain, Hubert
2016-08-01
Entropy measures are often used to quantify the regularity of postural sway time series. Recent methodological developments provided both multivariate and multiscale approaches allowing the extraction of complexity features from physiological signals; see "Dynamical complexity of human responses: A multivariate data-adaptive framework," in Bulletin of Polish Academy of Science and Technology, vol. 60, p. 433, 2012. The resulting entropy measures are good candidates for the analysis of bivariate postural sway signals exhibiting nonstationarity and multiscale properties. These methods are dependant on several input parameters such as embedding parameters. Using two data sets collected from institutionalized frail older adults, we numerically investigate the behavior of a recent multivariate and multiscale entropy estimator; see "Multivariate multiscale entropy: A tool for complexity analysis of multichannel data," Physics Review E, vol. 84, p. 061918, 2011. We propose criteria for the selection of the input parameters. Using these optimal parameters, we statistically compare the multivariate and multiscale entropy values of postural sway data of non-faller subjects to those of fallers. These two groups are discriminated by the resulting measures over multiple time scales. We also demonstrate that the typical parameter settings proposed in the literature lead to entropy measures that do not distinguish the two groups. This last result confirms the importance of the selection of appropriate input parameters.
ERIC Educational Resources Information Center
Thissen, David; Wainer, Howard
Simulation studies of the performance of (potentially) robust statistical estimation produce large quantities of numbers in the form of performance indices of the various estimators under various conditions. This report presents a multivariate graphical display used to aid in the digestion of the plentiful results in a current study of Item…
Bayesian statistics and Monte Carlo methods
NASA Astrophysics Data System (ADS)
Koch, K. R.
2018-03-01
The Bayesian approach allows an intuitive way to derive the methods of statistics. Probability is defined as a measure of the plausibility of statements or propositions. Three rules are sufficient to obtain the laws of probability. If the statements refer to the numerical values of variables, the so-called random variables, univariate and multivariate distributions follow. They lead to the point estimation by which unknown quantities, i.e. unknown parameters, are computed from measurements. The unknown parameters are random variables, they are fixed quantities in traditional statistics which is not founded on Bayes' theorem. Bayesian statistics therefore recommends itself for Monte Carlo methods, which generate random variates from given distributions. Monte Carlo methods, of course, can also be applied in traditional statistics. The unknown parameters, are introduced as functions of the measurements, and the Monte Carlo methods give the covariance matrix and the expectation of these functions. A confidence region is derived where the unknown parameters are situated with a given probability. Following a method of traditional statistics, hypotheses are tested by determining whether a value for an unknown parameter lies inside or outside the confidence region. The error propagation of a random vector by the Monte Carlo methods is presented as an application. If the random vector results from a nonlinearly transformed vector, its covariance matrix and its expectation follow from the Monte Carlo estimate. This saves a considerable amount of derivatives to be computed, and errors of the linearization are avoided. The Monte Carlo method is therefore efficient. If the functions of the measurements are given by a sum of two or more random vectors with different multivariate distributions, the resulting distribution is generally not known. TheMonte Carlo methods are then needed to obtain the covariance matrix and the expectation of the sum.
Learning investment indicators through data extension
NASA Astrophysics Data System (ADS)
Dvořák, Marek
2017-07-01
Stock prices in the form of time series were analysed using single and multivariate statistical methods. After simple data preprocessing in the form of logarithmic differences, we augmented this single variate time series to a multivariate representation. This method makes use of sliding windows to calculate several dozen of new variables using simple statistic tools like first and second moments as well as more complicated statistic, like auto-regression coefficients and residual analysis, followed by an optional quadratic transformation that was further used for data extension. These were used as a explanatory variables in a regularized logistic LASSO regression which tried to estimate Buy-Sell Index (BSI) from real stock market data.
Wang, Yong; Yao, Xiaomei; Parthasarathy, Ranganathan
2008-01-01
Fourier transform infrared (FTIR) chemical imaging can be used to investigate molecular chemical features of the adhesive/dentin interfaces. However, the information is not straightforward, and is not easily extracted. The objective of this study was to use multivariate analysis methods, principal component analysis and fuzzy c-means clustering, to analyze spectral data in comparison with univariate analysis. The spectral imaging data collected from both the adhesive/healthy dentin and adhesive/caries-affected dentin specimens were used and compared. The univariate statistical methods such as mapping of intensities of specific functional group do not always accurately identify functional group locations and concentrations due to more or less band overlapping in adhesive and dentin. Apart from the ease with which information can be extracted, multivariate methods highlight subtle and often important changes in the spectra that are difficult to observe using univariate methods. The results showed that the multivariate methods gave more satisfactory, interpretable results than univariate methods and were conclusive in showing that they can discriminate and classify differences between healthy dentin and caries-affected dentin within the interfacial regions. It is demonstrated that the multivariate FTIR imaging approaches can be used in the rapid characterization of heterogeneous, complex structure. PMID:18980198
Motegi, Hiromi; Tsuboi, Yuuri; Saga, Ayako; Kagami, Tomoko; Inoue, Maki; Toki, Hideaki; Minowa, Osamu; Noda, Tetsuo; Kikuchi, Jun
2015-11-04
There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers. In addition to classical analyses such as hierarchical cluster analysis, principal component analysis, and partial least squares discriminant analysis, various multivariate strategies, including independent component analysis, non-negative matrix factorization, and multivariate curve resolution, have recently been proposed. However, determining the number of components is problematic. Despite the proposal of several different methods, no satisfactory approach has yet been reported. To resolve this problem, we implemented a new idea: classifying a component as "reliable" or "unreliable" based on the reproducibility of its appearance, regardless of the number of components in the calculation. Using the clustering method for classification, we applied this idea to multivariate curve resolution-alternating least squares (MCR-ALS). Comparisons between conventional and modified methods applied to proton nuclear magnetic resonance ((1)H-NMR) spectral datasets derived from known standard mixtures and biological mixtures (urine and feces of mice) revealed that more plausible results are obtained by the modified method. In particular, clusters containing little information were detected with reliability. This strategy, named "cluster-aided MCR-ALS," will facilitate the attainment of more reliable results in the metabolomics datasets.
Mujica Ascencio, Saul; Choe, ChunSik; Meinke, Martina C; Müller, Rainer H; Maksimov, George V; Wigger-Alberti, Walter; Lademann, Juergen; Darvin, Maxim E
2016-07-01
Propylene glycol is one of the known substances added in cosmetic formulations as a penetration enhancer. Recently, nanocrystals have been employed also to increase the skin penetration of active components. Caffeine is a component with many applications and its penetration into the epidermis is controversially discussed in the literature. In the present study, the penetration ability of two components - caffeine nanocrystals and propylene glycol, applied topically on porcine ear skin in the form of a gel, was investigated ex vivo using two confocal Raman microscopes operated at different excitation wavelengths (785nm and 633nm). Several depth profiles were acquired in the fingerprint region and different spectral ranges, i.e., 526-600cm(-1) and 810-880cm(-1) were chosen for independent analysis of caffeine and propylene glycol penetration into the skin, respectively. Multivariate statistical methods such as principal component analysis (PCA) and linear discriminant analysis (LDA) combined with Student's t-test were employed to calculate the maximum penetration depths of each substance (caffeine and propylene glycol). The results show that propylene glycol penetrates significantly deeper than caffeine (20.7-22.0μm versus 12.3-13.0μm) without any penetration enhancement effect on caffeine. The results confirm that different substances, even if applied onto the skin as a mixture, can penetrate differently. The penetration depths of caffeine and propylene glycol obtained using two different confocal Raman microscopes are comparable showing that both types of microscopes are well suited for such investigations and that multivariate statistical PCA-LDA methods combined with Student's t-test are very useful for analyzing the penetration of different substances into the skin. Copyright © 2016 Elsevier B.V. All rights reserved.
Longobardi, F; Ventrella, A; Bianco, A; Catucci, L; Cafagna, I; Gallo, V; Mastrorilli, P; Agostiano, A
2013-12-01
In this study, non-targeted (1)H NMR fingerprinting was used in combination with multivariate statistical techniques for the classification of Italian sweet cherries based on their different geographical origins (Emilia Romagna and Puglia). As classification techniques, Soft Independent Modelling of Class Analogy (SIMCA), Partial Least Squares Discriminant Analysis (PLS-DA), and Linear Discriminant Analysis (LDA) were carried out and the results were compared. For LDA, before performing a refined selection of the number/combination of variables, two different strategies for a preliminary reduction of the variable number were tested. The best average recognition and CV prediction abilities (both 100.0%) were obtained for all the LDA models, although PLS-DA also showed remarkable performances (94.6%). All the statistical models were validated by observing the prediction abilities with respect to an external set of cherry samples. The best result (94.9%) was obtained with LDA by performing a best subset selection procedure on a set of 30 principal components previously selected by a stepwise decorrelation. The metabolites that mostly contributed to the classification performances of such LDA model, were found to be malate, glucose, fructose, glutamine and succinate. Copyright © 2013 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Flach, Milan; Mahecha, Miguel; Gans, Fabian; Rodner, Erik; Bodesheim, Paul; Guanche-Garcia, Yanira; Brenning, Alexander; Denzler, Joachim; Reichstein, Markus
2016-04-01
The number of available Earth observations (EOs) is currently substantially increasing. Detecting anomalous patterns in these multivariate time series is an important step in identifying changes in the underlying dynamical system. Likewise, data quality issues might result in anomalous multivariate data constellations and have to be identified before corrupting subsequent analyses. In industrial application a common strategy is to monitor production chains with several sensors coupled to some statistical process control (SPC) algorithm. The basic idea is to raise an alarm when these sensor data depict some anomalous pattern according to the SPC, i.e. the production chain is considered 'out of control'. In fact, the industrial applications are conceptually similar to the on-line monitoring of EOs. However, algorithms used in the context of SPC or process monitoring are rarely considered for supervising multivariate spatio-temporal Earth observations. The objective of this study is to exploit the potential and transferability of SPC concepts to Earth system applications. We compare a range of different algorithms typically applied by SPC systems and evaluate their capability to detect e.g. known extreme events in land surface processes. Specifically two main issues are addressed: (1) identifying the most suitable combination of data pre-processing and detection algorithm for a specific type of event and (2) analyzing the limits of the individual approaches with respect to the magnitude, spatio-temporal size of the event as well as the data's signal to noise ratio. Extensive artificial data sets that represent the typical properties of Earth observations are used in this study. Our results show that the majority of the algorithms used can be considered for the detection of multivariate spatiotemporal events and directly transferred to real Earth observation data as currently assembled in different projects at the European scale, e.g. http://baci-h2020.eu/index.php/ and http://earthsystemdatacube.net/. Known anomalies such as the Russian heatwave are detected as well as anomalies which are not detectable with univariate methods.
Moseson, Heidi; Gerdts, Caitlin; Dehlendorf, Christine; Hiatt, Robert A; Vittinghoff, Eric
2017-12-21
The list experiment is a promising measurement tool for eliciting truthful responses to stigmatized or sensitive health behaviors. However, investigators may be hesitant to adopt the method due to previously untestable assumptions and the perceived inability to conduct multivariable analysis. With a recently developed statistical test that can detect the presence of a design effect - the absence of which is a central assumption of the list experiment method - we sought to test the validity of a list experiment conducted on self-reported abortion in Liberia. We also aim to introduce recently developed multivariable regression estimators for the analysis of list experiment data, to explore relationships between respondent characteristics and having had an abortion - an important component of understanding the experiences of women who have abortions. To test the null hypothesis of no design effect in the Liberian list experiment data, we calculated the percentage of each respondent "type," characterized by response to the control items, and compared these percentages across treatment and control groups with a Bonferroni-adjusted alpha criterion. We then implemented two least squares and two maximum likelihood models (four total), each representing different bias-variance trade-offs, to estimate the association between respondent characteristics and abortion. We find no clear evidence of a design effect in list experiment data from Liberia (p = 0.18), affirming the first key assumption of the method. Multivariable analyses suggest a negative association between education and history of abortion. The retrospective nature of measuring lifetime experience of abortion, however, complicates interpretation of results, as the timing and safety of a respondent's abortion may have influenced her ability to pursue an education. Our work demonstrates that multivariable analyses, as well as statistical testing of a key design assumption, are possible with list experiment data, although with important limitations when considering lifetime measures. We outline how to implement this methodology with list experiment data in future research.
Gordon, Derek; Londono, Douglas; Patel, Payal; Kim, Wonkuk; Finch, Stephen J; Heiman, Gary A
2016-01-01
Our motivation here is to calculate the power of 3 statistical tests used when there are genetic traits that operate under a pleiotropic mode of inheritance and when qualitative phenotypes are defined by use of thresholds for the multiple quantitative phenotypes. Specifically, we formulate a multivariate function that provides the probability that an individual has a vector of specific quantitative trait values conditional on having a risk locus genotype, and we apply thresholds to define qualitative phenotypes (affected, unaffected) and compute penetrances and conditional genotype frequencies based on the multivariate function. We extend the analytic power and minimum-sample-size-necessary (MSSN) formulas for 2 categorical data-based tests (genotype, linear trend test [LTT]) of genetic association to the pleiotropic model. We further compare the MSSN of the genotype test and the LTT with that of a multivariate ANOVA (Pillai). We approximate the MSSN for statistics by linear models using a factorial design and ANOVA. With ANOVA decomposition, we determine which factors most significantly change the power/MSSN for all statistics. Finally, we determine which test statistics have the smallest MSSN. In this work, MSSN calculations are for 2 traits (bivariate distributions) only (for illustrative purposes). We note that the calculations may be extended to address any number of traits. Our key findings are that the genotype test usually has lower MSSN requirements than the LTT. More inclusive thresholds (top/bottom 25% vs. top/bottom 10%) have higher sample size requirements. The Pillai test has a much larger MSSN than both the genotype test and the LTT, as a result of sample selection. With these formulas, researchers can specify how many subjects they must collect to localize genes for pleiotropic phenotypes. © 2017 S. Karger AG, Basel.
Multivariate statistical analysis of low-voltage EDS spectrum images
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, I.M.
1998-03-01
Whereas energy-dispersive X-ray spectrometry (EDS) has been used for compositional analysis in the scanning electron microscope for 30 years, the benefits of using low operating voltages for such analyses have been explored only during the last few years. This paper couples low-voltage EDS with two other emerging areas of characterization: spectrum imaging and multivariate statistical analysis. The specimen analyzed for this study was a finished Intel Pentium processor, with the polyimide protective coating stripped off to expose the final active layers.
Quick Overview Scout 2008 Version 1.0
The Scout 2008 version 1.0 statistical software package has been updated from past DOS and Windows versions to provide classical and robust univariate and multivariate graphical and statistical methods that are not typically available in commercial or freeware statistical softwar...
NASA Astrophysics Data System (ADS)
Wang, Yunzhi; Qiu, Yuchen; Thai, Theresa; More, Kathleen; Ding, Kai; Liu, Hong; Zheng, Bin
2016-03-01
How to rationally identify epithelial ovarian cancer (EOC) patients who will benefit from bevacizumab or other antiangiogenic therapies is a critical issue in EOC treatments. The motivation of this study is to quantitatively measure adiposity features from CT images and investigate the feasibility of predicting potential benefit of EOC patients with or without receiving bevacizumab-based chemotherapy treatment using multivariate statistical models built based on quantitative adiposity image features. A dataset involving CT images from 59 advanced EOC patients were included. Among them, 32 patients received maintenance bevacizumab after primary chemotherapy and the remaining 27 patients did not. We developed a computer-aided detection (CAD) scheme to automatically segment subcutaneous fat areas (VFA) and visceral fat areas (SFA) and then extracted 7 adiposity-related quantitative features. Three multivariate data analysis models (linear regression, logistic regression and Cox proportional hazards regression) were performed respectively to investigate the potential association between the model-generated prediction results and the patients' progression-free survival (PFS) and overall survival (OS). The results show that using all 3 statistical models, a statistically significant association was detected between the model-generated results and both of the two clinical outcomes in the group of patients receiving maintenance bevacizumab (p<0.01), while there were no significant association for both PFS and OS in the group of patients without receiving maintenance bevacizumab. Therefore, this study demonstrated the feasibility of using quantitative adiposity-related CT image features based statistical prediction models to generate a new clinical marker and predict the clinical outcome of EOC patients receiving maintenance bevacizumab-based chemotherapy.
The use of multivariate statistics in studies of wildlife habitat
David E. Capen
1981-01-01
This report contains edited and reviewed versions of papers presented at a workshop held at the University of Vermont in April 1980. Topics include sampling avian habitats, multivariate methods, applications, examples, and new approaches to analysis and interpretation.
Rejection of Multivariate Outliers.
1983-05-01
available in Gnanadesikan (1977). 2 The motivation for the present investigation lies in a recent paper of Schvager and Margolin (1982) who derive a... Gnanadesikan , R. (1977). Methods for Statistical Data Analysis of Multivariate Observations. Wiley, New York. [7] Hawkins, D.M. (1980). Identification of
Multivariate analysis: greater insights into complex systems
USDA-ARS?s Scientific Manuscript database
Many agronomic researchers measure and collect multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate (MV) statistical methods encompass the simultaneous analysis of all random variables (RV) measured on each experimental or sampling ...
Multivariate meta-analysis for non-linear and other multi-parameter associations
Gasparrini, A; Armstrong, B; Kenward, M G
2012-01-01
In this paper, we formalize the application of multivariate meta-analysis and meta-regression to synthesize estimates of multi-parameter associations obtained from different studies. This modelling approach extends the standard two-stage analysis used to combine results across different sub-groups or populations. The most straightforward application is for the meta-analysis of non-linear relationships, described for example by regression coefficients of splines or other functions, but the methodology easily generalizes to any setting where complex associations are described by multiple correlated parameters. The modelling framework of multivariate meta-analysis is implemented in the package mvmeta within the statistical environment R. As an illustrative example, we propose a two-stage analysis for investigating the non-linear exposure–response relationship between temperature and non-accidental mortality using time-series data from multiple cities. Multivariate meta-analysis represents a useful analytical tool for studying complex associations through a two-stage procedure. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22807043
Multivariate Non-Symmetric Stochastic Models for Spatial Dependence Models
NASA Astrophysics Data System (ADS)
Haslauer, C. P.; Bárdossy, A.
2017-12-01
A copula based multivariate framework allows more flexibility to describe different kind of dependences than what is possible using models relying on the confining assumption of symmetric Gaussian models: different quantiles can be modelled with a different degree of dependence; it will be demonstrated how this can be expected given process understanding. maximum likelihood based multivariate quantitative parameter estimation yields stable and reliable results; not only improved results in cross-validation based measures of uncertainty are obtained but also a more realistic spatial structure of uncertainty compared to second order models of dependence; as much information as is available is included in the parameter estimation: incorporation of censored measurements (e.g., below detection limit, or ones that are above the sensitive range of the measurement device) yield to more realistic spatial models; the proportion of true zeros can be jointly estimated with and distinguished from censored measurements which allow estimates about the age of a contaminant in the system; secondary information (categorical and on the rational scale) has been used to improve the estimation of the primary variable; These copula based multivariate statistical techniques are demonstrated based on hydraulic conductivity observations at the Borden (Canada) site, the MADE site (USA), and a large regional groundwater quality data-set in south-west Germany. Fields of spatially distributed K were simulated with identical marginal simulation, identical second order spatial moments, yet substantially differing solute transport characteristics when numerical tracer tests were performed. A statistical methodology is shown that allows the delineation of a boundary layer separating homogenous parts of a spatial data-set. The effects of this boundary layer (macro structure) and the spatial dependence of K (micro structure) on solute transport behaviour is shown.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mayer, B. P.; Valdez, C. A.; DeHope, A. J.
Critical to many modern forensic investigations is the chemical attribution of the origin of an illegal drug. This process greatly relies on identification of compounds indicative of its clandestine or commercial production. The results of these studies can yield detailed information on method of manufacture, sophistication of the synthesis operation, starting material source, and final product. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic 3- methylfentanyl, N-(3-methyl-1-phenethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods were studied in an effort to identify and classify route-specific signatures. These methods were chosen to minimize the use of scheduledmore » precursors, complicated laboratory equipment, number of overall steps, and demanding reaction conditions. Using gas and liquid chromatographies combined with mass spectrometric methods (GC-QTOF and LC-QTOF) in conjunction with inductivelycoupled plasma mass spectrometry (ICP-MS), over 240 distinct compounds and elements were monitored. As seen in our previous work with CAS of fentanyl synthesis the complexity of the resultant data matrix necessitated the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 62 statistically significant, route-specific CAS were identified. Statistical classification models using a variety of machine learning techniques were then developed with the ability to predict the method of 3-methylfentanyl synthesis from three blind crude samples generated by synthetic chemists without prior experience with these methods.« less
NASA Technical Reports Server (NTRS)
Morrissey, L. A.; Weinstock, K. J.; Mouat, D. A.; Card, D. H.
1984-01-01
An evaluation of Thematic Mapper Simulator (TMS) data for the geobotanical discrimination of rock types based on vegetative cover characteristics is addressed in this research. A methodology for accomplishing this evaluation utilizing univariate and multivariate techniques is presented. TMS data acquired with a Daedalus DEI-1260 multispectral scanner were integrated with vegetation and geologic information for subsequent statistical analyses, which included a chi-square test, an analysis of variance, stepwise discriminant analysis, and Duncan's multiple range test. Results indicate that ultramafic rock types are spectrally separable from nonultramafics based on vegetative cover through the use of statistical analyses.
ERIC Educational Resources Information Center
Baker, Bruce D.; Richards, Craig E.
1999-01-01
Applies neural network methods for forecasting 1991-95 per-pupil expenditures in U.S. public elementary and secondary schools. Forecasting models included the National Center for Education Statistics' multivariate regression model and three neural architectures. Regarding prediction accuracy, neural network results were comparable or superior to…
ERIC Educational Resources Information Center
Hamaker, Ellen L.; Dolan, Conor V.; Molenaar, Peter C. M.
2005-01-01
Results obtained with interindividual techniques in a representative sample of a population are not necessarily generalizable to the individual members of this population. In this article the specific condition is presented that must be satisfied to generalize from the interindividual level to the intraindividual level. A way to investigate…
Predictors of effects of lifestyle intervention on diabetes mellitus type 2 patients.
Jacobsen, Ramune; Vadstrup, Eva; Røder, Michael; Frølich, Anne
2012-01-01
The main aim of the study was to identify predictors of the effects of lifestyle intervention on diabetes mellitus type 2 patients by means of multivariate analysis. Data from a previously published randomised clinical trial, which compared the effects of a rehabilitation programme including standardised education and physical training sessions in the municipality's health care centre with the same duration of individual counseling in the diabetes outpatient clinic, were used. Data from 143 diabetes patients were analysed. The merged lifestyle intervention resulted in statistically significant improvements in patients' systolic blood pressure, waist circumference, exercise capacity, glycaemic control, and some aspects of general health-related quality of life. The linear multivariate regression models explained 45% to 80% of the variance in these improvements. The baseline outcomes in accordance to the logic of the regression to the mean phenomenon were the only statistically significant and robust predictors in all regression models. These results are important from a clinical point of view as they highlight the more urgent need for and better outcomes following lifestyle intervention for those patients who have worse general and disease-specific health.
Ferreira, Fábio S.; Pereira, João M.S.; Duarte, João V.; Castelo-Branco, Miguel
2017-01-01
Background: Although voxel based morphometry studies are still the standard for analyzing brain structure, their dependence on massive univariate inferential methods is a limiting factor. A better understanding of brain pathologies can be achieved by applying inferential multivariate methods, which allow the study of multiple dependent variables, e.g. different imaging modalities of the same subject. Objective: Given the widespread use of SPM software in the brain imaging community, the main aim of this work is the implementation of massive multivariate inferential analysis as a toolbox in this software package. applied to the use of T1 and T2 structural data from diabetic patients and controls. This implementation was compared with the traditional ANCOVA in SPM and a similar multivariate GLM toolbox (MRM). Method: We implemented the new toolbox and tested it by investigating brain alterations on a cohort of twenty-eight type 2 diabetes patients and twenty-six matched healthy controls, using information from both T1 and T2 weighted structural MRI scans, both separately – using standard univariate VBM - and simultaneously, with multivariate analyses. Results: Univariate VBM replicated predominantly bilateral changes in basal ganglia and insular regions in type 2 diabetes patients. On the other hand, multivariate analyses replicated key findings of univariate results, while also revealing the thalami as additional foci of pathology. Conclusion: While the presented algorithm must be further optimized, the proposed toolbox is the first implementation of multivariate statistics in SPM8 as a user-friendly toolbox, which shows great potential and is ready to be validated in other clinical cohorts and modalities. PMID:28761571
Dong, Chunjiao; Clarke, David B; Yan, Xuedong; Khattak, Asad; Huang, Baoshan
2014-09-01
Crash data are collected through police reports and integrated with road inventory data for further analysis. Integrated police reports and inventory data yield correlated multivariate data for roadway entities (e.g., segments or intersections). Analysis of such data reveals important relationships that can help focus on high-risk situations and coming up with safety countermeasures. To understand relationships between crash frequencies and associated variables, while taking full advantage of the available data, multivariate random-parameters models are appropriate since they can simultaneously consider the correlation among the specific crash types and account for unobserved heterogeneity. However, a key issue that arises with correlated multivariate data is the number of crash-free samples increases, as crash counts have many categories. In this paper, we describe a multivariate random-parameters zero-inflated negative binomial (MRZINB) regression model for jointly modeling crash counts. The full Bayesian method is employed to estimate the model parameters. Crash frequencies at urban signalized intersections in Tennessee are analyzed. The paper investigates the performance of MZINB and MRZINB regression models in establishing the relationship between crash frequencies, pavement conditions, traffic factors, and geometric design features of roadway intersections. Compared to the MZINB model, the MRZINB model identifies additional statistically significant factors and provides better goodness of fit in developing the relationships. The empirical results show that MRZINB model possesses most of the desirable statistical properties in terms of its ability to accommodate unobserved heterogeneity and excess zero counts in correlated data. Notably, in the random-parameters MZINB model, the estimated parameters vary significantly across intersections for different crash types. Copyright © 2014 Elsevier Ltd. All rights reserved.
Tourkmani, Abdo Karim; Sánchez-Huerta, Valeria; De Wit, Guillermo; Martínez, Jaime D; Mingo, David; Mahillo-Fernández, Ignacio; Jiménez-Alfaro, Ignacio
2017-01-01
To analyze the relationship between the score obtained in the Risk Score System (RSS) proposed by Hicks et al with penetrating keratoplasty (PKP) graft failure at 1y postoperatively and among each factor in the RSS with the risk of PKP graft failure using univariate and multivariate analysis. The retrospective cohort study had 152 PKPs from 152 patients. Eighteen cases were excluded from our study due to primary failure (10 cases), incomplete medical notes (5 cases) and follow-up less than 1y (3 cases). We included 134 PKPs from 134 patients stratified by preoperative risk score. Spearman coefficient was calculated for the relationship between the score obtained and risk of failure at 1y. Univariate and multivariate analysis were calculated for the impact of every single risk factor included in the RSS over graft failure at 1y. Spearman coefficient showed statistically significant correlation between the score in the RSS and graft failure ( P <0.05). Multivariate logistic regression analysis showed no statistically significant relationship ( P >0.05) between diagnosis and lens status with graft failure. The relationship between the other risk factors studied and graft failure was significant ( P <0.05), although the results for previous grafts and graft failure was unreliable. None of our patients had previous blood transfusion, thus, it had no impact. After the application of multivariate analysis techniques, some risk factors do not show the expected impact over graft failure at 1y.
Multivariate Statistical Modelling of Drought and Heat Wave Events
NASA Astrophysics Data System (ADS)
Manning, Colin; Widmann, Martin; Vrac, Mathieu; Maraun, Douglas; Bevaqua, Emanuele
2016-04-01
Multivariate Statistical Modelling of Drought and Heat Wave Events C. Manning1,2, M. Widmann1, M. Vrac2, D. Maraun3, E. Bevaqua2,3 1. School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham, UK 2. Laboratoire des Sciences du Climat et de l'Environnement, (LSCE-IPSL), Centre d'Etudes de Saclay, Gif-sur-Yvette, France 3. Wegener Center for Climate and Global Change, University of Graz, Brandhofgasse 5, 8010 Graz, Austria Compound extreme events are a combination of two or more contributing events which in themselves may not be extreme but through their joint occurrence produce an extreme impact. Compound events are noted in the latest IPCC report as an important type of extreme event that have been given little attention so far. As part of the CE:LLO project (Compound Events: muLtivariate statisticaL mOdelling) we are developing a multivariate statistical model to gain an understanding of the dependence structure of certain compound events. One focus of this project is on the interaction between drought and heat wave events. Soil moisture has both a local and non-local effect on the occurrence of heat waves where it strongly controls the latent heat flux affecting the transfer of sensible heat to the atmosphere. These processes can create a feedback whereby a heat wave maybe amplified or suppressed by the soil moisture preconditioning, and vice versa, the heat wave may in turn have an effect on soil conditions. An aim of this project is to capture this dependence in order to correctly describe the joint probabilities of these conditions and the resulting probability of their compound impact. We will show an application of Pair Copula Constructions (PCCs) to study the aforementioned compound event. PCCs allow in theory for the formulation of multivariate dependence structures in any dimension where the PCC is a decomposition of a multivariate distribution into a product of bivariate components modelled using copulas. A copula is a multivariate distribution function which allows one to model the dependence structure of given variables separately from the marginal behaviour. We firstly look at the structure of soil moisture drought over the entire of France using the SAFRAN dataset between 1959 and 2009. Soil moisture is represented using the Standardised Precipitation Evapotranspiration Index (SPEI). Drought characteristics are computed at grid point scale where drought conditions are identified as those with an SPEI value below -1.0. We model the multivariate dependence structure of drought events defined by certain characteristics and compute return levels of these events. We initially find that drought characteristics such as duration, mean SPEI and the maximum contiguous area to a grid point all have positive correlations, though the degree to which they are correlated can vary considerably spatially. A spatial representation of return levels then may provide insight into the areas most prone to drought conditions. As a next step, we analyse the dependence structure between soil moisture conditions preceding the onset of a heat wave and the heat wave itself.
Multivariate methods to visualise colour-space and colour discrimination data.
Hastings, Gareth D; Rubin, Alan
2015-01-01
Despite most modern colour spaces treating colour as three-dimensional (3-D), colour data is usually not visualised in 3-D (and two-dimensional (2-D) projection-plane segments and multiple 2-D perspective views are used instead). The objectives of this article are firstly, to introduce a truly 3-D percept of colour space using stereo-pairs, secondly to view colour discrimination data using that platform, and thirdly to apply formal statistics and multivariate methods to analyse the data in 3-D. This is the first demonstration of the software that generated stereo-pairs of RGB colour space, as well as of a new computerised procedure that investigated colour discrimination by measuring colour just noticeable differences (JND). An initial pilot study and thorough investigation of instrument repeatability were performed. Thereafter, to demonstrate the capabilities of the software, five colour-normal and one colour-deficient subject were examined using the JND procedure and multivariate methods of data analysis. Scatter plots of responses were meaningfully examined in 3-D and were useful in evaluating multivariate normality as well as identifying outliers. The extent and direction of the difference between each JND response and the stimulus colour point was calculated and appreciated in 3-D. Ellipsoidal surfaces of constant probability density (distribution ellipsoids) were fitted to response data; the volumes of these ellipsoids appeared useful in differentiating the colour-deficient subject from the colour-normals. Hypothesis tests of variances and covariances showed many statistically significant differences between the results of the colour-deficient subject and those of the colour-normals, while far fewer differences were found when comparing within colour-normals. The 3-D visualisation of colour data using stereo-pairs, as well as the statistics and multivariate methods of analysis employed, were found to be unique and useful tools in the representation and study of colour. Many additional studies using these methods along with the JND and other procedures have been identified and will be reported in future publications. © 2014 The Authors Ophthalmic & Physiological Optics © 2014 The College of Optometrists.
Interpreting support vector machine models for multivariate group wise analysis in neuroimaging
Gaonkar, Bilwaj; Shinohara, Russell T; Davatzikos, Christos
2015-01-01
Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier’s decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification. PMID:26210913
NASA Technical Reports Server (NTRS)
Torres-Pomales, Wilfredo
2014-01-01
This report describes a modeling and simulation approach for disturbance patterns representative of the environment experienced by a digital system in an electromagnetic reverberation chamber. The disturbance is modeled by a multi-variate statistical distribution based on empirical observations. Extended versions of the Rejection Samping and Inverse Transform Sampling techniques are developed to generate multi-variate random samples of the disturbance. The results show that Inverse Transform Sampling returns samples with higher fidelity relative to the empirical distribution. This work is part of an ongoing effort to develop a resilience assessment methodology for complex safety-critical distributed systems.
Multivariate Analysis and Prediction of Dioxin-Furan ...
Peer Review Draft of Regional Methods Initiative Final Report Dioxins, which are bioaccumulative and environmentally persistent, pose an ongoing risk to human and ecosystem health. Fish constitute a significant source of dioxin exposure for humans and fish-eating wildlife. Current dioxin analytical methods are costly, time-consuming, and produce hazardous by-products. A Danish team developed a novel, multivariate statistical methodology based on the covariance of dioxin-furan congener Toxic Equivalences (TEQs) and fatty acid methyl esters (FAMEs) and applied it to North Atlantic Ocean fishmeal samples. The goal of the current study was to attempt to extend this Danish methodology to 77 whole and composite fish samples from three trophic groups: predator (whole largemouth bass), benthic (whole flathead and channel catfish) and forage fish (composite bluegill, pumpkinseed and green sunfish) from two dioxin contaminated rivers (Pocatalico R. and Kanawha R.) in West Virginia, USA. Multivariate statistical analyses, including, Principal Components Analysis (PCA), Hierarchical Clustering, and Partial Least Squares Regression (PLS), were used to assess the relationship between the FAMEs and TEQs in these dioxin contaminated freshwater fish from the Kanawha and Pocatalico Rivers. These three multivariate statistical methods all confirm that the pattern of Fatty Acid Methyl Esters (FAMEs) in these freshwater fish covaries with and is predictive of the WHO TE
Williams, L. Keoki; Buu, Anne
2017-01-01
We propose a multivariate genome-wide association test for mixed continuous, binary, and ordinal phenotypes. A latent response model is used to estimate the correlation between phenotypes with different measurement scales so that the empirical distribution of the Fisher’s combination statistic under the null hypothesis is estimated efficiently. The simulation study shows that our proposed correlation estimation methods have high levels of accuracy. More importantly, our approach conservatively estimates the variance of the test statistic so that the type I error rate is controlled. The simulation also shows that the proposed test maintains the power at the level very close to that of the ideal analysis based on known latent phenotypes while controlling the type I error. In contrast, conventional approaches–dichotomizing all observed phenotypes or treating them as continuous variables–could either reduce the power or employ a linear regression model unfit for the data. Furthermore, the statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that conducting a multivariate test on multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests. The proposed method also offers a new approach to analyzing the Fagerström Test for Nicotine Dependence as multivariate phenotypes in genome-wide association studies. PMID:28081206
Comparison of connectivity analyses for resting state EEG data
NASA Astrophysics Data System (ADS)
Olejarczyk, Elzbieta; Marzetti, Laura; Pizzella, Vittorio; Zappasodi, Filippo
2017-06-01
Objective. In the present work, a nonlinear measure (transfer entropy, TE) was used in a multivariate approach for the analysis of effective connectivity in high density resting state EEG data in eyes open and eyes closed. Advantages of the multivariate approach in comparison to the bivariate one were tested. Moreover, the multivariate TE was compared to an effective linear measure, i.e. directed transfer function (DTF). Finally, the existence of a relationship between the information transfer and the level of brain synchronization as measured by phase synchronization value (PLV) was investigated. Approach. The comparison between the connectivity measures, i.e. bivariate versus multivariate TE, TE versus DTF, TE versus PLV, was performed by means of statistical analysis of indexes based on graph theory. Main results. The multivariate approach is less sensitive to false indirect connections with respect to the bivariate estimates. The multivariate TE differentiated better between eyes closed and eyes open conditions compared to DTF. Moreover, the multivariate TE evidenced non-linear phenomena in information transfer, which are not evidenced by the use of DTF. We also showed that the target of information flow, in particular the frontal region, is an area of greater brain synchronization. Significance. Comparison of different connectivity analysis methods pointed to the advantages of nonlinear methods, and indicated a relationship existing between the flow of information and the level of synchronization of the brain.
Friedman, David B
2012-01-01
All quantitative proteomics experiments measure variation between samples. When performing large-scale experiments that involve multiple conditions or treatments, the experimental design should include the appropriate number of individual biological replicates from each condition to enable the distinction between a relevant biological signal from technical noise. Multivariate statistical analyses, such as principal component analysis (PCA), provide a global perspective on experimental variation, thereby enabling the assessment of whether the variation describes the expected biological signal or the unanticipated technical/biological noise inherent in the system. Examples will be shown from high-resolution multivariable DIGE experiments where PCA was instrumental in demonstrating biologically significant variation as well as sample outliers, fouled samples, and overriding technical variation that would not be readily observed using standard univariate tests.
Applying Sociocultural Theory to Teaching Statistics for Doctoral Social Work Students
ERIC Educational Resources Information Center
Mogro-Wilson, Cristina; Reeves, Michael G.; Charter, Mollie Lazar
2015-01-01
This article describes the development of two doctoral-level multivariate statistics courses utilizing sociocultural theory, an integrative pedagogical framework. In the first course, the implementation of sociocultural theory helps to support the students through a rigorous introduction to statistics. The second course involves students…
A review on the multivariate statistical methods for dimensional reduction studies
NASA Astrophysics Data System (ADS)
Aik, Lim Eng; Kiang, Lam Chee; Mohamed, Zulkifley Bin; Hong, Tan Wei
2017-05-01
In this research study we have discussed multivariate statistical methods for dimensional reduction, which has been done by various researchers. The reduction of dimensionality is valuable to accelerate algorithm progression, as well as really may offer assistance with the last grouping/clustering precision. A lot of boisterous or even flawed info information regularly prompts a not exactly alluring algorithm progression. Expelling un-useful or dis-instructive information segments may for sure help the algorithm discover more broad grouping locales and principles and generally speaking accomplish better exhibitions on new data set.
Extending local canonical correlation analysis to handle general linear contrasts for FMRI data.
Jin, Mingwu; Nandy, Rajesh; Curran, Tim; Cordes, Dietmar
2012-01-01
Local canonical correlation analysis (CCA) is a multivariate method that has been proposed to more accurately determine activation patterns in fMRI data. In its conventional formulation, CCA has several drawbacks that limit its usefulness in fMRI. A major drawback is that, unlike the general linear model (GLM), a test of general linear contrasts of the temporal regressors has not been incorporated into the CCA formalism. To overcome this drawback, a novel directional test statistic was derived using the equivalence of multivariate multiple regression (MVMR) and CCA. This extension will allow CCA to be used for inference of general linear contrasts in more complicated fMRI designs without reparameterization of the design matrix and without reestimating the CCA solutions for each particular contrast of interest. With the proper constraints on the spatial coefficients of CCA, this test statistic can yield a more powerful test on the inference of evoked brain regional activations from noisy fMRI data than the conventional t-test in the GLM. The quantitative results from simulated and pseudoreal data and activation maps from fMRI data were used to demonstrate the advantage of this novel test statistic.
Gauging Skills of Hospital Security Personnel: a Statistically-driven, Questionnaire-based Approach
Rinkoo, Arvind Vashishta; Mishra, Shubhra; Rahesuddin; Nabi, Tauqeer; Chandra, Vidha; Chandra, Hem
2013-01-01
Objectives This study aims to gauge the technical and soft skills of the hospital security personnel so as to enable prioritization of their training needs. Methodology A cross sectional questionnaire based study was conducted in December 2011. Two separate predesigned and pretested questionnaires were used for gauging soft skills and technical skills of the security personnel. Extensive statistical analysis, including Multivariate Analysis (Pillai-Bartlett trace along with Multi-factorial ANOVA) and Post-hoc Tests (Bonferroni Test) was applied. Results The 143 participants performed better on the soft skills front with an average score of 6.43 and standard deviation of 1.40. The average technical skills score was 5.09 with a standard deviation of 1.44. The study avowed a need for formal hands on training with greater emphasis on technical skills. Multivariate analysis of the available data further helped in identifying 20 security personnel who should be prioritized for soft skills training and a group of 36 security personnel who should receive maximum attention during technical skills training. Conclusion This statistically driven approach can be used as a prototype by healthcare delivery institutions worldwide, after situation specific customizations, to identify the training needs of any category of healthcare staff. PMID:23559904
Extending Local Canonical Correlation Analysis to Handle General Linear Contrasts for fMRI Data
Jin, Mingwu; Nandy, Rajesh; Curran, Tim; Cordes, Dietmar
2012-01-01
Local canonical correlation analysis (CCA) is a multivariate method that has been proposed to more accurately determine activation patterns in fMRI data. In its conventional formulation, CCA has several drawbacks that limit its usefulness in fMRI. A major drawback is that, unlike the general linear model (GLM), a test of general linear contrasts of the temporal regressors has not been incorporated into the CCA formalism. To overcome this drawback, a novel directional test statistic was derived using the equivalence of multivariate multiple regression (MVMR) and CCA. This extension will allow CCA to be used for inference of general linear contrasts in more complicated fMRI designs without reparameterization of the design matrix and without reestimating the CCA solutions for each particular contrast of interest. With the proper constraints on the spatial coefficients of CCA, this test statistic can yield a more powerful test on the inference of evoked brain regional activations from noisy fMRI data than the conventional t-test in the GLM. The quantitative results from simulated and pseudoreal data and activation maps from fMRI data were used to demonstrate the advantage of this novel test statistic. PMID:22461786
Silva, A F; Sarraguça, M C; Fonteyne, M; Vercruysse, J; De Leersnyder, F; Vanhoorne, V; Bostijn, N; Verstraeten, M; Vervaet, C; Remon, J P; De Beer, T; Lopes, J A
2017-08-07
A multivariate statistical process control (MSPC) strategy was developed for the monitoring of the ConsiGma™-25 continuous tablet manufacturing line. Thirty-five logged variables encompassing three major units, being a twin screw high shear granulator, a fluid bed dryer and a product control unit, were used to monitor the process. The MSPC strategy was based on principal component analysis of data acquired under normal operating conditions using a series of four process runs. Runs with imposed disturbances in the dryer air flow and temperature, in the granulator barrel temperature, speed and liquid mass flow and in the powder dosing unit mass flow were utilized to evaluate the model's monitoring performance. The impact of the imposed deviations to the process continuity was also evaluated using Hotelling's T 2 and Q residuals statistics control charts. The influence of the individual process variables was assessed by analyzing contribution plots at specific time points. Results show that the imposed disturbances were all detected in both control charts. Overall, the MSPC strategy was successfully developed and applied. Additionally, deviations not associated with the imposed changes were detected, mainly in the granulator barrel temperature control. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Roy, P. K.; Pal, S.; Banerjee, G.; Biswas Roy, M.; Ray, D.; Majumder, A.
2014-12-01
River is considered as one of the main sources of freshwater all over the world. Hence analysis and maintenance of this water resource is globally considered a matter of major concern. This paper deals with the assessment of surface water quality of the Ichamati river using multivariate statistical techniques. Eight distinct surface water quality observation stations were located and samples were collected. For the samples collected statistical techniques were applied to the physico-chemical parameters and depth of siltation. In this paper cluster analysis is done to determine the relations between surface water quality and siltation depth of river Ichamati. Multiple regressions and mathematical equation modeling have been done to characterize surface water quality of Ichamati river on the basis of physico-chemical parameters. It was found that surface water quality of the downstream river was different from the water quality of the upstream. The analysis of the water quality parameters of the Ichamati river clearly indicate high pollution load on the river water which can be accounted to agricultural discharge, tidal effect and soil erosion. The results further reveal that with the increase in depth of siltation, water quality degraded.
NASA Astrophysics Data System (ADS)
WANG, P. T.
2015-12-01
Groundwater modeling requires to assign hydrogeological properties to every numerical grid. Due to the lack of detailed information and the inherent spatial heterogeneity, geological properties can be treated as random variables. Hydrogeological property is assumed to be a multivariate distribution with spatial correlations. By sampling random numbers from a given statistical distribution and assigning a value to each grid, a random field for modeling can be completed. Therefore, statistics sampling plays an important role in the efficiency of modeling procedure. Latin Hypercube Sampling (LHS) is a stratified random sampling procedure that provides an efficient way to sample variables from their multivariate distributions. This study combines the the stratified random procedure from LHS and the simulation by using LU decomposition to form LULHS. Both conditional and unconditional simulations of LULHS were develpoed. The simulation efficiency and spatial correlation of LULHS are compared to the other three different simulation methods. The results show that for the conditional simulation and unconditional simulation, LULHS method is more efficient in terms of computational effort. Less realizations are required to achieve the required statistical accuracy and spatial correlation.
Generating an Empirical Probability Distribution for the Andrews-Pregibon Statistic.
ERIC Educational Resources Information Center
Jarrell, Michele G.
A probability distribution was developed for the Andrews-Pregibon (AP) statistic. The statistic, developed by D. F. Andrews and D. Pregibon (1978), identifies multivariate outliers. It is a ratio of the determinant of the data matrix with an observation deleted to the determinant of the entire data matrix. Although the AP statistic has been used…
NASA Astrophysics Data System (ADS)
Hanrieder, Jörg; Ewing, Andrew G.
2014-06-01
Amyotrophic lateral sclerosis (ALS) is a devastating, rapidly progressing disease of the central nervous system that is characterized by motor neuron degeneration in the brain stem and the spinal cord. We employed time of flight secondary ion mass spectrometry (ToF-SIMS) to profile spatial lipid- and metabolite- regulations in post mortem human spinal cord tissue from ALS patients to investigate chemical markers of ALS pathogenesis. ToF-SIMS scans and multivariate analysis of image and spectral data were performed on thoracic human spinal cord sections. Multivariate statistics of the image data allowed delineation of anatomical regions of interest based on their chemical identity. Spectral data extracted from these regions were compared using two different approaches for multivariate statistics, for investigating ALS related lipid and metabolite changes. The results show a significant decrease for cholesterol, triglycerides, and vitamin E in the ventral horn of ALS samples, which is presumably a consequence of motor neuron degeneration. Conversely, the biogenic mediator lipid lysophosphatidylcholine and its fragments were increased in ALS ventral spinal cord, pointing towards neuroinflammatory mechanisms associated with neuronal cell death. ToF-SIMS imaging is a promising approach for chemical histology and pathology for investigating the subcellular mechanisms underlying motor neuron degeneration in amyotrophic lateral sclerosis.
2013-01-01
Background Cognitive complaints are reported frequently after breast cancer treatments. Their association with neuropsychological (NP) test performance is not well-established. Methods Early-stage, posttreatment breast cancer patients were enrolled in a prospective, longitudinal, cohort study prior to starting endocrine therapy. Evaluation included an NP test battery and self-report questionnaires assessing symptoms, including cognitive complaints. Multivariable regression models assessed associations among cognitive complaints, mood, treatment exposures, and NP test performance. Results One hundred eighty-nine breast cancer patients, aged 21–65 years, completed the evaluation; 23.3% endorsed higher memory complaints and 19.0% reported higher executive function complaints (>1 SD above the mean for healthy control sample). Regression modeling demonstrated a statistically significant association of higher memory complaints with combined chemotherapy and radiation treatments (P = .01), poorer NP verbal memory performance (P = .02), and higher depressive symptoms (P < .001), controlling for age and IQ. For executive functioning complaints, multivariable modeling controlling for age, IQ, and other confounds demonstrated statistically significant associations with better NP visual memory performance (P = .03) and higher depressive symptoms (P < .001), whereas combined chemotherapy and radiation treatment (P = .05) approached statistical significance. Conclusions About one in five post–adjuvant treatment breast cancer patients had elevated memory and/or executive function complaints that were statistically significantly associated with domain-specific NP test performances and depressive symptoms; combined chemotherapy and radiation treatment was also statistically significantly associated with memory complaints. These results and other emerging studies suggest that subjective cognitive complaints in part reflect objective NP performance, although their etiology and biology appear to be multifactorial, motivating further transdisciplinary research. PMID:23606729
Stamate, Mirela Cristina; Todor, Nicolae; Cosgarea, Marcel
2015-01-01
The clinical utility of otoacoustic emissions as a noninvasive objective test of cochlear function has been long studied. Both transient otoacoustic emissions and distorsion products can be used to identify hearing loss, but to what extent they can be used as predictors for hearing loss is still debated. Most studies agree that multivariate analyses have better test performances than univariate analyses. The aim of the study was to determine transient otoacoustic emissions and distorsion products performance in identifying normal and impaired hearing loss, using the pure tone audiogram as a gold standard procedure and different multivariate statistical approaches. The study included 105 adult subjects with normal hearing and hearing loss who underwent the same test battery: pure-tone audiometry, tympanometry, otoacoustic emission tests. We chose to use the logistic regression as a multivariate statistical technique. Three logistic regression models were developed to characterize the relations between different risk factors (age, sex, tinnitus, demographic features, cochlear status defined by otoacoustic emissions) and hearing status defined by pure-tone audiometry. The multivariate analyses allow the calculation of the logistic score, which is a combination of the inputs, weighted by coefficients, calculated within the analyses. The accuracy of each model was assessed using receiver operating characteristics curve analysis. We used the logistic score to generate receivers operating curves and to estimate the areas under the curves in order to compare different multivariate analyses. We compared the performance of each otoacoustic emission (transient, distorsion product) using three different multivariate analyses for each ear, when multi-frequency gold standards were used. We demonstrated that all multivariate analyses provided high values of the area under the curve proving the performance of the otoacoustic emissions. Each otoacoustic emission test presented high values of area under the curve, suggesting that implementing a multivariate approach to evaluate the performances of each otoacoustic emission test would serve to increase the accuracy in identifying the normal and impaired ears. We encountered the highest area under the curve value for the combined multivariate analysis suggesting that both otoacoustic emission tests should be used in assessing hearing status. Our multivariate analyses revealed that age is a constant predictor factor of the auditory status for both ears, but the presence of tinnitus was the most important predictor for the hearing level, only for the left ear. Age presented similar coefficients, but tinnitus coefficients, by their high value, produced the highest variations of the logistic scores, only for the left ear group, thus increasing the risk of hearing loss. We did not find gender differences between ears for any otoacoustic emission tests, but studies still debate this question as the results are contradictory. Neither gender, nor environment origin had any predictive value for the hearing status, according to the results of our study. Like any other audiological test, using otoacoustic emissions to identify hearing loss is not without error. Even when applying multivariate analysis, perfect test performance is never achieved. Although most studies demonstrated the benefit of using the multivariate analysis, it has not been incorporated into clinical decisions maybe because of the idiosyncratic nature of multivariate solutions or because of the lack of the validation studies.
ERIC Educational Resources Information Center
Fouladi, Rachel T.
2000-01-01
Provides an overview of standard and modified normal theory and asymptotically distribution-free covariance and correlation structure analysis techniques and details Monte Carlo simulation results on Type I and Type II error control. Demonstrates through the simulation that robustness and nonrobustness of structure analysis techniques vary as a…
ERIC Educational Resources Information Center
Muslihah, Oleh Eneng
2015-01-01
The research examines the correlation between the understanding of school-based management, emotional intelligences and headmaster performance. Data was collected, using quantitative methods. The statistical analysis used was the Pearson Correlation, and multivariate regression analysis. The results of this research suggest firstly that there is…
Corron, Louise; Marchal, François; Condemi, Silvana; Chaumoître, Kathia; Adalian, Pascal
2017-01-01
Juvenile age estimation methods used in forensic anthropology generally lack methodological consistency and/or statistical validity. Considering this, a standard approach using nonparametric Multivariate Adaptive Regression Splines (MARS) models were tested to predict age from iliac biometric variables of male and female juveniles from Marseilles, France, aged 0-12 years. Models using unidimensional (length and width) and bidimensional iliac data (module and surface) were constructed on a training sample of 176 individuals and validated on an independent test sample of 68 individuals. Results show that MARS prediction models using iliac width, module and area give overall better and statistically valid age estimates. These models integrate punctual nonlinearities of the relationship between age and osteometric variables. By constructing valid prediction intervals whose size increases with age, MARS models take into account the normal increase of individual variability. MARS models can qualify as a practical and standardized approach for juvenile age estimation. © 2016 American Academy of Forensic Sciences.
Handwriting Examination: Moving from Art to Science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jarman, K.H.; Hanlen, R.C.; Manzolillo, P.A.
In this document, we present a method for validating the premises and methodology of forensic handwriting examination. This method is intuitively appealing because it relies on quantitative measurements currently used qualitatively by FDE's in making comparisons, and it is scientifically rigorous because it exploits the power of multivariate statistical analysis. This approach uses measures of both central tendency and variation to construct a profile for a given individual. (Central tendency and variation are important for characterizing an individual's writing and both are currently used by FDE's in comparative analyses). Once constructed, different profiles are then compared for individuality using clustermore » analysis; they are grouped so that profiles within a group cannot be differentiated from one another based on the measured characteristics, whereas profiles between groups can. The cluster analysis procedure used here exploits the power of multivariate hypothesis testing. The result is not only a profile grouping but also an indication of statistical significance of the groups generated.« less
Effect of environment and genotype on commercial maize hybrids using LC/MS-based metabolomics.
Baniasadi, Hamid; Vlahakis, Chris; Hazebroek, Jan; Zhong, Cathy; Asiago, Vincent
2014-02-12
We recently applied gas chromatography coupled to time-of-flight mass spectrometry (GC/TOF-MS) and multivariate statistical analysis to measure biological variation of many metabolites due to environment and genotype in forage and grain samples collected from 50 genetically diverse nongenetically modified (non-GM) DuPont Pioneer commercial maize hybrids grown at six North American locations. In the present study, the metabolome coverage was extended using a core subset of these grain and forage samples employing ultra high pressure liquid chromatography (uHPLC) mass spectrometry (LC/MS). A total of 286 and 857 metabolites were detected in grain and forage samples, respectively, using LC/MS. Multivariate statistical analysis was utilized to compare and correlate the metabolite profiles. Environment had a greater effect on the metabolome than genetic background. The results of this study support and extend previously published insights into the environmental and genetic associated perturbations to the metabolome that are not associated with transgenic modification.
SPICE: exploration and analysis of post-cytometric complex multivariate datasets.
Roederer, Mario; Nozzi, Joshua L; Nason, Martha C
2011-02-01
Polychromatic flow cytometry results in complex, multivariate datasets. To date, tools for the aggregate analysis of these datasets across multiple specimens grouped by different categorical variables, such as demographic information, have not been optimized. Often, the exploration of such datasets is accomplished by visualization of patterns with pie charts or bar charts, without easy access to statistical comparisons of measurements that comprise multiple components. Here we report on algorithms and a graphical interface we developed for these purposes. In particular, we discuss thresholding necessary for accurate representation of data in pie charts, the implications for display and comparison of normalized versus unnormalized data, and the effects of averaging when samples with significant background noise are present. Finally, we define a statistic for the nonparametric comparison of complex distributions to test for difference between groups of samples based on multi-component measurements. While originally developed to support the analysis of T cell functional profiles, these techniques are amenable to a broad range of datatypes. Published 2011 Wiley-Liss, Inc.
McConkey, Roy; Bunting, Brendan; Keogh, Fiona; Garcia Iriarte, Edurne
2017-01-01
A natural experiment contrasted the social relationships of people with intellectual disabilities ( n = 110) before and after they moved from congregated settings to either personalized accommodation or group homes. Contrasts could also be drawn with individuals who had enduring mental health problems ( n = 46) and who experienced similar moves. Face-to-face interviews were conducted in each person's residence on two occasions approximately 24 months apart. Multivariate statistical analyses were used to determine significant effects. Greater proportions of people living in personalized settings scored higher on the five chosen indicators of social relationships than did persons living in grouped accommodation. However, multivariate statistical analyses identified that only one in five persons increased their social relationships as a result of changes in their accommodation, particularly persons with an intellectual disability and high support needs. These findings reinforce the extent of social isolation experienced by people with disabilities and mental health problems that changes in their accommodation only partially counter.
Filipiak, Katarzyna; Klein, Daniel; Roy, Anuradha
2017-01-01
The problem of testing the separability of a covariance matrix against an unstructured variance-covariance matrix is studied in the context of multivariate repeated measures data using Rao's score test (RST). The RST statistic is developed with the first component of the separable structure as a first-order autoregressive (AR(1)) correlation matrix or an unstructured (UN) covariance matrix under the assumption of multivariate normality. It is shown that the distribution of the RST statistic under the null hypothesis of any separability does not depend on the true values of the mean or the unstructured components of the separable structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data, where the likelihood ratio test (LRT) cannot be used, and it outperforms the standard LRT in a number of contexts. Monte Carlo simulations are then used to study the comparative behavior of the null distribution of the RST statistic, as well as that of the LRT statistic, in terms of sample size considerations, and for the estimation of the empirical percentiles. Our findings are compared with existing results where the first component of the separable structure is a compound symmetry (CS) correlation matrix. It is also shown by simulations that the empirical null distribution of the RST statistic converges faster than the empirical null distribution of the LRT statistic to the limiting χ 2 distribution. The tests are implemented on a real dataset from medical studies. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Voss, Jesse S; Iqbal, Seher; Jenkins, Sarah M; Henry, Michael R; Clayton, Amy C; Jett, James R; Kipp, Benjamin R; Halling, Kevin C; Maldonado, Fabien
2014-01-01
Studies have shown that fluorescence in situ hybridization (FISH) testing increases lung cancer detection on cytology specimens in peripheral nodules. The goal of this study was to determine whether a predictive model using clinical features and routine cytology with FISH results could predict lung malignancy after a nondiagnostic bronchoscopic evaluation. Patients with an indeterminate peripheral lung nodule that had a nondiagnostic bronchoscopic evaluation were included in this study (N = 220). FISH was performed on residual bronchial brushing cytology specimens diagnosed as negative (n = 195), atypical (n = 16), or suspicious (n = 9). FISH results included hypertetrasomy (n = 30) and negative (n = 190). Primary study end points included lung cancer status along with time to diagnosis of lung cancer or date of last clinical follow-up. Hazard ratios (HRs) were calculated using Cox proportional hazards regression model analyses, and P values < .05 were considered statistically significant. The mean age of the 220 patients was 66.7 years (range, 35-91), and most (58%) were men. Most patients (79%) were current or former smokers with a mean pack year history of 43.2 years (median, 40; range, 1-200). After multivariate analysis, hypertetrasomy FISH (HR = 2.96, P < .001), pack years (HR = 1.03 per pack year up to 50, P = .001), age (HR = 1.04 per year, P = .02), atypical or suspicious cytology (HR = 2.02, P = .04), and nodule spiculation (HR = 2.36, P = .003) were independent predictors of malignancy over time and were used to create a prediction model (C-statistic = 0.78). These results suggest that this multivariate model including test results and clinical features may be useful following a nondiagnostic bronchoscopic examination. © 2013.
Yoshida, Hiroyuki; Shibata, Hiroko; Izutsu, Ken-Ichi; Goda, Yukihiro
2017-01-01
The current Japanese Ministry of Health Labour and Welfare (MHLW)'s Guideline for Bioequivalence Studies of Generic Products uses averaged dissolution rates for the assessment of dissolution similarity between test and reference formulations. This study clarifies how the application of model-independent multivariate confidence region procedure (Method B), described in the European Medical Agency and U.S. Food and Drug Administration guidelines, affects similarity outcomes obtained empirically from dissolution profiles with large variations in individual dissolution rates. Sixty-one datasets of dissolution profiles for immediate release, oral generic, and corresponding innovator products that showed large variation in individual dissolution rates in generic products were assessed on their similarity by using the f 2 statistics defined in the MHLW guidelines (MHLW f 2 method) and two different Method B procedures, including a bootstrap method applied with f 2 statistics (BS method) and a multivariate analysis method using the Mahalanobis distance (MV method). The MHLW f 2 and BS methods provided similar dissolution similarities between reference and generic products. Although a small difference in the similarity assessment may be due to the decrease in the lower confidence interval for expected f 2 values derived from the large variation in individual dissolution rates, the MV method provided results different from those obtained through MHLW f 2 and BS methods. Analysis of actual dissolution data for products with large individual variations would provide valuable information towards an enhanced understanding of these methods and their possible incorporation in the MHLW guidelines.
Endpoint in plasma etch process using new modified w-multivariate charts and windowed regression
NASA Astrophysics Data System (ADS)
Zakour, Sihem Ben; Taleb, Hassen
2017-09-01
Endpoint detection is very important undertaking on the side of getting a good understanding and figuring out if a plasma etching process is done in the right way, especially if the etched area is very small (0.1%). It truly is a crucial part of supplying repeatable effects in every single wafer. When the film being etched has been completely cleared, the endpoint is reached. To ensure the desired device performance on the produced integrated circuit, the high optical emission spectroscopy (OES) sensor is employed. The huge number of gathered wavelengths (profiles) is then analyzed and pre-processed using a new proposed simple algorithm named Spectra peak selection (SPS) to select the important wavelengths, then we employ wavelet analysis (WA) to enhance the performance of detection by suppressing noise and redundant information. The selected and treated OES wavelengths are then used in modified multivariate control charts (MEWMA and Hotelling) for three statistics (mean, SD and CV) and windowed polynomial regression for mean. The employ of three aforementioned statistics is motivated by controlling mean shift, variance shift and their ratio (CV) if both mean and SD are not stable. The control charts show their performance in detecting endpoint especially W-mean Hotelling chart and the worst result is given by CV statistic. As the best detection of endpoint is given by the W-Hotelling mean statistic, this statistic will be used to construct a windowed wavelet Hotelling polynomial regression. This latter can only identify the window containing endpoint phenomenon.
Yokota, Miyo
2005-05-01
In the United States, the biologically admixed population is increasing. Such demographic changes may affect the distribution of anthropometric characteristics, which are incorporated into the design of equipment and clothing for the US Army and other large organizations. The purpose of this study was to examine multivariate craniofacial anthropometric distributions between biologically admixed male populations and single racial groups of Black and White males. Multivariate statistical results suggested that nose breadth and lip length were different between Blacks and Whites. Such differences may be considered for adjustments to respirators and chemical-biological protective masks. However, based on this pilot study, multivariate anthropometric distributions of admixed individuals were within the distributions of single racial groups. Based on the sample reported, sizing and designing for the admixed groups are not necessary if anthropometric distributions of single racial groups comprising admixed groups are known.
NASA Technical Reports Server (NTRS)
Djorgovski, George
1993-01-01
The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multiparameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resource.
NASA Technical Reports Server (NTRS)
Djorgovski, Stanislav
1992-01-01
The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multi parameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resources.
NASA Astrophysics Data System (ADS)
Mercer, Gary J.
This quantitative study examined the relationship between secondary students with math anxiety and physics performance in an inquiry-based constructivist classroom. The Revised Math Anxiety Rating Scale was used to evaluate math anxiety levels. The results were then compared to the performance on a physics standardized final examination. A simple correlation was performed, followed by a multivariate regression analysis to examine effects based on gender and prior math background. The correlation showed statistical significance between math anxiety and physics performance. The regression analysis showed statistical significance for math anxiety, physics performance, and prior math background, but did not show statistical significance for math anxiety, physics performance, and gender.
Dangers in Using Analysis of Covariance Procedures.
ERIC Educational Resources Information Center
Campbell, Kathleen T.
Problems associated with the use of analysis of covariance (ANCOVA) as a statistical control technique are explained. Three problems relate to the use of "OVA" methods (analysis of variance, analysis of covariance, multivariate analysis of variance, and multivariate analysis of covariance) in general. These are: (1) the wasting of information when…
A Multivariate Solution of the Multivariate Ranking and Selection Problem
1980-02-01
Taneja (1972)), a ’a for a vector of constants c (Krishnaiah and Rizvi (1966)), the generalized variance ( Gnanadesikan and Gupta (1970)), iegier (1976...Olk-in, I. and Sobel, M. (1977). Selecting and Ordering Populations: A New Statistical Methodology, John Wiley & Sons, Inc., New York. Gnanadesikan
Evaluation of Meterorite Amono Acid Analysis Data Using Multivariate Techniques
NASA Technical Reports Server (NTRS)
McDonald, G.; Storrie-Lombardi, M.; Nealson, K.
1999-01-01
The amino acid distributions in the Murchison carbonaceous chondrite, Mars meteorite ALH84001, and ice from the Allan Hills region of Antarctica are shown, using a multivariate technique known as Principal Component Analysis (PCA), to be statistically distinct from the average amino acid compostion of 101 terrestrial protein superfamilies.
Zhang, Yong; Zhong, Miner; Geng, Nana; Jiang, Yunjian
2017-01-01
The market demand for electric vehicles (EVs) has increased in recent years. Suitable models are necessary to understand and forecast EV sales. This study presents a singular spectrum analysis (SSA) as a univariate time-series model and vector autoregressive model (VAR) as a multivariate model. Empirical results suggest that SSA satisfactorily indicates the evolving trend and provides reasonable results. The VAR model, which comprised exogenous parameters related to the market on a monthly basis, can significantly improve the prediction accuracy. The EV sales in China, which are categorized into battery and plug-in EVs, are predicted in both short term (up to December 2017) and long term (up to 2020), as statistical proofs of the growth of the Chinese EV industry.
Zhang, Yong; Zhong, Miner; Geng, Nana; Jiang, Yunjian
2017-01-01
The market demand for electric vehicles (EVs) has increased in recent years. Suitable models are necessary to understand and forecast EV sales. This study presents a singular spectrum analysis (SSA) as a univariate time-series model and vector autoregressive model (VAR) as a multivariate model. Empirical results suggest that SSA satisfactorily indicates the evolving trend and provides reasonable results. The VAR model, which comprised exogenous parameters related to the market on a monthly basis, can significantly improve the prediction accuracy. The EV sales in China, which are categorized into battery and plug-in EVs, are predicted in both short term (up to December 2017) and long term (up to 2020), as statistical proofs of the growth of the Chinese EV industry. PMID:28459872
Hohn, M. Ed; Nuhfer, E.B.; Vinopal, R.J.; Klanderman, D.S.
1980-01-01
Classifying very fine-grained rocks through fabric elements provides information about depositional environments, but is subject to the biases of visual taxonomy. To evaluate the statistical significance of an empirical classification of very fine-grained rocks, samples from Devonian shales in four cored wells in West Virginia and Virginia were measured for 15 variables: quartz, illite, pyrite and expandable clays determined by X-ray diffraction; total sulfur, organic content, inorganic carbon, matrix density, bulk density, porosity, silt, as well as density, sonic travel time, resistivity, and ??-ray response measured from well logs. The four lithologic types comprised: (1) sharply banded shale, (2) thinly laminated shale, (3) lenticularly laminated shale, and (4) nonbanded shale. Univariate and multivariate analyses of variance showed that the lithologic classification reflects significant differences for the variables measured, difference that can be detected independently of stratigraphic effects. Little-known statistical methods found useful in this work included: the multivariate analysis of variance with more than one effect, simultaneous plotting of samples and variables on canonical variates, and the use of parametric ANOVA and MANOVA on ranked data. ?? 1980 Plenum Publishing Corporation.
Del Giudice, G; Padulano, R; Siciliano, D
2016-01-01
The lack of geometrical and hydraulic information about sewer networks often excludes the adoption of in-deep modeling tools to obtain prioritization strategies for funds management. The present paper describes a novel statistical procedure for defining the prioritization scheme for preventive maintenance strategies based on a small sample of failure data collected by the Sewer Office of the Municipality of Naples (IT). Novelty issues involve, among others, considering sewer parameters as continuous statistical variables and accounting for their interdependences. After a statistical analysis of maintenance interventions, the most important available factors affecting the process are selected and their mutual correlations identified. Then, after a Box-Cox transformation of the original variables, a methodology is provided for the evaluation of a vulnerability map of the sewer network by adopting a joint multivariate normal distribution with different parameter sets. The goodness-of-fit is eventually tested for each distribution by means of a multivariate plotting position. The developed methodology is expected to assist municipal engineers in identifying critical sewers, prioritizing sewer inspections in order to fulfill rehabilitation requirements.
Buttigieg, Pier Luigi; Ramette, Alban
2014-12-01
The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynamic, web-based resource providing accessible descriptions of numerous multivariate techniques relevant to microbial ecologists. A combination of interactive elements allows users to discover and navigate between methods relevant to their needs and examine how they have been used by others in the field. We have designed GUSTA ME to become a community-led and -curated service, which we hope will provide a common reference and forum to discuss and disseminate analytical techniques relevant to the microbial ecology community. © 2014 The Authors. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies.
Demanuele, Charmaine; Bähner, Florian; Plichta, Michael M; Kirsch, Peter; Tost, Heike; Meyer-Lindenberg, Andreas; Durstewitz, Daniel
2015-01-01
Multivariate pattern analysis can reveal new information from neuroimaging data to illuminate human cognition and its disturbances. Here, we develop a methodological approach, based on multivariate statistical/machine learning and time series analysis, to discern cognitive processing stages from functional magnetic resonance imaging (fMRI) blood oxygenation level dependent (BOLD) time series. We apply this method to data recorded from a group of healthy adults whilst performing a virtual reality version of the delayed win-shift radial arm maze (RAM) task. This task has been frequently used to study working memory and decision making in rodents. Using linear classifiers and multivariate test statistics in conjunction with time series bootstraps, we show that different cognitive stages of the task, as defined by the experimenter, namely, the encoding/retrieval, choice, reward and delay stages, can be statistically discriminated from the BOLD time series in brain areas relevant for decision making and working memory. Discrimination of these task stages was significantly reduced during poor behavioral performance in dorsolateral prefrontal cortex (DLPFC), but not in the primary visual cortex (V1). Experimenter-defined dissection of time series into class labels based on task structure was confirmed by an unsupervised, bottom-up approach based on Hidden Markov Models. Furthermore, we show that different groupings of recorded time points into cognitive event classes can be used to test hypotheses about the specific cognitive role of a given brain region during task execution. We found that whilst the DLPFC strongly differentiated between task stages associated with different memory loads, but not between different visual-spatial aspects, the reverse was true for V1. Our methodology illustrates how different aspects of cognitive information processing during one and the same task can be separated and attributed to specific brain regions based on information contained in multivariate patterns of voxel activity.
Calypso: a user-friendly web-server for mining and visualizing microbiome-environment interactions.
Zakrzewski, Martha; Proietti, Carla; Ellis, Jonathan J; Hasan, Shihab; Brion, Marie-Jo; Berger, Bernard; Krause, Lutz
2017-03-01
Calypso is an easy-to-use online software suite that allows non-expert users to mine, interpret and compare taxonomic information from metagenomic or 16S rDNA datasets. Calypso has a focus on multivariate statistical approaches that can identify complex environment-microbiome associations. The software enables quantitative visualizations, statistical testing, multivariate analysis, supervised learning, factor analysis, multivariable regression, network analysis and diversity estimates. Comprehensive help pages, tutorials and videos are provided via a wiki page. The web-interface is accessible via http://cgenome.net/calypso/ . The software is programmed in Java, PERL and R and the source code is available from Zenodo ( https://zenodo.org/record/50931 ). The software is freely available for non-commercial users. l.krause@uq.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Chen, Yong; Luo, Sheng; Chu, Haitao; Wei, Peng
2013-05-01
Multivariate meta-analysis is useful in combining evidence from independent studies which involve several comparisons among groups based on a single outcome. For binary outcomes, the commonly used statistical models for multivariate meta-analysis are multivariate generalized linear mixed effects models which assume risks, after some transformation, follow a multivariate normal distribution with possible correlations. In this article, we consider an alternative model for multivariate meta-analysis where the risks are modeled by the multivariate beta distribution proposed by Sarmanov (1966). This model have several attractive features compared to the conventional multivariate generalized linear mixed effects models, including simplicity of likelihood function, no need to specify a link function, and has a closed-form expression of distribution functions for study-specific risk differences. We investigate the finite sample performance of this model by simulation studies and illustrate its use with an application to multivariate meta-analysis of adverse events of tricyclic antidepressants treatment in clinical trials.
Real, Jordi; Forné, Carles; Roso-Llorach, Albert; Martínez-Sánchez, Jose M
2016-05-01
Controlling for confounders is a crucial step in analytical observational studies, and multivariable models are widely used as statistical adjustment techniques. However, the validation of the assumptions of the multivariable regression models (MRMs) should be made clear in scientific reporting. The objective of this study is to review the quality of statistical reporting of the most commonly used MRMs (logistic, linear, and Cox regression) that were applied in analytical observational studies published between 2003 and 2014 by journals indexed in MEDLINE.Review of a representative sample of articles indexed in MEDLINE (n = 428) with observational design and use of MRMs (logistic, linear, and Cox regression). We assessed the quality of reporting about: model assumptions and goodness-of-fit, interactions, sensitivity analysis, crude and adjusted effect estimate, and specification of more than 1 adjusted model.The tests of underlying assumptions or goodness-of-fit of the MRMs used were described in 26.2% (95% CI: 22.0-30.3) of the articles and 18.5% (95% CI: 14.8-22.1) reported the interaction analysis. Reporting of all items assessed was higher in articles published in journals with a higher impact factor.A low percentage of articles indexed in MEDLINE that used multivariable techniques provided information demonstrating rigorous application of the model selected as an adjustment method. Given the importance of these methods to the final results and conclusions of observational studies, greater rigor is required in reporting the use of MRMs in the scientific literature.
NASA Astrophysics Data System (ADS)
O'Shea, Bethany; Jankowski, Jerzy
2006-12-01
The major ion composition of Great Artesian Basin groundwater in the lower Namoi River valley is relatively homogeneous in chemical composition. Traditional graphical techniques have been combined with multivariate statistical methods to determine whether subtle differences in the chemical composition of these waters can be delineated. Hierarchical cluster analysis and principal components analysis were successful in delineating minor variations within the groundwaters of the study area that were not visually identified in the graphical techniques applied. Hydrochemical interpretation allowed geochemical processes to be identified in each statistically defined water type and illustrated how these groundwaters differ from one another. Three main geochemical processes were identified in the groundwaters: ion exchange, precipitation, and mixing between waters from different sources. Both statistical methods delineated an anomalous sample suspected of being influenced by magmatic CO2 input. The use of statistical methods to complement traditional graphical techniques for waters appearing homogeneous is emphasized for all investigations of this type. Copyright
Gao, Yongnian; Gao, Junfeng; Yin, Hongbin; Liu, Chuansheng; Xia, Ting; Wang, Jing; Huang, Qi
2015-03-15
Remote sensing has been widely used for ater quality monitoring, but most of these monitoring studies have only focused on a few water quality variables, such as chlorophyll-a, turbidity, and total suspended solids, which have typically been considered optically active variables. Remote sensing presents a challenge in estimating the phosphorus concentration in water. The total phosphorus (TP) in lakes has been estimated from remotely sensed observations, primarily using the simple individual band ratio or their natural logarithm and the statistical regression method based on the field TP data and the spectral reflectance. In this study, we investigated the possibility of establishing a spatial modeling scheme to estimate the TP concentration of a large lake from multi-spectral satellite imagery using band combinations and regional multivariate statistical modeling techniques, and we tested the applicability of the spatial modeling scheme. The results showed that HJ-1A CCD multi-spectral satellite imagery can be used to estimate the TP concentration in a lake. The correlation and regression analysis showed a highly significant positive relationship between the TP concentration and certain remotely sensed combination variables. The proposed modeling scheme had a higher accuracy for the TP concentration estimation in the large lake compared with the traditional individual band ratio method and the whole-lake scale regression-modeling scheme. The TP concentration values showed a clear spatial variability and were high in western Lake Chaohu and relatively low in eastern Lake Chaohu. The northernmost portion, the northeastern coastal zone and the southeastern portion of western Lake Chaohu had the highest TP concentrations, and the other regions had the lowest TP concentration values, except for the coastal zone of eastern Lake Chaohu. These results strongly suggested that the proposed modeling scheme, i.e., the band combinations and the regional multivariate statistical modeling techniques, demonstrated advantages for estimating the TP concentration in a large lake and had a strong potential for universal application for the TP concentration estimation in large lake waters worldwide. Copyright © 2014 Elsevier Ltd. All rights reserved.
Reporting Practices and Use of Quantitative Methods in Canadian Journal Articles in Psychology.
Counsell, Alyssa; Harlow, Lisa L
2017-05-01
With recent focus on the state of research in psychology, it is essential to assess the nature of the statistical methods and analyses used and reported by psychological researchers. To that end, we investigated the prevalence of different statistical procedures and the nature of statistical reporting practices in recent articles from the four major Canadian psychology journals. The majority of authors evaluated their research hypotheses through the use of analysis of variance (ANOVA), t -tests, and multiple regression. Multivariate approaches were less common. Null hypothesis significance testing remains a popular strategy, but the majority of authors reported a standardized or unstandardized effect size measure alongside their significance test results. Confidence intervals on effect sizes were infrequently employed. Many authors provided minimal details about their statistical analyses and less than a third of the articles presented on data complications such as missing data and violations of statistical assumptions. Strengths of and areas needing improvement for reporting quantitative results are highlighted. The paper concludes with recommendations for how researchers and reviewers can improve comprehension and transparency in statistical reporting.
A multivariable model for predicting the frictional behaviour and hydration of the human skin.
Veijgen, N K; van der Heide, E; Masen, M A
2013-08-01
The frictional characteristics of skin-object interactions are important when handling objects, in the assessment of perception and comfort of products and materials and in the origins and prevention of skin injuries. In this study, based on statistical methods, a quantitative model is developed that describes the friction behaviour of human skin as a function of the subject characteristics, contact conditions, the properties of the counter material as well as environmental conditions. Although the frictional behaviour of human skin is a multivariable problem, in literature the variables that are associated with skin friction have been studied using univariable methods. In this work, multivariable models for the static and dynamic coefficients of friction as well as for the hydration of the skin are presented. A total of 634 skin-friction measurements were performed using a recently developed tribometer. Using a statistical analysis, previously defined potential influential variables were linked to the static and dynamic coefficient of friction and to the hydration of the skin, resulting in three predictive quantitative models that descibe the friction behaviour and the hydration of human skin respectively. Increased dynamic coefficients of friction were obtained from older subjects, on the index finger, with materials with a higher surface energy at higher room temperatures, whereas lower dynamic coefficients of friction were obtained at lower skin temperatures, on the temple with rougher contact materials. The static coefficient of friction increased with higher skin hydration, increasing age, on the index finger, with materials with a higher surface energy and at higher ambient temperatures. The hydration of the skin was associated with the skin temperature, anatomical location, presence of hair on the skin and the relative air humidity. Predictive models have been derived for the static and dynamic coefficient of friction using a multivariable approach. These two coefficients of friction show a strong correlation. Consequently the two multivariable models resemble, with the static coefficient of friction being on average 18% lower than the dynamic coefficient of friction. The multivariable models in this study can be used to describe the data set that was the basis for this study. Care should be taken when generalising these results. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Al-Aziz, Jameel; Christou, Nicolas; Dinov, Ivo D.
2011-01-01
The amount, complexity and provenance of data have dramatically increased in the past five years. Visualization of observed and simulated data is a critical component of any social, environmental, biomedical or scientific quest. Dynamic, exploratory and interactive visualization of multivariate data, without preprocessing by dimensionality reduction, remains a nearly insurmountable challenge. The Statistics Online Computational Resource (www.SOCR.ucla.edu) provides portable online aids for probability and statistics education, technology-based instruction and statistical computing. We have developed a new Java-based infrastructure, SOCR Motion Charts, for discovery-based exploratory analysis of multivariate data. This interactive data visualization tool enables the visualization of high-dimensional longitudinal data. SOCR Motion Charts allows mapping of ordinal, nominal and quantitative variables onto time, 2D axes, size, colors, glyphs and appearance characteristics, which facilitates the interactive display of multidimensional data. We validated this new visualization paradigm using several publicly available multivariate datasets including Ice-Thickness, Housing Prices, Consumer Price Index, and California Ozone Data. SOCR Motion Charts is designed using object-oriented programming, implemented as a Java Web-applet and is available to the entire community on the web at www.socr.ucla.edu/SOCR_MotionCharts. It can be used as an instructional tool for rendering and interrogating high-dimensional data in the classroom, as well as a research tool for exploratory data analysis. PMID:21479108
A Descriptive Study of Individual and Cross-Cultural Differences in Statistics Anxiety
ERIC Educational Resources Information Center
Baloglu, Mustafa; Deniz, M. Engin; Kesici, Sahin
2011-01-01
The present study investigated individual and cross-cultural differences in statistics anxiety among 223 Turkish and 237 American college students. A 2 x 2 between-subjects factorial multivariate analysis of covariance (MANCOVA) was performed on the six dependent variables which are the six subscales of the Statistical Anxiety Rating Scale.…
Multivariate approach to matrimonial mobility in Catalonia.
Calafell, F; Hernández, M
1993-10-01
Matrimonial mobility in Catalonia was studied using 1986 census data. Comarca (a geographic division) of birth was used as the population unit, and a measure of affinity (a statistical distance) between comarques in spouse geographic origin was defined. This distance was analyzed with multivariate methods drawn from numerical taxonomy to detect any discontinuities in matrimonial mobility and gene flow between comarques. Results show a three-level pattern of gene flow in Catalonia: (1) a strong endogamy within comarques; (2) a 100-km matrimonial circle around every comarca; and (3) the capital, Barcelona, which attracts migrants from all over Catalonia. The regionalization in matrimonial mobility follows the geographically clear-cut groups of comarques almost exactly.
Kuselman, Ilya; Pennecchi, Francesca R; da Silva, Ricardo J N B; Hibbert, D Brynn
2017-11-01
The probability of a false decision on conformity of a multicomponent material due to measurement uncertainty is discussed when test results are correlated. Specification limits of the components' content of such a material generate a multivariate specification interval/domain. When true values of components' content and corresponding test results are modelled by multivariate distributions (e.g. by multivariate normal distributions), a total global risk of a false decision on the material conformity can be evaluated based on calculation of integrals of their joint probability density function. No transformation of the raw data is required for that. A total specific risk can be evaluated as the joint posterior cumulative function of true values of a specific batch or lot lying outside the multivariate specification domain, when the vector of test results, obtained for the lot, is inside this domain. It was shown, using a case study of four components under control in a drug, that the correlation influence on the risk value is not easily predictable. To assess this influence, the evaluated total risk values were compared with those calculated for independent test results and also with those assuming much stronger correlation than that observed. While the observed statistically significant correlation did not lead to a visible difference in the total risk values in comparison to the independent test results, the stronger correlation among the variables caused either the total risk decreasing or its increasing, depending on the actual values of the test results. Copyright © 2017 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
National Association of Independent Colleges and Universities, Washington, DC.
This report seeks to explain the complex relationship between "cost" and "price" at independent private colleges and universities, and to demonstrate that federal student financial aid does not contribute to tuition and fee increases at such institutions. To support these results, 10 facts, drawn from multivariate statistical analyses of data from…
Ielpo, Pierina; Leardi, Riccardo; Pappagallo, Giuseppe; Uricchio, Vito Felice
2017-06-01
In this paper, the results obtained from multivariate statistical techniques such as PCA (Principal component analysis) and LDA (Linear discriminant analysis) applied to a wide soil data set are presented. The results have been compared with those obtained on a groundwater data set, whose samples were collected together with soil ones, within the project "Improvement of the Regional Agro-meteorological Monitoring Network (2004-2007)". LDA, applied to soil data, has allowed to distinguish the geographical origin of the sample from either one of the two macroaeras: Bari and Foggia provinces vs Brindisi, Lecce e Taranto provinces, with a percentage of correct prediction in cross validation of 87%. In the case of the groundwater data set, the best classification was obtained when the samples were grouped into three macroareas: Foggia province, Bari province and Brindisi, Lecce and Taranto provinces, by reaching a percentage of correct predictions in cross validation of 84%. The obtained information can be very useful in supporting soil and water resource management, such as the reduction of water consumption and the reduction of energy and chemical (nutrients and pesticides) inputs in agriculture.
Babamoradi, Hamid; van den Berg, Frans; Rinnan, Åsmund
2016-02-18
In Multivariate Statistical Process Control, when a fault is expected or detected in the process, contribution plots are essential for operators and optimization engineers in identifying those process variables that were affected by or might be the cause of the fault. The traditional way of interpreting a contribution plot is to examine the largest contributing process variables as the most probable faulty ones. This might result in false readings purely due to the differences in natural variation, measurement uncertainties, etc. It is more reasonable to compare variable contributions for new process runs with historical results achieved under Normal Operating Conditions, where confidence limits for contribution plots estimated from training data are used to judge new production runs. Asymptotic methods cannot provide confidence limits for contribution plots, leaving re-sampling methods as the only option. We suggest bootstrap re-sampling to build confidence limits for all contribution plots in online PCA-based MSPC. The new strategy to estimate CLs is compared to the previously reported CLs for contribution plots. An industrial batch process dataset was used to illustrate the concepts. Copyright © 2016 Elsevier B.V. All rights reserved.
The analysis of morphometric data on rocky mountain wolves and artic wolves using statistical method
NASA Astrophysics Data System (ADS)
Ammar Shafi, Muhammad; Saifullah Rusiman, Mohd; Hamzah, Nor Shamsidah Amir; Nor, Maria Elena; Ahmad, Noor’ani; Azia Hazida Mohamad Azmi, Nur; Latip, Muhammad Faez Ab; Hilmi Azman, Ahmad
2018-04-01
Morphometrics is a quantitative analysis depending on the shape and size of several specimens. Morphometric quantitative analyses are commonly used to analyse fossil record, shape and size of specimens and others. The aim of the study is to find the differences between rocky mountain wolves and arctic wolves based on gender. The sample utilised secondary data which included seven variables as independent variables and two dependent variables. Statistical modelling was used in the analysis such was the analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA). The results showed there exist differentiating results between arctic wolves and rocky mountain wolves based on independent factors and gender.
Facilitating the Transition from Bright to Dim Environments
2016-03-04
For the parametric data, a multivariate ANOVA was used in determining the systematic presence of any statistically significant performance differences...performed. All significance levels were p < 0.05, and statistical analyses were performed with the Statistical Package for Social Sciences ( SPSS ...1950. Age changes in rate and level of visual dark adaptation. Journal of Applied Physiology, 2, 407–411. Field, A. 2009. Discovering statistics
2012-01-01
Background The metals bioavailability in soils is commonly assessed by chemical extractions; however a generally accepted method is not yet established. In this study, the effectiveness of Diffusive Gradients in Thin-films (DGT) technique and single extractions in the assessment of metals bioaccumulation in vegetables, and the influence of soil parameters on phytoavailability were evaluated using multivariate statistics. Soil and plants grown in vegetable gardens from mining-affected rural areas, NW Romania, were collected and analysed. Results Pseudo-total metal content of Cu, Zn and Cd in soil ranged between 17.3-146 mg kg-1, 141–833 mg kg-1 and 0.15-2.05 mg kg-1, respectively, showing enriched contents of these elements. High degrees of metals extractability in 1M HCl and even in 1M NH4Cl were observed. Despite the relatively high total metal concentrations in soil, those found in vegetables were comparable to values typically reported for agricultural crops, probably due to the low concentrations of metals in soil solution (Csoln) and low effective concentrations (CE), assessed by DGT technique. Among the analysed vegetables, the highest metal concentrations were found in carrots roots. By applying multivariate statistics, it was found that CE, Csoln and extraction in 1M NH4Cl, were better predictors for metals bioavailability than the acid extractions applied in this study. Copper transfer to vegetables was strongly influenced by soil organic carbon (OC) and cation exchange capacity (CEC), while pH had a higher influence on Cd transfer from soil to plants. Conclusions The results showed that DGT can be used for general evaluation of the risks associated to soil contamination with Cu, Zn and Cd in field conditions. Although quantitative information on metals transfer from soil to vegetables was not observed. PMID:23079133
NASA Astrophysics Data System (ADS)
Wolf, S. F.; Lipschutz, M. E.
1993-07-01
Dodd et al. [1] found that, from their circumstances of fall, 17 H chondrites ("H Cluster 1") which fell in May, from 1855 to 1895, are distinguishable from other H chondrite falls and apparently derive from a co-orbital stream of meteoroids. From data for 10 moderately to highly labile trace elements (Rb, Ag, Se, Cs, Te, Zn, Cd, Bi, Tl, In), they used two multivariate statistical techniques--linear discriminant analysis and logistic regression--to demonstrate that 1. 13 H Cluster 1 chondrites are compositionally distinguishable from 45 other H chondrite falls, probably because of differences in thermal histories of the meteorites' parent materials; 2. The reality of the compositional differences between the populations of falls are beyond any reasonable statistical doubt. 3. The compositional differences are inconsistent with the notion that the results reflect analytical bias. We have used these techniques to assess analogous data for various H chondrite populations [2-4] with results that are listed in Table 1. These data indicate that 1. There is no statistical reason to believe that random populations from Victoria Land, Antarctica, differ compositionally from each other. 2. There is significant statistical reason to believe that the H chondrite population recovered from Victoria Land, Antarctica, differs compositionally from that from Queen Maud Land, Antarctica, and from falls. 3. There is no reason to believe that the H chondrite population recovered from Queen Maud Land, Antarctica, differs compositionally from falls. 4. These observations can be made either by data obtained by one analyst or several. These results, coupled with earlier ones [5], demonstrate that trivial explanations cannot explain compositional differences involving labile trace elements in pairs of H chondrite populations. These differences must then reflect differences of preterrestrial thermal histories of the meteorites' parent materials. Acceptance of these differences as preterrestrial has led to predictions subsequently verified by others (meteoroid and asteroid stream discoveries, differencesin thermoluminescence or TL). We predict that a TL difference will be seen between the populations of falls defined by Dodd et al. [1]. References: [1] Dodd R. T. et al. (1993) JGR, submitted. [2] Lingner D. W. et al. (1987) GCA, 51, 727-739. [3] Dennison J. E. and Lipschutz M. E. (1987) GCA, 51, 741-754. [4] Wolf S. F. and Lipschutz M. E. (1993) in Advances in Analytical Geochemistry (M. Hyman and M. Rowe, eds.), in press. [5] Wang M.-S. et al. (1992) Meteoritics, 27, 303. [6] Lipschutz M. E. and Samuels S. M. (1991) GCA, 55, 19-47. Table 1, which appears in the hard copy, shows a multivariate statistical analysis of H chondrite population pairs using 10 labile trace elements (number of meteorites in population in parentheses).
Alamilla, Francisco; Calcerrada, Matías; García-Ruiz, Carmen; Torre, Mercedes
2013-05-10
The differentiation of blue ballpoint pen inks written on documents through an LA-ICP-MS methodology is proposed. Small common office paper portions containing ink strokes from 21 blue pens of known origin were cut and measured without any sample preparation. In a first step, Mg, Ca and Sr were proposed as internal standards (ISs) and used in order to normalize elemental intensities and subtract background signals from the paper. Then, specific criteria were designed and employed to identify target elements (Li, V, Mn, Co, Ni, Cu, Zn, Zr, Sn, W and Pb) which resulted independent of the IS chosen in a 98% of the cases and allowed a qualitative clustering of the samples. In a second step, an elemental-related ratio (ink ratio) based on the targets previously identified was used to obtain mass independent intensities and perform pairwise comparisons by means of multivariate statistical analyses (MANOVA, Tukey's HSD and T2 Hotelling). This treatment improved the discrimination power (DP) and provided objective results, achieving a complete differentiation among different brands and a partial differentiation within pen inks from the same brands. The designed data treatment, together with the use of multivariate statistical tools, represents an easy and useful tool for differentiating among blue ballpoint pen inks, with hardly sample destruction and without the need for methodological calibrations, being its use potentially advantageous from a forensic-practice standpoint. To test the procedure, it was applied to analyze real handwritten questioned contracts, previously studied by the Department of Forensic Document Exams of the Criminalistics Service of Civil Guard (Spain). The results showed that all questioned ink entries were clustered in the same group, being those different from the remaining ink on the document. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Lu, Tsui-Shan; Longnecker, Matthew P.; Zhou, Haibo
2016-01-01
Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data and the general ODS design for a continuous response. While substantial work has been done for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome dependent sampling (Multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the Multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the Multivariate-ODS or the estimator from a simple random sample with the same sample size. The Multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of PCB exposure to hearing loss in children born to the Collaborative Perinatal Study. PMID:27966260
Analysis/forecast experiments with a multivariate statistical analysis scheme using FGGE data
NASA Technical Reports Server (NTRS)
Baker, W. E.; Bloom, S. C.; Nestler, M. S.
1985-01-01
A three-dimensional, multivariate, statistical analysis method, optimal interpolation (OI) is described for modeling meteorological data from widely dispersed sites. The model was developed to analyze FGGE data at the NASA-Goddard Laboratory of Atmospherics. The model features a multivariate surface analysis over the oceans, including maintenance of the Ekman balance and a geographically dependent correlation function. Preliminary comparisons are made between the OI model and similar schemes employed at the European Center for Medium Range Weather Forecasts and the National Meteorological Center. The OI scheme is used to provide input to a GCM, and model error correlations are calculated for forecasts of 500 mb vertical water mixing ratios and the wind profiles. Comparisons are made between the predictions and measured data. The model is shown to be as accurate as a successive corrections model out to 4.5 days.
Predicting trauma patient mortality: ICD [or ICD-10-AM] versus AIS based approaches.
Willis, Cameron D; Gabbe, Belinda J; Jolley, Damien; Harrison, James E; Cameron, Peter A
2010-11-01
The International Classification of Diseases Injury Severity Score (ICISS) has been proposed as an International Classification of Diseases (ICD)-10-based alternative to mortality prediction tools that use Abbreviated Injury Scale (AIS) data, including the Trauma and Injury Severity Score (TRISS). To date, studies have not examined the performance of ICISS using Australian trauma registry data. This study aimed to compare the performance of ICISS with other mortality prediction tools in an Australian trauma registry. This was a retrospective review of prospectively collected data from the Victorian State Trauma Registry. A training dataset was created for model development and a validation dataset for evaluation. The multiplicative ICISS model was compared with a worst injury ICISS approach, Victorian TRISS (V-TRISS, using local coefficients), maximum AIS severity and a multivariable model including ICD-10-AM codes as predictors. Models were investigated for discrimination (C-statistic) and calibration (Hosmer-Lemeshow statistic). The multivariable approach had the highest level of discrimination (C-statistic 0.90) and calibration (H-L 7.65, P= 0.468). Worst injury ICISS, V-TRISS and maximum AIS had similar performance. The multiplicative ICISS produced the lowest level of discrimination (C-statistic 0.80) and poorest calibration (H-L 50.23, P < 0.001). The performance of ICISS may be affected by the data used to develop estimates, the ICD version employed, the methods for deriving estimates and the inclusion of covariates. In this analysis, a multivariable approach using ICD-10-AM codes was the best-performing method. A multivariable ICISS approach may therefore be a useful alternative to AIS-based methods and may have comparable predictive performance to locally derived TRISS models. © 2010 The Authors. ANZ Journal of Surgery © 2010 Royal Australasian College of Surgeons.
Whist, A C; Liland, K H; Jonsson, M E; Sæbø, S; Sviland, S; Østerås, O; Norström, M; Hopp, P
2014-11-01
Surveillance programs for animal diseases are critical to early disease detection and risk estimation and to documenting a population's disease status at a given time. The aim of this study was to describe a risk-based surveillance program for detecting Mycobacterium avium ssp. paratuberculosis (MAP) infection in Norwegian dairy cattle. The included risk factors for detecting MAP were purchase of cattle, combined cattle and goat farming, and location of the cattle farm in counties containing goats with MAP. The risk indicators included production data [culling of animals >3 yr of age, carcass conformation of animals >3 yr of age, milk production decrease in older lactating cows (lactations 3, 4, and 5)], and clinical data (diarrhea, enteritis, or both, in animals >3 yr of age). Except for combined cattle and goat farming and cattle farm location, all data were collected at the cow level and summarized at the herd level. Predefined risk factors and risk indicators were extracted from different national databases and combined in a multivariate statistical process control to obtain a risk assessment for each herd. The ordinary Hotelling's T(2) statistic was applied as a multivariate, standardized measure of difference between the current observed state and the average state of the risk factors for a given herd. To make the analysis more robust and adapt it to the slowly developing nature of MAP, monthly risk calculations were based on data accumulated during a 24-mo period. Monitoring of these variables was performed to identify outliers that may indicate deviance in one or more of the underlying processes. The highest-ranked herds were scattered all over Norway and clustered in high-density dairy cattle farm areas. The resulting rankings of herds are being used in the national surveillance program for MAP in 2014 to increase the sensitivity of the ongoing surveillance program in which 5 fecal samples for bacteriological examination are collected from 25 dairy herds. The use of multivariate statistical process control for selection of herds will be beneficial when a diagnostic test suitable for mass screening is available and validated on the Norwegian cattle population, thus making it possible to increase the number of sampled herds. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Attitudes toward Advanced and Multivariate Statistics When Using Computers.
ERIC Educational Resources Information Center
Kennedy, Robert L.; McCallister, Corliss Jean
This study investigated the attitudes toward statistics of graduate students who studied advanced statistics in a course in which the focus of instruction was the use of a computer program in class. The use of the program made it possible to provide an individualized, self-paced, student-centered, and activity-based course. The three sections…
ERIC Educational Resources Information Center
Williams, Amanda S.
2015-01-01
Statistics anxiety is a common problem for graduate students. This study explores the multivariate relationship between a set of worry-related variables and six types of statistics anxiety. Canonical correlation analysis indicates a significant relationship between the two sets of variables. Findings suggest that students who are more intolerant…
Phung, Dung; Huang, Cunrui; Rutherford, Shannon; Dwirahmadi, Febi; Chu, Cordia; Wang, Xiaoming; Nguyen, Minh; Nguyen, Nga Huy; Do, Cuong Manh; Nguyen, Trung Hieu; Dinh, Tuan Anh Diep
2015-05-01
The present study is an evaluation of temporal/spatial variations of surface water quality using multivariate statistical techniques, comprising cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminant analysis (DA). Eleven water quality parameters were monitored at 38 different sites in Can Tho City, a Mekong Delta area of Vietnam from 2008 to 2012. Hierarchical cluster analysis grouped the 38 sampling sites into three clusters, representing mixed urban-rural areas, agricultural areas and industrial zone. FA/PCA resulted in three latent factors for the entire research location, three for cluster 1, four for cluster 2, and four for cluster 3 explaining 60, 60.2, 80.9, and 70% of the total variance in the respective water quality. The varifactors from FA indicated that the parameters responsible for water quality variations are related to erosion from disturbed land or inflow of effluent from sewage plants and industry, discharges from wastewater treatment plants and domestic wastewater, agricultural activities and industrial effluents, and contamination by sewage waste with faecal coliform bacteria through sewer and septic systems. Discriminant analysis (DA) revealed that nephelometric turbidity units (NTU), chemical oxygen demand (COD) and NH₃ are the discriminating parameters in space, affording 67% correct assignation in spatial analysis; pH and NO₂ are the discriminating parameters according to season, assigning approximately 60% of cases correctly. The findings suggest a possible revised sampling strategy that can reduce the number of sampling sites and the indicator parameters responsible for large variations in water quality. This study demonstrates the usefulness of multivariate statistical techniques for evaluation of temporal/spatial variations in water quality assessment and management.
Quantification of proportions of different water sources in a mining operation.
Scheiber, Laura; Ayora, Carlos; Vázquez-Suñé, Enric
2018-04-01
The water drained in mining operations (galleries, shafts, open pits) usually comes from different sources. Evaluating the contribution of these sources is very often necessary for water management. To determine mixing ratios, a conventional mass balance is often used. However, the presence of more than two sources creates uncertainties in mass balance applications. Moreover, the composition of the end-members is not commonly known with certainty and/or can vary in space and time. In this paper, we propose a powerful tool for solving such problems and managing groundwater in mining sites based on multivariate statistical analysis. This approach was applied to the Cobre Las Cruces mining complex, the largest copper mine in Europe. There, the open pit water is a mixture of three end-members: runoff (RO), basal Miocene (Mb) and Paleozoic (PZ) groundwater. The volume of water drained from the Miocene base aquifer must be determined and compensated via artificial recharging to comply with current regulations. Through multivariate statistical analysis of samples from a regional field campaign, the compositions of PZ and Mb end-members were firstly estimated, and then used for mixing calculations at the open pit scale. The runoff end-member was directly determined from samples collected in interception trenches inside the open pit. The application of multivariate statistical methods allowed the estimation of mixing ratios for the hydrological years 2014-2015 and 2015-2016. Open pit water proportions have changed from 15% to 7%, 41% to 36%, and 44% to 57% for runoff, Mb and PZ end-members, respectively. An independent estimation of runoff based on the curve method yielded comparable results. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Li, Xiaohui; Yang, Sibo; Fan, Rongwei; Yu, Xin; Chen, Deying
2018-06-01
In this paper, discrimination of soft tissues using laser-induced breakdown spectroscopy (LIBS) in combination with multivariate statistical methods is presented. Fresh pork fat, skin, ham, loin and tenderloin muscle tissues are manually cut into slices and ablated using a 1064 nm pulsed Nd:YAG laser. Discrimination analyses between fat, skin and muscle tissues, and further between highly similar ham, loin and tenderloin muscle tissues, are performed based on the LIBS spectra in combination with multivariate statistical methods, including principal component analysis (PCA), k nearest neighbors (kNN) classification, and support vector machine (SVM) classification. Performances of the discrimination models, including accuracy, sensitivity and specificity, are evaluated using 10-fold cross validation. The classification models are optimized to achieve best discrimination performances. The fat, skin and muscle tissues can be definitely discriminated using both kNN and SVM classifiers, with accuracy of over 99.83%, sensitivity of over 0.995 and specificity of over 0.998. The highly similar ham, loin and tenderloin muscle tissues can also be discriminated with acceptable performances. The best performances are achieved with SVM classifier using Gaussian kernel function, with accuracy of 76.84%, sensitivity of over 0.742 and specificity of over 0.869. The results show that the LIBS technique assisted with multivariate statistical methods could be a powerful tool for online discrimination of soft tissues, even for tissues of high similarity, such as muscles from different parts of the animal body. This technique could be used for discrimination of tissues suffering minor clinical changes, thus may advance the diagnosis of early lesions and abnormalities.
NASA Astrophysics Data System (ADS)
Mfumu Kihumba, Antoine; Ndembo Longo, Jean; Vanclooster, Marnik
2016-03-01
A multivariate statistical modelling approach was applied to explain the anthropogenic pressure of nitrate pollution on the Kinshasa groundwater body (Democratic Republic of Congo). Multiple regression and regression tree models were compared and used to identify major environmental factors that control the groundwater nitrate concentration in this region. The analyses were made in terms of physical attributes related to the topography, land use, geology and hydrogeology in the capture zone of different groundwater sampling stations. For the nitrate data, groundwater datasets from two different surveys were used. The statistical models identified the topography, the residential area, the service land (cemetery), and the surface-water land-use classes as major factors explaining nitrate occurrence in the groundwater. Also, groundwater nitrate pollution depends not on one single factor but on the combined influence of factors representing nitrogen loading sources and aquifer susceptibility characteristics. The groundwater nitrate pressure was better predicted with the regression tree model than with the multiple regression model. Furthermore, the results elucidated the sensitivity of the model performance towards the method of delineation of the capture zones. For pollution modelling at the monitoring points, therefore, it is better to identify capture-zone shapes based on a conceptual hydrogeological model rather than to adopt arbitrary circular capture zones.
Dependence of drivers affects risks associated with compound events
NASA Astrophysics Data System (ADS)
Zscheischler, Jakob; Seneviratne, Sonia I.
2017-04-01
Compound climate extremes are receiving increasing attention because of their disproportionate impacts on humans and ecosystems. Risks assessments, however, generally focus on univariate statistics even when multiple stressors are considered. Concurrent extreme droughts and heatwaves have been observed to cause a suite of extreme impacts on natural and human systems alike. For example, they can substantially affect vegetation health, prompting tree mortality, and thereby facilitating insect outbreaks and fires. In addition, hot droughts have the potential to trigger and intensify fires and can cause severe economical damage. By promoting disease spread, extremely hot and dry conditions also strongly affect human health. We analyse the co-occurrence of dry and hot summers and show that these are strongly correlated for many regions, inducing a much higher frequency of concurrent hot and dry summers than what would be assumed from the independent combination of the univariate statistics. Our results demonstrate how the dependence structure between variables affects the occurrence frequency of multivariate extremes. Assessments based on univariate statistics can thus strongly underestimate risks associated with given extremes, if impacts depend on multiple (dependent) variables. We conclude that a multivariate perspective is necessary in order to appropriately assess changes in climate extremes and their impacts, and to design adaptation strategies.
LSST Astroinformatics And Astrostatistics: Data-oriented Astronomical Research
NASA Astrophysics Data System (ADS)
Borne, Kirk D.; Stassun, K.; Brunner, R. J.; Djorgovski, S. G.; Graham, M.; Hakkila, J.; Mahabal, A.; Paegert, M.; Pesenson, M.; Ptak, A.; Scargle, J.; Informatics, LSST; Statistics Team
2011-01-01
The LSST Informatics and Statistics Science Collaboration (ISSC) focuses on research and scientific discovery challenges posed by the very large and complex data collection that LSST will generate. Application areas include astroinformatics, machine learning, data mining, astrostatistics, visualization, scientific data semantics, time series analysis, and advanced signal processing. Research problems to be addressed with these methodologies include transient event characterization and classification, rare class discovery, correlation mining, outlier/anomaly/surprise detection, improved estimators (e.g., for photometric redshift or early onset supernova classification), exploration of highly dimensional (multivariate) data catalogs, and more. We present sample science results from these data-oriented approaches to large-data astronomical research. We present results from LSST ISSC team members, including the EB (Eclipsing Binary) Factory, the environmental variations in the fundamental plane of elliptical galaxies, and outlier detection in multivariate catalogs.
NASA Astrophysics Data System (ADS)
Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.
2017-06-01
The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.
Interfaces between statistical analysis packages and the ESRI geographic information system
NASA Technical Reports Server (NTRS)
Masuoka, E.
1980-01-01
Interfaces between ESRI's geographic information system (GIS) data files and real valued data files written to facilitate statistical analysis and display of spatially referenced multivariable data are described. An example of data analysis which utilized the GIS and the statistical analysis system is presented to illustrate the utility of combining the analytic capability of a statistical package with the data management and display features of the GIS.
Mental disorder and violence: is there a relationship beyond substance use?
Van Dorn, Richard; Volavka, Jan; Johnson, Norman
2012-03-01
A general consensus exists that severe mental illness (SMI) increases violence risk. However, a recent report claimed that SMI "alone was not statistically related to future violence in bivariate or multivariate analyses." We reanalyze the data used to make this claim with a focus on causal relationships between SMI and violence, rather than the statistical prediction of violence. Data are from the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC), a two-wave study (N = 34,653: Wave 1: 2001-2003; Wave 2: 2004-2005). Indicators of mental disorder in the year prior to Wave 1 were used to examine violence between Waves 1 and 2. Those with SMI, irrespective of substance abuse status, were significantly more likely to be violent than those with no mental or substance use disorders. This finding held in both bivariate and multivariable models. Those with comorbid mental and substance use disorders had the highest risk of violence. Historical and current conditions were also associated with violence, including childhood abuse and neglect, household antisocial behavior, binge drinking and stressful life events. These results, in contrast to a recently published report, show that the NESARC data are consistent with the consensus view on mental disorder and violence: there is a statistically significant, yet modest relationship between SMI (within 12 months) and violence, and a stronger relationship between SMI with substance use disorder and violence. These results also highlight the importance of premorbid conditions, and other contemporaneous clinical factors, in violent behavior.
The extension of total gain (TG) statistic in survival models: properties and applications.
Choodari-Oskooei, Babak; Royston, Patrick; Parmar, Mahesh K B
2015-07-01
The results of multivariable regression models are usually summarized in the form of parameter estimates for the covariates, goodness-of-fit statistics, and the relevant p-values. These statistics do not inform us about whether covariate information will lead to any substantial improvement in prediction. Predictive ability measures can be used for this purpose since they provide important information about the practical significance of prognostic factors. R (2)-type indices are the most familiar forms of such measures in survival models, but they all have limitations and none is widely used. In this paper, we extend the total gain (TG) measure, proposed for a logistic regression model, to survival models and explore its properties using simulations and real data. TG is based on the binary regression quantile plot, otherwise known as the predictiveness curve. Standardised TG ranges from 0 (no explanatory power) to 1 ('perfect' explanatory power). The results of our simulations show that unlike many of the other R (2)-type predictive ability measures, TG is independent of random censoring. It increases as the effect of a covariate increases and can be applied to different types of survival models, including models with time-dependent covariate effects. We also apply TG to quantify the predictive ability of multivariable prognostic models developed in several disease areas. Overall, TG performs well in our simulation studies and can be recommended as a measure to quantify the predictive ability in survival models.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mayer, B. P.; Mew, D. A.; DeHope, A.
Attribution of the origin of an illicit drug relies on identification of compounds indicative of its clandestine production and is a key component of many modern forensic investigations. The results of these studies can yield detailed information on method of manufacture, starting material source, and final product - all critical forensic evidence. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic fentanyl, N-(1-phenylethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods, all previously published fentanyl synthetic routes or hybrid versions thereof, were studied in an effort to identify and classify route-specific signatures. 160 distinct compounds and inorganicmore » species were identified using gas and liquid chromatographies combined with mass spectrometric methods (GC-MS and LCMS/ MS-TOF) in conjunction with inductively coupled plasma mass spectrometry (ICPMS). The complexity of the resultant data matrix urged the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 87 route-specific CAS were classified and a statistical model capable of predicting the method of fentanyl synthesis was validated and tested against CAS profiles from crude fentanyl products deposited and later extracted from two operationally relevant surfaces: stainless steel and vinyl tile. This work provides the most detailed fentanyl CAS investigation to date by using orthogonal mass spectral data to identify CAS of forensic significance for illicit drug detection, profiling, and attribution.« less
Metabolic profiling of human lung cancer blood plasma using 1H NMR spectroscopy
NASA Astrophysics Data System (ADS)
Kokova, Daria; Dementeva, Natalia; Kotelnikov, Oleg; Ponomaryova, Anastasia; Cherdyntseva, Nadezhda; Kzhyshkowska, Juliya
2017-11-01
Lung cancer (both small cell and non-small cell) is the second most common cancer in both men and women. The article represents results of evaluating of the plasma metabolic profiles of 100 lung cancer patients and 100 controls to investigate significant metabolites using 400 MHz 1H NMR spectrometer. The results of multivariate statistical analysis show that a medium-field NMR spectrometer can obtain the data which are already sufficient for clinical metabolomics.
ERIC Educational Resources Information Center
Grasman, Raoul P. P. P.; Huizenga, Hilde M.; Geurts, Hilde M.
2010-01-01
Crawford and Howell (1998) have pointed out that the common practice of z-score inference on cognitive disability is inappropriate if a patient's performance on a task is compared with relatively few typical control individuals. Appropriate univariate and multivariate statistical tests have been proposed for these studies, but these are only valid…
Applied statistics in agricultural, biological, and environmental sciences.
USDA-ARS?s Scientific Manuscript database
Agronomic research often involves measurement and collection of multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate statistical methods encompass the simultaneous analysis of all random variables measured on each experimental or s...
Toppi, J; Petti, M; Vecchiato, G; Cincotti, F; Salinari, S; Mattia, D; Babiloni, F; Astolfi, L
2013-01-01
Partial Directed Coherence (PDC) is a spectral multivariate estimator for effective connectivity, relying on the concept of Granger causality. Even if its original definition derived directly from information theory, two modifies were introduced in order to provide better physiological interpretations of the estimated networks: i) normalization of the estimator according to rows, ii) squared transformation. In the present paper we investigated the effect of PDC normalization on the performances achieved by applying the statistical validation process on investigated connectivity patterns under different conditions of Signal to Noise ratio (SNR) and amount of data available for the analysis. Results of the statistical analysis revealed an effect of PDC normalization only on the percentages of type I and type II errors occurred by using Shuffling procedure for the assessment of connectivity patterns. No effects of the PDC formulation resulted on the performances achieved during the validation process executed instead by means of Asymptotic Statistic approach. Moreover, the percentages of both false positives and false negatives committed by Asymptotic Statistic are always lower than those achieved by Shuffling procedure for each type of normalization.
Engineering Design Handbook. Army Weapon Systems Analysis. Part 2
1979-10-01
EXPERIMENTAL DESIGN ............................... ............ 41-3 41-5 RESULTS OF THE ASARS lIX SIMULATIONS ........................... 41-4 41-6 LATIN...sciences and human factors engineering fields utilizing experimental methodology and multi-variable statistical techniques drawn from experimental ...randomly to grenades for the test design . The nine experimental types of hand grenades (first’ nine in Table 33-2) had a "pip" on their spherical
Pingault, Jean Baptiste; Côté, Sylvana M; Petitclerc, Amélie; Vitaro, Frank; Tremblay, Richard E
2015-01-01
Parental educational expectations have been associated with children's educational attainment in a number of long-term longitudinal studies, but whether this relationship is causal has long been debated. The aims of this prospective study were twofold: 1) test whether low maternal educational expectations contributed to failure to graduate from high school; and 2) compare the results obtained using different strategies for accounting for confounding variables (i.e. multivariate regression and propensity score matching). The study sample included 1,279 participants from the Quebec Longitudinal Study of Kindergarten Children. Maternal educational expectations were assessed when the participants were aged 12 years. High school graduation—measuring educational attainment—was determined through the Quebec Ministry of Education when the participants were aged 22-23 years. Findings show that when using the most common statistical approach (i.e. multivariate regressions to adjust for a restricted set of potential confounders) the contribution of low maternal educational expectations to failure to graduate from high school was statistically significant. However, when using propensity score matching, the contribution of maternal expectations was reduced and remained statistically significant only for males. The results of this study are consistent with the possibility that the contribution of parental expectations to educational attainment is overestimated in the available literature. This may be explained by the use of a restricted range of potential confounding variables as well as the dearth of studies using appropriate statistical techniques and study designs in order to minimize confounding. Each of these techniques and designs, including propensity score matching, has its strengths and limitations: A more comprehensive understanding of the causal role of parental expectations will stem from a convergence of findings from studies using different techniques and designs.
Multivariate model of female black bear habitat use for a Geographic Information System
Clark, Joseph D.; Dunn, James E.; Smith, Kimberly G.
1993-01-01
Simple univariate statistical techniques may not adequately assess the multidimensional nature of habitats used by wildlife. Thus, we developed a multivariate method to model habitat-use potential using a set of female black bear (Ursus americanus) radio locations and habitat data consisting of forest cover type, elevation, slope, aspect, distance to roads, distance to streams, and forest cover type diversity score in the Ozark Mountains of Arkansas. The model is based on the Mahalanobis distance statistic coupled with Geographic Information System (GIS) technology. That statistic is a measure of dissimilarity and represents a standardized squared distance between a set of sample variates and an ideal based on the mean of variates associated with animal observations. Calculations were made with the GIS to produce a map containing Mahalanobis distance values within each cell on a 60- × 60-m grid. The model identified areas of high habitat use potential that could not otherwise be identified by independent perusal of any single map layer. This technique avoids many pitfalls that commonly affect typical multivariate analyses of habitat use and is a useful tool for habitat manipulation or mitigation to favor terrestrial vertebrates that use habitats on a landscape scale.
Kragelj, Borut
2016-03-01
Aiming at improving treatment individualization in patients with prostate cancer treated with combination of external beam radiotherapy and high-dose-rate brachytherapy to boost the dose to prostate (HDRB-B), the objective was to evaluate factors that have potential impact on obstructive urination problems (OUP) after HDRB-B. In the follow-up study 88 patients consecutively treated with HDRB-B at the Institute of Oncology Ljubljana in the period 2006-2011 were included. The observed outcome was deterioration of OUP (DOUP) during the follow-up period longer than 1 year. Univariate and multivariate relationship analysis between DOUP and potential risk factors (treatment factors, patients' characteristics) was carried out by using binary logistic regression. ROC curve was constructed on predicted values and the area under the curve (AUC) calculated to assess the performance of the multivariate model. Analysis was carried out on 71 patients who completed 3 years of follow-up. DOUP was noted in 13/71 (18.3%) of them. The results of multivariate analysis showed statistically significant relationship between DOUP and anti-coagulation treatment (OR 4.86, 95% C.I. limits: 1.21-19.61, p = 0.026). Also minimal dose received by 90% of the urethra volume was close to statistical significance (OR = 1.23; 95% C.I. limits: 0.98-1.07, p = 0.099). The value of AUC was 0.755. The study emphasized the relationship between DOUP and anticoagulation treatment, and suggested the multivariate model with fair predictive performance. This model potentially enables a reduction of DOUP after HDRB-B. It supports the belief that further research should be focused on urethral sphincter as a critical structure for OUP.
Eigenvalue statistics for the sum of two complex Wishart matrices
NASA Astrophysics Data System (ADS)
Kumar, Santosh
2014-09-01
The sum of independent Wishart matrices, taken from distributions with unequal covariance matrices, plays a crucial role in multivariate statistics, and has applications in the fields of quantitative finance and telecommunication. However, analytical results concerning the corresponding eigenvalue statistics have remained unavailable, even for the sum of two Wishart matrices. This can be attributed to the complicated and rotationally noninvariant nature of the matrix distribution that makes extracting the information about eigenvalues a nontrivial task. Using a generalization of the Harish-Chandra-Itzykson-Zuber integral, we find exact solution to this problem for the complex Wishart case when one of the covariance matrices is proportional to the identity matrix, while the other is arbitrary. We derive exact and compact expressions for the joint probability density and marginal density of eigenvalues. The analytical results are compared with numerical simulations and we find perfect agreement.
MANCOVA for one way classification with homogeneity of regression coefficient vectors
NASA Astrophysics Data System (ADS)
Mokesh Rayalu, G.; Ravisankar, J.; Mythili, G. Y.
2017-11-01
The MANOVA and MANCOVA are the extensions of the univariate ANOVA and ANCOVA techniques to multidimensional or vector valued observations. The assumption of a Gaussian distribution has been replaced with the Multivariate Gaussian distribution for the vectors data and residual term variables in the statistical models of these techniques. The objective of MANCOVA is to determine if there are statistically reliable mean differences that can be demonstrated between groups later modifying the newly created variable. When randomization assignment of samples or subjects to groups is not possible, multivariate analysis of covariance (MANCOVA) provides statistical matching of groups by adjusting dependent variables as if all subjects scored the same on the covariates. In this research article, an extension has been made to the MANCOVA technique with more number of covariates and homogeneity of regression coefficient vectors is also tested.
Application of Multivariate Statistical Analysis to Biomarkers in Se-Turkey Crude Oils
NASA Astrophysics Data System (ADS)
Gürgey, K.; Canbolat, S.
2017-11-01
Twenty-four crude oil samples were collected from the 24 oil fields distributed in different districts of SE-Turkey. API and Sulphur content (%), Stable Carbon Isotope, Gas Chromatography (GC), and Gas Chromatography-Mass Spectrometry (GC-MS) data were used to construct a geochemical data matrix. The aim of this study is to examine the genetic grouping or correlations in the crude oil samples, hence the number of source rocks present in the SE-Turkey. To achieve these aims, two of the multivariate statistical analysis techniques (Principle Component Analysis [PCA] and Cluster Analysis were applied to data matrix of 24 samples and 8 source specific biomarker variables/parameters. The results showed that there are 3 genetically different oil groups: Batman-Nusaybin Oils, Adıyaman-Kozluk Oils and Diyarbakir Oils, in addition to a one mixed group. These groupings imply that at least, three different source rocks are present in South-Eastern (SE) Turkey. Grouping of the crude oil samples appears to be consistent with the geographic locations of the oils fields, subsurface stratigraphy as well as geology of the area.
Use of the Analysis of the Volatile Faecal Metabolome in Screening for Colorectal Cancer
2015-01-01
Diagnosis of colorectal cancer is an invasive and expensive colonoscopy, which is usually carried out after a positive screening test. Unfortunately, existing screening tests lack specificity and sensitivity, hence many unnecessary colonoscopies are performed. Here we report on a potential new screening test for colorectal cancer based on the analysis of volatile organic compounds (VOCs) in the headspace of faecal samples. Faecal samples were obtained from subjects who had a positive faecal occult blood sample (FOBT). Subjects subsequently had colonoscopies performed to classify them into low risk (non-cancer) and high risk (colorectal cancer) groups. Volatile organic compounds were analysed by selected ion flow tube mass spectrometry (SIFT-MS) and then data were analysed using both univariate and multivariate statistical methods. Ions most likely from hydrogen sulphide, dimethyl sulphide and dimethyl disulphide are statistically significantly higher in samples from high risk rather than low risk subjects. Results using multivariate methods show that the test gives a correct classification of 75% with 78% specificity and 72% sensitivity on FOBT positive samples, offering a potentially effective alternative to FOBT. PMID:26086914
Ferreira, Fábio S; Pereira, João M S; Duarte, João V; Castelo-Branco, Miguel
2017-01-01
Although voxel based morphometry studies are still the standard for analyzing brain structure, their dependence on massive univariate inferential methods is a limiting factor. A better understanding of brain pathologies can be achieved by applying inferential multivariate methods, which allow the study of multiple dependent variables, e.g. different imaging modalities of the same subject. Given the widespread use of SPM software in the brain imaging community, the main aim of this work is the implementation of massive multivariate inferential analysis as a toolbox in this software package. applied to the use of T1 and T2 structural data from diabetic patients and controls. This implementation was compared with the traditional ANCOVA in SPM and a similar multivariate GLM toolbox (MRM). We implemented the new toolbox and tested it by investigating brain alterations on a cohort of twenty-eight type 2 diabetes patients and twenty-six matched healthy controls, using information from both T1 and T2 weighted structural MRI scans, both separately - using standard univariate VBM - and simultaneously, with multivariate analyses. Univariate VBM replicated predominantly bilateral changes in basal ganglia and insular regions in type 2 diabetes patients. On the other hand, multivariate analyses replicated key findings of univariate results, while also revealing the thalami as additional foci of pathology. While the presented algorithm must be further optimized, the proposed toolbox is the first implementation of multivariate statistics in SPM8 as a user-friendly toolbox, which shows great potential and is ready to be validated in other clinical cohorts and modalities.
A new multivariate zero-adjusted Poisson model with applications to biomedicine.
Liu, Yin; Tian, Guo-Liang; Tang, Man-Lai; Yuen, Kam Chuen
2018-05-25
Recently, although advances were made on modeling multivariate count data, existing models really has several limitations: (i) The multivariate Poisson log-normal model (Aitchison and Ho, ) cannot be used to fit multivariate count data with excess zero-vectors; (ii) The multivariate zero-inflated Poisson (ZIP) distribution (Li et al., 1999) cannot be used to model zero-truncated/deflated count data and it is difficult to apply to high-dimensional cases; (iii) The Type I multivariate zero-adjusted Poisson (ZAP) distribution (Tian et al., 2017) could only model multivariate count data with a special correlation structure for random components that are all positive or negative. In this paper, we first introduce a new multivariate ZAP distribution, based on a multivariate Poisson distribution, which allows the correlations between components with a more flexible dependency structure, that is some of the correlation coefficients could be positive while others could be negative. We then develop its important distributional properties, and provide efficient statistical inference methods for multivariate ZAP model with or without covariates. Two real data examples in biomedicine are used to illustrate the proposed methods. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
ERIC Educational Resources Information Center
Yuan, Ke-Hai
2008-01-01
In the literature of mean and covariance structure analysis, noncentral chi-square distribution is commonly used to describe the behavior of the likelihood ratio (LR) statistic under alternative hypothesis. Due to the inaccessibility of the rather technical literature for the distribution of the LR statistic, it is widely believed that the…
Some Tests of Randomness with Applications
1981-02-01
freedom. For further details, the reader is referred to Gnanadesikan (1977, p. 169) wherein other relevant tests are also given, Graphical tests, as...sample from a gamma distri- bution. J. Am. Statist. Assoc. 71, 480-7. Gnanadesikan , R. (1977). Methods for Statistical Data Analysis of Multivariate
Statistical polarization in greenhouse gas emissions: Theory and evidence.
Remuzgo, Lorena; Trueba, Carmen
2017-11-01
The current debate on climate change is over whether global warming can be limited in order to lessen its impacts. In this sense, evidence of a decrease in the statistical polarization in greenhouse gas (GHG) emissions could encourage countries to establish a stronger multilateral climate change agreement. Based on the interregional and intraregional components of the multivariate generalised entropy measures (Maasoumi, 1986), Gigliarano and Mosler (2009) proposed to study the statistical polarization concept from a multivariate view. In this paper, we apply this approach to study the evolution of such phenomenon in the global distribution of the main GHGs. The empirical analysis has been carried out for the time period 1990-2011, considering an endogenous grouping of countries (Aghevli and Mehran, 1981; Davies and Shorrocks, 1989). Most of the statistical polarization indices showed a slightly increasing pattern that was similar regardless of the number of groups considered. Finally, some policy implications are commented. Copyright © 2017 Elsevier Ltd. All rights reserved.
Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo
2017-03-15
Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Gourdol, L.; Hissler, C.; Pfister, L.
2012-04-01
The Luxembourg sandstone aquifer is of major relevance for the national supply of drinking water in Luxembourg. The city of Luxembourg (20% of the country's population) gets almost 2/3 of its drinking water from this aquifer. As a consequence, the study of both the groundwater hydrochemistry, as well as its spatial and temporal variations, are considered as of highest priority. Since 2005, a monitoring network has been implemented by the Water Department of Luxembourg City, with a view to a more sustainable management of this strategic water resource. The data collected to date forms a large and complex dataset, describing spatial and temporal variations of many hydrochemical parameters. The data treatment issue is tightly connected to this kind of water monitoring programs and complex databases. Standard multivariate statistical techniques, such as principal components analysis and hierarchical cluster analysis, have been widely used as unbiased methods for extracting meaningful information from groundwater quality data and are now classically used in many hydrogeological studies, in particular to characterize temporal or spatial hydrochemical variations induced by natural and anthropogenic factors. But these classical multivariate methods deal with two-way matrices, usually parameters/sites or parameters/time, while often the dataset resulting from qualitative water monitoring programs should be seen as a datacube parameters/sites/time. Three-way matrices, such as the one we propose here, are difficult to handle and to analyse by classical multivariate statistical tools and thus should be treated with approaches dealing with three-way data structures. One possible analysis approach consists in the use of partial triadic analysis (PTA). The PTA was previously used with success in many ecological studies but never to date in the domain of hydrogeology. Applied to the dataset of the Luxembourg Sandstone aquifer, the PTA appears as a new promising statistical instrument for hydrogeologists, in particular to characterize temporal and spatial hydrochemical variations induced by natural and anthropogenic factors. This new approach for groundwater management offers potential for 1) identifying a common multivariate spatial structure, 2) untapping the different hydrochemical patterns and explaining their controlling factors and 3) analysing the temporal variability of this structure and grasping hydrochemical changes.
Quirós, Elia; Felicísimo, Angel M; Cuartero, Aurora
2009-01-01
This work proposes a new method to classify multi-spectral satellite images based on multivariate adaptive regression splines (MARS) and compares this classification system with the more common parallelepiped and maximum likelihood (ML) methods. We apply the classification methods to the land cover classification of a test zone located in southwestern Spain. The basis of the MARS method and its associated procedures are explained in detail, and the area under the ROC curve (AUC) is compared for the three methods. The results show that the MARS method provides better results than the parallelepiped method in all cases, and it provides better results than the maximum likelihood method in 13 cases out of 17. These results demonstrate that the MARS method can be used in isolation or in combination with other methods to improve the accuracy of soil cover classification. The improvement is statistically significant according to the Wilcoxon signed rank test.
ERIC Educational Resources Information Center
Joo, Soohyung; Kipp, Margaret E. I.
2015-01-01
Introduction: This study examines the structure of Web space in the field of library and information science using multivariate analysis of social tags from the Website, Delicious.com. A few studies have examined mathematical modelling of tags, mainly examining tagging in terms of tripartite graphs, pattern tracing and descriptive statistics. This…
ERIC Educational Resources Information Center
Magis, David; De Boeck, Paul
2011-01-01
We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is…
ERIC Educational Resources Information Center
Arbaugh, J. B.; Hwang, Alvin
2013-01-01
Seeking to assess the analytical rigor of empirical research in management education, this article reviews the use of multivariate statistical techniques in 85 studies of online and blended management education over the past decade and compares them with prescriptions offered by both the organization studies and educational research communities.…
Multiple Hypothesis Testing for Experimental Gingivitis Based on Wilcoxon Signed Rank Statistics
Preisser, John S.; Sen, Pranab K.; Offenbacher, Steven
2011-01-01
Dental research often involves repeated multivariate outcomes on a small number of subjects for which there is interest in identifying outcomes that exhibit change in their levels over time as well as to characterize the nature of that change. In particular, periodontal research often involves the analysis of molecular mediators of inflammation for which multivariate parametric methods are highly sensitive to outliers and deviations from Gaussian assumptions. In such settings, nonparametric methods may be favored over parametric ones. Additionally, there is a need for statistical methods that control an overall error rate for multiple hypothesis testing. We review univariate and multivariate nonparametric hypothesis tests and apply them to longitudinal data to assess changes over time in 31 biomarkers measured from the gingival crevicular fluid in 22 subjects whereby gingivitis was induced by temporarily withholding tooth brushing. To identify biomarkers that can be induced to change, multivariate Wilcoxon signed rank tests for a set of four summary measures based upon area under the curve are applied for each biomarker and compared to their univariate counterparts. Multiple hypothesis testing methods with choice of control of the false discovery rate or strong control of the family-wise error rate are examined. PMID:21984957
Fragility of Results in Ophthalmology Randomized Controlled Trials: A Systematic Review.
Shen, Carl; Shamsudeen, Isabel; Farrokhyar, Forough; Sabri, Kourosh
2018-05-01
Evidence-based medicine is guided by our interpretation of randomized controlled trials (RCTs) that address important clinical questions. Evaluation of the robustness of statistically significant outcomes adds a crucial element to the global assessment of trial findings. The purpose of this systematic review was to determine the robustness of ophthalmology RCTs through application of the Fragility Index (FI), a novel metric of the robustness of statistically significant outcomes. Systematic review. A literature search (MEDLINE) was performed for all RCTs published in top ophthalmology journals and ophthalmology-related RCTs published in high-impact journals in the past 10 years. Two reviewers independently screened 1811 identified articles for inclusion if they (1) were a human ophthalmology-related trial, (2) had a 1:1 prospective study design, and (3) reported a statistically significant dichotomous outcome in the abstract. All relevant data, including outcome, P value, number of patients in each group, number of events in each group, number of patients lost to follow-up, and trial characteristics, were extracted. The FI of each RCT was calculated and multivariate regression applied to determine predictive factors. The 156 trials had a median sample size of 91.5 (range, 13-2593) patients/eyes, and a median of 28 (range, 4-2217) events. The median FI of the included trials was 2 (range, 0-48), meaning that if 2 non-events were switched to events in the treatment group, the result would lose its statistical significance. A quarter of all trials had an FI of 1 or less, and 75% of trials had an FI of 6 or less. The FI was less than the number of missing data points in 52.6% of trials. Predictive factors for FI by multivariate regression included smaller P value (P < 0.001), larger sample size (P = 0.001), larger number of events (P = 0.011), and journal impact factor (P = 0.029). In ophthalmology trials, statistically significant dichotomous results are often fragile, meaning that a difference of only a couple of events can change the statistical significance. An application of the FI in RCTs may aid in the interpretation of results and assessment of quality of evidence. Copyright © 2017 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Ding, Hao; Cao, Ming; DuPont, Andrew W.; Scott, Larry D.; Guha, Sushovan; Singhal, Shashideep; Younes, Mamoun; Pence, Isaac; Herline, Alan; Schwartz, David; Xu, Hua; Mahadevan-Jansen, Anita; Bi, Xiaohong
2016-03-01
Inflammatory bowel disease (IBD) is an idiopathic disease that is typically characterized by chronic inflammation of the gastrointestinal tract. Recently much effort has been devoted to the development of novel diagnostic tools that can assist physicians for fast, accurate, and automated diagnosis of the disease. Previous research based on Raman spectroscopy has shown promising results in differentiating IBD patients from normal screening cases. In the current study, we examined IBD patients in vivo through a colonoscope-coupled Raman system. Optical diagnosis for IBD discrimination was conducted based on full-range spectra using multivariate statistical methods. Further, we incorporated several feature selection methods in machine learning into the classification model. The diagnostic performance for disease differentiation was significantly improved after feature selection. Our results showed that improved IBD diagnosis can be achieved using Raman spectroscopy in combination with multivariate analysis and feature selection.
Statistical Significance and Baseline Monitoring.
1984-07-01
impacted at once........................... 24 6 Observed versus nominal a levels for multivariate tests of data sets (50 runs of 4 groups each...cumulative proportion of the observations found for each nominal level. The results of the comparisons of the observed versus nominal a levels for the...a values are always higher than nominal levels. Virtual- . .,ly all nominal a levels are below 0.20. In other words, the discriminant analysis models
B. Baker; Henry Diaz; William Hargrove; Forrest Hoffman
2010-01-01
Changes in climate as projected by state-of-the-art climate models are likely to result in novel combinations of climate and topo-edaphic factors that will have substantial impacts on the distribution and persistence of natural vegetation and animal species. We have used multivariate techniques to quantify some of these changes; the...
Cost Modeling for Space Telescope
NASA Technical Reports Server (NTRS)
Stahl, H. Philip
2011-01-01
Parametric cost models are an important tool for planning missions, compare concepts and justify technology investments. This paper presents on-going efforts to develop single variable and multi-variable cost models for space telescope optical telescope assembly (OTA). These models are based on data collected from historical space telescope missions. Standard statistical methods are used to derive CERs for OTA cost versus aperture diameter and mass. The results are compared with previously published models.
Method for factor analysis of GC/MS data
Van Benthem, Mark H; Kotula, Paul G; Keenan, Michael R
2012-09-11
The method of the present invention provides a fast, robust, and automated multivariate statistical analysis of gas chromatography/mass spectroscopy (GC/MS) data sets. The method can involve systematic elimination of undesired, saturated peak masses to yield data that follow a linear, additive model. The cleaned data can then be subjected to a combination of PCA and orthogonal factor rotation followed by refinement with MCR-ALS to yield highly interpretable results.
Li, Cen; Yang, Hongxia; Xiao, Yuancan; Zhandui; Sanglao; Wang, Zhang; Ladan, Duojie; Bi, Hongtao
2016-01-01
Zuotai (gTso thal) is one of the famous drugs containing mercury in Tibetan medicine. However, little is known about the chemical substance basis of its pharmacodynamics and the intrinsic link of different samples sources so far. Given this, energy dispersive spectrometry of X-ray (EDX), scanning electron microscopy (SEM), atomic force microscopy (AFM), and powder X-ray diffraction (XRD) were used to assay the elements, micromorphology, and phase composition of nine Zuotai samples from different regions, respectively; the XRD fingerprint features of Zuotai were analyzed by multivariate statistical analysis. EDX result shows that Zuotai contains Hg, S, O, Fe, Al, Cu, and other elements. SEM and AFM observations suggest that Zuotai is a kind of ancient nanodrug. Its particles are mainly in the range of 100–800 nm, which commonly further aggregate into 1–30 μm loosely amorphous particles. XRD test shows that β-HgS, S8, and α-HgS are its main phase compositions. XRD fingerprint analysis indicates that the similarity degrees of nine samples are very high, and the results of multivariate statistical analysis are broadly consistent with sample sources. The present research has revealed the physicochemical characteristics of Zuotai, and it would play a positive role in interpreting this mysterious Tibetan drug. PMID:27738409
Li, Cen; Yang, Hongxia; Du, Yuzhi; Xiao, Yuancan; Zhandui; Sanglao; Wang, Zhang; Ladan, Duojie; Bi, Hongtao; Wei, Lixin
2016-01-01
Zuotai ( gTso thal ) is one of the famous drugs containing mercury in Tibetan medicine. However, little is known about the chemical substance basis of its pharmacodynamics and the intrinsic link of different samples sources so far. Given this, energy dispersive spectrometry of X-ray (EDX), scanning electron microscopy (SEM), atomic force microscopy (AFM), and powder X-ray diffraction (XRD) were used to assay the elements, micromorphology, and phase composition of nine Zuotai samples from different regions, respectively; the XRD fingerprint features of Zuotai were analyzed by multivariate statistical analysis. EDX result shows that Zuotai contains Hg, S, O, Fe, Al, Cu, and other elements. SEM and AFM observations suggest that Zuotai is a kind of ancient nanodrug. Its particles are mainly in the range of 100-800 nm, which commonly further aggregate into 1-30 μ m loosely amorphous particles. XRD test shows that β -HgS, S 8 , and α -HgS are its main phase compositions. XRD fingerprint analysis indicates that the similarity degrees of nine samples are very high, and the results of multivariate statistical analysis are broadly consistent with sample sources. The present research has revealed the physicochemical characteristics of Zuotai , and it would play a positive role in interpreting this mysterious Tibetan drug.
Are studies reporting significant results more likely to be published?
Koletsi, Despina; Karagianni, Anthi; Pandis, Nikolaos; Makou, Margarita; Polychronopoulou, Argy; Eliades, Theodore
2009-11-01
Our objective was to assess the hypothesis that there are variations of the proportion of articles reporting a significant effect, with a higher percentage of those articles published in journals with impact factors. The contents of 5 orthodontic journals (American Journal of Orthodontics and Dentofacial Orthopedics, Angle Orthodontist, European Journal of Orthodontics, Journal of Orthodontics, and Orthodontics and Craniofacial Research), published between 2004 and 2008, were hand-searched. Articles with statistical analysis of data were included in the study and classified into 4 categories: behavior and psychology, biomaterials and biomechanics, diagnostic procedures and treatment, and craniofacial growth, morphology, and genetics. In total, 2622 articles were examined, with 1785 included in the analysis. Univariate and multivariate logistic regression analyses were applied with statistical significance as the dependent variable, and whether the journal had an impact factor, the subject, and the year were the independent predictors. A higher percentage of articles showed significant results relative to those without significant associations (on average, 88% vs 12%) for those journals. Overall, these journals published significantly more studies with significant results, ranging from 75% to 90% (P = 0.02). Multivariate modeling showed that journals with impact factors had a 100% increased probability of publishing a statistically significant result compared with journals with no impact factor (odds ratio [OR], 1.99; 95% CI, 1.19-3.31). Compared with articles on biomaterials and biomechanics, all other subject categories showed lower probabilities of significant results. Nonsignificant findings in behavior and psychology and diagnosis and treatment were 1.8 (OR, 1.75; 95% CI, 1.51-2.67) and 3.5 (OR, 3.50; 95% CI, 2.27-5.37) times more likely to be published, respectively. Journals seem to prefer reporting significant results; this might be because of authors' perceptions of the importance of their findings and editors' and reviewers' preferences for significant results. The implication of this factor in the reliability of systematic reviews is discussed.
MGAS: a powerful tool for multivariate gene-based genome-wide association analysis.
Van der Sluis, Sophie; Dolan, Conor V; Li, Jiang; Song, Youqiang; Sham, Pak; Posthuma, Danielle; Li, Miao-Xin
2015-04-01
Standard genome-wide association studies, testing the association between one phenotype and a large number of single nucleotide polymorphisms (SNPs), are limited in two ways: (i) traits are often multivariate, and analysis of composite scores entails loss in statistical power and (ii) gene-based analyses may be preferred, e.g. to decrease the multiple testing problem. Here we present a new method, multivariate gene-based association test by extended Simes procedure (MGAS), that allows gene-based testing of multivariate phenotypes in unrelated individuals. Through extensive simulation, we show that under most trait-generating genotype-phenotype models MGAS has superior statistical power to detect associated genes compared with gene-based analyses of univariate phenotypic composite scores (i.e. GATES, multiple regression), and multivariate analysis of variance (MANOVA). Re-analysis of metabolic data revealed 32 False Discovery Rate controlled genome-wide significant genes, and 12 regions harboring multiple genes; of these 44 regions, 30 were not reported in the original analysis. MGAS allows researchers to conduct their multivariate gene-based analyses efficiently, and without the loss of power that is often associated with an incorrectly specified genotype-phenotype models. MGAS is freely available in KGG v3.0 (http://statgenpro.psychiatry.hku.hk/limx/kgg/download.php). Access to the metabolic dataset can be requested at dbGaP (https://dbgap.ncbi.nlm.nih.gov/). The R-simulation code is available from http://ctglab.nl/people/sophie_van_der_sluis. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Appolloni, L; Sandulli, R; Vetrano, G; Russo, G F
2018-05-15
Marine Protected Areas are considered key tools for conservation of coastal ecosystems. However, many reserves are characterized by several problems mainly related to inadequate zonings that often do not protect high biodiversity and propagule supply areas precluding, at the same time, economic important zones for local interests. The Gulf of Naples is here employed as a study area to assess the effects of inclusion of different conservation features and costs in reserve design process. In particular eight scenarios are developed using graph theory to identify propagule source patches and fishing and exploitation activities as costs-in-use for local population. Scenarios elaborated by MARXAN, software commonly used for marine conservation planning, are compared using multivariate analyses (MDS, PERMANOVA and PERMDISP) in order to assess input data having greatest effects on protected areas selection. MARXAN is heuristic software able to give a number of different correct results, all of them near to the best solution. Its outputs show that the most important areas to be protected, in order to ensure long-term habitat life and adequate propagule supply, are mainly located around the Gulf islands. In addition through statistical analyses it allowed us to prove that different choices on conservation features lead to statistically different scenarios. The presence of propagule supply patches forces MARXAN to select almost the same areas to protect decreasingly different MARXAN results and, thus, choices for reserves area selection. The multivariate analyses applied here to marine spatial planning proved to be very helpful allowing to identify i) how different scenario input data affect MARXAN and ii) what features have to be taken into account in study areas characterized by peculiar biological and economic interests. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Belianinov, Alex; Ganesh, Panchapakesan; Lin, Wenzhi; Sales, Brian C.; Sefat, Athena S.; Jesse, Stephen; Pan, Minghu; Kalinin, Sergei V.
2014-12-01
Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1-xSex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.
NASA Astrophysics Data System (ADS)
Brizzi, S.; Sandri, L.; Funiciello, F.; Corbi, F.; Piromallo, C.; Heuret, A.
2018-03-01
The observed maximum magnitude of subduction megathrust earthquakes is highly variable worldwide. One key question is which conditions, if any, favor the occurrence of giant earthquakes (Mw ≥ 8.5). Here we carry out a multivariate statistical study in order to investigate the factors affecting the maximum magnitude of subduction megathrust earthquakes. We find that the trench-parallel extent of subduction zones and the thickness of trench sediments provide the largest discriminating capability between subduction zones that have experienced giant earthquakes and those having significantly lower maximum magnitude. Monte Carlo simulations show that the observed spatial distribution of giant earthquakes cannot be explained by pure chance to a statistically significant level. We suggest that the combination of a long subduction zone with thick trench sediments likely promotes a great lateral rupture propagation, characteristic of almost all giant earthquakes.
Comparative Research of Navy Voluntary Education at Operational Commands
2017-03-01
return on investment, ROI, logistic regression, multivariate analysis, descriptive statistics, Markov, time-series, linear programming 15. NUMBER...21 B. DESCRIPTIVE STATISTICS TABLES ...............................................25 C. PRIVACY CONSIDERATIONS...THIS PAGE INTENTIONALLY LEFT BLANK xi LIST OF TABLES Table 1. Variables and Descriptions . Adapted from NETC (2016). .......................21
Spatial Dynamics and Determinants of County-Level Education Expenditure in China
ERIC Educational Resources Information Center
Gu, Jiafeng
2012-01-01
In this paper, a multivariate spatial autoregressive model of local public education expenditure determination with autoregressive disturbance is developed and estimated. The existence of spatial interdependence is tested using Moran's I statistic and Lagrange multiplier test statistics for both the spatial error and spatial lag models. The full…
ERIC Educational Resources Information Center
Henry, Gary T.; And Others
1992-01-01
A statistical technique is presented for developing performance standards based on benchmark groups. The benchmark groups are selected using a multivariate technique that relies on a squared Euclidean distance method. For each observation unit (a school district in the example), a unique comparison group is selected. (SLD)
Most analyses of daily time series epidemiology data relate mortality or morbidity counts to PM and other air pollutants by means of single-outcome regression models using multiple predictors, without taking into account the complex statistical structure of the predictor variable...
Challenging Conventional Wisdom for Multivariate Statistical Models with Small Samples
ERIC Educational Resources Information Center
McNeish, Daniel
2017-01-01
In education research, small samples are common because of financial limitations, logistical challenges, or exploratory studies. With small samples, statistical principles on which researchers rely do not hold, leading to trust issues with model estimates and possible replication issues when scaling up. Researchers are generally aware of such…
Cardiovascular reactivity patterns and pathways to hypertension: a multivariate cluster analysis.
Brindle, R C; Ginty, A T; Jones, A; Phillips, A C; Roseboom, T J; Carroll, D; Painter, R C; de Rooij, S R
2016-12-01
Substantial evidence links exaggerated mental stress induced blood pressure reactivity to future hypertension, but the results for heart rate reactivity are less clear. For this reason multivariate cluster analysis was carried out to examine the relationship between heart rate and blood pressure reactivity patterns and hypertension in a large prospective cohort (age range 55-60 years). Four clusters emerged with statistically different systolic and diastolic blood pressure and heart rate reactivity patterns. Cluster 1 was characterised by a relatively exaggerated blood pressure and heart rate response while the blood pressure and heart rate responses of cluster 2 were relatively modest and in line with the sample mean. Cluster 3 was characterised by blunted cardiovascular stress reactivity across all variables and cluster 4, by an exaggerated blood pressure response and modest heart rate response. Membership to cluster 4 conferred an increased risk of hypertension at 5-year follow-up (hazard ratio=2.98 (95% CI: 1.50-5.90), P<0.01) that survived adjustment for a host of potential confounding variables. These results suggest that the cardiac reactivity plays a potentially important role in the link between blood pressure reactivity and hypertension and support the use of multivariate approaches to stress psychophysiology.
Yan, Zhengbing; Kuang, Te-Hui; Yao, Yuan
2017-09-01
In recent years, multivariate statistical monitoring of batch processes has become a popular research topic, wherein multivariate fault isolation is an important step aiming at the identification of the faulty variables contributing most to the detected process abnormality. Although contribution plots have been commonly used in statistical fault isolation, such methods suffer from the smearing effect between correlated variables. In particular, in batch process monitoring, the high autocorrelations and cross-correlations that exist in variable trajectories make the smearing effect unavoidable. To address such a problem, a variable selection-based fault isolation method is proposed in this research, which transforms the fault isolation problem into a variable selection problem in partial least squares discriminant analysis and solves it by calculating a sparse partial least squares model. As different from the traditional methods, the proposed method emphasizes the relative importance of each process variable. Such information may help process engineers in conducting root-cause diagnosis. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Bludau, Sebastian; Bzdok, Danilo; Gruber, Oliver; Kohn, Nils; Riedl, Valentin; Sorg, Christian; Palomero-Gallagher, Nicola; Müller, Veronika I.; Hoffstaedter, Felix; Amunts, Katrin; Eickhoff, Simon B.
2017-01-01
Objective The heterogeneous human frontal pole has been identified as a node in the dysfunctional network of major depressive disorder. The contribution of the medial (socio-affective) versus lateral (cognitive) frontal pole to major depression pathogenesis is currently unclear. The present study performs morphometric comparison of the microstructurally informed subdivisions of human frontal pole between depressed patients and controls using both uni- and multivariate statistics. Methods Multi-site voxel- and region-based morphometric MRI analysis of 73 depressed patients and 73 matched controls without psychiatric history. Frontal pole volume was first compared between depressed patients and controls by subdivision-wise classical morphometric analysis. In a second approach, frontal pole volume was compared by subdivision-naive multivariate searchlight analysis based on support vector machines. Results Subdivision-wise morphometric analysis found a significantly smaller medial frontal pole in depressed patients with a negative correlation of disease severity and duration. Histologically uninformed multivariate voxel-wise statistics provided converging evidence for structural aberrations specific to the microstructurally defined medial area of the frontal pole in depressed patients. Conclusions Across disparate methods, we demonstrated subregion specificity in the left medial frontal pole volume in depressed patients. Indeed, the frontal pole was shown to structurally and functionally connect to other key regions in major depression pathology like the anterior cingulate cortex and the amygdala via the uncinate fasciculus. Present and previous findings consolidate the left medial portion of the frontal pole as particularly altered in major depression. PMID:26621569
Feinauer, Christoph; Procaccini, Andrea; Zecchina, Riccardo; Weigt, Martin; Pagnani, Andrea
2014-01-01
In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code. PMID:24663061
Catelani, Tiago A; Santos, João Rodrigo; Páscoa, Ricardo N M J; Pezza, Leonardo; Pezza, Helena R; Lopes, João A
2018-03-01
This work proposes the use of near infrared (NIR) spectroscopy in diffuse reflectance mode and multivariate statistical process control (MSPC) based on principal component analysis (PCA) for real-time monitoring of the coffee roasting process. The main objective was the development of a MSPC methodology able to early detect disturbances to the roasting process resourcing to real-time acquisition of NIR spectra. A total of fifteen roasting batches were defined according to an experimental design to develop the MSPC models. This methodology was tested on a set of five batches where disturbances of different nature were imposed to simulate real faulty situations. Some of these batches were used to optimize the model while the remaining was used to test the methodology. A modelling strategy based on a time sliding window provided the best results in terms of distinguishing batches with and without disturbances, resourcing to typical MSPC charts: Hotelling's T 2 and squared predicted error statistics. A PCA model encompassing a time window of four minutes with three principal components was able to efficiently detect all disturbances assayed. NIR spectroscopy combined with the MSPC approach proved to be an adequate auxiliary tool for coffee roasters to detect faults in a conventional roasting process in real-time. Copyright © 2017 Elsevier B.V. All rights reserved.
Mayer, Brian P.; DeHope, Alan J.; Mew, Daniel A.; ...
2016-03-24
Attribution of the origin of an illicit drug relies on identification of compounds indicative of its clandestine production and is a key component of many modern forensic investigations. Here, the results of these studies can yield detailed information on method of manufacture, starting material source, and final product, all critical forensic evidence. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic fentanyl, N-(1-phenylethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods, all previously published fentanyl synthetic routes or hybrid versions thereof, were studied in an effort to identify and classify route-specific signatures. A total of 160 distinctmore » compounds and inorganic species were identified using gas and liquid chromatographies combined with mass spectrometric methods (gas chromatography/mass spectrometry (GC/MS) and liquid chromatography–tandem mass spectrometry-time of-flight (LC–MS/MS-TOF)) in conjunction with inductively coupled plasma mass spectrometry (ICPMS). The complexity of the resultant data matrix urged the use of multivariate statistical analysis. Using partial least-squares-discriminant analysis (PLS-DA), 87 route-specific CAS were classified and a statistical model capable of predicting the method of fentanyl synthesis was validated and tested against CAS profiles from crude fentanyl products deposited and later extracted from two operationally relevant surfaces: stainless steel and vinyl tile. Finally, this work provides the most detailed fentanyl CAS investigation to date by using orthogonal mass spectral data to identify CAS of forensic significance for illicit drug detection, profiling, and attribution.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mayer, Brian P.; DeHope, Alan J.; Mew, Daniel A.
Attribution of the origin of an illicit drug relies on identification of compounds indicative of its clandestine production and is a key component of many modern forensic investigations. Here, the results of these studies can yield detailed information on method of manufacture, starting material source, and final product, all critical forensic evidence. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic fentanyl, N-(1-phenylethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods, all previously published fentanyl synthetic routes or hybrid versions thereof, were studied in an effort to identify and classify route-specific signatures. A total of 160 distinctmore » compounds and inorganic species were identified using gas and liquid chromatographies combined with mass spectrometric methods (gas chromatography/mass spectrometry (GC/MS) and liquid chromatography–tandem mass spectrometry-time of-flight (LC–MS/MS-TOF)) in conjunction with inductively coupled plasma mass spectrometry (ICPMS). The complexity of the resultant data matrix urged the use of multivariate statistical analysis. Using partial least-squares-discriminant analysis (PLS-DA), 87 route-specific CAS were classified and a statistical model capable of predicting the method of fentanyl synthesis was validated and tested against CAS profiles from crude fentanyl products deposited and later extracted from two operationally relevant surfaces: stainless steel and vinyl tile. Finally, this work provides the most detailed fentanyl CAS investigation to date by using orthogonal mass spectral data to identify CAS of forensic significance for illicit drug detection, profiling, and attribution.« less
Kernel canonical-correlation Granger causality for multiple time series
NASA Astrophysics Data System (ADS)
Wu, Guorong; Duan, Xujun; Liao, Wei; Gao, Qing; Chen, Huafu
2011-04-01
Canonical-correlation analysis as a multivariate statistical technique has been applied to multivariate Granger causality analysis to infer information flow in complex systems. It shows unique appeal and great superiority over the traditional vector autoregressive method, due to the simplified procedure that detects causal interaction between multiple time series, and the avoidance of potential model estimation problems. However, it is limited to the linear case. Here, we extend the framework of canonical correlation to include the estimation of multivariate nonlinear Granger causality for drawing inference about directed interaction. Its feasibility and effectiveness are verified on simulated data.
Indelicato, Serena; Bongiorno, David; Tuzzolino, Nicola; Mannino, Maria Rosaria; Muscarella, Rosalia; Fradella, Pasquale; Gargano, Maria Elena; Nicosia, Salvatore; Ceraulo, Leopoldo
2018-03-14
Multivariate analysis was performed on a large data set of groundwater and leachate samples collected during 9 years of operation of the Bellolampo municipal solid waste landfill (located above Palermo, Italy). The aim was to obtain the most likely correlations among the data. The analysis results are presented. Groundwater samples were collected in the period 2004-2013, whereas the leachate analysis refers to the period 2006-2013. For groundwater, statistical data evaluation revealed notable differences among the samples taken from the numerous wells located around the landfill. Characteristic parameters revealed by principal component analysis (PCA) were more deeply investigated, and corresponding thematic maps were drawn. The composition of the leachate was also thoroughly investigated. Several chemical macro-descriptors were calculated, and the results are presented. A comparison of PCA results for the leachate and groundwater data clearly reveals that the groundwater's main components substantially differ from those of the leachate. This outcome strongly suggests excluding leachate permeation through the multiple landfill lining.
Defining the ecological hydrology of Taiwan Rivers using multivariate statistical methods
NASA Astrophysics Data System (ADS)
Chang, Fi-John; Wu, Tzu-Ching; Tsai, Wen-Ping; Herricks, Edwin E.
2009-09-01
SummaryThe identification and verification of ecohydrologic flow indicators has found new support as the importance of ecological flow regimes is recognized in modern water resources management, particularly in river restoration and reservoir management. An ecohydrologic indicator system reflecting the unique characteristics of Taiwan's water resources and hydrology has been developed, the Taiwan ecohydrological indicator system (TEIS). A major challenge for the water resources community is using the TEIS to provide environmental flow rules that improve existing water resources management. This paper examines data from the extensive network of flow monitoring stations in Taiwan using TEIS statistics to define and refine environmental flow options in Taiwan. Multivariate statistical methods were used to examine TEIS statistics for 102 stations representing the geographic and land use diversity of Taiwan. The Pearson correlation coefficient showed high multicollinearity between the TEIS statistics. Watersheds were separated into upper and lower-watershed locations. An analysis of variance indicated significant differences between upstream, more natural, and downstream, more developed, locations in the same basin with hydrologic indicator redundancy in flow change and magnitude statistics. Issues of multicollinearity were examined using a Principal Component Analysis (PCA) with the first three components related to general flow and high/low flow statistics, frequency and time statistics, and quantity statistics. These principle components would explain about 85% of the total variation. A major conclusion is that managers must be aware of differences among basins, as well as differences within basins that will require careful selection of management procedures to achieve needed flow regimes.
NASA Astrophysics Data System (ADS)
Most, Sebastian; Nowak, Wolfgang; Bijeljic, Branko
2015-04-01
Fickian transport in groundwater flow is the exception rather than the rule. Transport in porous media is frequently simulated via particle methods (i.e. particle tracking random walk (PTRW) or continuous time random walk (CTRW)). These methods formulate transport as a stochastic process of particle position increments. At the pore scale, geometry and micro-heterogeneities prohibit the commonly made assumption of independent and normally distributed increments to represent dispersion. Many recent particle methods seek to loosen this assumption. Hence, it is important to get a better understanding of the processes at pore scale. For our analysis we track the positions of 10.000 particles migrating through the pore space over time. The data we use come from micro CT scans of a homogeneous sandstone and encompass about 10 grain sizes. Based on those images we discretize the pore structure and simulate flow at the pore scale based on the Navier-Stokes equation. This flow field realistically describes flow inside the pore space and we do not need to add artificial dispersion during the transport simulation. Next, we use particle tracking random walk and simulate pore-scale transport. Finally, we use the obtained particle trajectories to do a multivariate statistical analysis of the particle motion at the pore scale. Our analysis is based on copulas. Every multivariate joint distribution is a combination of its univariate marginal distributions. The copula represents the dependence structure of those univariate marginals and is therefore useful to observe correlation and non-Gaussian interactions (i.e. non-Fickian transport). The first goal of this analysis is to better understand the validity regions of commonly made assumptions. We are investigating three different transport distances: 1) The distance where the statistical dependence between particle increments can be modelled as an order-one Markov process. This would be the Markovian distance for the process, where the validity of yet-unexplored non-Gaussian-but-Markovian random walks start. 2) The distance where bivariate statistical dependence simplifies to a multi-Gaussian dependence based on simple linear correlation (validity of correlated PTRW/CTRW). 3) The distance of complete statistical independence (validity of classical PTRW/CTRW). The second objective is to reveal characteristic dependencies influencing transport the most. Those dependencies can be very complex. Copulas are highly capable of representing linear dependence as well as non-linear dependence. With that tool we are able to detect persistent characteristics dominating transport even across different scales. The results derived from our experimental data set suggest that there are many more non-Fickian aspects of pore-scale transport than the univariate statistics of longitudinal displacements. Non-Fickianity can also be found in transverse displacements, and in the relations between increments at different time steps. Also, the found dependence is non-linear (i.e. beyond simple correlation) and persists over long distances. Thus, our results strongly support the further refinement of techniques like correlated PTRW or correlated CTRW towards non-linear statistical relations.
Mathematical background and attitudes toward statistics in a sample of Spanish college students.
Carmona, José; Martínez, Rafael J; Sánchez, Manuel
2005-08-01
To examine the relation of mathematical background and initial attitudes toward statistics of Spanish college students in social sciences the Survey of Attitudes Toward Statistics was given to 827 students. Multivariate analyses tested the effects of two indicators of mathematical background (amount of exposure and achievement in previous courses) on the four subscales. Analysis suggested grades in previous courses are more related to initial attitudes toward statistics than the number of mathematics courses taken. Mathematical background was related with students' affective responses to statistics but not with their valuing of statistics. Implications of possible research are discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ladd-Lively, Jennifer L
2014-01-01
The objective of this work was to determine the feasibility of using on-line multivariate statistical process control (MSPC) for safeguards applications in natural uranium conversion plants. Multivariate statistical process control is commonly used throughout industry for the detection of faults. For safeguards applications in uranium conversion plants, faults could include the diversion of intermediate products such as uranium dioxide, uranium tetrafluoride, and uranium hexafluoride. This study was limited to a 100 metric ton of uranium (MTU) per year natural uranium conversion plant (NUCP) using the wet solvent extraction method for the purification of uranium ore concentrate. A key component inmore » the multivariate statistical methodology is the Principal Component Analysis (PCA) approach for the analysis of data, development of the base case model, and evaluation of future operations. The PCA approach was implemented through the use of singular value decomposition of the data matrix where the data matrix represents normal operation of the plant. Component mole balances were used to model each of the process units in the NUCP. However, this approach could be applied to any data set. The monitoring framework developed in this research could be used to determine whether or not a diversion of material has occurred at an NUCP as part of an International Atomic Energy Agency (IAEA) safeguards system. This approach can be used to identify the key monitoring locations, as well as locations where monitoring is unimportant. Detection limits at the key monitoring locations can also be established using this technique. Several faulty scenarios were developed to test the monitoring framework after the base case or normal operating conditions of the PCA model were established. In all of the scenarios, the monitoring framework was able to detect the fault. Overall this study was successful at meeting the stated objective.« less
Integrated environmental monitoring and multivariate data analysis-A case study.
Eide, Ingvar; Westad, Frank; Nilssen, Ingunn; de Freitas, Felipe Sales; Dos Santos, Natalia Gomes; Dos Santos, Francisco; Cabral, Marcelo Montenegro; Bicego, Marcia Caruso; Figueira, Rubens; Johnsen, Ståle
2017-03-01
The present article describes integration of environmental monitoring and discharge data and interpretation using multivariate statistics, principal component analysis (PCA), and partial least squares (PLS) regression. The monitoring was carried out at the Peregrino oil field off the coast of Brazil. One sensor platform and 3 sediment traps were placed on the seabed. The sensors measured current speed and direction, turbidity, temperature, and conductivity. The sediment trap samples were used to determine suspended particulate matter that was characterized with respect to a number of chemical parameters (26 alkanes, 16 PAHs, N, C, calcium carbonate, and Ba). Data on discharges of drill cuttings and water-based drilling fluid were provided on a daily basis. The monitoring was carried out during 7 campaigns from June 2010 to October 2012, each lasting 2 to 3 months due to the capacity of the sediment traps. The data from the campaigns were preprocessed, combined, and interpreted using multivariate statistics. No systematic difference could be observed between campaigns or traps despite the fact that the first campaign was carried out before drilling, and 1 of 3 sediment traps was located in an area not expected to be influenced by the discharges. There was a strong covariation between suspended particulate matter and total N and organic C suggesting that the majority of the sediment samples had a natural and biogenic origin. Furthermore, the multivariate regression showed no correlation between discharges of drill cuttings and sediment trap or turbidity data taking current speed and direction into consideration. Because of this lack of correlation with discharges from the drilling location, a more detailed evaluation of chemical indicators providing information about origin was carried out in addition to numerical modeling of dispersion and deposition. The chemical indicators and the modeling of dispersion and deposition support the conclusions from the multivariate statistics. Integr Environ Assess Manag 2017;13:387-395. © 2016 SETAC. © 2016 SETAC.
Multivariate Methods for Meta-Analysis of Genetic Association Studies.
Dimou, Niki L; Pantavou, Katerina G; Braliou, Georgia G; Bagos, Pantelis G
2018-01-01
Multivariate meta-analysis of genetic association studies and genome-wide association studies has received a remarkable attention as it improves the precision of the analysis. Here, we review, summarize and present in a unified framework methods for multivariate meta-analysis of genetic association studies and genome-wide association studies. Starting with the statistical methods used for robust analysis and genetic model selection, we present in brief univariate methods for meta-analysis and we then scrutinize multivariate methodologies. Multivariate models of meta-analysis for a single gene-disease association studies, including models for haplotype association studies, multiple linked polymorphisms and multiple outcomes are discussed. The popular Mendelian randomization approach and special cases of meta-analysis addressing issues such as the assumption of the mode of inheritance, deviation from Hardy-Weinberg Equilibrium and gene-environment interactions are also presented. All available methods are enriched with practical applications and methodologies that could be developed in the future are discussed. Links for all available software implementing multivariate meta-analysis methods are also provided.
NASA Astrophysics Data System (ADS)
Baker, Paul T.; Caudill, Sarah; Hodge, Kari A.; Talukder, Dipongkar; Capano, Collin; Cornish, Neil J.
2015-03-01
Searches for gravitational waves produced by coalescing black hole binaries with total masses ≳25 M⊙ use matched filtering with templates of short duration. Non-Gaussian noise bursts in gravitational wave detector data can mimic short signals and limit the sensitivity of these searches. Previous searches have relied on empirically designed statistics incorporating signal-to-noise ratio and signal-based vetoes to separate gravitational wave candidates from noise candidates. We report on sensitivity improvements achieved using a multivariate candidate ranking statistic derived from a supervised machine learning algorithm. We apply the random forest of bagged decision trees technique to two separate searches in the high mass (≳25 M⊙ ) parameter space. For a search which is sensitive to gravitational waves from the inspiral, merger, and ringdown of binary black holes with total mass between 25 M⊙ and 100 M⊙ , we find sensitive volume improvements as high as 70±13%-109±11% when compared to the previously used ranking statistic. For a ringdown-only search which is sensitive to gravitational waves from the resultant perturbed intermediate mass black hole with mass roughly between 10 M⊙ and 600 M⊙ , we find sensitive volume improvements as high as 61±4%-241±12% when compared to the previously used ranking statistic. We also report how sensitivity improvements can differ depending on mass regime, mass ratio, and available data quality information. Finally, we describe the techniques used to tune and train the random forest classifier that can be generalized to its use in other searches for gravitational waves.
NASA Astrophysics Data System (ADS)
Vrac, Mathieu
2018-06-01
Climate simulations often suffer from statistical biases with respect to observations or reanalyses. It is therefore common to correct (or adjust) those simulations before using them as inputs into impact models. However, most bias correction (BC) methods are univariate and so do not account for the statistical dependences linking the different locations and/or physical variables of interest. In addition, they are often deterministic, and stochasticity is frequently needed to investigate climate uncertainty and to add constrained randomness to climate simulations that do not possess a realistic variability. This study presents a multivariate method of rank resampling for distributions and dependences (R2D2) bias correction allowing one to adjust not only the univariate distributions but also their inter-variable and inter-site dependence structures. Moreover, the proposed R2D2 method provides some stochasticity since it can generate as many multivariate corrected outputs as the number of statistical dimensions (i.e., number of grid cell × number of climate variables) of the simulations to be corrected. It is based on an assumption of stability in time of the dependence structure - making it possible to deal with a high number of statistical dimensions - that lets the climate model drive the temporal properties and their changes in time. R2D2 is applied on temperature and precipitation reanalysis time series with respect to high-resolution reference data over the southeast of France (1506 grid cell). Bivariate, 1506-dimensional and 3012-dimensional versions of R2D2 are tested over a historical period and compared to a univariate BC. How the different BC methods behave in a climate change context is also illustrated with an application to regional climate simulations over the 2071-2100 period. The results indicate that the 1d-BC basically reproduces the climate model multivariate properties, 2d-R2D2 is only satisfying in the inter-variable context, 1506d-R2D2 strongly improves inter-site properties and 3012d-R2D2 is able to account for both. Applications of the proposed R2D2 method to various climate datasets are relevant for many impact studies. The perspectives of improvements are numerous, such as introducing stochasticity in the dependence itself, questioning its stability assumption, and accounting for temporal properties adjustment while including more physics in the adjustment procedures.
Warton, David I; Thibaut, Loïc; Wang, Yi Alice
2017-01-01
Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.
Thibaut, Loïc; Wang, Yi Alice
2017-01-01
Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)—common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of “model-free bootstrap”, adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods. PMID:28738071
Multivariate pattern dependence
Saxe, Rebecca
2017-01-01
When we perform a cognitive task, multiple brain regions are engaged. Understanding how these regions interact is a fundamental step to uncover the neural bases of behavior. Most research on the interactions between brain regions has focused on the univariate responses in the regions. However, fine grained patterns of response encode important information, as shown by multivariate pattern analysis. In the present article, we introduce and apply multivariate pattern dependence (MVPD): a technique to study the statistical dependence between brain regions in humans in terms of the multivariate relations between their patterns of responses. MVPD characterizes the responses in each brain region as trajectories in region-specific multidimensional spaces, and models the multivariate relationship between these trajectories. We applied MVPD to the posterior superior temporal sulcus (pSTS) and to the fusiform face area (FFA), using a searchlight approach to reveal interactions between these seed regions and the rest of the brain. Across two different experiments, MVPD identified significant statistical dependence not detected by standard functional connectivity. Additionally, MVPD outperformed univariate connectivity in its ability to explain independent variance in the responses of individual voxels. In the end, MVPD uncovered different connectivity profiles associated with different representational subspaces of FFA: the first principal component of FFA shows differential connectivity with occipital and parietal regions implicated in the processing of low-level properties of faces, while the second and third components show differential connectivity with anterior temporal regions implicated in the processing of invariant representations of face identity. PMID:29155809
Statistical Knowledge for Teaching: Exploring it in the Classroom
ERIC Educational Resources Information Center
Burgess, Tim
2009-01-01
This paper first reports on the methodology of a study of teacher knowledge for statistics, conducted in a classroom at the primary school level. The methodology included videotaping of a sequence of lessons that involved students in investigating multivariate data sets, followed up by audiotaped interviews with each teacher. These stimulated…
Performance of the S - [chi][squared] Statistic for Full-Information Bifactor Models
ERIC Educational Resources Information Center
Li, Ying; Rupp, Andre A.
2011-01-01
This study investigated the Type I error rate and power of the multivariate extension of the S - [chi][squared] statistic using unidimensional and multidimensional item response theory (UIRT and MIRT, respectively) models as well as full-information bifactor (FI-bifactor) models through simulation. Manipulated factors included test length, sample…
2003-07-01
4, Gnanadesikan , 1977). An entity whose measured features fall into one of the regions is classified accordingly. For the approaches we discuss here... Gnanadesikan , R. 1977. Methods for Statistical Data Analysis of Multivariate Observations. John Wiley & Sons, New York. Hassig, N. L., O’Brien, R. F
Conceptual and statistical problems associated with the use of diversity indices in ecology.
Barrantes, Gilbert; Sandoval, Luis
2009-09-01
Diversity indices, particularly the Shannon-Wiener index, have extensively been used in analyzing patterns of diversity at different geographic and ecological scales. These indices have serious conceptual and statistical problems which make comparisons of species richness or species abundances across communities nearly impossible. There is often no a single statistical method that retains all information needed to answer even a simple question. However, multivariate analyses could be used instead of diversity indices, such as cluster analyses or multiple regressions. More complex multivariate analyses, such as Canonical Correspondence Analysis, provide very valuable information on environmental variables associated to the presence and abundance of the species in a community. In addition, particular hypotheses associated to changes in species richness across localities, or change in abundance of one, or a group of species can be tested using univariate, bivariate, and/or rarefaction statistical tests. The rarefaction method has proved to be robust to standardize all samples to a common size. Even the simplest method as reporting the number of species per taxonomic category possibly provides more information than a diversity index value.
Exploratory Multivariate Analysis. A Graphical Approach.
1981-01-01
Gnanadesikan , 1977) but we feel that these should be used with great caution unless one really has good reason to believe that the data came from such a...are referred to Gnanadesikan (1977). The present author hopes that the convenience of a single summary or significance level will not deter his readers...fit of a harmonic model to meteorological data. (In preparation). Gnanadesikan , R. (1977). Methods for Statistical Data Analysis of Multivariate
Nonlinear multivariate and time series analysis by neural network methods
NASA Astrophysics Data System (ADS)
Hsieh, William W.
2004-03-01
Methods in multivariate statistical analysis are essential for working with large amounts of geophysical data, data from observational arrays, from satellites, or from numerical model output. In classical multivariate statistical analysis, there is a hierarchy of methods, starting with linear regression at the base, followed by principal component analysis (PCA) and finally canonical correlation analysis (CCA). A multivariate time series method, the singular spectrum analysis (SSA), has been a fruitful extension of the PCA technique. The common drawback of these classical methods is that only linear structures can be correctly extracted from the data. Since the late 1980s, neural network methods have become popular for performing nonlinear regression and classification. More recently, neural network methods have been extended to perform nonlinear PCA (NLPCA), nonlinear CCA (NLCCA), and nonlinear SSA (NLSSA). This paper presents a unified view of the NLPCA, NLCCA, and NLSSA techniques and their applications to various data sets of the atmosphere and the ocean (especially for the El Niño-Southern Oscillation and the stratospheric quasi-biennial oscillation). These data sets reveal that the linear methods are often too simplistic to describe real-world systems, with a tendency to scatter a single oscillatory phenomenon into numerous unphysical modes or higher harmonics, which can be largely alleviated in the new nonlinear paradigm.
Multivariate analysis of cytokine profiles in pregnancy complications.
Azizieh, Fawaz; Dingle, Kamaludin; Raghupathy, Raj; Johnson, Kjell; VanderPlas, Jacob; Ansari, Ali
2018-03-01
The immunoregulation to tolerate the semiallogeneic fetus during pregnancy includes a harmonious dynamic balance between anti- and pro-inflammatory cytokines. Several earlier studies reported significantly different levels and/or ratios of several cytokines in complicated pregnancy as compared to normal pregnancy. However, as cytokines operate in networks with potentially complex interactions, it is also interesting to compare groups with multi-cytokine data sets, with multivariate analysis. Such analysis will further examine how great the differences are, and which cytokines are more different than others. Various multivariate statistical tools, such as Cramer test, classification and regression trees, partial least squares regression figures, 2-dimensional Kolmogorov-Smirmov test, principal component analysis and gap statistic, were used to compare cytokine data of normal vs anomalous groups of different pregnancy complications. Multivariate analysis assisted in examining if the groups were different, how strongly they differed, in what ways they differed and further reported evidence for subgroups in 1 group (pregnancy-induced hypertension), possibly indicating multiple causes for the complication. This work contributes to a better understanding of cytokines interaction and may have important implications on targeting cytokine balance modulation or design of future medications or interventions that best direct management or prevention from an immunological approach. © 2018 The Authors. American Journal of Reproductive Immunology Published by John Wiley & Sons Ltd.
Is fertility falling in Zimbabwe?
Udjo, E O
1996-01-01
With an unequalled contraceptive prevalence rate in sub-Saharan Africa, of 43% among currently married women in Zimbabwe, the Central Statistical Office (1989) observed that fertility has declined sharply in recent years. Using data from several surveys on Zimbabwe, especially the birth histories of the Zimbabwe Demographic and Health Survey, this study examines fertility trends in Zimbabwe. The results show that the fertility decline in Zimbabwe is modest and that the decline is concentrated among high order births. Multivariate analysis did not show a statistically significant effect of contraception on fertility, partly because a high proportion of Zimbabwean women in the reproductive age group never use contraception due to prevailing pronatalist attitudes in the country.
Pepi, Salvatore; Coletta, Antonio; Crupi, Pasquale; Leis, Marilena; Russo, Sabrina; Sansone, Luigi; Tassinari, Renzo; Chicca, Milvia; Vaccaro, Carmela
2016-04-01
The present geochemical study concerns the impact of viticultural practices in the chemical composition of the grape cultivar "Negroamaro" in Apulia, a southern Italian region renowned for its quality wine. Three types of soil management (SM), two cover cropping with different mixtures, and a soil tillage were considered. For each SM, the vines were irrigated according to two irrigation levels. Chemical composition of soil and of berries of Vitis vinifera cultivar "Negroamaro" were analyzed by X-ray fluorescence, inductively coupled plasma-mass spectrometry and multivariate statistics (linear discrimination analysis). In detail, we investigated major and trace elements behavior in the soil according to irrigation levels, the related index of bioaccumulation (BA) and the relationship between trace element concentration and soil management in "Negroamaro" grapes. The results indicate that soil management affects the mobility of major and trace elements. A specific assimilation of these elements in grapes from vines grown under different soil management was confirmed by BA. Multivariate statistics allowed to associate the vines to the type of soil management. This geochemical characterization of elements could be useful to develop fingerprints of vines of the cultivar "Negroamaro" according to soil management and geographical origin.
1H NMR-based metabolic profiling for evaluating poppy seed rancidity and brewing.
Jawień, Ewa; Ząbek, Adam; Deja, Stanisław; Łukaszewicz, Marcin; Młynarz, Piotr
2015-12-01
Poppy seeds are widely used in household and commercial confectionery. The aim of this study was to demonstrate the application of metabolic profiling for industrial monitoring of the molecular changes which occur during minced poppy seed rancidity and brewing processes performed on raw seeds. Both forms of poppy seeds were obtained from a confectionery company. Proton nuclear magnetic resonance (1H NMR) was applied as the analytical method of choice together with multivariate statistical data analysis. Metabolic fingerprinting was applied as a bioprocess control tool to monitor rancidity with the trajectory of change and brewing progressions. Low molecular weight compounds were found to be statistically significant biomarkers of these bioprocesses. Changes in concentrations of chemical compounds were explained relative to the biochemical processes and external conditions. The obtained results provide valuable and comprehensive information to gain a better understanding of the biology of rancidity and brewing processes, while demonstrating the potential for applying NMR spectroscopy combined with multivariate data analysis tools for quality control in food industries involved in the processing of oilseeds. This precious and versatile information gives a better understanding of the biology of these processes.
Beyond Reading Alone: The Relationship Between Aural Literacy And Asthma Management
Rosenfeld, Lindsay; Rudd, Rima; Emmons, Karen M.; Acevedo-García, Dolores; Martin, Laurie; Buka, Stephen
2010-01-01
Objectives To examine the relationship between literacy and asthma management with a focus on the oral exchange. Methods Study participants, all of whom reported asthma, were drawn from the New England Family Study (NEFS), an examination of links between education and health. NEFS data included reading, oral (speaking), and aural (listening) literacy measures. An additional survey was conducted with this group of study participants related to asthma issues, particularly asthma management. Data analysis focused on bivariate and multivariable logistic regression. Results In bivariate logistic regression models exploring aural literacy, there was a statistically significant association between those participants with lower aural literacy skills and less successful asthma management (OR:4.37, 95%CI:1.11, 17.32). In multivariable logistic regression analyses, controlling for gender, income, and race in separate models (one-at-a-time), there remained a statistically significant association between those participants with lower aural literacy skills and less successful asthma management. Conclusion Lower aural literacy skills seem to complicate asthma management capabilities. Practice Implications Greater attention to the oral exchange, in particular the listening skills highlighted by aural literacy, as well as other related literacy skills may help us develop strategies for clear communication related to asthma management. PMID:20399060
Valverde-Som, Lucia; Ruiz-Samblás, Cristina; Rodríguez-García, Francisco P; Cuadros-Rodríguez, Luis
2018-02-09
Virgin olive oil is the only food product for which sensory analysis is regulated to classify it in different quality categories. To harmonize the results of the sensorial method, the use of standards or reference materials is crucial. The stability of sensory reference materials is required to enable their suitable control, aiming to confirm that their specific target values are maintained on an ongoing basis. Currently, such stability is monitored by means of sensory analysis and the sensory panels are in the paradoxical situation of controlling the standards that are devoted to controlling the panels. In the present study, several approaches based on similarity analysis are exploited. For each approach, the specific methodology to build a proper multivariate control chart to monitor the stability of the sensory properties is explained and discussed. The normalized Euclidean and Mahalanobis distances, the so-called nearness and hardiness indices respectively, have been defined as new similarity indices to range the values from 0 to 1. Also, the squared mean from Hotelling's T 2 -statistic and Q 2 -statistic has been proposed as another similarity index. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.
Rathi, Monika; Ahrenkiel, S P; Carapella, J J; Wanlass, M W
2013-02-01
Given an unknown multicomponent alloy, and a set of standard compounds or alloys of known composition, can one improve upon popular standards-based methods for energy dispersive X-ray (EDX) spectrometry to quantify the elemental composition of the unknown specimen? A method is presented here for determining elemental composition of alloys using transmission electron microscopy-based EDX with appropriate standards. The method begins with a discrete set of related reference standards of known composition, applies multivariate statistical analysis to those spectra, and evaluates the compositions with a linear matrix algebra method to relate the spectra to elemental composition. By using associated standards, only limited assumptions about the physical origins of the EDX spectra are needed. Spectral absorption corrections can be performed by providing an estimate of the foil thickness of one or more reference standards. The technique was applied to III-V multicomponent alloy thin films: composition and foil thickness were determined for various III-V alloys. The results were then validated by comparing with X-ray diffraction and photoluminescence analysis, demonstrating accuracy of approximately 1% in atomic fraction.
Stekolnikov, Alexandr A; Klimov, Pavel B
2010-09-01
We revise chiggers belonging to the minuta-species group (genus Neotrombicula Hirst, 1925) from the Palaearctic using size-free multivariate morphometrics. This approach allowed us to resolve several diagnostic problems. We show that the widely distributed Neotrombicula scrupulosa Kudryashova, 1993 forms three spatially and ecologically isolated groups different from each other in size or shape (morphometric property) only: specimens from the Caucasus are distinct from those from Asia in shape, whereas the Asian specimens from plains and mountains are different from each other in size. We developed a multivariate classification model to separate three closely related species: N. scrupulosa, N. lubrica Kudryashova, 1993 and N. minuta Schluger, 1966. This model is based on five shape variables selected from an initial 17 variables by a best subset analysis using a custom size-correction subroutine. The variable selection procedure slightly improved the predictive power of the model, suggesting that it not only removed redundancy but also reduced 'noise' in the dataset. The overall classification accuracy of this model is 96.2, 96.2 and 95.5%, as estimated by internal validation, external validation and jackknife statistics, respectively. Our analyses resulted in one new synonymy: N. dimidiata Stekolnikov, 1995 is considered to be a synonym of N. lubrica. Both N. scrupulosa and N. lubrica are recorded from new localities. A key to species of the minuta-group incorporating results from our multivariate analyses is presented.
Robust tests for multivariate factorial designs under heteroscedasticity.
Vallejo, Guillermo; Ato, Manuel
2012-06-01
The question of how to analyze several multivariate normal mean vectors when normality and covariance homogeneity assumptions are violated is considered in this article. For the two-way MANOVA layout, we address this problem adapting results presented by Brunner, Dette, and Munk (BDM; 1997) and Vallejo and Ato (modified Brown-Forsythe [MBF]; 2006) in the context of univariate factorial and split-plot designs and a multivariate version of the linear model (MLM) to accommodate heterogeneous data. Furthermore, we compare these procedures with the Welch-James (WJ) approximate degrees of freedom multivariate statistics based on ordinary least squares via Monte Carlo simulation. Our numerical studies show that of the methods evaluated, only the modified versions of the BDM and MBF procedures were robust to violations of underlying assumptions. The MLM approach was only occasionally liberal, and then by only a small amount, whereas the WJ procedure was often liberal if the interactive effects were involved in the design, particularly when the number of dependent variables increased and total sample size was small. On the other hand, it was also found that the MLM procedure was uniformly more powerful than its most direct competitors. The overall success rate was 22.4% for the BDM, 36.3% for the MBF, and 45.0% for the MLM.
Linear Least Squares for Correlated Data
NASA Technical Reports Server (NTRS)
Dean, Edwin B.
1988-01-01
Throughout the literature authors have consistently discussed the suspicion that regression results were less than satisfactory when the independent variables were correlated. Camm, Gulledge, and Womer, and Womer and Marcotte provide excellent applied examples of these concerns. Many authors have obtained partial solutions for this problem as discussed by Womer and Marcotte and Wonnacott and Wonnacott, which result in generalized least squares algorithms to solve restrictive cases. This paper presents a simple but relatively general multivariate method for obtaining linear least squares coefficients which are free of the statistical distortion created by correlated independent variables.
Lotan, Tamara L.; Wei, Wei; Morais, Carlos L.; Hawley, Sarah T.; Fazli, Ladan; Hurtado-Coll, Antonio; Troyer, Dean; McKenney, Jesse K.; Simko, Jeffrey; Carroll, Peter R.; Gleave, Martin; Lance, Raymond; Lin, Daniel W.; Nelson, Peter S.; Thompson, Ian M.; True, Lawrence D.; Feng, Ziding; Brooks, James D.
2015-01-01
Background PTEN is the most commonly deleted tumor suppressor gene in primary prostate cancer (PCa) and its loss is associated with poor clinical outcomes and ERG gene rearrangement. Objective We tested whether PTEN loss is associated with shorter recurrence-free survival (RFS) in surgically treated PCa patients with known ERG status. Design, setting, and participants A genetically validated, automated PTEN immunohistochemistry (IHC) protocol was used for 1275 primary prostate tumors from the Canary Foundation retrospective PCa tissue microarray cohort to assess homogeneous (in all tumor tissue sampled) or heterogeneous (in a subset of tumor tissue sampled) PTEN loss. ERG status as determined by a genetically validated IHC assay was available for a subset of 938 tumors. Outcome measurements and statistical analysis Associations between PTEN and ERG status were assessed using Fisher’s exact test. Kaplan-Meier and multivariate weighted Cox proportional models for RFS were constructed. Results and limitations When compared to intact PTEN, homogeneous (hazard ratio [HR] 1.66, p = 0.001) but not heterogeneous (HR 1.24, p = 0.14) PTEN loss was significantly associated with shorter RFS in multivariate models. Among ERG-positive tumors, homogeneous (HR 3.07, p < 0.0001) but not heterogeneous (HR 1.46, p = 0.10) PTEN loss was significantly associated with shorter RFS. Among ERG-negative tumors, PTEN did not reach significance for inclusion in the final multivariate models. The interaction term for PTEN and ERG status with respect to RFS did not reach statistical significance (p = 0.11) for the current sample size. Conclusions These data suggest that PTEN is a useful prognostic biomarker and that there is no statistically significant interaction between PTEN and ERG status for RFS. Patient summary We found that loss of the PTEN tumor suppressor gene in prostate tumors as assessed by tissue staining is correlated with shorter time to prostate cancer recurrence after radical prostatectomy. PMID:27617307
Hamchevici, Carmen; Udrea, Ion
2013-11-01
The concept of basin-wide Joint Danube Survey (JDS) was launched by the International Commission for the Protection of the Danube River (ICPDR) as a tool for investigative monitoring under the Water Framework Directive (WFD), with a frequency of 6 years. The first JDS was carried out in 2001 and its success in providing key information for characterisation of the Danube River Basin District as required by WFD lead to the organisation of the second JDS in 2007, which was the world's biggest river research expedition in that year. The present paper presents an approach for improving the survey strategy for the next planned survey JDS3 (2013) by means of several multivariate statistical techniques. In order to design the optimum structure in terms of parameters and sampling sites, principal component analysis (PCA), factor analysis (FA) and cluster analysis were applied on JDS2 data for 13 selected physico-chemical and one biological element measured in 78 sampling sites located on the main course of the Danube. Results from PCA/FA showed that most of the dataset variance (above 75%) was explained by five varifactors loaded with 8 out of 14 variables: physical (transparency and total suspended solids), relevant nutrients (N-nitrates and P-orthophosphates), feedback effects of primary production (pH, alkalinity and dissolved oxygen) and algal biomass. Taking into account the representation of the factor scores given by FA versus sampling sites and the major groups generated by the clustering procedure, the spatial network of the next survey could be carefully tailored, leading to a decreasing of sampling sites by more than 30%. The approach of target oriented sampling strategy based on the selected multivariate statistics can provide a strong reduction in dimensionality of the original data and corresponding costs as well, without any loss of information.
Wang, Longfei; Lee, Sungyoung; Gim, Jungsoo; Qiao, Dandi; Cho, Michael; Elston, Robert C; Silverman, Edwin K; Won, Sungho
2016-09-01
Family-based designs have been repeatedly shown to be powerful in detecting the significant rare variants associated with human diseases. Furthermore, human diseases are often defined by the outcomes of multiple phenotypes, and thus we expect multivariate family-based analyses may be very efficient in detecting associations with rare variants. However, few statistical methods implementing this strategy have been developed for family-based designs. In this report, we describe one such implementation: the multivariate family-based rare variant association tool (mFARVAT). mFARVAT is a quasi-likelihood-based score test for rare variant association analysis with multiple phenotypes, and tests both homogeneous and heterogeneous effects of each variant on multiple phenotypes. Simulation results show that the proposed method is generally robust and efficient for various disease models, and we identify some promising candidate genes associated with chronic obstructive pulmonary disease. The software of mFARVAT is freely available at http://healthstat.snu.ac.kr/software/mfarvat/, implemented in C++ and supported on Linux and MS Windows. © 2016 WILEY PERIODICALS, INC.
Proton radius from electron scattering data
NASA Astrophysics Data System (ADS)
Higinbotham, Douglas W.; Kabir, Al Amin; Lin, Vincent; Meekins, David; Norum, Blaine; Sawatzky, Brad
2016-05-01
Background: The proton charge radius extracted from recent muonic hydrogen Lamb shift measurements is significantly smaller than that extracted from atomic hydrogen and electron scattering measurements. The discrepancy has become known as the proton radius puzzle. Purpose: In an attempt to understand the discrepancy, we review high-precision electron scattering results from Mainz, Jefferson Lab, Saskatoon, and Stanford. Methods: We make use of stepwise regression techniques using the F test as well as the Akaike information criterion to systematically determine the predictive variables to use for a given set and range of electron scattering data as well as to provide multivariate error estimates. Results: Starting with the precision, low four-momentum transfer (Q2) data from Mainz (1980) and Saskatoon (1974), we find that a stepwise regression of the Maclaurin series using the F test as well as the Akaike information criterion justify using a linear extrapolation which yields a value for the proton radius that is consistent with the result obtained from muonic hydrogen measurements. Applying the same Maclaurin series and statistical criteria to the 2014 Rosenbluth results on GE from Mainz, we again find that the stepwise regression tends to favor a radius consistent with the muonic hydrogen radius but produces results that are extremely sensitive to the range of data included in the fit. Making use of the high-Q2 data on GE to select functions which extrapolate to high Q2, we find that a Padé (N =M =1 ) statistical model works remarkably well, as does a dipole function with a 0.84 fm radius, GE(Q2) =(1+Q2/0.66 GeV2) -2 . Conclusions: Rigorous applications of stepwise regression techniques and multivariate error estimates result in the extraction of a proton charge radius that is consistent with the muonic hydrogen result of 0.84 fm; either from linear extrapolation of the extremely-low-Q2 data or by use of the Padé approximant for extrapolation using a larger range of data. Thus, based on a purely statistical analysis of electron scattering data, we conclude that the electron scattering results and the muonic hydrogen results are consistent. It is the atomic hydrogen results that are the outliers.
Gunn, Andrew J; Sheth, Rahul A; Luber, Brandon; Huynh, Minh-Huy; Rachamreddy, Niranjan R; Kalva, Sanjeeva P
2017-01-01
The purpse of this study was to evaluate the ability of various radiologic response criteria to predict patient outcomes after trans-arterial chemo-embolization with drug-eluting beads (DEB-TACE) in patients with advanced-stage (BCLC C) hepatocellular carcinoma (HCC). Hospital records from 2005 to 2011 were retrospectively reviewed. Non-infiltrative lesions were measured at baseline and on follow-up scans after DEB-TACE according to various common radiologic response criteria, including guidelines of the World Health Organization (WHO), Response Evaluation Criteria in Solid Tumors (RECIST), the European Association for the Study of the Liver (EASL), and modified RECIST (mRECIST). Statistical analysis was performed to see which, if any, of the response criteria could be used as a predictor of overall survival (OS) or time-to-progression (TTP). 75 patients met inclusion criteria. Median OS and TTP were 22.6 months (95 % CI 11.6-24.8) and 9.8 months (95 % CI 7.1-21.6), respectively. Univariate and multivariate Cox analyses revealed that none of the evaluated criteria had the ability to be used as a predictor for OS or TTP. Analysis of the C index in both univariate and multivariate models showed that the evaluated criteria were not accurate predictors of either OS (C-statistic range: 0.51-0.58 in the univariate model; range: 0.54-0.58 in the multivariate model) or TTP (C-statistic range: 0.55-0.59 in the univariate model; range: 0.57-0.61 in the multivariate model). Current response criteria are not accurate predictors of OS or TTP in patients with advanced-stage HCC after DEB-TACE.
NASA Astrophysics Data System (ADS)
Guillen, George; Rainey, Gail; Morin, Michelle
2004-04-01
Currently, the Minerals Management Service uses the Oil Spill Risk Analysis model (OSRAM) to predict the movement of potential oil spills greater than 1000 bbl originating from offshore oil and gas facilities. OSRAM generates oil spill trajectories using meteorological and hydrological data input from either actual physical measurements or estimates generated from other hydrological models. OSRAM and many other models produce output matrices of average, maximum and minimum contact probabilities to specific landfall or target segments (columns) from oil spills at specific points (rows). Analysts and managers are often interested in identifying geographic areas or groups of facilities that pose similar risks to specific targets or groups of targets if a spill occurred. Unfortunately, due to the potentially large matrix generated by many spill models, this question is difficult to answer without the use of data reduction and visualization methods. In our study we utilized a multivariate statistical method called cluster analysis to group areas of similar risk based on potential distribution of landfall target trajectory probabilities. We also utilized ArcView™ GIS to display spill launch point groupings. The combination of GIS and multivariate statistical techniques in the post-processing of trajectory model output is a powerful tool for identifying and delineating areas of similar risk from multiple spill sources. We strongly encourage modelers, statistical and GIS software programmers to closely collaborate to produce a more seamless integration of these technologies and approaches to analyzing data. They are complimentary methods that strengthen the overall assessment of spill risks.
Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang
2010-07-01
We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root- n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided.
Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang
2013-01-01
We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root-n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided. PMID:24790286
. Another project used multivariate statistics to develop a novel device to non-invasively measure hydrogen Cellulosic Ethanol Production due to Experimental Measurement Uncertainty," Biotechnology for Biofuels
Nursing Homes Appeals of Deficiency Citations: The Informal Dispute Resolution Process
Mukamel, Dana B.; Weimer, David L.; Li, Yue; Bailey, Lauren; Spector, William D.; Harrington, Charlene
2012-01-01
Objective Nursing homes found to be not meeting quality standards are cited for deficiencies. Before 1995, their only recourse was a formal appeal process, which is lengthy and costly. In 1995, the Centers for Medicare & Medicaid Services (CMS) instituted the Informal Dispute Resolution (IDR) process. This study presents for the first time national statistics about the IDR process and an analysis of the factors that influence nursing homes’ decisions to request an IDR. Design Retrospective study including descriptive statistics and multivariate logistic hierarchical models. Setting U.S. nursing homes in 2005 to 2008. Participant 15,916 Medicaid and Medicare certified nursing homes nationally, with 94,188 surveys and 9,388 IDRs. Measures The unit of observation was an annual survey or a complaint survey that generated at least one deficiency. The dependent variable was dichotomous and indicated whether the annual or a complaint survey triggered an IDR request. Independent variables included characteristics of the nursing home, the deficiency, the market, and the state regulatory environment. Results Ten percent of all annual surveys and complaint surveys resulted in IDRs. There was substantial variation across states, which persisted over time. Multivariate results suggest that nursing homes’ decisions to request an IDR depend on their assessment of the probability of success and assessment of the benefits of the submission. Conclusions Nursing homes avail themselves of the IDR process. Their propensity to do so depends on a number of factors, including the state regulatory system and the market environment in which they operate. PMID:22402171
2014-09-01
approaches. Ecological Modelling Volume 200, Issues 1–2, 10, pp 1–19. Buhlmann, Kurt A ., Thomas S.B. Akre , John B. Iverson, Deno Karapatakis, Russell A ...statistical multivariate analysis to define the current and projected future range probability for species of interest to Army land managers. A software...15 Figure 4. RCW omission rate and predicted area as a function of the cumulative threshold
Deterministic annealing for density estimation by multivariate normal mixtures
NASA Astrophysics Data System (ADS)
Kloppenburg, Martin; Tavan, Paul
1997-03-01
An approach to maximum-likelihood density estimation by mixtures of multivariate normal distributions for large high-dimensional data sets is presented. Conventionally that problem is tackled by notoriously unstable expectation-maximization (EM) algorithms. We remove these instabilities by the introduction of soft constraints, enabling deterministic annealing. Our developments are motivated by the proof that algorithmically stable fuzzy clustering methods that are derived from statistical physics analogs are special cases of EM procedures.
A Note on Asymptotic Joint Distribution of the Eigenvalues of a Noncentral Multivariate F Matrix.
1984-11-01
Krishnaiah (1982). Now, let us consider the samples drawn from the k multivariate normal popuiejons. Let (Xlt....Xpt) denote the mean vector of the t...to maltivariate problems. Sankh-ya, 4, 381-39(s. (71 KRISHNAIAH , P. R. (1982). Selection of variables in discrimlnant analysis. In Handbook of...Statistics, Volume 2 (P. R. Krishnaiah , editor), 805-820. North-Holland Publishing Company. 6. Unclassifie INSTRUCTIONS REPORT DOCUMENTATION PAGE
ERIC Educational Resources Information Center
SAW, J.G.
THIS PAPER DEALS WITH SOME TESTS OF HYPOTHESIS FREQUENTLY ENCOUNTERED IN THE ANALYSIS OF MULTIVARIATE DATA. THE TYPE OF HYPOTHESIS CONSIDERED IS THAT WHICH THE STATISTICIAN CAN ANSWER IN THE NEGATIVE OR AFFIRMATIVE. THE DOOLITTLE METHOD MAKES IT POSSIBLE TO EVALUATE THE DETERMINANT OF A MATRIX OF HIGH ORDER, TO SOLVE A MATRIX EQUATION, OR TO…
1983-06-16
has been advocated by Gnanadesikan and ilk (1969), and others in the literature. This suggests that, if we use the formal signficance test type...American Statistical Asso., 62, 1159-1178. Gnanadesikan , R., and Wilk, M..B. (1969). Data Analytic Methods in Multi- variate Statistical Analysis. In
USDA-ARS?s Scientific Manuscript database
Conventional multivariate statistical methods have been used for decades to calculate environmental indicators. These methods generally work fine if they are used in a situation where the method can be tailored to the data. But there is some skepticism that the methods might fail in the context of s...
A Review of Calibration Transfer Practices and Instrument Differences in Spectroscopy.
Workman, Jerome J
2018-03-01
Calibration transfer for use with spectroscopic instruments, particularly for near-infrared, infrared, and Raman analysis, has been the subject of multiple articles, research papers, book chapters, and technical reviews. There has been a myriad of approaches published and claims made for resolving the problems associated with transferring calibrations; however, the capability of attaining identical results over time from two or more instruments using an identical calibration still eludes technologists. Calibration transfer, in a precise definition, refers to a series of analytical approaches or chemometric techniques used to attempt to apply a single spectral database, and the calibration model developed using that database, for two or more instruments, with statistically retained accuracy and precision. Ideally, one would develop a single calibration for any particular application, and move it indiscriminately across instruments and achieve identical analysis or prediction results. There are many technical aspects involved in such precision calibration transfer, related to the measuring instrument reproducibility and repeatability, the reference chemical values used for the calibration, the multivariate mathematics used for calibration, and sample presentation repeatability and reproducibility. Ideally, a multivariate model developed on a single instrument would provide a statistically identical analysis when used on other instruments following transfer. This paper reviews common calibration transfer techniques, mostly related to instrument differences, and the mathematics of the uncertainty between instruments when making spectroscopic measurements of identical samples. It does not specifically address calibration maintenance or reference laboratory differences.
Zhu, Yu-Min; Zhang, Hua; Fan, Shi-Suo; Wang, Si-Jia; Xia, Yi; Shao, Li-Ming; He, Pin-Jing
2014-07-15
Due to the heterogeneity of metal distribution, it is challenging to identify the speciation, source and fate of metals in solid samples at micro scales. To overcome these challenges single particles of air pollution control residues were detected in situ by synchrotron microprobe after each step of chemical extraction and analyzed by multivariate statistical analysis. Results showed that Pb, Cu and Zn co-existed as acid soluble fractions during chemical extraction, regardless of their individual distribution as chlorides or oxides in the raw particles. Besides the forms of Fe2O3, MnO2 and FeCr2O4, Fe, Mn, Cr and Ni were closely associated with each other, mainly as reducible fractions. In addition, the two groups of metals had interrelations with the Si-containing insoluble matrix. The binding could not be directly detected by micro-X-ray diffraction (μ-XRD) and XRD, suggesting their partial existence as amorphous forms or in the solid solution. The combined method on single particles can effectively determine metallic multi-associations and various extraction behaviors that could not be identified by XRD, μ-XRD or X-ray absorption spectroscopy. The results are useful for further source identification and migration tracing of heavy metals. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhou, Chao; Yin, Kunlong; Cao, Ying; Ahmed, Bayes; Li, Yuanyao; Catani, Filippo; Pourghasemi, Hamid Reza
2018-03-01
Landslide is a common natural hazard and responsible for extensive damage and losses in mountainous areas. In this study, Longju in the Three Gorges Reservoir area in China was taken as a case study for landslide susceptibility assessment in order to develop effective risk prevention and mitigation strategies. To begin, 202 landslides were identified, including 95 colluvial landslides and 107 rockfalls. Twelve landslide causal factor maps were prepared initially, and the relationship between these factors and each landslide type was analyzed using the information value model. Later, the unimportant factors were selected and eliminated using the information gain ratio technique. The landslide locations were randomly divided into two groups: 70% for training and 30% for verifying. Two machine learning models: the support vector machine (SVM) and artificial neural network (ANN), and a multivariate statistical model: the logistic regression (LR), were applied for landslide susceptibility modeling (LSM) for each type. The LSM index maps, obtained from combining the assessment results of the two landslide types, were classified into five levels. The performance of the LSMs was evaluated using the receiver operating characteristics curve and Friedman test. Results show that the elimination of noise-generating factors and the separated modeling of each landslide type have significantly increased the prediction accuracy. The machine learning models outperformed the multivariate statistical model and SVM model was found ideal for the case study area.
Marković, Snežana; Kerč, Janez; Horvat, Matej
2017-03-01
We are presenting a new approach of identifying sources of variability within a manufacturing process by NIR measurements of samples of intermediate material after each consecutive unit operation (interprocess NIR sampling technique). In addition, we summarize the development of a multivariate statistical process control (MSPC) model for the production of enteric-coated pellet product of the proton-pump inhibitor class. By developing provisional NIR calibration models, the identification of critical process points yields comparable results to the established MSPC modeling procedure. Both approaches are shown to lead to the same conclusion, identifying parameters of extrusion/spheronization and characteristics of lactose that have the greatest influence on the end-product's enteric coating performance. The proposed approach enables quicker and easier identification of variability sources during manufacturing process, especially in cases when historical process data is not straightforwardly available. In the presented case the changes of lactose characteristics are influencing the performance of the extrusion/spheronization process step. The pellet cores produced by using one (considered as less suitable) lactose source were on average larger and more fragile, leading to consequent breakage of the cores during subsequent fluid bed operations. These results were confirmed by additional experimental analyses illuminating the underlying mechanism of fracture of oblong pellets during the pellet coating process leading to compromised film coating.
Clinical validation of robot simulation of toothbrushing - comparative plaque removal efficacy
2014-01-01
Background Clinical validation of laboratory toothbrushing tests has important advantages. It was, therefore, the aim to demonstrate correlation of tooth cleaning efficiency of a new robot brushing simulation technique with clinical plaque removal. Methods Clinical programme: 27 subjects received dental cleaning prior to 3-day-plaque-regrowth-interval. Plaque was stained, photographically documented and scored using planimetrical index. Subjects brushed teeth 33–47 with three techniques (horizontal, rotating, vertical), each for 20s buccally and for 20s orally in 3 consecutive intervals. The force was calibrated, the brushing technique was video supported. Two different brushes were randomly assigned to the subject. Robot programme: Clinical brushing programmes were transfered to a 6-axis-robot. Artificial teeth 33–47 were covered with plaque-simulating substrate. All brushing techniques were repeated 7 times, results were scored according to clinical planimetry. All data underwent statistical analysis by t-test, U-test and multivariate analysis. Results The individual clinical cleaning patterns are well reproduced by the robot programmes. Differences in plaque removal are statistically significant for the two brushes, reproduced in clinical and robot data. Multivariate analysis confirms the higher cleaning efficiency for anterior teeth and for the buccal sites. Conclusions The robot tooth brushing simulation programme showed good correlation with clinically standardized tooth brushing. This new robot brushing simulation programme can be used for rapid, reproducible laboratory testing of tooth cleaning. PMID:24996973
Botsford, Benjamin; Vedana, Gustavo; Cope, Leslie; Yiu, Samuel C; Jun, Albert S
2016-01-01
To compare the effect of 20% sulfur hexafluoride (SF6) with that of air on graft detachment rates for intraocular tamponade in Descemet membrane endothelial keratoplasty (DMEK). Forty-two eyes of patients who underwent DMEK by a single surgeon (A.S.J.) at Wilmer Eye Institute between January 2012 and 2014 were identified; 21 received air for intraocular tamponade and the next consecutive 21 received SF6. The main outcome measure was the graft detachment rate; univariate and multivariate analyses were performed. The graft detachment rate was 67% in the air group and 19% in the SF6 group (p<0.05). No complete graft detachments occurred, and all partial detachments underwent intervention with injection of intraocular air. The percentages of eyes with 20/25 or better vision were not different between the groups (67% vs. 71%). Univariate analysis showed significantly higher detachment rates with air tamponade (OR, 8.50; p<0.005) and larger donor graft size (OR, 14.96; p<0.05). Multivariate analysis with gas but not graft size included showed that gas was an independent statistically significant predictor of outcome (OR, 6.65; p<0.05). When graft size was included as a covariate, gas was no longer a statistically significant predictor of detachment but maintained OR of 7.81 (p=0.063) similar to the results of univariate and multivariate analyses without graft size. In comparison with air, graft detachment rates for intraocular tamponade in DMEK were significantly reduced by 20% SF6.
Fu, Zhibiao; Baker, Daniel; Cheng, Aili; Leighton, Julie; Appelbaum, Edward; Aon, Juan
2016-05-01
The principle of quality by design (QbD) has been widely applied to biopharmaceutical manufacturing processes. Process characterization is an essential step to implement the QbD concept to establish the design space and to define the proven acceptable ranges (PAR) for critical process parameters (CPPs). In this study, we present characterization of a Saccharomyces cerevisiae fermentation process using risk assessment analysis, statistical design of experiments (DoE), and the multivariate Bayesian predictive approach. The critical quality attributes (CQAs) and CPPs were identified with a risk assessment. The statistical model for each attribute was established using the results from the DoE study with consideration given to interactions between CPPs. Both the conventional overlapping contour plot and the multivariate Bayesian predictive approaches were used to establish the region of process operating conditions where all attributes met their specifications simultaneously. The quantitative Bayesian predictive approach was chosen to define the PARs for the CPPs, which apply to the manufacturing control strategy. Experience from the 10,000 L manufacturing scale process validation, including 64 continued process verification batches, indicates that the CPPs remain under a state of control and within the established PARs. The end product quality attributes were within their drug substance specifications. The probability generated with the Bayesian approach was also used as a tool to assess CPP deviations. This approach can be extended to develop other production process characterization and quantify a reliable operating region. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 32:799-812, 2016. © 2016 American Institute of Chemical Engineers.
[Academic performance in first year medical students: an explanatory multivariate model].
Urrutia Aguilar, María Esther; Ortiz León, Silvia; Fouilloux Morales, Claudia; Ponce Rosas, Efrén Raúl; Guevara Guzmán, Rosalinda
2014-12-01
Current education is focused in intellectual, affective, and ethical aspects, thus acknowledging their significance in students´ metacognition. Nowadays, it is known that an adequate and motivating environment together with a positive attitude towards studies is fundamental to induce learning. Medical students are under multiple stressful, academic, personal, and vocational situations. To identify psychosocial, vocational, and academic variables of 2010-2011 first year medical students at UNAM that may help predict their academic performance. Academic surveys of psychological and vocational factors were applied; an academic follow-up was carried out to obtain a multivariate model. The data were analyzed considering descriptive, comparative, correlative, and predictive statistics. The main variables that affect students´ academic performance are related to previous knowledge and to psychological variables. The results show the significance of implementing institutional programs to support students throughout their college adaptation.
Pingault, Jean Baptiste; Côté, Sylvana M.; Petitclerc, Amélie; Vitaro, Frank; Tremblay, Richard E.
2015-01-01
Background Parental educational expectations have been associated with children’s educational attainment in a number of long-term longitudinal studies, but whether this relationship is causal has long been debated. The aims of this prospective study were twofold: 1) test whether low maternal educational expectations contributed to failure to graduate from high school; and 2) compare the results obtained using different strategies for accounting for confounding variables (i.e. multivariate regression and propensity score matching). Methodology/Principal Findings The study sample included 1,279 participants from the Quebec Longitudinal Study of Kindergarten Children. Maternal educational expectations were assessed when the participants were aged 12 years. High school graduation – measuring educational attainment – was determined through the Quebec Ministry of Education when the participants were aged 22–23 years. Findings show that when using the most common statistical approach (i.e. multivariate regressions to adjust for a restricted set of potential confounders) the contribution of low maternal educational expectations to failure to graduate from high school was statistically significant. However, when using propensity score matching, the contribution of maternal expectations was reduced and remained statistically significant only for males. Conclusions/Significance The results of this study are consistent with the possibility that the contribution of parental expectations to educational attainment is overestimated in the available literature. This may be explained by the use of a restricted range of potential confounding variables as well as the dearth of studies using appropriate statistical techniques and study designs in order to minimize confounding. Each of these techniques and designs, including propensity score matching, has its strengths and limitations: A more comprehensive understanding of the causal role of parental expectations will stem from a convergence of findings from studies using different techniques and designs. PMID:25803867
Konukoglu, Ender; Coutu, Jean-Philippe; Salat, David H; Fischl, Bruce
2016-07-01
Diffusion magnetic resonance imaging (dMRI) is a unique technology that allows the noninvasive quantification of microstructural tissue properties of the human brain in healthy subjects as well as the probing of disease-induced variations. Population studies of dMRI data have been essential in identifying pathological structural changes in various conditions, such as Alzheimer's and Huntington's diseases (Salat et al., 2010; Rosas et al., 2006). The most common form of dMRI involves fitting a tensor to the underlying imaging data (known as diffusion tensor imaging, or DTI), then deriving parametric maps, each quantifying a different aspect of the underlying microstructure, e.g. fractional anisotropy and mean diffusivity. To date, the statistical methods utilized in most DTI population studies either analyzed only one such map or analyzed several of them, each in isolation. However, it is most likely that variations in the microstructure due to pathology or normal variability would affect several parameters simultaneously, with differing variations modulating the various parameters to differing degrees. Therefore, joint analysis of the available diffusion maps can be more powerful in characterizing histopathology and distinguishing between conditions than the widely used univariate analysis. In this article, we propose a multivariate approach for statistical analysis of diffusion parameters that uses partial least squares correlation (PLSC) analysis and permutation testing as building blocks in a voxel-wise fashion. Stemming from the common formulation, we present three different multivariate procedures for group analysis, regressing-out nuisance parameters and comparing effects of different conditions. We used the proposed procedures to study the effects of non-demented aging, Alzheimer's disease and mild cognitive impairment on the white matter. Here, we present results demonstrating that the proposed PLSC-based approach can differentiate between effects of different conditions in the same region as well as uncover spatial variations of effects across the white matter. The proposed procedures were able to answer questions on structural variations such as: "are there regions in the white matter where Alzheimer's disease has a different effect than aging or similar effect as aging?" and "are there regions in the white matter that are affected by both mild cognitive impairment and Alzheimer's disease but with differing multivariate effects?" Copyright © 2016 Elsevier Inc. All rights reserved.
Konukoglu, Ender; Coutu, Jean-Philippe; Salat, David H.; Fischl, Bruce
2016-01-01
Diffusion magnetic resonance imaging (dMRI) is a unique technology that allows the noninvasive quantification of microstructural tissue properties of the human brain in healthy subjects as well as the probing of disease-induced variations. Population studies of dMRI data have been essential in identifying pathological structural changes in various conditions, such as Alzheimer’s and Huntington’s diseases1,2. The most common form of dMRI involves fitting a tensor to the underlying imaging data (known as Diffusion Tensor Imaging, or DTI), then deriving parametric maps, each quantifying a different aspect of the underlying microstructure, e.g. fractional anisotropy and mean diffusivity. To date, the statistical methods utilized in most DTI population studies either analyzed only one such map or analyzed several of them, each in isolation. However, it is most likely that variations in the microstructure due to pathology or normal variability would affect several parameters simultaneously, with differing variations modulating the various parameters to differing degrees. Therefore, joint analysis of the available diffusion maps can be more powerful in characterizing histopathology and distinguishing between conditions than the widely used univariate analysis. In this article, we propose a multivariate approach for statistical analysis of diffusion parameters that uses partial least squares correlation (PLSC) analysis and permutation testing as building blocks in a voxel-wise fashion. Stemming from the common formulation, we present three different multivariate procedures for group analysis, regressing-out nuisance parameters and comparing effects of different conditions. We used the proposed procedures to study the effects of non-demented aging, Alzheimer’s disease and mild cognitive impairment on the white matter. Here, we present results demonstrating that the proposed PLSC-based approach can differentiate between effects of different conditions in the same region as well as uncover spatial variations of effects across the white matter. The proposed procedures were able to answer questions on structural variations such as: “are there regions in the white matter where Alzheimer’s disease has a different effect than aging or similar effect as aging?” and “are there regions in the white matter that are affected by both mild cognitive impairment and Alzheimer’s disease but with differing multivariate effects?” PMID:27103138
Effect of sexual steroids on boar kinematic sperm subpopulations.
Ayala, E M E; Aragón, M A
2017-11-01
Here, we show the effects of sexual steroids, progesterone, testosterone, or estradiol on motility parameters of boar sperm. Sixteen commercial seminal doses, four each of four adult boars, were analyzed using computer assisted sperm analysis (CASA). Mean values of motility parameters were analyzed by bivariate and multivariate statistics. Principal component analysis (PCA), followed by hierarchical clustering, was applied on data of motility parameters, provided automatically as intervals by the CASA system. Effects of sexual steroids were described in the kinematic subpopulations identified from multivariate statistics. Mean values of motility parameters were not significantly changed after addition of sexual steroids. Multivariate graphics showed that sperm subpopulations were not sensitive to the addition of either testosterone or estradiol, but sperm subpopulations responsive to progesterone were found. Distribution of motility parameters were wide in controls but sharpened at distinct concentrations of progesterone. We conclude that kinematic sperm subpopulations responsive to progesterone are present in boar semen, and these subpopulations are masked in evaluations of mean values of motility parameters. © 2017 International Society for Advancement of Cytometry. © 2017 International Society for Advancement of Cytometry.
Heidema, A Geert; Thissen, Uwe; Boer, Jolanda M A; Bouwman, Freek G; Feskens, Edith J M; Mariman, Edwin C M
2009-06-01
In this study, we applied the multivariate statistical tool Partial Least Squares (PLS) to analyze the relative importance of 83 plasma proteins in relation to coronary heart disease (CHD) mortality and the intermediate end points body mass index, HDL-cholesterol and total cholesterol. From a Dutch monitoring project for cardiovascular disease risk factors, men who died of CHD between initial participation (1987-1991) and end of follow-up (January 1, 2000) (N = 44) and matched controls (N = 44) were selected. Baseline plasma concentrations of proteins were measured by a multiplex immunoassay. With the use of PLS, we identified 15 proteins with prognostic value for CHD mortality and sets of proteins associated with the intermediate end points. Subsequently, sets of proteins and intermediate end points were analyzed together by Principal Components Analysis, indicating that proteins involved in inflammation explained most of the variance, followed by proteins involved in metabolism and proteins associated with total-C. This study is one of the first in which the association of a large number of plasma proteins with CHD mortality and intermediate end points is investigated by applying multivariate statistics, providing insight in the relationships among proteins, intermediate end points and CHD mortality, and a set of proteins with prognostic value.
Belianinov, Alex; Panchapakesan, G.; Lin, Wenzhi; ...
2014-12-02
Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1 x Sex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signaturemore » and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Belianinov, Alex, E-mail: belianinova@ornl.gov; Ganesh, Panchapakesan; Lin, Wenzhi
2014-12-01
Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe{sub 0.55}Se{sub 0.45} (T{sub c} = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe{sub 1−x}Se{sub x} structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified bymore » their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less
Statistical Analyses of Raw Material Data for MTM45-1/CF7442A-36% RW: CMH Cure Cycle
NASA Technical Reports Server (NTRS)
Coroneos, Rula; Pai, Shantaram, S.; Murthy, Pappu
2013-01-01
This report describes statistical characterization of physical properties of the composite material system MTM45-1/CF7442A, which has been tested and is currently being considered for use on spacecraft structures. This composite system is made of 6K plain weave graphite fibers in a highly toughened resin system. This report summarizes the distribution types and statistical details of the tests and the conditions for the experimental data generated. These distributions will be used in multivariate regression analyses to help determine material and design allowables for similar material systems and to establish a procedure for other material systems. Additionally, these distributions will be used in future probabilistic analyses of spacecraft structures. The specific properties that are characterized are the ultimate strength, modulus, and Poisson??s ratio by using a commercially available statistical package. Results are displayed using graphical and semigraphical methods and are included in the accompanying appendixes.
Zamani, Abbas Ali; Yaftian, Mohammad Reza; Parizanganeh, Abdolhossein
2012-12-17
The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran) for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP). Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs). Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area.
2012-01-01
The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran) for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP). Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs). Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area. PMID:23369182
Orthotopic bladder substitution in men revisited: identification of continence predictors.
Koraitim, M M; Atta, M A; Foda, M K
2006-11-01
We determined the impact of the functional characteristics of the neobladder and urethral sphincter on continence results, and determined the most significant predictors of continence. A total of 88 male patients 29 to 70 years old underwent orthotopic bladder substitution with tubularized ileocecal segment (40) and detubularized sigmoid (25) or ileum (23). Uroflowmetry, cystometry and urethral pressure profilometry were performed at 13 to 36 months (mean 19) postoperatively. The correlation between urinary continence and 28 urodynamic variables was assessed. Parameters that correlated significantly with continence were entered into a multivariate analysis using a logistic regression model to determine the most significant predictors of continence. Maximum urethral closure pressure was the only parameter that showed a statistically significant correlation with diurnal continence. Nocturnal continence had not only a statistically significant positive correlation with maximum urethral closure pressure, but also statistically significant negative correlations with maximum contraction amplitude, and baseline pressure at mid and maximum capacity. Three of these 4 parameters, including maximum urethral closure pressure, maximum contraction amplitude and baseline pressure at mid capacity, proved to be significant predictors of continence on multivariate analysis. While daytime continence is determined by maximum urethral closure pressure, during the night it is the net result of 2 forces that have about equal influence but in opposite directions, that is maximum urethral closure pressure vs maximum contraction amplitude plus baseline pressure at mid capacity. Two equations were derived from the logistic regression model to predict the probability of continence after orthotopic bladder substitution, including Z1 (diurnal) = 0.605 + 0.0085 maximum urethral closure pressure and Z2 (nocturnal) = 0.841 + 0.01 [maximum urethral closure pressure - (maximum contraction amplitude + baseline pressure at mid capacity)].
Péron, Julien; Pond, Gregory R; Gan, Hui K; Chen, Eric X; Almufti, Roula; Maillet, Denis; You, Benoit
2012-07-03
The Consolidated Standards of Reporting Trials (CONSORT) guidelines were developed in the mid-1990s for the explicit purpose of improving clinical trial reporting. However, there is little information regarding the adherence to CONSORT guidelines of recent publications of randomized controlled trials (RCTs) in oncology. All phase III RCTs published between 2005 and 2009 were reviewed using an 18-point overall quality score for reporting based on the 2001 CONSORT statement. Multivariable linear regression was used to identify features associated with improved reporting quality. To provide baseline data for future evaluations of reporting quality, RCTs were also assessed according to the 2010 revised CONSORT statement. All statistical tests were two-sided. A total of 357 RCTs were reviewed. The mean 2001 overall quality score was 13.4 on a scale of 0-18, whereas the mean 2010 overall quality score was 19.3 on a scale of 0-27. The overall RCT reporting quality score improved by 0.21 points per year from 2005 to 2009. Poorly reported items included method used to generate the random allocation (adequately reported in 29% of trials), whether and how blinding was applied (41%), method of allocation concealment (51%), and participant flow (59%). High impact factor (IF, P = .003), recent publication date (P = .008), and geographic origin of RCTs (P = .003) were independent factors statistically significantly associated with higher reporting quality in a multivariable regression model. Sample size, tumor type, and positivity of trial results were not associated with higher reporting quality, whereas funding source and treatment type had a borderline statistically significant impact. The results show that numerous items remained unreported for many trials. Thus, given the potential impact of poorly reported trials, oncology journals should require even stricter adherence to the CONSORT guidelines.
Lepore, Natasha; Brun, Caroline A; Chiang, Ming-Chang; Chou, Yi-Yu; Dutton, Rebecca A; Hayashi, Kiralee M; Lopez, Oscar L; Aizenstein, Howard J; Toga, Arthur W; Becker, James T; Thompson, Paul M
2006-01-01
Tensor-based morphometry (TBM) is widely used in computational anatomy as a means to understand shape variation between structural brain images. A 3D nonlinear registration technique is typically used to align all brain images to a common neuroanatomical template, and the deformation fields are analyzed statistically to identify group differences in anatomy. However, the differences are usually computed solely from the determinants of the Jacobian matrices that are associated with the deformation fields computed by the registration procedure. Thus, much of the information contained within those matrices gets thrown out in the process. Only the magnitude of the expansions or contractions is examined, while the anisotropy and directional components of the changes are ignored. Here we remedy this problem by computing multivariate shape change statistics using the strain matrices. As the latter do not form a vector space, means and covariances are computed on the manifold of positive-definite matrices to which they belong. We study the brain morphology of 26 HIV/AIDS patients and 14 matched healthy control subjects using our method. The images are registered using a high-dimensional 3D fluid registration algorithm, which optimizes the Jensen-Rényi divergence, an information-theoretic measure of image correspondence. The anisotropy of the deformation is then computed. We apply a manifold version of Hotelling's T2 test to the strain matrices. Our results complement those found from the determinants of the Jacobians alone and provide greater power in detecting group differences in brain structure.
Canine diabetes mellitus risk factors: A matched case-control study.
Pöppl, Alan Gomes; de Carvalho, Guilherme Luiz Carvalho; Vivian, Itatiele Farias; Corbellini, Luis Gustavo; González, Félix Hilário Díaz
2017-10-01
Different subtypes of canine diabetes mellitus (CDM) have been described based on their aetiopathogenesis. Therefore, manifold risk factors may be involved in CDM development. This study aims to investigate canine diabetes mellitus risk factors. Owners of 110 diabetic dogs and 136 healthy controls matched by breed, sex, and age were interviewed concerning aspects related to diet, weight, physical activity, oral health, reproductive history, pancreatitis, and exposure to exogenous glucocorticoids. Two multivariate multivariable statistical models were created: The UMod included males and females without variables related to oestrous cycle, while the FMod included only females with all analysed variables. In the UMod, "Not exclusively commercial diet" (OR 4.86, 95%CI 2.2-10.7, P<0.001) and "Overweight" (OR 3.51, 95%CI 1.6-7.5, P=0.001) were statistically significant, while in the FMod, "Not exclusively commercial diet" (OR 4.14, 95%CI 1.3-12.7, P=0.01), "Table scraps abuse" (OR 3.62, 95%CI 1.1-12.2, P=0.03), "Overweight" (OR 3.91, 95%CI 1.2-12.6, P=0.02), and "Dioestrus" (OR 5.53, 95%CI 1.9-16.3, P=0.002) were statistically significant. The findings in this study support feeding not exclusively balanced commercial dog food, overweight, treats abuse, and diestrus, as main CDM risk factors. Moreover, those results give subside for preventive care studies against CDM development. Copyright © 2017 Elsevier Ltd. All rights reserved.
Maric, Mark; Harvey, Lauren; Tomcsak, Maren; Solano, Angelique; Bridge, Candice
2017-06-30
In comparison to other violent crimes, sexual assaults suffer from very low prosecution and conviction rates especially in the absence of DNA evidence. As a result, the forensic community needs to utilize other forms of trace contact evidence, like lubricant evidence, in order to provide a link between the victim and the assailant. In this study, 90 personal bottled and condom lubricants from the three main marketing types, silicone-based, water-based and condoms, were characterized by direct analysis in real time time of flight mass spectrometry (DART-TOFMS). The instrumental data was analyzed by multivariate statistics including hierarchal cluster analysis, principal component analysis, and linear discriminant analysis. By interpreting the mass spectral data with multivariate statistics, 12 discrete groupings were identified, indicating inherent chemical diversity not only between but within the three main marketing groups. A number of unique chemical markers, both major and minor, were identified, other than the three main chemical components (i.e. PEG, PDMS and nonoxynol-9) currently used for lubricant classification. The data was validated by a stratified 20% withheld cross-validation which demonstrated that there was minimal overlap between the groupings. Based on the groupings identified and unique features of each group, a highly discriminating statistical model was then developed that aims to provide the foundation for the development of a forensic lubricant database that may eventually be applied to casework. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Merello, Paloma; García-Diego, Fernando-Juan; Zarzo, Manuel
2014-08-01
Chemometrics has been applied successfully since the 1990s for the multivariate statistical control of industrial processes. A new area of interest for these tools is the microclimatic monitoring of cultural heritage. Sensors record climatic parameters over time and statistical data analysis is performed to obtain valuable information for preventive conservation. A case study of an open-air archaeological site is presented here. A set of 26 temperature and relative humidity data-loggers was installed in four rooms of Ariadne's house (Pompeii). If climatic values are recorded versus time at different positions, the resulting data structure is equivalent to records of physical parameters registered at several points of a continuous chemical process. However, there is an important difference in this case: continuous processes are controlled to reach a steady state, whilst open-air sites undergo tremendous fluctuations. Although data from continuous processes are usually column-centred prior to applying principal components analysis, it turned out that another pre-treatment (row-centred data) was more convenient for the interpretation of components and to identify abnormal patterns. The detection of typical trajectories was more straightforward by dividing the whole monitored period into several sub-periods, because the marked climatic fluctuations throughout the year affect the correlation structures. The proposed statistical methodology is of interest for the microclimatic monitoring of cultural heritage, particularly in the case of open-air or semi-confined archaeological sites. Copyright © 2014 Elsevier B.V. All rights reserved.
Haller, Sven; Lovblad, Karl-Olof; Giannakopoulos, Panteleimon; Van De Ville, Dimitri
2014-05-01
Many diseases are associated with systematic modifications in brain morphometry and function. These alterations may be subtle, in particular at early stages of the disease progress, and thus not evident by visual inspection alone. Group-level statistical comparisons have dominated neuroimaging studies for many years, proving fascinating insight into brain regions involved in various diseases. However, such group-level results do not warrant diagnostic value for individual patients. Recently, pattern recognition approaches have led to a fundamental shift in paradigm, bringing multivariate analysis and predictive results, notably for the early diagnosis of individual patients. We review the state-of-the-art fundamentals of pattern recognition including feature selection, cross-validation and classification techniques, as well as limitations including inter-individual variation in normal brain anatomy and neurocognitive reserve. We conclude with the discussion of future trends including multi-modal pattern recognition, multi-center approaches with data-sharing and cloud-computing.
Multi-frequency complex network from time series for uncovering oil-water flow structure.
Gao, Zhong-Ke; Yang, Yu-Xuan; Fang, Peng-Cheng; Jin, Ning-De; Xia, Cheng-Yi; Hu, Li-Dan
2015-02-04
Uncovering complex oil-water flow structure represents a challenge in diverse scientific disciplines. This challenge stimulates us to develop a new distributed conductance sensor for measuring local flow signals at different positions and then propose a novel approach based on multi-frequency complex network to uncover the flow structures from experimental multivariate measurements. In particular, based on the Fast Fourier transform, we demonstrate how to derive multi-frequency complex network from multivariate time series. We construct complex networks at different frequencies and then detect community structures. Our results indicate that the community structures faithfully represent the structural features of oil-water flow patterns. Furthermore, we investigate the network statistic at different frequencies for each derived network and find that the frequency clustering coefficient enables to uncover the evolution of flow patterns and yield deep insights into the formation of flow structures. Current results present a first step towards a network visualization of complex flow patterns from a community structure perspective.
Testing for significance of phase synchronisation dynamics in the EEG.
Daly, Ian; Sweeney-Reed, Catherine M; Nasuto, Slawomir J
2013-06-01
A number of tests exist to check for statistical significance of phase synchronisation within the Electroencephalogram (EEG); however, the majority suffer from a lack of generality and applicability. They may also fail to account for temporal dynamics in the phase synchronisation, regarding synchronisation as a constant state instead of a dynamical process. Therefore, a novel test is developed for identifying the statistical significance of phase synchronisation based upon a combination of work characterising temporal dynamics of multivariate time-series and Markov modelling. We show how this method is better able to assess the significance of phase synchronisation than a range of commonly used significance tests. We also show how the method may be applied to identify and classify significantly different phase synchronisation dynamics in both univariate and multivariate datasets.
Yang, James J; Li, Jia; Williams, L Keoki; Buu, Anne
2016-01-05
In genome-wide association studies (GWAS) for complex diseases, the association between a SNP and each phenotype is usually weak. Combining multiple related phenotypic traits can increase the power of gene search and thus is a practically important area that requires methodology work. This study provides a comprehensive review of existing methods for conducting GWAS on complex diseases with multiple phenotypes including the multivariate analysis of variance (MANOVA), the principal component analysis (PCA), the generalizing estimating equations (GEE), the trait-based association test involving the extended Simes procedure (TATES), and the classical Fisher combination test. We propose a new method that relaxes the unrealistic independence assumption of the classical Fisher combination test and is computationally efficient. To demonstrate applications of the proposed method, we also present the results of statistical analysis on the Study of Addiction: Genetics and Environment (SAGE) data. Our simulation study shows that the proposed method has higher power than existing methods while controlling for the type I error rate. The GEE and the classical Fisher combination test, on the other hand, do not control the type I error rate and thus are not recommended. In general, the power of the competing methods decreases as the correlation between phenotypes increases. All the methods tend to have lower power when the multivariate phenotypes come from long tailed distributions. The real data analysis also demonstrates that the proposed method allows us to compare the marginal results with the multivariate results and specify which SNPs are specific to a particular phenotype or contribute to the common construct. The proposed method outperforms existing methods in most settings and also has great applications in GWAS on complex diseases with multiple phenotypes such as the substance abuse disorders.
Multivariable Parametric Cost Model for Ground Optical Telescope Assembly
NASA Technical Reports Server (NTRS)
Stahl, H. Philip; Rowell, Ginger Holmes; Reese, Gayle; Byberg, Alicia
2005-01-01
A parametric cost model for ground-based telescopes is developed using multivariable statistical analysis of both engineering and performance parameters. While diameter continues to be the dominant cost driver, diffraction-limited wavelength is found to be a secondary driver. Other parameters such as radius of curvature are examined. The model includes an explicit factor for primary mirror segmentation and/or duplication (i.e., multi-telescope phased-array systems). Additionally, single variable models Based on aperture diameter are derived.
On measures of association among genetic variables
Gianola, Daniel; Manfredi, Eduardo; Simianer, Henner
2012-01-01
Summary Systems involving many variables are important in population and quantitative genetics, for example, in multi-trait prediction of breeding values and in exploration of multi-locus associations. We studied departures of the joint distribution of sets of genetic variables from independence. New measures of association based on notions of statistical distance between distributions are presented. These are more general than correlations, which are pairwise measures, and lack a clear interpretation beyond the bivariate normal distribution. Our measures are based on logarithmic (Kullback-Leibler) and on relative ‘distances’ between distributions. Indexes of association are developed and illustrated for quantitative genetics settings in which the joint distribution of the variables is either multivariate normal or multivariate-t, and we show how the indexes can be used to study linkage disequilibrium in a two-locus system with multiple alleles and present applications to systems of correlated beta distributions. Two multivariate beta and multivariate beta-binomial processes are examined, and new distributions are introduced: the GMS-Sarmanov multivariate beta and its beta-binomial counterpart. PMID:22742500
Implementation of the Iterative Proportion Fitting Algorithm for Geostatistical Facies Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li Yupeng, E-mail: yupeng@ualberta.ca; Deutsch, Clayton V.
2012-06-15
In geostatistics, most stochastic algorithm for simulation of categorical variables such as facies or rock types require a conditional probability distribution. The multivariate probability distribution of all the grouped locations including the unsampled location permits calculation of the conditional probability directly based on its definition. In this article, the iterative proportion fitting (IPF) algorithm is implemented to infer this multivariate probability. Using the IPF algorithm, the multivariate probability is obtained by iterative modification to an initial estimated multivariate probability using lower order bivariate probabilities as constraints. The imposed bivariate marginal probabilities are inferred from profiles along drill holes or wells.more » In the IPF process, a sparse matrix is used to calculate the marginal probabilities from the multivariate probability, which makes the iterative fitting more tractable and practical. This algorithm can be extended to higher order marginal probability constraints as used in multiple point statistics. The theoretical framework is developed and illustrated with estimation and simulation example.« less
Kyle J. Haynes; Andrew M. Liebhold; Ottar N. Bjørnstad; Andrew J. Allstadt; Randall S. Morin
2018-01-01
Evaluating the causes of spatial synchrony in population dynamics in nature is notoriously difficult due to a lack of data and appropriate statistical methods. Here, we use a recently developed method, a multivariate extension of the local indicators of spatial autocorrelation statistic, to map geographic variation in the synchrony of gypsy moth outbreaks. Regression...
Maserejian, Nancy N.; Trachtenberg, Felicia L.; Hauser, Russ; McKinlay, Sonja; Shrader, Peter; Bellinger, David C.
2012-01-01
Background Resin-based dental restorations may intra-orally release their components and bisphenol A. Gestational bisphenol A exposure has been associated with poorer executive functioning in children. Objectives To examine whether exposure to resin-based composite restorations is associated with neuropsychological development in children. Methods Secondary analysis of treatment level data from the New England Children’s Amalgam Trial, a 2-group randomized safety trial conducted from 1997–2006. Children (N=534) aged 6–10 y with >2 posterior tooth caries were randomized to treatment with amalgam or resin-based composites (bisphenol-A-diglycidyl-dimethacrylate-composite for permanent teeth; urethane dimethacrylate-based polyacid-modified compomer for primary teeth). Neuropsychological function at 4- and 5-year follow-up (N=444) was measured by a battery of tests of executive function, intelligence, memory, visual-spatial skills, verbal fluency, and problem-solving. Multivariable generalized linear regression models were used to examine the association between composite exposure levels and changes in neuropsychological test scores from baseline to follow-up. For comparison, data on children randomized to amalgam treatment were similarly analyzed. Results With greater exposure to either dental composite material, results were generally consistent in the direction of slightly poorer changes in tests of intelligence, achievement or memory, but there were no statistically significant associations. For the four primary measures of executive function, scores were slightly worse with greater total composite exposure, but statistically significant only for the test of Letter Fluency (10-surface-years β= −0.8, SE=0.4, P=0.035), and the subtest of color naming (β= −1.5, SE=0.5, P=0.004) in the Stroop Color-Word Interference Test. Multivariate analysis of variance confirmed that the negative associations between composite level and executive function were not statistically significant (MANOVA P=0.18). Results for greater amalgam exposure were mostly nonsignificant in the opposite direction of slightly improved scores over follow-up. Conclusions Dental composite restorations had statistically insignificant associations of small magnitude with impairments in neuropsychological test change scores over 4- or 5-years of follow-up in this trial. PMID:22906860
Infrared micro-spectroscopic studies of epithelial cells
Romeo, Melissa; Mohlenhoff, Brian; Jennings, Michael; Diem, Max
2009-01-01
We report results from a study of human and canine mucosal cells, investigated by infrared micro-spectroscopy, and analyzed by methods of multivariate statistics. We demonstrate that the infrared spectra of individual cells are sensitive to the stage of maturation, and that a distinction between healthy and diseased cells will be possible. Since this report is written for an audience not familiar with infrared micro-spectroscopy, a short introduction into this field is presented along with a summary of principal component analysis. PMID:16797481
Fallah, Aria; Weil, Alexander G; Juraschka, Kyle; Ibrahim, George M; Wang, Anthony C; Crevier, Louis; Tseng, Chi-Hong; Kulkarni, Abhaya V; Ragheb, John; Bhatia, Sanjiv
2017-12-01
OBJECTIVE Combined endoscopic third ventriculostomy (ETC) and choroid plexus cauterization (CPC)-ETV/CPC- is being investigated to increase the rate of shunt independence in infants with hydrocephalus. The degree of CPC necessary to achieve improved rates of shunt independence is currently unknown. METHODS Using data from a single-center, retrospective, observational cohort study involving patients who underwent ETV/CPC for treatment of infantile hydrocephalus, comparative statistical analyses were performed to detect a difference in need for subsequent CSF diversion procedure in patients undergoing partial CPC (describes unilateral CPC or bilateral CPC that only extended from the foramen of Monro [FM] to the atrium on one side) or subtotal CPC (describes CPC extending from the FM to the posterior temporal horn bilaterally) using a rigid neuroendoscope. Propensity scores for extent of CPC were calculated using age and etiology. Propensity scores were used to perform 1) case-matching comparisons and 2) Cox multivariable regression, adjusting for propensity score in the unmatched cohort. Cox multivariable regression adjusting for age and etiology, but not propensity score was also performed as a third statistical technique. RESULTS Eighty-four patients who underwent ETV/CPC had sufficient data to be included in the analysis. Subtotal CPC was performed in 58 patients (69%) and partial CPC in 26 (31%). The ETV/CPC success rates at 6 and 12 months, respectively, were 49% and 41% for patients undergoing subtotal CPC and 35% and 31% for those undergoing partial CPC. Cox multivariate regression in a 48-patient cohort case-matched by propensity score demonstrated no added effect of increased extent of CPC on ETV/CPC survival (HR 0.868, 95% CI 0.422-1.789, p = 0.702). Cox multivariate regression including all patients, with adjustment for propensity score, demonstrated no effect of extent of CPC on ETV/CPC survival (HR 0.845, 95% CI 0.462-1.548, p = 0.586). Cox multivariate regression including all patients, with adjustment for age and etiology, but not propensity score, demonstrated no effect of extent of CPC on ETV/CPC survival (HR 0.908, 95% CI 0.495-1.664, p = 0.755). CONCLUSIONS Using multiple comparative statistical analyses, no difference in need for subsequent CSF diversion procedure was detected between patients in this cohort who underwent partial versus subtotal CPC. Further investigation regarding whether there is truly no difference between partial versus subtotal extent of CPC in larger patient populations and whether further gain in CPC success can be achieved with complete CPC is warranted.
[Statistical prediction methods in violence risk assessment and its application].
Liu, Yuan-Yuan; Hu, Jun-Mei; Yang, Min; Li, Xiao-Song
2013-06-01
It is an urgent global problem how to improve the violence risk assessment. As a necessary part of risk assessment, statistical methods have remarkable impacts and effects. In this study, the predicted methods in violence risk assessment from the point of statistics are reviewed. The application of Logistic regression as the sample of multivariate statistical model, decision tree model as the sample of data mining technique, and neural networks model as the sample of artificial intelligence technology are all reviewed. This study provides data in order to contribute the further research of violence risk assessment.
TU-FG-201-05: Varian MPC as a Statistical Process Control Tool
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carver, A; Rowbottom, C
Purpose: Quality assurance in radiotherapy requires the measurement of various machine parameters to ensure they remain within permitted values over time. In Truebeam release 2.0 the Machine Performance Check (MPC) was released allowing beam output and machine axis movements to be assessed in a single test. We aim to evaluate the Varian Machine Performance Check (MPC) as a tool for Statistical Process Control (SPC). Methods: Varian’s MPC tool was used on three Truebeam and one EDGE linac for a period of approximately one year. MPC was commissioned against independent systems. After this period the data were reviewed to determine whethermore » or not the MPC was useful as a process control tool. Analyses on individual tests were analysed using Shewhart control plots, using Matlab for analysis. Principal component analysis was used to determine if a multivariate model was of any benefit in analysing the data. Results: Control charts were found to be useful to detect beam output changes, worn T-nuts and jaw calibration issues. Upper and lower control limits were defined at the 95% level. Multivariate SPC was performed using Principal Component Analysis. We found little evidence of clustering beyond that which might be naively expected such as beam uniformity and beam output. Whilst this makes multivariate analysis of little use it suggests that each test is giving independent information. Conclusion: The variety of independent parameters tested in MPC makes it a sensitive tool for routine machine QA. We have determined that using control charts in our QA programme would rapidly detect changes in machine performance. The use of control charts allows large quantities of tests to be performed on all linacs without visual inspection of all results. The use of control limits alerts users when data are inconsistent with previous measurements before they become out of specification. A. Carver has received a speaker’s honorarium from Varian.« less
Fast classification of hazelnut cultivars through portable infrared spectroscopy and chemometrics
NASA Astrophysics Data System (ADS)
Manfredi, Marcello; Robotti, Elisa; Quasso, Fabio; Mazzucco, Eleonora; Calabrese, Giorgio; Marengo, Emilio
2018-01-01
The authentication and traceability of hazelnuts is very important for both the consumer and the food industry, to safeguard the protected varieties and the food quality. This study investigates the use of a portable FTIR spectrometer coupled to multivariate statistical analysis for the classification of raw hazelnuts. The method discriminates hazelnuts from different origins/cultivars based on differences of the signal intensities of their IR spectra. The multivariate classification methods, namely principal component analysis (PCA) followed by linear discriminant analysis (LDA) and partial least square discriminant analysis (PLS-DA), with or without variable selection, allowed a very good discrimination among the groups, with PLS-DA coupled to variable selection providing the best results. Due to the fast analysis, high sensitivity, simplicity and no sample preparation, the proposed analytical methodology could be successfully used to verify the cultivar of hazelnuts, and the analysis can be performed quickly and directly on site.
Optimal False Discovery Rate Control for Dependent Data
Xie, Jichun; Cai, T. Tony; Maris, John; Li, Hongzhe
2013-01-01
This paper considers the problem of optimal false discovery rate control when the test statistics are dependent. An optimal joint oracle procedure, which minimizes the false non-discovery rate subject to a constraint on the false discovery rate is developed. A data-driven marginal plug-in procedure is then proposed to approximate the optimal joint procedure for multivariate normal data. It is shown that the marginal procedure is asymptotically optimal for multivariate normal data with a short-range dependent covariance structure. Numerical results show that the marginal procedure controls false discovery rate and leads to a smaller false non-discovery rate than several commonly used p-value based false discovery rate controlling methods. The procedure is illustrated by an application to a genome-wide association study of neuroblastoma and it identifies a few more genetic variants that are potentially associated with neuroblastoma than several p-value-based false discovery rate controlling procedures. PMID:23378870
NASA Astrophysics Data System (ADS)
Xu, Qinzeng; Gao, Fei; Xu, Qiang; Yang, Hongsheng
2014-11-01
Fatty acids (FAs) provide energy and also can be used to trace trophic relationships among organisms. Sea cucumber Apostichopus japonicus goes into a state of aestivation during warm summer months. We examined fatty acid profiles in aestivated and non-aestivated A. japonicus using multivariate analyses (PERMANOVA, MDS, ANOSIM, and SIMPER). The results indicate that the fatty acid profiles of aestivated and non-aestivated sea cucumbers differed significantly. The FAs that were produced by bacteria and brown kelp contributed the most to the differences in the fatty acid composition of aestivated and nonaestivated sea cucumbers. Aestivated sea cucumbers may synthesize FAs from heterotrophic bacteria during early aestivation, and long chain FAs such as eicosapentaenoic (EPA) and docosahexaenoic acid (DHA) that produced from intestinal degradation, are digested during deep aestivation. Specific changes in the fatty acid composition of A. japonicus during aestivation needs more detailed study in the future.
NASA Astrophysics Data System (ADS)
Singh, Veena D.; Daharwal, Sanjay J.
2017-01-01
Three multivariate calibration spectrophotometric methods were developed for simultaneous estimation of Paracetamol (PARA), Enalapril maleate (ENM) and Hydrochlorothiazide (HCTZ) in tablet dosage form; namely multi-linear regression calibration (MLRC), trilinear regression calibration method (TLRC) and classical least square (CLS) method. The selectivity of the proposed methods were studied by analyzing the laboratory prepared ternary mixture and successfully applied in their combined dosage form. The proposed methods were validated as per ICH guidelines and good accuracy; precision and specificity were confirmed within the concentration range of 5-35 μg mL- 1, 5-40 μg mL- 1 and 5-40 μg mL- 1of PARA, HCTZ and ENM, respectively. The results were statistically compared with reported HPLC method. Thus, the proposed methods can be effectively useful for the routine quality control analysis of these drugs in commercial tablet dosage form.
Multivariable Parametric Cost Model for Ground Optical: Telescope Assembly
NASA Technical Reports Server (NTRS)
Stahl, H. Philip; Rowell, Ginger Holmes; Reese, Gayle; Byberg, Alicia
2004-01-01
A parametric cost model for ground-based telescopes is developed using multi-variable statistical analysis of both engineering and performance parameters. While diameter continues to be the dominant cost driver, diffraction limited wavelength is found to be a secondary driver. Other parameters such as radius of curvature were examined. The model includes an explicit factor for primary mirror segmentation and/or duplication (i.e. multi-telescope phased-array systems). Additionally, single variable models based on aperture diameter were derived.
Layer-by-Layer Polyelectrolyte Encapsulation of Mycoplasma pneumoniae for Enhanced Raman Detection
Rivera-Betancourt, Omar E.; Sheppard, Edward S.; Krause, Duncan C.; Dluhy, Richard A.
2014-01-01
Mycoplasma pneumoniae is a major cause of respiratory disease in humans and accounts for as much as 20% of all community-acquired pneumonia. Existing mycoplasma diagnosis is primarily limited by the poor success rate at culturing the bacteria from clinical samples. There is a critical need to develop a new platform for mycoplasma detection that has high sensitivity, specificity, and expediency. Here we report the layer-by-layer (LBL) encapsulation of M. pneumoniae cells with Ag nanoparticles in a matrix of the polyelectrolytes poly(allylamine hydrochloride) (PAH) and poly(styrene sulfonate) (PSS). We evaluated nanoparticle encapsulated mycoplasma cells as a platform for the differentiation of M. pneumoniae strains using surface enhanced Raman scattering (SERS) combined with multivariate statistical analysis. Three separate M. pneumoniae strains (M129, FH and II-3) were studied. Scanning electron microscopy and fluorescence imaging showed that the Ag nanoparticles were incorporated between the oppositely charged polyelectrolyte layers. SERS spectra showed that LBL encapsulation provides excellent spectral reproducibility. Multivariate statistical analysis of the Raman spectra differentiated the three M. pneumoniae strains with 97 – 100% specificity and sensitivity, and low (0.1 – 0.4) root mean square error. These results indicated that nanoparticle and polyelectrolyte encapsulation of M. pneumoniae is a potentially powerful platform for rapid and sensitive SERS-based bacterial identification. PMID:25017005
Vigli, Georgia; Philippidis, Angelos; Spyros, Apostolos; Dais, Photis
2003-09-10
A combination of (1)H NMR and (31)P NMR spectroscopy and multivariate statistical analysis was used to classify 192 samples from 13 types of vegetable oils, namely, hazelnut, sunflower, corn, soybean, sesame, walnut, rapeseed, almond, palm, groundnut, safflower, coconut, and virgin olive oils from various regions of Greece. 1,2-Diglycerides, 1,3-diglycerides, the ratio of 1,2-diglycerides to total diglycerides, acidity, iodine value, and fatty acid composition determined upon analysis of the respective (1)H NMR and (31)P NMR spectra were selected as variables to establish a classification/prediction model by employing discriminant analysis. This model, obtained from the training set of 128 samples, resulted in a significant discrimination among the different classes of oils, whereas 100% of correct validated assignments for 64 samples were obtained. Different artificial mixtures of olive-hazelnut, olive-corn, olive-sunflower, and olive-soybean oils were prepared and analyzed by (1)H NMR and (31)P NMR spectroscopy. Subsequent discriminant analysis of the data allowed detection of adulteration as low as 5% w/w, provided that fresh virgin olive oil samples were used, as reflected by their high 1,2-diglycerides to total diglycerides ratio (D > or = 0.90).
Liu, Zechang; Wang, Liping; Liu, Yumei
2018-01-18
Hops impart flavor to beer, with the volatile components characterizing the various hop varieties and qualities. Fingerprinting, especially flavor fingerprinting, is often used to identify 'flavor products' because inconsistencies in the description of flavor may lead to an incorrect definition of beer quality. Compared to flavor fingerprinting, volatile fingerprinting is simpler and easier. We performed volatile fingerprinting using head space-solid phase micro-extraction gas chromatography-mass spectrometry combined with similarity analysis and principal component analysis (PCA) for evaluating and distinguishing between three major Chinese hops. Eighty-four volatiles were identified, which were classified into seven categories. Volatile fingerprinting based on similarity analysis did not yield any obvious result. By contrast, hop varieties and qualities were identified using volatile fingerprinting based on PCA. The potential variables explained the variance in the three hop varieties. In addition, the dendrogram and principal component score plot described the differences and classifications of hops. Volatile fingerprinting plus multivariate statistical analysis can rapidly differentiate between the different varieties and qualities of the three major Chinese hops. Furthermore, this method can be used as a reference in other fields. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.
Huang, Guowen; Lee, Duncan; Scott, E Marian
2018-03-30
The long-term health effects of air pollution are often estimated using a spatio-temporal ecological areal unit study, but this design leads to the following statistical challenges: (1) how to estimate spatially representative pollution concentrations for each areal unit; (2) how to allow for the uncertainty in these estimated concentrations when estimating their health effects; and (3) how to simultaneously estimate the joint effects of multiple correlated pollutants. This article proposes a novel 2-stage Bayesian hierarchical model for addressing these 3 challenges, with inference based on Markov chain Monte Carlo simulation. The first stage is a multivariate spatio-temporal fusion model for predicting areal level average concentrations of multiple pollutants from both monitored and modelled pollution data. The second stage is a spatio-temporal model for estimating the health impact of multiple correlated pollutants simultaneously, which accounts for the uncertainty in the estimated pollution concentrations. The novel methodology is motivated by a new study of the impact of both particulate matter and nitrogen dioxide concentrations on respiratory hospital admissions in Scotland between 2007 and 2011, and the results suggest that both pollutants exhibit substantial and independent health effects. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Yao, Yuchen; Bao, Jie; Skyllas-Kazacos, Maria; Welch, Barry J.; Akhmetov, Sergey
2018-04-01
Individual anode current signals in aluminum reduction cells provide localized cell conditions in the vicinity of each anode, which contain more information than the conventionally measured cell voltage and line current. One common use of this measurement is to identify process faults that can cause significant changes in the anode current signals. While this method is simple and direct, it ignores the interactions between anode currents and other important process variables. This paper presents an approach that applies multivariate statistical analysis techniques to individual anode currents and other process operating data, for the detection and diagnosis of local process abnormalities in aluminum reduction cells. Specifically, since the Hall-Héroult process is time-varying with its process variables dynamically and nonlinearly correlated, dynamic kernel principal component analysis with moving windows is used. The cell is discretized into a number of subsystems, with each subsystem representing one anode and cell conditions in its vicinity. The fault associated with each subsystem is identified based on multivariate statistical control charts. The results show that the proposed approach is able to not only effectively pinpoint the problematic areas in the cell, but also assess the effect of the fault on different parts of the cell.
Gogna, Navdeep; Hamid, Neda; Dorai, Kavita
2015-11-10
Extracts from the Carica papaya L. plant are widely reported to contain metabolites with antibacterial, antioxidant and anticancer activity. This study aims to analyze the metabolic profiles of papaya leaves and seeds in order to gain insights into their phytomedicinal constituents. We performed metabolite fingerprinting using 1D and 2D 1H NMR experiments and used multivariate statistical analysis to identify those plant parts that contain the most concentrations of metabolites of phytomedicinal value. Secondary metabolites such as phenyl propanoids, including flavonoids, were found in greater concentrations in the leaves as compared to the seeds. UPLC-ESI-MS verified the presence of significant metabolites in the papaya extracts suggested by the NMR analysis. Interestingly, the concentration of eleven secondary metabolites namely caffeic, cinnamic, chlorogenic, quinic, coumaric, vanillic, and protocatechuic acids, naringenin, hesperidin, rutin, and kaempferol, were higher in young as compared to old papaya leaves. The results of the NMR analysis were corroborated by estimating the total phenolic and flavonoid content of the extracts. Estimation of antioxidant activity in leaves and seed extracts by DPPH and ABTS in-vitro assays and antioxidant capacity in C2C12 cell line also showed that papaya extracts exhibit high antioxidant activity. Copyright © 2015 Elsevier B.V. All rights reserved.
Additives and salts for dye-sensitized solar cells electrolytes: what is the best choice?
NASA Astrophysics Data System (ADS)
Bella, Federico; Sacco, Adriano; Pugliese, Diego; Laurenti, Marco; Bianco, Stefano
2014-10-01
A multivariate chemometric approach is proposed for the first time for performance optimization of I-/I3- liquid electrolytes for dye-sensitized solar cells (DSSCs). Over the years the system composed by iodide/triiodide redox shuttle dissolved in organic solvent has been enriched with the addition of different specific cations and chemical compounds to improve the photoelectrochemical behavior of the cell. However, usually such additives act favorably with respect to some of the cell parameters and negatively to others. Moreover, the combined action of different compounds often yields contradictory results, and from the literature it is not possible to identify an optimal recipe. We report here a systematic work, based on a multivariate experimental design, to statistically and quantitatively evaluate the effect of different additives on the photovoltaic performances of the device. The effect of cation size in iodine salts, the iodine/iodide ratio in the electrolyte and the effect of type and concentration of additives are mutually evaluated by means of a Design of Experiment (DoE) approach. Through this statistical method, the optimization of the overall parameters is demonstrated with a limited number of experimental trials. A 25% improvement on the photovoltaic conversion efficiency compared with that obtained with a commercial electrolyte is demonstrated.
Lei, Tianli; Chen, Shifeng; Wang, Kai; Zhang, Dandan; Dong, Lin; Lv, Chongning; Wang, Jing; Lu, Jincai
2018-02-01
Bupleuri Radix is a commonly used herb in clinic, and raw and vinegar-baked Bupleuri Radix are both documented in the Pharmacopoeia of People's Republic of China. According to the theories of traditional Chinese medicine, Bupleuri Radix possesses different therapeutic effects before and after processing. However, the chemical mechanism of this processing is still unknown. In this study, ultra-high-performance liquid chromatography with quadruple time-of-flight mass spectrometry coupled with multivariate statistical analysis including principal component analysis and orthogonal partial least square-discriminant analysis was developed to holistically compare the difference between raw and vinegar-baked Bupleuri Radix for the first time. As a result, 50 peaks in raw and processed Bupleuri Radix were detected, respectively, and a total of 49 peak chemical compounds were identified. Saikosaponin a, saikosaponin d, saikosaponin b 3 , saikosaponin e, saikosaponin c, saikosaponin b 2 , saikosaponin b 1 , 4''-O-acetyl-saikosaponin d, hyperoside and 3',4'-dimethoxy quercetin were explored as potential markers of raw and vinegar-baked Bupleuri Radix. This study has been successfully applied for global analysis of raw and vinegar-processed samples. Furthermore, the underlying hepatoprotective mechanism of Bupleuri Radix was predicted, which was related to the changes of chemical profiling. Copyright © 2017 John Wiley & Sons, Ltd.
How to compare cross-lagged associations in a multilevel autoregressive model.
Schuurman, Noémi K; Ferrer, Emilio; de Boer-Sonnenschein, Mieke; Hamaker, Ellen L
2016-06-01
By modeling variables over time it is possible to investigate the Granger-causal cross-lagged associations between variables. By comparing the standardized cross-lagged coefficients, the relative strength of these associations can be evaluated in order to determine important driving forces in the dynamic system. The aim of this study was twofold: first, to illustrate the added value of a multilevel multivariate autoregressive modeling approach for investigating these associations over more traditional techniques; and second, to discuss how the coefficients of the multilevel autoregressive model should be standardized for comparing the strength of the cross-lagged associations. The hierarchical structure of multilevel multivariate autoregressive models complicates standardization, because subject-based statistics or group-based statistics can be used to standardize the coefficients, and each method may result in different conclusions. We argue that in order to make a meaningful comparison of the strength of the cross-lagged associations, the coefficients should be standardized within persons. We further illustrate the bivariate multilevel autoregressive model and the standardization of the coefficients, and we show that disregarding individual differences in dynamics can prove misleading, by means of an empirical example on experienced competence and exhaustion in persons diagnosed with burnout. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Biostatistics Series Module 10: Brief Overview of Multivariate Methods.
Hazra, Avijit; Gogtay, Nithya
2017-01-01
Multivariate analysis refers to statistical techniques that simultaneously look at three or more variables in relation to the subjects under investigation with the aim of identifying or clarifying the relationships between them. These techniques have been broadly classified as dependence techniques, which explore the relationship between one or more dependent variables and their independent predictors, and interdependence techniques, that make no such distinction but treat all variables equally in a search for underlying relationships. Multiple linear regression models a situation where a single numerical dependent variable is to be predicted from multiple numerical independent variables. Logistic regression is used when the outcome variable is dichotomous in nature. The log-linear technique models count type of data and can be used to analyze cross-tabulations where more than two variables are included. Analysis of covariance is an extension of analysis of variance (ANOVA), in which an additional independent variable of interest, the covariate, is brought into the analysis. It tries to examine whether a difference persists after "controlling" for the effect of the covariate that can impact the numerical dependent variable of interest. Multivariate analysis of variance (MANOVA) is a multivariate extension of ANOVA used when multiple numerical dependent variables have to be incorporated in the analysis. Interdependence techniques are more commonly applied to psychometrics, social sciences and market research. Exploratory factor analysis and principal component analysis are related techniques that seek to extract from a larger number of metric variables, a smaller number of composite factors or components, which are linearly related to the original variables. Cluster analysis aims to identify, in a large number of cases, relatively homogeneous groups called clusters, without prior information about the groups. The calculation intensive nature of multivariate analysis has so far precluded most researchers from using these techniques routinely. The situation is now changing with wider availability, and increasing sophistication of statistical software and researchers should no longer shy away from exploring the applications of multivariate methods to real-life data sets.
Evaluation of Facility Management by Multivariate Statistics - Factor Analysis
NASA Astrophysics Data System (ADS)
Singovszki, Miloš; Vranayová, Zuzana
2013-06-01
Facility management is evolving, there is no exact than other sciences, although its development is fast forward. The knowledge and practical skills in facility management is not replaced, on the contrary, they complement each other. The existing low utilization of science in the field of facility management is mainly caused by the management of support activities are many variables and prevailing immediate reaction to the extraordinary situation arising from motives of those who have substantial experience and years of proven experience. Facility management is looking for a system that uses organized knowledge and will form the basis, which grows from a wide range of disciplines. Significant influence on its formation as a scientific discipline is the "structure, which follows strategy". The paper deals evaluate technology building as part of an facility management by multivariate statistic - factor analysis.
Santos, L N S; Cabral, P D S; Neves, G A R; Alves, F R; Teixeira, M B; Cunha, F N; Silva, N F
2017-03-16
The availability of common bean cultivars tolerant to Meloidogyne javanica is limited in Brazil. Thus, the present study aimed to evaluate the reactions of 33 common bean genotypes (23 landrace, 8 commercial, 1 susceptible standard and 1 resistant standard) to M. javanica, employing multivariate statistics to discriminate the reaction of the genotypes. The experiment was conducted in a greenhouse using a completely randomized design with seven replicates. The seeds were sown in 1-L pots containing autoclaved soil and sand in a 1:1 ratio (v:v). On day 19, after emergence of the seedlings, the plants were treated with inoculum containing 4000 eggs + second-stage juveniles (J2). At 60 days after inoculation, the seedlings were evaluated based on biometric and parasitism-related traits, such as number of galls, final nematode population per root system, reproduction factor, and percent reduction in the reproduction factor of the nematode (%RRF). The data were subjected to analysis of variance using the F-test. The Mahalanobis generalized distance was used to obtain the dissimilarity matrix, and the average linkage between groups was used for clustering. The use of multivariate statistics allowed groups to be separated according to the resistance levels of genotypes, as observed in the %RRF. The landrace genotypes FORT-09, FORT-17, FORT-31, FORT-32, FORT-34 and FORT-36 presented resistance to M. javanica; thus, these genotypes can be considered potential sources of resistance.
Burgos, P I; Vilá, L M; Reveille, J D; Alarcón, G S
2009-12-01
To determine the factors associated with peripheral vascular damage in systemic lupus erythematosus patients and its impact on survival from Lupus in Minorities, Nature versus Nurture, a longitudinal US multi-ethnic cohort. Peripheral vascular damage was defined by the Systemic Lupus International Collaborating Clinics Damage Index (SDI). Factors associated with peripheral vascular damage were examined by univariable and multi-variable logistic regression models and its impact on survival by a Cox multi-variable regression. Thirty-four (5.3%) of 637 patients (90% women, mean [SD] age 36.5 [12.6] [16-87] years) developed peripheral vascular damage. Age and the SDI (without peripheral vascular damage) were statistically significant (odds ratio [OR] = 1.05, 95% confidence interval [CI] 1.01-1.08; P = 0.0107 and OR = 1.30, 95% CI 0.09-1.56; P = 0.0043, respectively) in multi-variable analyses. Azathioprine, warfarin and statins were also statistically significant, and glucocorticoid use was borderline statistically significant (OR = 1.03, 95% CI 0.10-1.06; P = 0.0975). In the survival analysis, peripheral vascular damage was independently associated with a diminished survival (hazard ratio = 2.36; 95% CI 1.07-5.19; P = 0.0334). In short, age was independently associated with peripheral vascular damage, but so was the presence of damage in other organs (ocular, neuropsychiatric, renal, cardiovascular, pulmonary, musculoskeletal and integument) and some medications (probably reflecting more severe disease). Peripheral vascular damage also negatively affected survival.
Prabitha, Vasumathi Gopala; Suchetha, Sambasivan; Jayanthi, Jayaraj Lalitha; Baiju, Kamalasanan Vijayakumary; Rema, Prabhakaran; Anuraj, Koyippurath; Mathews, Anita; Sebastian, Paul; Subhash, Narayanan
2016-01-01
Diffuse reflectance (DR) spectroscopy is a non-invasive, real-time, and cost-effective tool for early detection of malignant changes in squamous epithelial tissues. The present study aims to evaluate the diagnostic power of diffuse reflectance spectroscopy for non-invasive discrimination of cervical lesions in vivo. A clinical trial was carried out on 48 sites in 34 patients by recording DR spectra using a point-monitoring device with white light illumination. The acquired data were analyzed and classified using multivariate statistical analysis based on principal component analysis (PCA) and linear discriminant analysis (LDA). Diagnostic accuracies were validated using random number generators. The receiver operating characteristic (ROC) curves were plotted for evaluating the discriminating power of the proposed statistical technique. An algorithm was developed and used to classify non-diseased (normal) from diseased sites (abnormal) with a sensitivity of 72 % and specificity of 87 %. While low-grade squamous intraepithelial lesion (LSIL) could be discriminated from normal with a sensitivity of 56 % and specificity of 80 %, and high-grade squamous intraepithelial lesion (HSIL) from normal with a sensitivity of 89 % and specificity of 97 %, LSIL could be discriminated from HSIL with 100 % sensitivity and specificity. The areas under the ROC curves were 0.993 (95 % confidence interval (CI) 0.0 to 1) and 1 (95 % CI 1) for the discrimination of HSIL from normal and HSIL from LSIL, respectively. The results of the study show that DR spectroscopy could be used along with multivariate analytical techniques as a non-invasive technique to monitor cervical disease status in real time.
Xu, Yangyang; Cai, Hao; Cao, Gang; Duan, Yu; Pei, Ke; Tu, Sicong; Zhou, Jia; Xie, Li; Sun, Dongdong; Zhao, Jiayu; Liu, Jing; Wang, Xiaoqi; Shen, Lin
2018-04-15
Baizhu Shaoyao San (BSS) is a famous traditional Chinese medicinal formula widely used for the treatment of painful diarrhea, intestinal inflammation, and diarrhea-predominant irritable bowel syndrome. According to clinical medication, three medicinal herbs (Atractylodis Macrocephalae Rhizoma, Paeoniae Radix Alba, and Citri Reticulatae Pericarpium) included in BSS must be processed using some specific methods of stir-frying. On the basis of the classical theories of traditional Chinese medicine, the therapeutic effects of BSS would be significantly enhanced after processing. Generally, the changes of curative effects mainly result from the variations of inside chemical basis caused by the processing procedure. To find out the corresponding changes of chemical compositions in BSS after processing and to elucidate the material basis of the changed curative effects, an optimized ultra-high-performance liquid chromatography-quadrupole/time-of-flight mass spectrometry in positive and negative ion modes coupled with multivariate statistical analyses were developed. As a result, a total of 186 compounds were ultimately identified in crude and processed BSS, in which 62 marker compounds with significant differences between crude and processed BSS were found by principal component analysis and t-test. Compared with crude BSS, the contents of 23 compounds were remarkably decreased and the contents of 39 compounds showed notable increase in processed BSS. The transformation mechanisms of some changed compounds were appropriately inferred from the results. Furthermore, compounds with extremely significant differences might strengthen the effects of the whole herbal formula. Copyright © 2018 Elsevier B.V. All rights reserved.
Assessment of data pre-processing methods for LC-MS/MS-based metabolomics of uterine cervix cancer.
Chen, Yanhua; Xu, Jing; Zhang, Ruiping; Shen, Guoqing; Song, Yongmei; Sun, Jianghao; He, Jiuming; Zhan, Qimin; Abliz, Zeper
2013-05-07
A metabolomics strategy based on rapid resolution liquid chromatography/tandem mass spectrometry (RRLC-MS/MS) and multivariate statistics has been implemented to identify potential biomarkers in uterine cervix cancer. Due to the importance of the data pre-processing method, three popular software packages have been compared. Then they have been used to acquire respective data matrices from the same LC-MS/MS data. Multivariate statistics was subsequently used to identify significantly changed biomarkers for uterine cervix cancer from the resulting data matrices. The reliabilities of the identified discriminated metabolites have been further validated on the basis of manually extracted data and ROC curves. Nine potential biomarkers have been identified as having a close relationship with uterine cervix cancer. Considering these in combination as a biomarker group, the AUC amounted to 0.997, with a sensitivity of 92.9% and a specificity of 95.6%. The prediction accuracy was 96.6%. Among these potential biomarkers, the amounts of four purine derivatives were greatly decreased, which might be related to a P2 receptor that might lead to a decrease in cell number through apoptosis. Moreover, only two of them were identified simultaneously by all of the pre-processing tools. The results have demonstrated that the data pre-processing method could seriously bias the metabolomics results. Therefore, application of two or more data pre-processing methods would reveal a more comprehensive set of potential biomarkers in non-targeted metabolomics, before a further validation with LC-MS/MS based targeted metabolomics in MRM mode could be conducted.
Bayesian multivariate Poisson abundance models for T-cell receptor data.
Greene, Joshua; Birtwistle, Marc R; Ignatowicz, Leszek; Rempala, Grzegorz A
2013-06-07
A major feature of an adaptive immune system is its ability to generate B- and T-cell clones capable of recognizing and neutralizing specific antigens. These clones recognize antigens with the help of the surface molecules, called antigen receptors, acquired individually during the clonal development process. In order to ensure a response to a broad range of antigens, the number of different receptor molecules is extremely large, resulting in a huge clonal diversity of both B- and T-cell receptor populations and making their experimental comparisons statistically challenging. To facilitate such comparisons, we propose a flexible parametric model of multivariate count data and illustrate its use in a simultaneous analysis of multiple antigen receptor populations derived from mammalian T-cells. The model relies on a representation of the observed receptor counts as a multivariate Poisson abundance mixture (m PAM). A Bayesian parameter fitting procedure is proposed, based on the complete posterior likelihood, rather than the conditional one used typically in similar settings. The new procedure is shown to be considerably more efficient than its conditional counterpart (as measured by the Fisher information) in the regions of m PAM parameter space relevant to model T-cell data. Copyright © 2013 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Riad, Safaa M.; Salem, Hesham; Elbalkiny, Heba T.; Khattab, Fatma I.
2015-04-01
Five, accurate, precise, and sensitive univariate and multivariate spectrophotometric methods were developed for the simultaneous determination of a ternary mixture containing Trimethoprim (TMP), Sulphamethoxazole (SMZ) and Oxytetracycline (OTC) in waste water samples collected from different cites either production wastewater or livestock wastewater after their solid phase extraction using OASIS HLB cartridges. In univariate methods OTC was determined at its λmax 355.7 nm (0D), while (TMP) and (SMZ) were determined by three different univariate methods. Method (A) is based on successive spectrophotometric resolution technique (SSRT). The technique starts with the ratio subtraction method followed by ratio difference method for determination of TMP and SMZ. Method (B) is successive derivative ratio technique (SDR). Method (C) is mean centering of the ratio spectra (MCR). The developed multivariate methods are principle component regression (PCR) and partial least squares (PLS). The specificity of the developed methods is investigated by analyzing laboratory prepared mixtures containing different ratios of the three drugs. The obtained results are statistically compared with those obtained by the official methods, showing no significant difference with respect to accuracy and precision at p = 0.05.
Riad, Safaa M; Salem, Hesham; Elbalkiny, Heba T; Khattab, Fatma I
2015-04-05
Five, accurate, precise, and sensitive univariate and multivariate spectrophotometric methods were developed for the simultaneous determination of a ternary mixture containing Trimethoprim (TMP), Sulphamethoxazole (SMZ) and Oxytetracycline (OTC) in waste water samples collected from different cites either production wastewater or livestock wastewater after their solid phase extraction using OASIS HLB cartridges. In univariate methods OTC was determined at its λmax 355.7 nm (0D), while (TMP) and (SMZ) were determined by three different univariate methods. Method (A) is based on successive spectrophotometric resolution technique (SSRT). The technique starts with the ratio subtraction method followed by ratio difference method for determination of TMP and SMZ. Method (B) is successive derivative ratio technique (SDR). Method (C) is mean centering of the ratio spectra (MCR). The developed multivariate methods are principle component regression (PCR) and partial least squares (PLS). The specificity of the developed methods is investigated by analyzing laboratory prepared mixtures containing different ratios of the three drugs. The obtained results are statistically compared with those obtained by the official methods, showing no significant difference with respect to accuracy and precision at p=0.05. Copyright © 2015 Elsevier B.V. All rights reserved.
A Review of Multivariate Distributions for Count Data Derived from the Poisson Distribution.
Inouye, David; Yang, Eunho; Allen, Genevera; Ravikumar, Pradeep
2017-01-01
The Poisson distribution has been widely studied and used for modeling univariate count-valued data. Multivariate generalizations of the Poisson distribution that permit dependencies, however, have been far less popular. Yet, real-world high-dimensional count-valued data found in word counts, genomics, and crime statistics, for example, exhibit rich dependencies, and motivate the need for multivariate distributions that can appropriately model this data. We review multivariate distributions derived from the univariate Poisson, categorizing these models into three main classes: 1) where the marginal distributions are Poisson, 2) where the joint distribution is a mixture of independent multivariate Poisson distributions, and 3) where the node-conditional distributions are derived from the Poisson. We discuss the development of multiple instances of these classes and compare the models in terms of interpretability and theory. Then, we empirically compare multiple models from each class on three real-world datasets that have varying data characteristics from different domains, namely traffic accident data, biological next generation sequencing data, and text data. These empirical experiments develop intuition about the comparative advantages and disadvantages of each class of multivariate distribution that was derived from the Poisson. Finally, we suggest new research directions as explored in the subsequent discussion section.
Quantification of Covariance in Tropical Cyclone Activity across Teleconnected Basins
NASA Astrophysics Data System (ADS)
Tolwinski-Ward, S. E.; Wang, D.
2015-12-01
Rigorous statistical quantification of natural hazard covariance across regions has important implications for risk management, and is also of fundamental scientific interest. We present a multivariate Bayesian Poisson regression model for inferring the covariance in tropical cyclone (TC) counts across multiple ocean basins and across Saffir-Simpson intensity categories. Such covariability results from the influence of large-scale modes of climate variability on local environments that can alternately suppress or enhance TC genesis and intensification, and our model also simultaneously quantifies the covariance of TC counts with various climatic modes in order to deduce the source of inter-basin TC covariability. The model explicitly treats the time-dependent uncertainty in observed maximum sustained wind data, and hence the nominal intensity category of each TC. Differences in annual TC counts as measured by different agencies are also formally addressed. The probabilistic output of the model can be probed for probabilistic answers to such questions as: - Does the relationship between different categories of TCs differ statistically by basin? - Which climatic predictors have significant relationships with TC activity in each basin? - Are the relationships between counts in different basins conditionally independent given the climatic predictors, or are there other factors at play affecting inter-basin covariability? - How can a portfolio of insured property be optimized across space to minimize risk? Although we present results of our model applied to TCs, the framework is generalizable to covariance estimation between multivariate counts of natural hazards across regions and/or across peril types.
Predicting clinical diagnosis in Huntington's disease: An imaging polymarker
Daws, Richard E.; Soreq, Eyal; Johnson, Eileanoir B.; Scahill, Rachael I.; Tabrizi, Sarah J.; Barker, Roger A.; Hampshire, Adam
2018-01-01
Objective Huntington's disease (HD) gene carriers can be identified before clinical diagnosis; however, statistical models for predicting when overt motor symptoms will manifest are too imprecise to be useful at the level of the individual. Perfecting this prediction is integral to the search for disease modifying therapies. This study aimed to identify an imaging marker capable of reliably predicting real‐life clinical diagnosis in HD. Method A multivariate machine learning approach was applied to resting‐state and structural magnetic resonance imaging scans from 19 premanifest HD gene carriers (preHD, 8 of whom developed clinical disease in the 5 years postscanning) and 21 healthy controls. A classification model was developed using cross‐group comparisons between preHD and controls, and within the preHD group in relation to “estimated” and “actual” proximity to disease onset. Imaging measures were modeled individually, and combined, and permutation modeling robustly tested classification accuracy. Results Classification performance for preHDs versus controls was greatest when all measures were combined. The resulting polymarker predicted converters with high accuracy, including those who were not expected to manifest in that time scale based on the currently adopted statistical models. Interpretation We propose that a holistic multivariate machine learning treatment of brain abnormalities in the premanifest phase can be used to accurately identify those patients within 5 years of developing motor features of HD, with implications for prognostication and preclinical trials. Ann Neurol 2018;83:532–543 PMID:29405351
Collins, Simon N; Dyson, Sue J; Murray, Rachel C; Newton, J Richard; Burden, Faith; Trawford, Andrew F
2012-08-01
To establish and validate an objective method of radiographic diagnosis of anatomic changes in laminitic forefeet of donkeys on the basis of data from a comprehensive series of radiographic measurements. 85 donkeys with and 85 without forelimb laminitis for baseline data determination; a cohort of 44 donkeys with and 18 without forelimb laminitis was used for validation analyses. For each donkey, lateromedial radiographic views of 1 weight-bearing forelimb were obtained; images from 11 laminitic and 2 nonlaminitic donkeys were excluded (motion artifact) from baseline data determination. Data from an a priori selection of 19 measurements of anatomic features of laminitic and nonlaminitic donkey feet were analyzed by use of a novel application of multivariate statistical techniques. The resultant diagnostic models were validated in a blinded manner with data from the separate cohort of laminitic and nonlaminitic donkeys. Data were modeled, and robust statistical rules were established for the diagnosis of anatomic changes within laminitic donkey forefeet. Component 1 scores ≤ -3.5 were indicative of extreme anatomic change, and scores from -2.0 to 0.0 denoted modest change. Nonlaminitic donkeys with a score from 0.5 to 1.0 should be considered as at risk for laminitis. Results indicated that the radiographic procedures evaluated can be used for the identification, assessment, and monitoring of anatomic changes associated with laminitis. Screening assessments by use of this method may enable early detection of mild anatomic change and identification of at-risk donkeys.
Cho, Hyun-Deok; Kim, Unyong; Suh, Joon Hyuk; Eom, Han Young; Kim, Junghyun; Lee, Seul Gi; Choi, Yong Seok; Han, Sang Beom
2016-04-01
Analytical methods using high-performance liquid chromatography with diode array and tandem mass spectrometry detection were developed for the discrimination of the rhizomes of four Atractylodes medicinal plants: A. japonica, A. macrocephala, A. chinensis, and A. lancea. A quantitative study was performed, selecting five bioactive components, including atractylenolide I, II, III, eudesma-4(14),7(11)-dien-8-one and atractylodin, on twenty-six Atractylodes samples of various origins. Sample extraction was optimized to sonication with 80% methanol for 40 min at room temperature. High-performance liquid chromatography with diode array detection was established using a C18 column with a water/acetonitrile gradient system at a flow rate of 1.0 mL/min, and the detection wavelength was set at 236 nm. Liquid chromatography with tandem mass spectrometry was applied to certify the reliability of the quantitative results. The developed methods were validated by ensuring specificity, linearity, limit of quantification, accuracy, precision, recovery, robustness, and stability. Results showed that cangzhu contained higher amounts of atractylenolide I and atractylodin than baizhu, and especially atractylodin contents showed the greatest variation between baizhu and cangzhu. Multivariate statistical analysis, such as principal component analysis and hierarchical cluster analysis, were also employed for further classification of the Atractylodes plants. The established method was suitable for quality control of the Atractylodes plants. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Statistics anxiety, state anxiety during an examination, and academic achievement.
Macher, Daniel; Paechter, Manuela; Papousek, Ilona; Ruggeri, Kai; Freudenthaler, H Harald; Arendasy, Martin
2013-12-01
A large proportion of students identify statistics courses as the most anxiety-inducing courses in their curriculum. Many students feel impaired by feelings of state anxiety in the examination and therefore probably show lower achievements. The study investigates how statistics anxiety, attitudes (e.g., interest, mathematical self-concept) and trait anxiety, as a general disposition to anxiety, influence experiences of anxiety as well as achievement in an examination. Participants were 284 undergraduate psychology students, 225 females and 59 males. Two weeks prior to the examination, participants completed a demographic questionnaire and measures of the STARS, the STAI, self-concept in mathematics, and interest in statistics. At the beginning of the statistics examination, students assessed their present state anxiety by the KUSTA scale. After 25 min, all examination participants gave another assessment of their anxiety at that moment. Students' examination scores were recorded. Structural equation modelling techniques were used to test relationships between the variables in a multivariate context. Statistics anxiety was the only variable related to state anxiety in the examination. Via state anxiety experienced before and during the examination, statistics anxiety had a negative influence on achievement. However, statistics anxiety also had a direct positive influence on achievement. This result may be explained by students' motivational goals in the specific educational setting. The results provide insight into the relationship between students' attitudes, dispositions, experiences of anxiety in the examination, and academic achievement, and give recommendations to instructors on how to support students prior to and in the examination. © 2012 The British Psychological Society.
Statistical methods in personality assessment research.
Schinka, J A; LaLone, L; Broeckel, J A
1997-06-01
Emerging models of personality structure and advances in the measurement of personality and psychopathology suggest that research in personality and personality assessment has entered a stage of advanced development, in this article we examine whether researchers in these areas have taken advantage of new and evolving statistical procedures. We conducted a review of articles published in the Journal of Personality, Assessment during the past 5 years. Of the 449 articles that included some form of data analysis, 12.7% used only descriptive statistics, most employed only univariate statistics, and fewer than 10% used multivariate methods of data analysis. We discuss the cost of using limited statistical methods, the possible reasons for the apparent reluctance to employ advanced statistical procedures, and potential solutions to this technical shortcoming.
Kukreti, B M; Pandey, Pradeep; Singh, R V
2012-08-01
Non-coring based exploratory drilling was under taken in the sedimentary environment of Rangsohkham block, East Khasi Hills district to examine the eastern extension of existing uranium resources located at Domiasiat and Wakhyn in the Mahadek basin of Meghalaya (India). Although radiometric survey and radiometric analysis of surface grab/channel samples in the block indicate high uranium content but the gamma ray logging results of exploratory boreholes in the block, did not obtain the expected results. To understand this abrupt discontinuity between the two sets of data (surface and subsurface) multivariate statistical analysis of primordial radioactive elements (K(40), U(238) and Th(232)) was performed using the concept of representative subsurface samples, drawn from the randomly selected 11 boreholes of this block. The study was performed to a high confidence level (99%), and results are discussed for assessing the U and Th behavior in the block. Results not only confirm the continuation of three distinct geological formations in the area but also the uranium bearing potential in the Mahadek sandstone of the eastern part of Mahadek Basin. Copyright © 2012 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
McKinley, C. C.; Scudder, R.; Thomas, D. J.
2016-12-01
The Neodymium Isotopic composition (Nd IC) of oxide coatings has been applied as a tracer of water mass composition and used to address fundamental questions about past ocean conditions. The leached authigenic oxide coating from marine sediment is widely assumed to reflect the dissolved trace metal composition of the bottom water interacting with sediment at the seafloor. However, recent studies have shown that readily reducible sediment components, in addition to trace metal fluxes from the pore water, are incorporated into the bottom water, influencing the trace metal composition of leached oxide coatings. This challenges the prevailing application of the authigenic oxide Nd IC as a proxy of seawater composition. Therefore, it is important to identify the component end-members that create sediments of different lithology and determine if, or how they might contribute to the Nd IC of oxide coatings. To investigate lithologic influence on the results of sequential leaching, we selected two sites with complete bulk sediment statistical characterization. Site U1370 in the South Pacific Gyre, is predominantly composed of Rhyolite ( 60%) and has a distinguishable ( 10%) Fe-Mn Oxyhydroxide component (Dunlea et al., 2015). Site 1149 near the Izu-Bonin-Arc is predominantly composed of dispersed ash ( 20-50%) and eolian dust from Asia ( 50-80%) (Scudder et al., 2014). We perform a two-step leaching procedure: a 14 mL of 0.02 M hydroxylamine hydrochloride (HH) in 20% acetic acid buffered to a pH 4 for one hour, targeting metals bound to Fe- and Mn- oxides fractions, and a second HH leach for 12 hours, designed to remove any remaining oxides from the residual component. We analyze all three resulting fractions for a large suite of major, trace and rare earth elements, a sub-set of the samples are also analyzed for Nd IC. We use multivariate statistical analyses of the resulting geochemical data to identify how each component of the sediment partitions across the sequential extractions. Here we present results comparing the two sites, and examine how the composition of the sediment impacts the resulting Nd IC.
Health and human rights: a statistical measurement framework using household survey data in Uganda.
Wesonga, Ronald; Owino, Abraham; Ssekiboobo, Agnes; Atuhaire, Leonard; Jehopio, Peter
2015-05-03
Health is intertwined with human rights as is clearly reflected in the right to life. Promotion of health practices in the context of human rights can be accomplished if there is a better understanding of the level of human rights observance. In this paper, we evaluate and present an appraisal for a possibility of applying household survey to study the determinants of health and human rights and also derive the probability that human rights are observed; an important ingredient into the national planning framework. Data from the Uganda National Governance Baseline Survey were used. A conceptual framework for predictors of a hybrid dependent variable was developed and both bivariate and multivariate statistical techniques employed. Multivariate post estimation computations were derived after evaluations of the significance of coefficients of health and human rights predictors. Findings, show that household characteristics of respondents considered in this study were statistically significant (p < 0.05) to provide a reliable assessment of human rights observance. For example, a unit increase of respondents' schooling levels results in an increase of about 34% level of positively assessing human rights observance. Additionally, the study establishes, through the three models presented, that household assessment of health and human rights observance was 20% which also represents how much of the entire continuum of human rights is demanded. Findings propose important evidence for monitoring and evaluation of health in the context human rights using household survey data. They provide a benchmark for health and human rights assessments with a focus on international and national development plans to achieve socio-economic transformation and health in society.
Jupiter, Daniel C
2012-01-01
In this first of a series of statistical methodology commentaries for the clinician, we discuss the use of multivariate linear regression. Copyright © 2012 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Intharathirat, Rotchana; Abdul Salam, P; Kumar, S; Untong, Akarapong
2015-05-01
In order to plan, manage and use municipal solid waste (MSW) in a sustainable way, accurate forecasting of MSW generation and composition plays a key role. It is difficult to carry out the reliable estimates using the existing models due to the limited data available in the developing countries. This study aims to forecast MSW collected in Thailand with prediction interval in long term period by using the optimized multivariate grey model which is the mathematical approach. For multivariate models, the representative factors of residential and commercial sectors affecting waste collected are identified, classified and quantified based on statistics and mathematics of grey system theory. Results show that GMC (1, 5), the grey model with convolution integral, is the most accurate with the least error of 1.16% MAPE. MSW collected would increase 1.40% per year from 43,435-44,994 tonnes per day in 2013 to 55,177-56,735 tonnes per day in 2030. This model also illustrates that population density is the most important factor affecting MSW collected, followed by urbanization, proportion employment and household size, respectively. These mean that the representative factors of commercial sector may affect more MSW collected than that of residential sector. Results can help decision makers to develop the measures and policies of waste management in long term period. Copyright © 2015 Elsevier Ltd. All rights reserved.
Simultaneous calibration of ensemble river flow predictions over an entire range of lead times
NASA Astrophysics Data System (ADS)
Hemri, S.; Fundel, F.; Zappa, M.
2013-10-01
Probabilistic estimates of future water levels and river discharge are usually simulated with hydrologic models using ensemble weather forecasts as main inputs. As hydrologic models are imperfect and the meteorological ensembles tend to be biased and underdispersed, the ensemble forecasts for river runoff typically are biased and underdispersed, too. Thus, in order to achieve both reliable and sharp predictions statistical postprocessing is required. In this work Bayesian model averaging (BMA) is applied to statistically postprocess ensemble runoff raw forecasts for a catchment in Switzerland, at lead times ranging from 1 to 240 h. The raw forecasts have been obtained using deterministic and ensemble forcing meteorological models with different forecast lead time ranges. First, BMA is applied based on mixtures of univariate normal distributions, subject to the assumption of independence between distinct lead times. Then, the independence assumption is relaxed in order to estimate multivariate runoff forecasts over the entire range of lead times simultaneously, based on a BMA version that uses multivariate normal distributions. Since river runoff is a highly skewed variable, Box-Cox transformations are applied in order to achieve approximate normality. Both univariate and multivariate BMA approaches are able to generate well calibrated probabilistic forecasts that are considerably sharper than climatological forecasts. Additionally, multivariate BMA provides a promising approach for incorporating temporal dependencies into the postprocessed forecasts. Its major advantage against univariate BMA is an increase in reliability when the forecast system is changing due to model availability.
NASA Astrophysics Data System (ADS)
Ghanate, A. D.; Kothiwale, S.; Singh, S. P.; Bertrand, Dominique; Krishna, C. Murali
2011-02-01
Cancer is now recognized as one of the major causes of morbidity and mortality. Histopathological diagnosis, the gold standard, is shown to be subjective, time consuming, prone to interobserver disagreement, and often fails to predict prognosis. Optical spectroscopic methods are being contemplated as adjuncts or alternatives to conventional cancer diagnostics. The most important aspect of these approaches is their objectivity, and multivariate statistical tools play a major role in realizing it. However, rigorous evaluation of the robustness of spectral models is a prerequisite. The utility of Raman spectroscopy in the diagnosis of cancers has been well established. Until now, the specificity and applicability of spectral models have been evaluated for specific cancer types. In this study, we have evaluated the utility of spectroscopic models representing normal and malignant tissues of the breast, cervix, colon, larynx, and oral cavity in a broader perspective, using different multivariate tests. The limit test, which was used in our earlier study, gave high sensitivity but suffered from poor specificity. The performance of other methods such as factorial discriminant analysis and partial least square discriminant analysis are at par with more complex nonlinear methods such as decision trees, but they provide very little information about the classification model. This comparative study thus demonstrates not just the efficacy of Raman spectroscopic models but also the applicability and limitations of different multivariate tools for discrimination under complex conditions such as the multicancer scenario.
Landslide susceptibility map: from research to application
NASA Astrophysics Data System (ADS)
Fiorucci, Federica; Reichenbach, Paola; Ardizzone, Francesca; Rossi, Mauro; Felicioni, Giulia; Antonini, Guendalina
2014-05-01
Susceptibility map is an important and essential tool in environmental planning, to evaluate landslide hazard and risk and for a correct and responsible management of the territory. Landslide susceptibility is the likelihood of a landslide occurring in an area on the basis of local terrain conditions. Can be expressed as the probability that any given region will be affected by landslides, i.e. an estimate of "where" landslides are likely to occur. In this work we present two examples of landslide susceptibility map prepared for the Umbria Region and for the Perugia Municipality. These two maps were realized following official request from the Regional and Municipal government to the Research Institute for the Hydrogeological Protection (CNR-IRPI). The susceptibility map prepared for the Umbria Region represents the development of previous agreements focused to prepare: i) a landslide inventory map that was included in the Urban Territorial Planning (PUT) and ii) a series of maps for the Regional Plan for Multi-risk Prevention. The activities carried out for the Umbria Region were focused to define and apply methods and techniques for landslide susceptibility zonation. Susceptibility maps were prepared exploiting a multivariate statistical model (linear discriminant analysis) for the five Civil Protection Alert Zones defined in the regional territory. The five resulting maps were tested and validated using the spatial distribution of recent landslide events that occurred in the region. The susceptibility map for the Perugia Municipality was prepared to be integrated as one of the cartographic product in the Municipal development plan (PRG - Piano Regolatore Generale) as required by the existing legislation. At strategic level, one of the main objectives of the PRG, is to establish a framework of knowledge and legal aspects for the management of geo-hydrological risk. At national level most of the susceptibility maps prepared for the PRG, were and still are obtained qualitatively classifying the territory according to slope classes. For the Perugia Municipality the susceptibility map was obtained combining results of statistical multivariate models and landslide density map. In particular, in the first phase a susceptibility zonation was prepared using different single and combined probability statistical multivariate techniques. The zonation was then combined and compared with the landslide density map in order to reclassify the false negative (portion of the territory classified by the model as stable affected by slope failures). The semi-quantitative resulting map was classified in five susceptibility classes. For each class a set of technical regulation was established to manage the territory.
A climate-based multivariate extreme emulator of met-ocean-hydrological events for coastal flooding
NASA Astrophysics Data System (ADS)
Camus, Paula; Rueda, Ana; Mendez, Fernando J.; Tomas, Antonio; Del Jesus, Manuel; Losada, Iñigo J.
2015-04-01
Atmosphere-ocean general circulation models (AOGCMs) are useful to analyze large-scale climate variability (long-term historical periods, future climate projections). However, applications such as coastal flood modeling require climate information at finer scale. Besides, flooding events depend on multiple climate conditions: waves, surge levels from the open-ocean and river discharge caused by precipitation. Therefore, a multivariate statistical downscaling approach is adopted to reproduce relationships between variables and due to its low computational cost. The proposed method can be considered as a hybrid approach which combines a probabilistic weather type downscaling model with a stochastic weather generator component. Predictand distributions are reproduced modeling the relationship with AOGCM predictors based on a physical division in weather types (Camus et al., 2012). The multivariate dependence structure of the predictand (extreme events) is introduced linking the independent marginal distributions of the variables by a probabilistic copula regression (Ben Ayala et al., 2014). This hybrid approach is applied for the downscaling of AOGCM data to daily precipitation and maximum significant wave height and storm-surge in different locations along the Spanish coast. Reanalysis data is used to assess the proposed method. A commonly predictor for the three variables involved is classified using a regression-guided clustering algorithm. The most appropriate statistical model (general extreme value distribution, pareto distribution) for daily conditions is fitted. Stochastic simulation of the present climate is performed obtaining the set of hydraulic boundary conditions needed for high resolution coastal flood modeling. References: Camus, P., Menéndez, M., Méndez, F.J., Izaguirre, C., Espejo, A., Cánovas, V., Pérez, J., Rueda, A., Losada, I.J., Medina, R. (2014b). A weather-type statistical downscaling framework for ocean wave climate. Journal of Geophysical Research, doi: 10.1002/2014JC010141. Ben Ayala, M.A., Chebana, F., Ouarda, T.B.M.J. (2014). Probabilistic Gaussian Copula Regression Model for Multisite and Multivariable Downscaling, Journal of Climate, 27, 3331-3347.
Introduction to multivariate discrimination
NASA Astrophysics Data System (ADS)
Kégl, Balázs
2013-07-01
Multivariate discrimination or classification is one of the best-studied problem in machine learning, with a plethora of well-tested and well-performing algorithms. There are also several good general textbooks [1-9] on the subject written to an average engineering, computer science, or statistics graduate student; most of them are also accessible for an average physics student with some background on computer science and statistics. Hence, instead of writing a generic introduction, we concentrate here on relating the subject to a practitioner experimental physicist. After a short introduction on the basic setup (Section 1) we delve into the practical issues of complexity regularization, model selection, and hyperparameter optimization (Section 2), since it is this step that makes high-complexity non-parametric fitting so different from low-dimensional parametric fitting. To emphasize that this issue is not restricted to classification, we illustrate the concept on a low-dimensional but non-parametric regression example (Section 2.1). Section 3 describes the common algorithmic-statistical formal framework that unifies the main families of multivariate classification algorithms. We explain here the large-margin principle that partly explains why these algorithms work. Section 4 is devoted to the description of the three main (families of) classification algorithms, neural networks, the support vector machine, and AdaBoost. We do not go into the algorithmic details; the goal is to give an overview on the form of the functions these methods learn and on the objective functions they optimize. Besides their technical description, we also make an attempt to put these algorithm into a socio-historical context. We then briefly describe some rather heterogeneous applications to illustrate the pattern recognition pipeline and to show how widespread the use of these methods is (Section 5). We conclude the chapter with three essentially open research problems that are either relevant to or even motivated by certain unorthodox applications of multivariate discrimination in experimental physics.
Proton radius from electron scattering data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Higinbotham, Douglas W.; Kabir, Al Amin; Lin, Vincent
Background: The proton charge radius extracted from recent muonic hydrogen Lamb shift measurements is significantly smaller than that extracted from atomic hydrogen and electron scattering measurements. The discrepancy has become known as the proton radius puzzle. Purpose: In an attempt to understand the discrepancy, we review high-precision electron scattering results from Mainz, Jefferson Lab, Saskatoon and Stanford. Methods: We make use of stepwise regression techniques using the F-test as well as the Akaike information criterion to systematically determine the predictive variables to use for a given set and range of electron scattering data as well as to provide multivariate errormore » estimates. Results: Starting with the precision, low four-momentum transfer (Q 2) data from Mainz (1980) and Saskatoon (1974), we find that a stepwise regression of the Maclaurin series using the F-test as well as the Akaike information criterion justify using a linear extrapolation which yields a value for the proton radius that is consistent with the result obtained from muonic hydrogen measurements. Applying the same Maclaurin series and statistical criteria to the 2014 Rosenbluth results on GE from Mainz, we again find that the stepwise regression tends to favor a radius consistent with the muonic hydrogen radius but produces results that are extremely sensitive to the range of data included in the fit. Making use of the high-Q 2 data on G E to select functions which extrapolate to high Q 2, we find that a Pad´e (N = M = 1) statistical model works remarkably well, as does a dipole function with a 0.84 fm radius, G E(Q 2) = (1 + Q 2/0.66 GeV 2) -2. Conclusions: Rigorous applications of stepwise regression techniques and multivariate error estimates result in the extraction of a proton charge radius that is consistent with the muonic hydrogen result of 0.84 fm; either from linear extrapolation of the extreme low-Q 2 data or by use of the Pad´e approximant for extrapolation using a larger range of data. Thus, based on a purely statistical analysis of electron scattering data, we conclude that the electron scattering result and the muonic hydrogen result are consistent. Lastly, it is the atomic hydrogen results that are the outliers.« less
Proton radius from electron scattering data
Higinbotham, Douglas W.; Kabir, Al Amin; Lin, Vincent; ...
2016-05-31
Background: The proton charge radius extracted from recent muonic hydrogen Lamb shift measurements is significantly smaller than that extracted from atomic hydrogen and electron scattering measurements. The discrepancy has become known as the proton radius puzzle. Purpose: In an attempt to understand the discrepancy, we review high-precision electron scattering results from Mainz, Jefferson Lab, Saskatoon and Stanford. Methods: We make use of stepwise regression techniques using the F-test as well as the Akaike information criterion to systematically determine the predictive variables to use for a given set and range of electron scattering data as well as to provide multivariate errormore » estimates. Results: Starting with the precision, low four-momentum transfer (Q 2) data from Mainz (1980) and Saskatoon (1974), we find that a stepwise regression of the Maclaurin series using the F-test as well as the Akaike information criterion justify using a linear extrapolation which yields a value for the proton radius that is consistent with the result obtained from muonic hydrogen measurements. Applying the same Maclaurin series and statistical criteria to the 2014 Rosenbluth results on GE from Mainz, we again find that the stepwise regression tends to favor a radius consistent with the muonic hydrogen radius but produces results that are extremely sensitive to the range of data included in the fit. Making use of the high-Q 2 data on G E to select functions which extrapolate to high Q 2, we find that a Pad´e (N = M = 1) statistical model works remarkably well, as does a dipole function with a 0.84 fm radius, G E(Q 2) = (1 + Q 2/0.66 GeV 2) -2. Conclusions: Rigorous applications of stepwise regression techniques and multivariate error estimates result in the extraction of a proton charge radius that is consistent with the muonic hydrogen result of 0.84 fm; either from linear extrapolation of the extreme low-Q 2 data or by use of the Pad´e approximant for extrapolation using a larger range of data. Thus, based on a purely statistical analysis of electron scattering data, we conclude that the electron scattering result and the muonic hydrogen result are consistent. Lastly, it is the atomic hydrogen results that are the outliers.« less
NASA Technical Reports Server (NTRS)
Tripp, John S.; Tcheng, Ping
1999-01-01
Statistical tools, previously developed for nonlinear least-squares estimation of multivariate sensor calibration parameters and the associated calibration uncertainty analysis, have been applied to single- and multiple-axis inertial model attitude sensors used in wind tunnel testing to measure angle of attack and roll angle. The analysis provides confidence and prediction intervals of calibrated sensor measurement uncertainty as functions of applied input pitch and roll angles. A comparative performance study of various experimental designs for inertial sensor calibration is presented along with corroborating experimental data. The importance of replicated calibrations over extended time periods has been emphasized; replication provides independent estimates of calibration precision and bias uncertainties, statistical tests for calibration or modeling bias uncertainty, and statistical tests for sensor parameter drift over time. A set of recommendations for a new standardized model attitude sensor calibration method and usage procedures is included. The statistical information provided by these procedures is necessary for the uncertainty analysis of aerospace test results now required by users of industrial wind tunnel test facilities.
Multivariate evoked response detection based on the spectral F-test.
Rocha, Paulo Fábio F; Felix, Leonardo B; Miranda de Sá, Antonio Mauricio F L; Mendes, Eduardo M A M
2016-05-01
Objective response detection techniques, such as magnitude square coherence, component synchrony measure, and the spectral F-test, have been used to automate the detection of evoked responses. The performance of these detectors depends on both the signal-to-noise ratio (SNR) and the length of the electroencephalogram (EEG) signal. Recently, multivariate detectors were developed to increase the detection rate even in the case of a low signal-to-noise ratio or of short data records originated from EEG signals. In this context, an extension to the multivariate case of the spectral F-test detector is proposed. The performance of this technique is assessed using Monte Carlo. As an example, EEG data from 12 subjects during photic stimulation is used to demonstrate the usefulness of the proposed detector. The multivariate method showed detection rates consistently higher than those ones when only one signal was used. It is shown that the response detection in EEG signals with the multivariate technique was statistically significant if two or more EEG derivations were used. Copyright © 2016 Elsevier B.V. All rights reserved.
Cavalcante, Y L; Hauser-Davis, R A; Saraiva, A C F; Brandão, I L S; Oliveira, T F; Silveira, A M
2013-01-01
This paper compared and evaluated seasonal variations in physico-chemical parameters and metals at a hydroelectric power station reservoir by applying Multivariate Analyses and Artificial Neural Networks (ANN) statistical techniques. A Factor Analysis was used to reduce the number of variables: the first factor was composed of elements Ca, K, Mg and Na, and the second by Chemical Oxygen Demand. The ANN showed 100% correct classifications in training and validation samples. Physico-chemical analyses showed that water pH values were not statistically different between the dry and rainy seasons, while temperature, conductivity, alkalinity, ammonia and DO were higher in the dry period. TSS, hardness and COD, on the other hand, were higher during the rainy season. The statistical analyses showed that Ca, K, Mg and Na are directly connected to the Chemical Oxygen Demand, which indicates a possibility of their input into the reservoir system by domestic sewage and agricultural run-offs. These statistical applications, thus, are also relevant in cases of environmental management and policy decision-making processes, to identify which factors should be further studied and/or modified to recover degraded or contaminated water bodies. Copyright © 2012 Elsevier B.V. All rights reserved.
Matsumoto, Takao; Ishikawa, Ryo; Tohei, Tetsuya; Kimura, Hideo; Yao, Qiwen; Zhao, Hongyang; Wang, Xiaolin; Chen, Dapeng; Cheng, Zhenxiang; Shibata, Naoya; Ikuhara, Yuichi
2013-10-09
A state-of-the-art spherical aberration-corrected STEM was fully utilized to directly visualize the multiferroic domain structure in a hexagonal YMnO3 single crystal at atomic scale. With the aid of multivariate statistical analysis (MSA), we obtained unbiased and quantitative maps of ferroelectric domain structures with atomic resolution. Such a statistical image analysis of the transition region between opposite polarizations has confirmed atomically sharp transitions of ferroelectric polarization both in antiparallel (uncharged) and tail-to-tail 180° (charged) domain boundaries. Through the analysis, a correlated subatomic image shift of Mn-O layers with that of Y layers, exhibiting a double-arc shape of reversed curvatures, have been elucidated. The amount of image shift in Mn-O layers along the c-axis is statistically significant as small as 0.016 nm, roughly one-third of the evident image shift of 0.048 nm in Y layers. Interestingly, a careful analysis has shown that such a subatomic image shift in Mn-O layers vanishes at the tail-to-tail 180° domain boundaries. Furthermore, taking advantage of the annular bright field (ABF) imaging technique combined with MSA, the tilting of MnO5 bipyramids, the very core mechanism of multiferroicity of the material, is evaluated.
Velasco-Tapia, Fernando
2014-01-01
Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC) volcanic range (Mexican Volcanic Belt). In this locality, the volcanic activity (3.7 to 0.5 Ma) was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward's linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas) in the comingled lavas (binary mixtures).
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.
2003-01-01
Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity in each basin, particle size sorting, average storm intensity (millimeters per hour), soil organic matter content, soil permeability, and soil drainage. The results of this study demonstrate that logistic regression is a valuable tool for predicting the probability of debris flows occurring in recently-burned landscapes.
Risk factors for hydrocephalus and neurological deficit in children born with an encephalocele.
Da Silva, Stephanie L; Jeelani, Yasser; Dang, Ha; Krieger, Mark D; McComb, J Gordon
2015-04-01
There is a known association of hydrocephalus with encephaloceles. Risk factors for hydrocephalus and neurological deficit were ascertained in a series of patients born with an encephalocele. A retrospective analysis was undertaken of patients treated for encephaloceles at Children's Hospital Los Angeles between 1994 and 2012. The following factors were evaluated for their prognostic value: age at presentation, sex, location of encephalocele, size, contents, microcephaly, presence of hydrocephalus, CSF leak, associated cranial anomalies, and neurological outcome. Seventy children were identified, including 38 girls and 32 boys. The median age at presentation was 2 months. The mean follow-up duration was 3.7 years. Encephalocele location was classified as anterior (n = 14) or posterior (n = 56) to the coronal suture. The average maximum encephalocele diameter was 4 cm (range 0.5-23 cm). Forty-seven encephaloceles contained neural tissue. Eight infants presented at birth with CSF leaking from the encephalocele, with 1 being infected. Six patients presented with hydrocephalus, while 11 developed progressive hydrocephalus postoperatively. On univariate analysis, the presence of neural tissue, cranial anomalies, encephalocele size of at least 2 cm, seizure disorder, and microcephaly were each positively associated with hydrocephalus. On multivariate logistic regression modeling, the single prognostic factor for hydrocephalus of borderline statistical significance was the presence of neural tissue (odds ratio [OR] = 5.8, 95% confidence interval [CI] = 0.8-74.0). Fourteen patients had severe developmental delay, 28 had mild/moderate delay, and 28 were neurologically normal. On univariate analysis, the presence of cranial anomalies, larger size of encephalocele, hydrocephalus, and microcephaly were positively associated with neurological deficit. In the multivariable model, the only statistically significant prognostic factor for neurological deficit was presence of hydrocephalus (OR 17.2, 95% CI 1.7-infinity). In multivariate models, the presence of neural tissue was borderline significantly associated with hydrocephalus and the presence of hydrocephalus was significantly associated with neurological deficit. The location of the encephalocele did not have a statistically significant association with incidence of hydrocephalus or neurological deficit. In contrast to modestly good/fair neurological outcome in children with an encephalocele without hydrocephalus, the presence of hydrocephalus resulted in a far worse neurological outcome.
Zhu, Guangxu; Guo, Qingjun; Xiao, Huayun; Chen, Tongbin; Yang, Jun
2017-06-01
Heavy metals are considered toxic to humans and ecosystems. In the present study, heavy metal concentration in soil was investigated using the single pollution index (PIi), the integrated Nemerow pollution index (PIN), and the geoaccumulation index (Igeo) to determine metal accumulation and its pollution status at the abandoned site of the Capital Iron and Steel Factory in Beijing and its surrounding area. Multivariate statistical (principal component analysis and correlation analysis), geostatistical analysis (ArcGIS tool), combined with stable Pb isotopic ratios, were applied to explore the characteristics of heavy metal pollution and the possible sources of pollutants. The results indicated that heavy metal elements show different degrees of accumulation in the study area, the observed trend of the enrichment factors, and the geoaccumulation index was Hg > Cd > Zn > Cr > Pb > Cu ≈ As > Ni. Hg, Cd, Zn, and Cr were the dominant elements that influenced soil quality in the study area. The Nemerow index method indicated that all of the heavy metals caused serious pollution except Ni. Multivariate statistical analysis indicated that Cd, Zn, Cu, and Pb show obvious correlation and have higher loads on the same principal component, suggesting that they had the same sources, which are related to industrial activities and vehicle emissions. The spatial distribution maps based on ordinary kriging showed that high concentrations of heavy metals were located in the local factory area and in the southeast-northwest part of the study region, corresponding with the predominant wind directions. Analyses of lead isotopes confirmed that Pb in the study soils is predominantly derived from three Pb sources: dust generated during steel production, coal combustion, and the natural background. Moreover, the ternary mixture model based on lead isotope analysis indicates that lead in the study soils originates mainly from anthropogenic sources, which contribute much more than the natural sources. Our study could not only reveal the overall situation of heavy metal contamination, but also identify the specific pollution sources.
Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics
Chen, Wenan; Larrabee, Beth R.; Ovsyannikova, Inna G.; Kennedy, Richard B.; Haralambieva, Iana H.; Poland, Gregory A.; Schaid, Daniel J.
2015-01-01
Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf. PMID:25948564
Forecasting of municipal solid waste quantity in a developing country using multivariate grey models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Intharathirat, Rotchana, E-mail: rotchana.in@gmail.com; Abdul Salam, P., E-mail: salam@ait.ac.th; Kumar, S., E-mail: kumar@ait.ac.th
Highlights: • Grey model can be used to forecast MSW quantity accurately with the limited data. • Prediction interval overcomes the uncertainty of MSW forecast effectively. • A multivariate model gives accuracy associated with factors affecting MSW quantity. • Population, urbanization, employment and household size play role for MSW quantity. - Abstract: In order to plan, manage and use municipal solid waste (MSW) in a sustainable way, accurate forecasting of MSW generation and composition plays a key role. It is difficult to carry out the reliable estimates using the existing models due to the limited data available in the developingmore » countries. This study aims to forecast MSW collected in Thailand with prediction interval in long term period by using the optimized multivariate grey model which is the mathematical approach. For multivariate models, the representative factors of residential and commercial sectors affecting waste collected are identified, classified and quantified based on statistics and mathematics of grey system theory. Results show that GMC (1, 5), the grey model with convolution integral, is the most accurate with the least error of 1.16% MAPE. MSW collected would increase 1.40% per year from 43,435–44,994 tonnes per day in 2013 to 55,177–56,735 tonnes per day in 2030. This model also illustrates that population density is the most important factor affecting MSW collected, followed by urbanization, proportion employment and household size, respectively. These mean that the representative factors of commercial sector may affect more MSW collected than that of residential sector. Results can help decision makers to develop the measures and policies of waste management in long term period.« less
Riley, Richard D; Elia, Eleni G; Malin, Gemma; Hemming, Karla; Price, Malcolm P
2015-07-30
A prognostic factor is any measure that is associated with the risk of future health outcomes in those with existing disease. Often, the prognostic ability of a factor is evaluated in multiple studies. However, meta-analysis is difficult because primary studies often use different methods of measurement and/or different cut-points to dichotomise continuous factors into 'high' and 'low' groups; selective reporting is also common. We illustrate how multivariate random effects meta-analysis models can accommodate multiple prognostic effect estimates from the same study, relating to multiple cut-points and/or methods of measurement. The models account for within-study and between-study correlations, which utilises more information and reduces the impact of unreported cut-points and/or measurement methods in some studies. The applicability of the approach is improved with individual participant data and by assuming a functional relationship between prognostic effect and cut-point to reduce the number of unknown parameters. The models provide important inferential results for each cut-point and method of measurement, including the summary prognostic effect, the between-study variance and a 95% prediction interval for the prognostic effect in new populations. Two applications are presented. The first reveals that, in a multivariate meta-analysis using published results, the Apgar score is prognostic of neonatal mortality but effect sizes are smaller at most cut-points than previously thought. In the second, a multivariate meta-analysis of two methods of measurement provides weak evidence that microvessel density is prognostic of mortality in lung cancer, even when individual participant data are available so that a continuous prognostic trend is examined (rather than cut-points). © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Benson, Nsikak U.; Asuquo, Francis E.; Williams, Akan B.; Essien, Joseph P.; Ekong, Cyril I.; Akpabio, Otobong; Olajire, Abaas A.
2016-01-01
Trace metals (Cd, Cr, Cu, Ni and Pb) concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria). The degree of contamination was assessed using the individual contamination factors (ICF) and global contamination factor (GCF). Multivariate statistical approaches including principal component analysis (PCA), cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources. PMID:27257934
Prolonged instability prior to a regime shift
Spanbauer, Trisha; Allen, Craig R.; Angeler, David G.; Eason, Tarsha; Fritz, Sherilyn C.; Garmestani, Ahjond S.; Nash, Kirsty L.; Stone, Jeffery R.
2014-01-01
Regime shifts are generally defined as the point of ‘abrupt’ change in the state of a system. However, a seemingly abrupt transition can be the product of a system reorganization that has been ongoing much longer than is evident in statistical analysis of a single component of the system. Using both univariate and multivariate statistical methods, we tested a long-term high-resolution paleoecological dataset with a known change in species assemblage for a regime shift. Analysis of this dataset with Fisher Information and multivariate time series modeling showed that there was a∼2000 year period of instability prior to the regime shift. This period of instability and the subsequent regime shift coincide with regional climate change, indicating that the system is undergoing extrinsic forcing. Paleoecological records offer a unique opportunity to test tools for the detection of thresholds and stable-states, and thus to examine the long-term stability of ecosystems over periods of multiple millennia.
Cardot, J-M; Roudier, B; Schütz, H
2017-07-01
The f 2 test is generally used for comparing dissolution profiles. In cases of high variability, the f 2 test is not applicable, and the Multivariate Statistical Distance (MSD) test is frequently proposed as an alternative by the FDA and EMA. The guidelines provide only general recommendations. MSD tests can be performed either on raw data with or without time as a variable or on parameters of models. In addition, data can be limited-as in the case of the f 2 test-to dissolutions of up to 85% or to all available data. In the context of the present paper, the recommended calculation included all raw dissolution data up to the first point greater than 85% as a variable-without the various times as parameters. The proposed MSD overcomes several drawbacks found in other methods.
Mallette, Jennifer R; Casale, John F; Jordan, James; Morello, David R; Beyer, Paul M
2016-03-23
Previously, geo-sourcing to five major coca growing regions within South America was accomplished. However, the expansion of coca cultivation throughout South America made sub-regional origin determinations increasingly difficult. The former methodology was recently enhanced with additional stable isotope analyses ((2)H and (18)O) to fully characterize cocaine due to the varying environmental conditions in which the coca was grown. An improved data analysis method was implemented with the combination of machine learning and multivariate statistical analysis methods to provide further partitioning between growing regions. Here, we show how the combination of trace cocaine alkaloids, stable isotopes, and multivariate statistical analyses can be used to classify illicit cocaine as originating from one of 19 growing regions within South America. The data obtained through this approach can be used to describe current coca cultivation and production trends, highlight trafficking routes, as well as identify new coca growing regions.
NASA Astrophysics Data System (ADS)
Mallette, Jennifer R.; Casale, John F.; Jordan, James; Morello, David R.; Beyer, Paul M.
2016-03-01
Previously, geo-sourcing to five major coca growing regions within South America was accomplished. However, the expansion of coca cultivation throughout South America made sub-regional origin determinations increasingly difficult. The former methodology was recently enhanced with additional stable isotope analyses (2H and 18O) to fully characterize cocaine due to the varying environmental conditions in which the coca was grown. An improved data analysis method was implemented with the combination of machine learning and multivariate statistical analysis methods to provide further partitioning between growing regions. Here, we show how the combination of trace cocaine alkaloids, stable isotopes, and multivariate statistical analyses can be used to classify illicit cocaine as originating from one of 19 growing regions within South America. The data obtained through this approach can be used to describe current coca cultivation and production trends, highlight trafficking routes, as well as identify new coca growing regions.
NASA Astrophysics Data System (ADS)
Fayez, Yasmin Mohammed; Tawakkol, Shereen Mostafa; Fahmy, Nesma Mahmoud; Lotfy, Hayam Mahmoud; Shehata, Mostafa Abdel-Aty
2018-04-01
Three methods of analysis are conducted that need computational procedures by the Matlab® software. The first is the univariate mean centering method which eliminates the interfering signal of the one component at a selected wave length leaving the amplitude measured to represent the component of interest only. The other two multivariate methods named PLS and PCR depend on a large number of variables that lead to extraction of the maximum amount of information required to determine the component of interest in the presence of the other. Good accurate and precise results are obtained from the three methods for determining clotrimazole in the linearity range 1-12 μg/mL and 75-550 μg/mL with dexamethasone acetate 2-20 μg/mL in synthetic mixtures and pharmaceutical formulation using two different spectral regions 205-240 nm and 233-278 nm. The results obtained are compared statistically to each other and to the official methods.
Arm structure in normal spiral galaxies, 1: Multivariate data for 492 galaxies
NASA Technical Reports Server (NTRS)
Magri, Christopher
1994-01-01
Multivariate data have been collected as part of an effort to develop a new classification system for spiral galaxies, one which is not necessarily based on subjective morphological properties. A sample of 492 moderately bright northern Sa and Sc spirals was chosen for future statistical analysis. New observations were made at 20 and 21 cm; the latter data are described in detail here. Infrared Astronomy Satellite (IRAS) fluxes were obtained from archival data. Finally, new estimates of arm pattern radomness and of local environmental harshness were compiled for most sample objects.
Statistical analysis and interpolation of compositional data in materials science.
Pesenson, Misha Z; Suram, Santosh K; Gregoire, John M
2015-02-09
Compositional data are ubiquitous in chemistry and materials science: analysis of elements in multicomponent systems, combinatorial problems, etc., lead to data that are non-negative and sum to a constant (for example, atomic concentrations). The constant sum constraint restricts the sampling space to a simplex instead of the usual Euclidean space. Since statistical measures such as mean and standard deviation are defined for the Euclidean space, traditional correlation studies, multivariate analysis, and hypothesis testing may lead to erroneous dependencies and incorrect inferences when applied to compositional data. Furthermore, composition measurements that are used for data analytics may not include all of the elements contained in the material; that is, the measurements may be subcompositions of a higher-dimensional parent composition. Physically meaningful statistical analysis must yield results that are invariant under the number of composition elements, requiring the application of specialized statistical tools. We present specifics and subtleties of compositional data processing through discussion of illustrative examples. We introduce basic concepts, terminology, and methods required for the analysis of compositional data and utilize them for the spatial interpolation of composition in a sputtered thin film. The results demonstrate the importance of this mathematical framework for compositional data analysis (CDA) in the fields of materials science and chemistry.
Moorman, J. Randall; Delos, John B.; Flower, Abigail A.; Cao, Hanqing; Kovatchev, Boris P.; Richman, Joshua S.; Lake, Douglas E.
2014-01-01
We have applied principles of statistical signal processing and non-linear dynamics to analyze heart rate time series from premature newborn infants in order to assist in the early diagnosis of sepsis, a common and potentially deadly bacterial infection of the bloodstream. We began with the observation of reduced variability and transient decelerations in heart rate interval time series for hours up to days prior to clinical signs of illness. We find that measurements of standard deviation, sample asymmetry and sample entropy are highly related to imminent clinical illness. We developed multivariable statistical predictive models, and an interface to display the real-time results to clinicians. Using this approach, we have observed numerous cases in which incipient neonatal sepsis was diagnosed and treated without any clinical illness at all. This review focuses on the mathematical and statistical time series approaches used to detect these abnormal heart rate characteristics and present predictive monitoring information to the clinician. PMID:22026974
NASA Astrophysics Data System (ADS)
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-04-01
North-East Sicily is strongly exposed to shallow landslide events. On October, 1st 2009 a severe rainstorm (225.5 mm of cumulative rainfall in 9 hours) caused flash floods and more than 1000 landslides, which struck several small villages as Giampilieri, Altolia, Molino, Pezzolo, Scaletta Zanclea, Itala, with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly consisting in earth and debris translational slides evolving into debris flows, triggered on steep slopes involving colluvium and regolith materials which cover the underlying metamorphic bedrock of Peloritani Mountains. In this area catchments are small (about 10 square kilometres), elongated, with steep slopes, low order streams, short time of concentration, and discharge directly into the sea. In the past, landslides occurred at Altolia in 1613 and 2000, at Molino in 1750, 1805 and 2000, at Giampilieri in 1791, 1918, 1929, 1932, 2000 and on October 25, 2007. The aim of this work is to define susceptibility models for shallow landslides using multivariate statistical analyses in the Giampilieri area (25 square kilometres). A detailed landslide inventory map has been produced, as the first step, through field surveys coupled with the observation of high resolution aerial colour orthophoto taken immediately after the event. 1,490 initiation zones have been identified; most of them have planimetric dimensions ranging between tens to few hundreds of square metres. The spatial hazard assessment has been focused on the detachment areas. Susceptibility models, performed in a GIS environment, took into account several parameters. The morphometric and hydrologic parameters has been derived from a detailed LiDAR 1×1 m. Square grid cells of 4×4 m were adopted as mapping units, on the basis of the area-frequency distribution of the detachment zones, and the optimal representation of the local morphometric conditions (e.g. slope angle, plan curvature). A first phase of the work addressed to identify the spatial relationships between the landslides location and the 13 related factors by using the Frequency Ratio bivariate statistical method. The analysis was then carried out by adopting a multivariate statistical approach, according to the Logistic Regression technique and Random Forests technique that gave best results in terms of AUC. The models were performed and evaluated with different sample sizes and also taking into account the temporal variation of input variables such as burned areas by wildfire. The most significant outcome of this work are: the relevant influence of the sample size on the model results and the strong importance of some environmental factors (e.g. land use and wildfires) for the identification of the depletion zones of extremely rapid shallow landslides.
Martin, Lisa; Watanabe, Sharon; Fainsinger, Robin; Lau, Francis; Ghosh, Sunita; Quan, Hue; Atkins, Marlis; Fassbender, Konrad; Downing, G Michael; Baracos, Vickie
2010-10-01
To determine whether elements of a standard nutritional screening assessment are independently prognostic of survival in patients with advanced cancer. A prospective nested cohort of patients with metastatic cancer were accrued from different units of a Regional Palliative Care Program. Patients completed a nutritional screen on admission. Data included age, sex, cancer site, height, weight history, dietary intake, 13 nutrition impact symptoms, and patient- and physician-reported performance status (PS). Univariate and multivariate survival analyses were conducted. Concordance statistics (c-statistics) were used to test the predictive accuracy of models based on training and validation sets; a c-statistic of 0.5 indicates the model predicts the outcome as well as chance; perfect prediction has a c-statistic of 1.0. A training set of patients in palliative home care (n = 1,164) was used to identify prognostic variables. Primary disease site, PS, short-term weight change (either gain or loss), dietary intake, and dysphagia predicted survival in multivariate analysis (P < .05). A model including only patients separated by disease site and PS with high c-statistics between predicted and observed responses for survival in the training set (0.90) and validation set (0.88; n = 603). The addition of weight change, dietary intake, and dysphagia did not further improve the c-statistic of the model. The c-statistic was also not altered by substituting physician-rated palliative PS for patient-reported PS. We demonstrate a high probability of concordance between predicted and observed survival for patients in distinct palliative care settings (home care, tertiary inpatient, ambulatory outpatient) based on patient-reported information.
Matiatos, Ioannis
2016-01-15
Nitrate (NO3) is one of the most common contaminants in aquatic environments and groundwater. Nitrate concentrations and environmental isotope data (δ(15)N-NO3 and δ(18)O-NO3) from groundwater of Asopos basin, which has different land-use types, i.e., a large number of industries (e.g., textile, metal processing, food, fertilizers, paint), urban and agricultural areas and livestock breeding facilities, were analyzed to identify the nitrate sources of water contamination and N-biogeochemical transformations. A Bayesian isotope mixing model (SIAR) and multivariate statistical analysis of hydrochemical data were used to estimate the proportional contribution of different NO3 sources and to identify the dominant factors controlling the nitrate content of the groundwater in the region. The comparison of SIAR and Principal Component Analysis showed that wastes originating from urban and industrial zones of the basin are mainly responsible for nitrate contamination of groundwater in these areas. Agricultural fertilizers and manure likely contribute to groundwater contamination away from urban fabric and industrial land-use areas. Soil contribution to nitrate contamination due to organic matter is higher in the south-western part of the area far from the industries and the urban settlements. The present study aims to highlight the use of environmental isotopes combined with multivariate statistical analysis in locating sources of nitrate contamination in groundwater leading to a more effective planning of environmental measures and remediation strategies in river basins and water bodies as defined by the European Water Frame Directive (Directive 2000/60/EC).
Statistical Evaluation of Time Series Analysis Techniques
NASA Technical Reports Server (NTRS)
Benignus, V. A.
1973-01-01
The performance of a modified version of NASA's multivariate spectrum analysis program is discussed. A multiple regression model was used to make the revisions. Performance improvements were documented and compared to the standard fast Fourier transform by Monte Carlo techniques.
Robustness of Multiple Objective Decision Analysis Preference Functions
2002-06-01
p p′ : The probability of some event. ,i ip q : The probability of event . i Π : An aggregation of proportional data used in calculating a test ...statistical tests of the significance of the term and also is conducted in a multivariate framework rather than the ROSA univariate approach. A...residual error is ˆ−e = y y (45) The coefficient provides a ready indicator of the contribution for the associated variable and statistical tests
Multivariate flood risk assessment: reinsurance perspective
NASA Astrophysics Data System (ADS)
Ghizzoni, Tatiana; Ellenrieder, Tobias
2013-04-01
For insurance and re-insurance purposes the knowledge of the spatial characteristics of fluvial flooding is fundamental. The probability of simultaneous flooding at different locations during one event and the associated severity and losses have to be estimated in order to assess premiums and for accumulation control (Probable Maximum Losses calculation). Therefore, the identification of a statistical model able to describe the multivariate joint distribution of flood events in multiple location is necessary. In this context, copulas can be viewed as alternative tools for dealing with multivariate simulations as they allow to formalize dependence structures of random vectors. An application of copula function for flood scenario generation is presented for Australia (Queensland, New South Wales and Victoria) where 100.000 possible flood scenarios covering approximately 15.000 years were simulated.
Jenson, Susan K.; Trautwein, C.M.
1984-01-01
The application of an unsupervised, spatially dependent clustering technique (AMOEBA) to interpolated raster arrays of stream sediment data has been found to provide useful multivariate geochemical associations for modeling regional polymetallic resource potential. The technique is based on three assumptions regarding the compositional and spatial relationships of stream sediment data and their regional significance. These assumptions are: (1) compositionally separable classes exist and can be statistically distinguished; (2) the classification of multivariate data should minimize the pair probability of misclustering to establish useful compositional associations; and (3) a compositionally defined class represented by three or more contiguous cells within an array is a more important descriptor of a terrane than a class represented by spatial outliers.
Information extraction from multivariate images
NASA Technical Reports Server (NTRS)
Park, S. K.; Kegley, K. A.; Schiess, J. R.
1986-01-01
An overview of several multivariate image processing techniques is presented, with emphasis on techniques based upon the principal component transformation (PCT). Multiimages in various formats have a multivariate pixel value, associated with each pixel location, which has been scaled and quantized into a gray level vector, and the bivariate of the extent to which two images are correlated. The PCT of a multiimage decorrelates the multiimage to reduce its dimensionality and reveal its intercomponent dependencies if some off-diagonal elements are not small, and for the purposes of display the principal component images must be postprocessed into multiimage format. The principal component analysis of a multiimage is a statistical analysis based upon the PCT whose primary application is to determine the intrinsic component dimensionality of the multiimage. Computational considerations are also discussed.
Environmental assessment of Al-Hammar Marsh, Southern Iraq.
Al-Gburi, Hind Fadhil Abdullah; Al-Tawash, Balsam Salim; Al-Lafta, Hadi Salim
2017-02-01
(a) To determine the spatial distributions and levels of major and minor elements, as well as heavy metals, in water, sediment, and biota (plant and fish) in Al-Hammar Marsh, southern Iraq, and ultimately to supply more comprehensive information for policy-makers to manage the contaminants input into the marsh so that their concentrations do not reach toxic levels. (b) to characterize the seasonal changes in the marsh surface water quality. (c) to address the potential environmental risk of these elements by comparison with the historical levels and global quality guidelines (i.e., World Health Organization (WHO) standard limits). (d) to define the sources of these elements (i.e., natural and/or anthropogenic) using combined multivariate statistical techniques such as Principal Component Analysis (PCA) and Agglomerative Hierarchical Cluster Analysis (AHCA) along with pollution analysis (i.e., enrichment factor analysis). Water, sediment, plant, and fish samples were collected from the marsh, and analyzed for major and minor ions, as well as heavy metals, and then compared to historical levels and global quality guidelines (WHO guidelines). Then, multivariate statistical techniques, such as PCA and AHCA, were used to determine the element sourcing. Water analyses revealed unacceptable values for almost all physio-chemical and biological properties, according to WHO standard limits for drinking water. Almost all major ions and heavy metal concentrations in water showed a distinct decreasing trend at the marsh outlet station compared to other stations. In general, major and minor ions, as well as heavy metals exhibit higher concentrations in winter than in summer. Sediment analyses using multivariate statistical techniques revealed that Mg, Fe, S, P, V, Zn, As, Se, Mo, Co, Ni, Cu, Sr, Br, Cd, Ca, N, Mn, Cr, and Pb were derived from anthropogenic sources, while Al, Si, Ti, K, and Zr were primarily derived from natural sources. Enrichment factor analysis gave results compatible with multivariate statistical techniques findings. Analysis of heavy metals in plant samples revealed that there is no pollution in plants in Al-Hammar Marsh. However, the concentrations of heavy metals in fish samples showed that all samples were contaminated by Pb, Mn, and Ni, while some samples were contaminated by Pb, Mn, and Ni. Decreasing of Tigris and Euphrates discharges during the past decades due to drought conditions and upstream damming, as well as the increasing stress of wastewater effluents from anthropogenic activities, led to degradation of the downstream Al-Hammar Marsh water quality in terms of physical, chemical, and biological properties. As such properties were found to consistently exceed the historical and global quality objectives. However, element concentration decreasing trend at the marsh outlet station compared to other stations indicate that the marsh plays an important role as a natural filtration and bioremediation system. Higher element concentrations in winter were due to runoff from the washing of the surrounding Sabkha during flooding by winter rainstorms. Finally, the high concentrations of heavy metals in fish samples can be attributed to bioaccumulation and biomagnification processes.
A Review of Multivariate Distributions for Count Data Derived from the Poisson Distribution
Inouye, David; Yang, Eunho; Allen, Genevera; Ravikumar, Pradeep
2017-01-01
The Poisson distribution has been widely studied and used for modeling univariate count-valued data. Multivariate generalizations of the Poisson distribution that permit dependencies, however, have been far less popular. Yet, real-world high-dimensional count-valued data found in word counts, genomics, and crime statistics, for example, exhibit rich dependencies, and motivate the need for multivariate distributions that can appropriately model this data. We review multivariate distributions derived from the univariate Poisson, categorizing these models into three main classes: 1) where the marginal distributions are Poisson, 2) where the joint distribution is a mixture of independent multivariate Poisson distributions, and 3) where the node-conditional distributions are derived from the Poisson. We discuss the development of multiple instances of these classes and compare the models in terms of interpretability and theory. Then, we empirically compare multiple models from each class on three real-world datasets that have varying data characteristics from different domains, namely traffic accident data, biological next generation sequencing data, and text data. These empirical experiments develop intuition about the comparative advantages and disadvantages of each class of multivariate distribution that was derived from the Poisson. Finally, we suggest new research directions as explored in the subsequent discussion section. PMID:28983398
Multivariate frequency domain analysis of protein dynamics
NASA Astrophysics Data System (ADS)
Matsunaga, Yasuhiro; Fuchigami, Sotaro; Kidera, Akinori
2009-03-01
Multivariate frequency domain analysis (MFDA) is proposed to characterize collective vibrational dynamics of protein obtained by a molecular dynamics (MD) simulation. MFDA performs principal component analysis (PCA) for a bandpass filtered multivariate time series using the multitaper method of spectral estimation. By applying MFDA to MD trajectories of bovine pancreatic trypsin inhibitor, we determined the collective vibrational modes in the frequency domain, which were identified by their vibrational frequencies and eigenvectors. At near zero temperature, the vibrational modes determined by MFDA agreed well with those calculated by normal mode analysis. At 300 K, the vibrational modes exhibited characteristic features that were considerably different from the principal modes of the static distribution given by the standard PCA. The influences of aqueous environments were discussed based on two different sets of vibrational modes, one derived from a MD simulation in water and the other from a simulation in vacuum. Using the varimax rotation, an algorithm of the multivariate statistical analysis, the representative orthogonal set of eigenmodes was determined at each vibrational frequency.
PYCHEM: a multivariate analysis package for python.
Jarvis, Roger M; Broadhurst, David; Johnson, Helen; O'Boyle, Noel M; Goodacre, Royston
2006-10-15
We have implemented a multivariate statistical analysis toolbox, with an optional standalone graphical user interface (GUI), using the Python scripting language. This is a free and open source project that addresses the need for a multivariate analysis toolbox in Python. Although the functionality provided does not cover the full range of multivariate tools that are available, it has a broad complement of methods that are widely used in the biological sciences. In contrast to tools like MATLAB, PyChem 2.0.0 is easily accessible and free, allows for rapid extension using a range of Python modules and is part of the growing amount of complementary and interoperable scientific software in Python based upon SciPy. One of the attractions of PyChem is that it is an open source project and so there is an opportunity, through collaboration, to increase the scope of the software and to continually evolve a user-friendly platform that has applicability across a wide range of analytical and post-genomic disciplines. http://sourceforge.net/projects/pychem
A refined method for multivariate meta-analysis and meta-regression
Jackson, Daniel; Riley, Richard D
2014-01-01
Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects’ standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:23996351
Kamtchueng, Brice T; Fantong, Wilson Y; Wirmvem, Mengnjo J; Tiodjio, Rosine E; Takounjou, Alain F; Ndam Ngoupayou, Jules R; Kusakabe, Minoru; Zhang, Jing; Ohba, Takeshi; Tanyileke, Gregory; Hell, Joseph V; Ueda, Akira
2016-09-01
With the use of conventional hydrogeochemical techniques, multivariate statistical analysis, and stable isotope approaches, this paper investigates for the first time surface water and groundwater from the surrounding areas of Lake Monoun (LM), West Cameroon. The results reveal that waters are generally slightly acidic to neutral. The relative abundance of major dissolved species are Ca(2+) > Mg(2+) > Na(+) > K(+) for cations and HCO3 (-) ≫ NO3 (-) > Cl(-) > SO4 (2-) for anions. The main water type is Ca-Mg-HCO3. Observed salinity is related to water-rock interaction, ion exchange process, and anthropogenic activities. Nitrate and chloride have been identified as the most common pollutants. These pollutants are attributed to the chlorination of wells and leaching from pit latrines and refuse dumps. The stable isotopic compositions in the investigated water sources suggest evidence of evaporation before recharge. Four major groups of waters were identified by salinity and NO3 concentrations using the Q-mode hierarchical cluster analysis (HCA). Consistent with the isotopic results, group 1 represents fresh unpolluted water occurring near the recharge zone in the general flow regime; groups 2 and 3 are mixed water whose composition is controlled by both weathering of rock-forming minerals and anthropogenic activities; group 4 represents water under high vulnerability of anthropogenic pollution. Moreover, the isotopic results and the HCA showed that the CO2-rich bottom water of LM belongs to an isolated hydrological system within the Foumbot plain. Except for some springs, groundwater water in the area is inappropriate for drinking and domestic purposes but good to excellent for irrigation.
NASA Astrophysics Data System (ADS)
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-11-01
The aim of this work is to define reliable susceptibility models for shallow landslides using Logistic Regression and Random Forests multivariate statistical techniques. The study area, located in North-East Sicily, was hit on October 1st 2009 by a severe rainstorm (225 mm of cumulative rainfall in 7 h) which caused flash floods and more than 1000 landslides. Several small villages, such as Giampilieri, were hit with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly types such as earth and debris translational slides evolving into debris flows, were triggered on steep slopes and involved colluvium and regolith materials which cover the underlying metamorphic bedrock. The work has been carried out with the following steps: i) realization of a detailed event landslide inventory map through field surveys coupled with observation of high resolution aerial colour orthophoto; ii) identification of landslide source areas; iii) data preparation of landslide controlling factors and descriptive statistics based on a bivariate method (Frequency Ratio) to get an initial overview on existing relationships between causative factors and shallow landslide source areas; iv) choice of criteria for the selection and sizing of the mapping unit; v) implementation of 5 multivariate statistical susceptibility models based on Logistic Regression and Random Forests techniques and focused on landslide source areas; vi) evaluation of the influence of sample size and type of sampling on results and performance of the models; vii) evaluation of the predictive capabilities of the models using ROC curve, AUC and contingency tables; viii) comparison of model results and obtained susceptibility maps; and ix) analysis of temporal variation of landslide susceptibility related to input parameter changes. Models based on Logistic Regression and Random Forests have demonstrated excellent predictive capabilities. Land use and wildfire variables were found to have a strong control on the occurrence of very rapid shallow landslides.
NASA Astrophysics Data System (ADS)
Gregoire, Alexandre David
2011-07-01
The goal of this research was to accurately predict the ultimate compressive load of impact damaged graphite/epoxy coupons using a Kohonen self-organizing map (SOM) neural network and multivariate statistical regression analysis (MSRA). An optimized use of these data treatment tools allowed the generation of a simple, physically understandable equation that predicts the ultimate failure load of an impacted damaged coupon based uniquely on the acoustic emissions it emits at low proof loads. Acoustic emission (AE) data were collected using two 150 kHz resonant transducers which detected and recorded the AE activity given off during compression to failure of thirty-four impacted 24-ply bidirectional woven cloth laminate graphite/epoxy coupons. The AE quantification parameters duration, energy and amplitude for each AE hit were input to the Kohonen self-organizing map (SOM) neural network to accurately classify the material failure mechanisms present in the low proof load data. The number of failure mechanisms from the first 30% of the loading for twenty-four coupons were used to generate a linear prediction equation which yielded a worst case ultimate load prediction error of 16.17%, just outside of the +/-15% B-basis allowables, which was the goal for this research. Particular emphasis was placed upon the noise removal process which was largely responsible for the accuracy of the results.
Post-processing of multi-hydrologic model simulations for improved streamflow projections
NASA Astrophysics Data System (ADS)
khajehei, sepideh; Ahmadalipour, Ali; Moradkhani, Hamid
2016-04-01
Hydrologic model outputs are prone to bias and uncertainty due to knowledge deficiency in model and data. Uncertainty in hydroclimatic projections arises due to uncertainty in hydrologic model as well as the epistemic or aleatory uncertainties in GCM parameterization and development. This study is conducted to: 1) evaluate the recently developed multi-variate post-processing method for historical simulations and 2) assess the effect of post-processing on uncertainty and reliability of future streamflow projections in both high-flow and low-flow conditions. The first objective is performed for historical period of 1970-1999. Future streamflow projections are generated for 10 statistically downscaled GCMs from two widely used downscaling methods: Bias Corrected Statistically Downscaled (BCSD) and Multivariate Adaptive Constructed Analogs (MACA), over the period of 2010-2099 for two representative concentration pathways of RCP4.5 and RCP8.5. Three semi-distributed hydrologic models were employed and calibrated at 1/16 degree latitude-longitude resolution for over 100 points across the Columbia River Basin (CRB) in the pacific northwest USA. Streamflow outputs are post-processed through a Bayesian framework based on copula functions. The post-processing approach is relying on a transfer function developed based on bivariate joint distribution between the observation and simulation in historical period. Results show that application of post-processing technique leads to considerably higher accuracy in historical simulations and also reducing model uncertainty in future streamflow projections.
Neurophysiological correlates of depressive symptoms in young adults: A quantitative EEG study.
Lee, Poh Foong; Kan, Donica Pei Xin; Croarkin, Paul; Phang, Cheng Kar; Doruk, Deniz
2018-01-01
There is an unmet need for practical and reliable biomarkers for mood disorders in young adults. Identifying the brain activity associated with the early signs of depressive disorders could have important diagnostic and therapeutic implications. In this study we sought to investigate the EEG characteristics in young adults with newly identified depressive symptoms. Based on the initial screening, a total of 100 participants (n = 50 euthymic, n = 50 depressive) underwent 32-channel EEG acquisition. Simple logistic regression and C-statistic were used to explore if EEG power could be used to discriminate between the groups. The strongest EEG predictors of mood using multivariate logistic regression models. Simple logistic regression analysis with subsequent C-statistics revealed that only high-alpha and beta power originating from the left central cortex (C3) have a reliable discriminative value (ROC curve >0.7 (70%)) for differentiating the depressive group from the euthymic group. Multivariate regression analysis showed that the single most significant predictor of group (depressive vs. euthymic) is the high-alpha power over C3 (p = 0.03). The present findings suggest that EEG is a useful tool in the identification of neurophysiological correlates of depressive symptoms in young adults with no previous psychiatric history. Our results could guide future studies investigating the early neurophysiological changes and surrogate outcomes in depression. Copyright © 2017 Elsevier Ltd. All rights reserved.
Li, Siyue; Zhang, Quanfa
2010-04-15
A data matrix (4032 observations), obtained during a 2-year monitoring period (2005-2006) from 42 sites in the upper Han River is subjected to various multivariate statistical techniques including cluster analysis, principal component analysis (PCA), factor analysis (FA), correlation analysis and analysis of variance to determine the spatial characterization of dissolved trace elements and heavy metals. Our results indicate that waters in the upper Han River are primarily polluted by Al, As, Cd, Pb, Sb and Se, and the potential pollutants include Ba, Cr, Hg, Mn and Ni. Spatial distribution of trace metals indicates the polluted sections mainly concentrate in the Danjiang, Danjiangkou Reservoir catchment and Hanzhong Plain, and the most contaminated river is in the Hanzhong Plain. Q-model clustering depends on geographical location of sampling sites and groups the 42 sampling sites into four clusters, i.e., Danjiang, Danjiangkou Reservoir region (lower catchment), upper catchment and one river in headwaters pertaining to water quality. The headwaters, Danjiang and lower catchment, and upper catchment correspond to very high polluted, moderate polluted and relatively low polluted regions, respectively. Additionally, PCA/FA and correlation analysis demonstrates that Al, Cd, Mn, Ni, Fe, Si and Sr are controlled by natural sources, whereas the other metals appear to be primarily controlled by anthropogenic origins though geogenic source contributing to them. 2009 Elsevier B.V. All rights reserved.
Bhattacharya, Monisha; Isvaran, Kavita; Balakrishnan, Rohini
2017-04-01
In acoustically communicating animals, reproductive isolation between sympatric species is usually maintained through species-specific calls. This requires that the receiver be tuned to the conspecific signal. Mapping the response space of the receiver onto the signal space of the conspecific investigates this tuning. A combinatorial approach to investigating the response space is more informative as the influence on the receiver of the interactions between the features is also elucidated. However, most studies have examined individual preference functions rather than the multivariate response space. We studied the maintenance of reproductive isolation between two sympatric tree cricket species ( Oecanthus henryi and Oecanthus indicus ) through the temporal features of the calls. Individual response functions were determined experimentally for O. henryi , the results from which were combined in a statistical framework to generate a multivariate quantitative receiver response space. The predicted response was higher for the signals of the conspecific than for signals of the sympatric heterospecific, indicating maintenance of reproductive isolation through songs. The model allows prediction of response to untested combinations of temporal features as well as delineation of the evolutionary constraints on the signal space. The model can also be used to predict the response of O. henryi to other heterospecific signals, making it a useful tool for the study of the evolution and maintenance of reproductive isolation via long-range acoustic signals. © 2017. Published by The Company of Biologists Ltd.
Giuca, Maria Rita; Cappè, Maria; Carli, Elisabetta; Lardani, Lisa
2018-01-01
Aim The purpose of the present study was to evaluate the clinical defects and etiological factors potentially involved in the onset of MIH in a pediatric sample. Methods 120 children, selected from the university dental clinic, were included: 60 children (25 boys and 35 girls; average age: 9.8 ± 1.8 years) with MIH formed the test group and 60 children (27 boys and 33 girls; average age: 10.1 ± 2 years) without MIH constituted the control group. Distribution and severity of MIH defects were evaluated, and a questionnaire was used to investigate the etiological variables; chi-square, univariate, and multivariate statistical tests were performed (significance level set at p < 0.05). Results A total of 186 molars and 98 incisors exhibited MIH defects: 55 molars and 75 incisors showed mild defects, 91 molars and 20 incisors had moderate lesions, and 40 molars and 3 incisors showed severe lesions. Univariate and multivariate statistical analysis showed a significant association (p < 0.05) between MIH and ear, nose, and throat (ENT) disorders and the antibiotics used during pregnancy (0.019). Conclusions Moderate defects were more frequent in the molars, while mild lesions were more frequent in the incisors. Antibiotics used during pregnancy and ENT may be directly involved in the etiology of MIH in children. PMID:29861729
Quality control for quantitative PCR based on amplification compatibility test.
Tichopad, Ales; Bar, Tzachi; Pecen, Ladislav; Kitchen, Robert R; Kubista, Mikael; Pfaffl, Michael W
2010-04-01
Quantitative qPCR is a routinely used method for the accurate quantification of nucleic acids. Yet it may generate erroneous results if the amplification process is obscured by inhibition or generation of aberrant side-products such as primer dimers. Several methods have been established to control for pre-processing performance that rely on the introduction of a co-amplified reference sequence, however there is currently no method to allow for reliable control of the amplification process without directly modifying the sample mix. Herein we present a statistical approach based on multivariate analysis of the amplification response data generated in real-time. The amplification trajectory in its most resolved and dynamic phase is fitted with a suitable model. Two parameters of this model, related to amplification efficiency, are then used for calculation of the Z-score statistics. Each studied sample is compared to a predefined reference set of reactions, typically calibration reactions. A probabilistic decision for each individual Z-score is then used to identify the majority of inhibited reactions in our experiments. We compare this approach to univariate methods using only the sample specific amplification efficiency as reporter of the compatibility. We demonstrate improved identification performance using the multivariate approach compared to the univariate approach. Finally we stress that the performance of the amplification compatibility test as a quality control procedure depends on the quality of the reference set. Copyright 2010 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Waubke, Holger; Kasess, Christian H.
2016-11-01
Devices that emit structure-borne sound are commonly decoupled by elastic components to shield the environment from acoustical noise and vibrations. The elastic elements often have a hysteretic behavior that is typically neglected. In order to take hysteretic behavior into account, Bouc developed a differential equation for such materials, especially joints made of rubber or equipped with dampers. In this work, the Bouc model is solved by means of the Gaussian closure technique based on the Kolmogorov equation. Kolmogorov developed a method to derive probability density functions for arbitrary explicit first-order vector differential equations under white noise excitation using a partial differential equation of a multivariate conditional probability distribution. Up to now no analytical solution of the Kolmogorov equation in conjunction with the Bouc model exists. Therefore a wide range of approximate solutions, especially the statistical linearization, were developed. Using the Gaussian closure technique that is an approximation to the Kolmogorov equation assuming a multivariate Gaussian distribution an analytic solution is derived in this paper for the Bouc model. For the stationary case the two methods yield equivalent results, however, in contrast to statistical linearization the presented solution allows to calculate the transient behavior explicitly. Further, stationary case leads to an implicit set of equations that can be solved iteratively with a small number of iterations and without instabilities for specific parameter sets.
2014-01-01
Background Maternal educational attainment has been associated with birth outcomes among adult mothers. However, limited research explores whether academic performance and educational aspiration influence birth outcomes among adolescent mothers. Methods Data from Waves I and IV of the National Longitudinal Study of Adolescent Health (Add Health) were used. Adolescent girls whose first pregnancy occurred after Wave I, during their adolescence, and ended with a singleton live birth were included. Adolescents’ grade point average (GPA), experience of ever skipping a grade and ever repeating a grade, and their aspiration to attend college were examined as predictors of birth outcomes (birthweight and gestational age; n = 763). Univariate statistics, bivariate analyses and multivariable models were run stratified on race using survey procedures. Results Among Black adolescents, those who ever skipped a grade had higher offspring’s birthweight. Among non-Black adolescents, ever skipping a grade and higher educational aspiration were associated with higher offspring’s birthweight; ever skipping a grade was also associated with higher gestational age. GPA was not statistically significantly associated with either birth outcome. The addition of smoking during pregnancy and prenatal care visit into the multivariable models did not change these associations. Conclusions Some indicators of higher academic performance and aspiration are associated with better birth outcomes among adolescents. Investing in improving educational opportunities may improve birth outcomes among teenage mothers. PMID:24422664
A Baseline for the Multivariate Comparison of Resting-State Networks
Allen, Elena A.; Erhardt, Erik B.; Damaraju, Eswar; Gruner, William; Segall, Judith M.; Silva, Rogers F.; Havlicek, Martin; Rachakonda, Srinivas; Fries, Jill; Kalyanam, Ravi; Michael, Andrew M.; Caprihan, Arvind; Turner, Jessica A.; Eichele, Tom; Adelsheim, Steven; Bryan, Angela D.; Bustillo, Juan; Clark, Vincent P.; Feldstein Ewing, Sarah W.; Filbey, Francesca; Ford, Corey C.; Hutchison, Kent; Jung, Rex E.; Kiehl, Kent A.; Kodituwakku, Piyadasa; Komesu, Yuko M.; Mayer, Andrew R.; Pearlson, Godfrey D.; Phillips, John P.; Sadek, Joseph R.; Stevens, Michael; Teuscher, Ursina; Thoma, Robert J.; Calhoun, Vince D.
2011-01-01
As the size of functional and structural MRI datasets expands, it becomes increasingly important to establish a baseline from which diagnostic relevance may be determined, a processing strategy that efficiently prepares data for analysis, and a statistical approach that identifies important effects in a manner that is both robust and reproducible. In this paper, we introduce a multivariate analytic approach that optimizes sensitivity and reduces unnecessary testing. We demonstrate the utility of this mega-analytic approach by identifying the effects of age and gender on the resting-state networks (RSNs) of 603 healthy adolescents and adults (mean age: 23.4 years, range: 12–71 years). Data were collected on the same scanner, preprocessed using an automated analysis pipeline based in SPM, and studied using group independent component analysis. RSNs were identified and evaluated in terms of three primary outcome measures: time course spectral power, spatial map intensity, and functional network connectivity. Results revealed robust effects of age on all three outcome measures, largely indicating decreases in network coherence and connectivity with increasing age. Gender effects were of smaller magnitude but suggested stronger intra-network connectivity in females and more inter-network connectivity in males, particularly with regard to sensorimotor networks. These findings, along with the analysis approach and statistical framework described here, provide a useful baseline for future investigations of brain networks in health and disease. PMID:21442040
Clinical validation of robot simulation of toothbrushing--comparative plaque removal efficacy.
Lang, Tomas; Staufer, Sebastian; Jennes, Barbara; Gaengler, Peter
2014-07-04
Clinical validation of laboratory toothbrushing tests has important advantages. It was, therefore, the aim to demonstrate correlation of tooth cleaning efficiency of a new robot brushing simulation technique with clinical plaque removal. Clinical programme: 27 subjects received dental cleaning prior to 3-day-plaque-regrowth-interval. Plaque was stained, photographically documented and scored using planimetrical index. Subjects brushed teeth 33-47 with three techniques (horizontal, rotating, vertical), each for 20s buccally and for 20s orally in 3 consecutive intervals. The force was calibrated, the brushing technique was video supported. Two different brushes were randomly assigned to the subject. Robot programme: Clinical brushing programmes were transfered to a 6-axis-robot. Artificial teeth 33-47 were covered with plaque-simulating substrate. All brushing techniques were repeated 7 times, results were scored according to clinical planimetry. All data underwent statistical analysis by t-test, U-test and multivariate analysis. The individual clinical cleaning patterns are well reproduced by the robot programmes. Differences in plaque removal are statistically significant for the two brushes, reproduced in clinical and robot data. Multivariate analysis confirms the higher cleaning efficiency for anterior teeth and for the buccal sites. The robot tooth brushing simulation programme showed good correlation with clinically standardized tooth brushing.This new robot brushing simulation programme can be used for rapid, reproducible laboratory testing of tooth cleaning.
Multivariate analysis of fears in dental phobic patients according to a reduced FSS-II scale.
Hakeberg, M; Gustafsson, J E; Berggren, U; Carlsson, S G
1995-10-01
This study analyzed and assessed dimensions of a questionnaire developed to measure general fears and phobias. A previous factor analysis among 109 dental phobics had revealed a five-factor structure with 22 items and an explained total variance of 54%. The present study analyzed the same material using a multivariate statistical procedure (LISREL) to reveal structural latent variables. The LISREL analysis, based on the correlation matrix, yielded a chi-square of 216.6 with 195 degrees of freedom (P = 0.138) and showed a model with seven latent variables. One was a general fear factor correlated to all 22 items. The other six factors concerned "Illness & Death" (5 items), "Failures & Embarrassment" (5 items), "Social situations" (5 items), "Physical injuries" (4 items), "Animals & Natural phenomena" (4 items). One item (opposite sex) was included in both "Failures & Embarrassment" and "Social situations". The last factor, "Social interaction", combined all the items in "Failures & Embarrassment" and "Social situations" (9 items). In conclusion, this multivariate statistical analysis (LISREL) revealed and confirmed a factor structure similar to our previous study, but added two important dimensions not shown with a traditional factor analysis. This reduced FSS-II version measures general fears and phobias and may be used on a routine clinical basis as well as in dental phobia research.
FGWAS: Functional genome wide association analysis.
Huang, Chao; Thompson, Paul; Wang, Yalin; Yu, Yang; Zhang, Jingwen; Kong, Dehan; Colen, Rivka R; Knickmeyer, Rebecca C; Zhu, Hongtu
2017-10-01
Functional phenotypes (e.g., subcortical surface representation), which commonly arise in imaging genetic studies, have been used to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. However, existing statistical methods largely ignore the functional features (e.g., functional smoothness and correlation). The aim of this paper is to develop a functional genome-wide association analysis (FGWAS) framework to efficiently carry out whole-genome analyses of functional phenotypes. FGWAS consists of three components: a multivariate varying coefficient model, a global sure independence screening procedure, and a test procedure. Compared with the standard multivariate regression model, the multivariate varying coefficient model explicitly models the functional features of functional phenotypes through the integration of smooth coefficient functions and functional principal component analysis. Statistically, compared with existing methods for genome-wide association studies (GWAS), FGWAS can substantially boost the detection power for discovering important genetic variants influencing brain structure and function. Simulation studies show that FGWAS outperforms existing GWAS methods for searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. We have successfully applied FGWAS to large-scale analysis of data from the Alzheimer's Disease Neuroimaging Initiative for 708 subjects, 30,000 vertices on the left and right hippocampal surfaces, and 501,584 SNPs. Copyright © 2017 Elsevier Inc. All rights reserved.
Assessment and statistics of surgically induced astigmatism.
Naeser, Kristian
2008-05-01
The aim of the thesis was to develop methods for assessment of surgically induced astigmatism (SIA) in individual eyes, and in groups of eyes. The thesis is based on 12 peer-reviewed publications, published over a period of 16 years. In these publications older and contemporary literature was reviewed(1). A new method (the polar system) for analysis of SIA was developed. Multivariate statistical analysis of refractive data was described(2-4). Clinical validation studies were performed. The description of a cylinder surface with polar values and differential geometry was compared. The main results were: refractive data in the form of sphere, cylinder and axis may define an individual patient or data set, but are unsuited for mathematical and statistical analyses(1). The polar value system converts net astigmatisms to orthonormal components in dioptric space. A polar value is the difference in meridional power between two orthogonal meridians(5,6). Any pair of polar values, separated by an arch of 45 degrees, characterizes a net astigmatism completely(7). The two polar values represent the net curvital and net torsional power over the chosen meridian(8). The spherical component is described by the spherical equivalent power. Several clinical studies demonstrated the efficiency of multivariate statistical analysis of refractive data(4,9-11). Polar values and formal differential geometry describe astigmatic surfaces with similar concepts and mathematical functions(8). Other contemporary methods, such as Long's power matrix, Holladay's and Alpins' methods, Zernike(12) and Fourier analyses(8), are correlated to the polar value system. In conclusion, analysis of SIA should be performed with polar values or other contemporary component systems. The study was supported by Statens Sundhedsvidenskabeligt Forskningsråd, Cykelhandler P. Th. Rasmussen og Hustrus Mindelegat, Hotelejer Carl Larsen og Hustru Nicoline Larsens Mindelegat, Landsforeningen til Vaern om Synet, Forskningsinitiativet for Arhus Amt, Alcon Denmark, and Desirée and Niels Ydes Fond.
Winzer, Klaus-Jürgen; Buchholz, Anika; Schumacher, Martin; Sauerbrei, Willi
2016-01-01
Background Prognostic factors and prognostic models play a key role in medical research and patient management. The Nottingham Prognostic Index (NPI) is a well-established prognostic classification scheme for patients with breast cancer. In a very simple way, it combines the information from tumor size, lymph node stage and tumor grade. For the resulting index cutpoints are proposed to classify it into three to six groups with different prognosis. As not all prognostic information from the three and other standard factors is used, we will consider improvement of the prognostic ability using suitable analysis approaches. Methods and Findings Reanalyzing overall survival data of 1560 patients from a clinical database by using multivariable fractional polynomials and further modern statistical methods we illustrate suitable multivariable modelling and methods to derive and assess the prognostic ability of an index. Using a REMARK type profile we summarize relevant steps of the analysis. Adding the information from hormonal receptor status and using the full information from the three NPI components, specifically concerning the number of positive lymph nodes, an extended NPI with improved prognostic ability is derived. Conclusions The prognostic ability of even one of the best established prognostic index in medicine can be improved by using suitable statistical methodology to extract the full information from standard clinical data. This extended version of the NPI can serve as a benchmark to assess the added value of new information, ranging from a new single clinical marker to a derived index from omics data. An established benchmark would also help to harmonize the statistical analyses of such studies and protect against the propagation of many false promises concerning the prognostic value of new measurements. Statistical methods used are generally available and can be used for similar analyses in other diseases. PMID:26938061
An effective drift correction for dynamical downscaling of decadal global climate predictions
NASA Astrophysics Data System (ADS)
Paeth, Heiko; Li, Jingmin; Pollinger, Felix; Müller, Wolfgang A.; Pohlmann, Holger; Feldmann, Hendrik; Panitz, Hans-Jürgen
2018-04-01
Initialized decadal climate predictions with coupled climate models are often marked by substantial climate drifts that emanate from a mismatch between the climatology of the coupled model system and the data set used for initialization. While such drifts may be easily removed from the prediction system when analyzing individual variables, a major problem prevails for multivariate issues and, especially, when the output of the global prediction system shall be used for dynamical downscaling. In this study, we present a statistical approach to remove climate drifts in a multivariate context and demonstrate the effect of this drift correction on regional climate model simulations over the Euro-Atlantic sector. The statistical approach is based on an empirical orthogonal function (EOF) analysis adapted to a very large data matrix. The climate drift emerges as a dramatic cooling trend in North Atlantic sea surface temperatures (SSTs) and is captured by the leading EOF of the multivariate output from the global prediction system, accounting for 7.7% of total variability. The SST cooling pattern also imposes drifts in various atmospheric variables and levels. The removal of the first EOF effectuates the drift correction while retaining other components of intra-annual, inter-annual and decadal variability. In the regional climate model, the multivariate drift correction of the input data removes the cooling trends in most western European land regions and systematically reduces the discrepancy between the output of the regional climate model and observational data. In contrast, removing the drift only in the SST field from the global model has hardly any positive effect on the regional climate model.
Soltani, Shahla; Asghari Moghaddam, Asghar; Barzegar, Rahim; Kazemian, Naeimeh; Tziritis, Evangelos
2017-08-18
Kordkandi-Duzduzan plain is one of the fertile plains of East Azarbaijan Province, NW of Iran. Groundwater is an important resource for drinking and agricultural purposes due to the lack of surface water resources in the region. The main objectives of the present study are to identify the hydrogeochemical processes and the potential sources of major, minor, and trace metals and metalloids such as Cr, Mn, Cd, Fe, Al, and As by using joint hydrogeochemical techniques and multivariate statistical analysis and to evaluate groundwater quality deterioration with the use of PoS environmental index. To achieve these objectives, 23 groundwater samples were collected in September 2015. Piper diagram shows that the mixed Ca-Mg-Cl is the dominant groundwater type, and some of the samples have Ca-HCO 3 , Ca-Cl, and Na-Cl types. Multivariate statistical analyses indicate that weathering and dissolution of different rocks and minerals, e.g., silicates, gypsum, and halite, ion exchange, and agricultural activities influence the hydrogeochemistry of the study area. The cluster analysis divides the samples into two distinct clusters which are completely different in EC (and its dependent variables such as Na + , K + , Ca 2+ , Mg 2+ , SO 4 2- , and Cl - ), Cd, and Cr variables according to the ANOVA statistical test. Based on the median values, the concentrations of pH, NO 3 - , SiO 2 , and As in cluster 1 are elevated compared with those of cluster 2, while their maximum values occur in cluster 2. According to the PoS index, the dominant parameter that controls quality deterioration is As, with 60% of contribution. Samples of lowest PoS values are located in the southern and northern parts (recharge area) while samples of the highest values are located in the discharge area and the eastern part.
Multivariate Statistical Analysis of Water Quality data in Indian River Lagoon, Florida
NASA Astrophysics Data System (ADS)
Sayemuzzaman, M.; Ye, M.
2015-12-01
The Indian River Lagoon, is part of the longest barrier island complex in the United States, is a region of particular concern to the environmental scientist because of the rapid rate of human development throughout the region and the geographical position in between the colder temperate zone and warmer sub-tropical zone. Thus, the surface water quality analysis in this region always brings the newer information. In this present study, multivariate statistical procedures were applied to analyze the spatial and temporal water quality in the Indian River Lagoon over the period 1998-2013. Twelve parameters have been analyzed on twelve key water monitoring stations in and beside the lagoon on monthly datasets (total of 27,648 observations). The dataset was treated using cluster analysis (CA), principle component analysis (PCA) and non-parametric trend analysis. The CA was used to cluster twelve monitoring stations into four groups, with stations on the similar surrounding characteristics being in the same group. The PCA was then applied to the similar groups to find the important water quality parameters. The principal components (PCs), PC1 to PC5 was considered based on the explained cumulative variances 75% to 85% in each cluster groups. Nutrient species (phosphorus and nitrogen), salinity, specific conductivity and erosion factors (TSS, Turbidity) were major variables involved in the construction of the PCs. Statistical significant positive or negative trends and the abrupt trend shift were detected applying Mann-Kendall trend test and Sequential Mann-Kendall (SQMK), for each individual stations for the important water quality parameters. Land use land cover change pattern, local anthropogenic activities and extreme climate such as drought might be associated with these trends. This study presents the multivariate statistical assessment in order to get better information about the quality of surface water. Thus, effective pollution control/management of the surface waters can be undertaken.
Mathematics and statistics research department. Progress report, period ending June 30, 1981
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lever, W.E.; Kane, V.E.; Scott, D.S.
1981-09-01
This report is the twenty-fourth in the series of progress reports of the Mathematics and Statistics Research Department of the Computer Sciences Division, Union Carbide Corporation - Nuclear Division (UCC-ND). Part A records research progress in biometrics research, materials science applications, model evaluation, moving boundary problems, multivariate analysis, numerical linear algebra, risk analysis, and complementary areas. Collaboration and consulting with others throughout the UCC-ND complex are recorded in Part B. Included are sections on biology and health sciences, chemistry, energy, engineering, environmental sciences, health and safety research, materials sciences, safeguards, surveys, and uranium resource evaluation. Part C summarizes the variousmore » educational activities in which the staff was engaged. Part D lists the presentations of research results, and Part E records the staff's other professional activities during the report period.« less
NASA Astrophysics Data System (ADS)
Kerr, Laura T.; Adams, Aine; O'Dea, Shirley; Domijan, Katarina; Cullen, Ivor; Hennelly, Bryan M.
2014-05-01
Raman microspectroscopy can be applied to the urinary bladder for highly accurate classification and diagnosis of bladder cancer. This technique can be applied in vitro to bladder epithelial cells obtained from urine cytology or in vivo as an optical biopsy" to provide results in real-time with higher sensitivity and specificity than current clinical methods. However, there exists a high degree of variability across experimental parameters which need to be standardised before this technique can be utilized in an everyday clinical environment. In this study, we investigate different laser wavelengths (473 nm and 532 nm), sample substrates (glass, fused silica and calcium fluoride) and multivariate statistical methods in order to gain insight into how these various experimental parameters impact on the sensitivity and specificity of Raman cytology.
Multilingualism and fMRI: Longitudinal Study of Second Language Acquisition
Andrews, Edna; Frigau, Luca; Voyvodic-Casabo, Clara; Voyvodic, James; Wright, John
2013-01-01
BOLD fMRI is often used for the study of human language. However, there are still very few attempts to conduct longitudinal fMRI studies in the study of language acquisition by measuring auditory comprehension and reading. The following paper is the first in a series concerning a unique longitudinal study devoted to the analysis of bi- and multilingual subjects who are: (1) already proficient in at least two languages; or (2) are acquiring Russian as a second/third language. The focus of the current analysis is to present data from the auditory sections of a set of three scans acquired from April, 2011 through April, 2012 on a five-person subject pool who are learning Russian during the study. All subjects were scanned using the same protocol for auditory comprehension on the same General Electric LX 3T Signa scanner in Duke University Hospital. Using a multivariate analysis of covariance (MANCOVA) for statistical analysis, proficiency measurements are shown to correlate significantly with scan results in the Russian conditions over time. The importance of both the left and right hemispheres in language processing is discussed. Special attention is devoted to the importance of contextualizing imaging data with corresponding behavioral and empirical testing data using a multivariate analysis of variance. This is the only study to date that includes: (1) longitudinal fMRI data with subject-based proficiency and behavioral data acquired in the same time frame; and (2) statistical modeling that demonstrates the importance of covariate language proficiency data for understanding imaging results of language acquisition. PMID:24961428
Multilingualism and fMRI: Longitudinal Study of Second Language Acquisition.
Andrews, Edna; Frigau, Luca; Voyvodic-Casabo, Clara; Voyvodic, James; Wright, John
2013-05-28
BOLD fMRI is often used for the study of human language. However, there are still very few attempts to conduct longitudinal fMRI studies in the study of language acquisition by measuring auditory comprehension and reading. The following paper is the first in a series concerning a unique longitudinal study devoted to the analysis of bi- and multilingual subjects who are: (1) already proficient in at least two languages; or (2) are acquiring Russian as a second/third language. The focus of the current analysis is to present data from the auditory sections of a set of three scans acquired from April, 2011 through April, 2012 on a five-person subject pool who are learning Russian during the study. All subjects were scanned using the same protocol for auditory comprehension on the same General Electric LX 3T Signa scanner in Duke University Hospital. Using a multivariate analysis of covariance (MANCOVA) for statistical analysis, proficiency measurements are shown to correlate significantly with scan results in the Russian conditions over time. The importance of both the left and right hemispheres in language processing is discussed. Special attention is devoted to the importance of contextualizing imaging data with corresponding behavioral and empirical testing data using a multivariate analysis of variance. This is the only study to date that includes: (1) longitudinal fMRI data with subject-based proficiency and behavioral data acquired in the same time frame; and (2) statistical modeling that demonstrates the importance of covariate language proficiency data for understanding imaging results of language acquisition.
Guan, Qingyu; Wang, Feifei; Xu, Chuanqi; Pan, Ninghui; Lin, Jinkuo; Zhao, Rui; Yang, Yanyan; Luo, Haiping
2018-02-01
Hexi Corridor is the most important base of commodity grain and producing area for cash crops. However, the rapid development of agriculture and industry has inevitably led to heavy metal contamination in the soils. Multivariate statistical analysis, GIS-based geostatistical methods and Positive Matrix Factorization (PMF) receptor modeling techniques were used to understand the levels of heavy metals and their source apportionment for agricultural soil in Hexi Corridor. The results showed that the average concentrations of Cr, Cu, Ni, Pb and Zn were lower than the secondary standard of soil environmental quality; however, the concentrations of eight metals (Cr, Cu, Mn, Ni, Pb, Ti, V and Zn) were higher than background values, and their corresponding enrichment factor values were significantly greater than 1. Different degrees of heavy metal pollution occurred in the agricultural soils; specifically, Ni had the most potential for impacting human health. The results from the multivariate statistical analysis and GIS-based geostatistical methods indicated both natural sources (Co and W) and anthropogenic sources (Cr, Cu, Mn, Ni, Pb, Ti, V and Zn). To better identify pollution sources of heavy metals in the agricultural soils, the PMF model was applied. Further source apportionment revealed that enrichments of Pb and Zn were attributed to traffic sources; Cr and Ni were closely related to industrial activities, including mining, smelting, coal combustion, iron and steel production and metal processing; Zn and Cu originated from agricultural activities; and V, Ti and Mn were derived from oil- and coal-related activities. Copyright © 2017 Elsevier Ltd. All rights reserved.
Chan, Kar-Man; Yue, Grace Gar-Lee; Li, Ping; Wong, Eric Chun-Wai; Lee, Julia Kin-Ming; Kennelly, Edward J; Lau, Clara Bik-San
2017-03-03
According to Chinese Pharmacopoeia 2015 edition, Ganoderma (Lingzhi) is a species complex that comprise of Ganoderma lucidum and Ganoderma sinense. The bioactivity and chemical composition of G. lucidium had been studied extensively, and it was shown to possess antitumor activities in pharmacological studies. In contrast, G. sinense has not been studied in great detail. Our previous studies found that the stipe of G. sinense exhibited more potent antitumor activity than the pileus. To identify the antitumor compounds in the stipe of G. sinense, we studied its chemical components by merging the bioactivity results with liquid chromatography-mass spectrometry-based chemometrics. The stipe of G. sinense was extracted with water, followed by ethanol precipitation and liquid-liquid partition. The resulting residue was fractionated using column chromatography. The antitumor activity of these fractions were analysed using MTT assay in murine breast tumor 4T1 cells, and their chemical components were studied using the LC-QTOF-MS with multivariate statistical tools. The chemometric and MS/MS analysis correlated bioactivity with five known cytotoxic compounds, 4-hyroxyphenylacetate, 9-oxo-(10E,12E)-octadecadienoic acid, 3-phenyl-2-propenoic acid, 13-oxo-(9E,11E)-octadecadienoic acid and lingzhine C, from the stipe of G. sinense. To the best of our knowledge, 4-hyroxyphenylacetate, 3-phenyl-2-propenoic acid and lingzhine C are firstly reported to be found in G. sinense. These five compounds will be investigated for their antitumor activities in the future. Copyright © 2017 Elsevier B.V. All rights reserved.
Using Fisher information to track stability in multivariate systems
With the current proliferation of data, the proficient use of statistical and mining techniques offer substantial benefits to capture useful information from any dataset. As numerous approaches make use of information theory concepts, here, we discuss how Fisher information (FI...
Truu, Jaak; Heinaru, Eeva; Talpsep, Ene; Heinaru, Ain
2002-01-01
The oil-shale industry has created serious pollution problems in northeastern Estonia. Untreated, phenol-rich leachate from semi-coke mounds formed as a by-product of oil-shale processing is discharged into the Baltic Sea via channels and rivers. An exploratory analysis of water chemical and microbiological data sets from the low-flow period was carried out using different multivariate analysis techniques. Principal component analysis allowed us to distinguish different locations in the river system. The riverine microbial community response to water chemical parameters was assessed by co-inertia analysis. Water pH, COD and total nitrogen were negatively related to the number of biodegradative bacteria, while oxygen concentration promoted the abundance of these bacteria. The results demonstrate the utility of multivariate statistical techniques as tools for estimating the magnitude and extent of pollution based on river water chemical and microbiological parameters. An evaluation of river chemical and microbiological data suggests that the ambient natural attenuation mechanisms only partly eliminate pollutants from river water, and that a sufficient reduction of more recalcitrant compounds could be achieved through the reduction of wastewater discharge from the oil-shale chemical industry into the rivers.
Redo surgery risk in patients with cardiac prosthetic valve dysfunction
Maciejewski, Marek; Piestrzeniewicz, Katarzyna; Bielecka-Dąbrowa, Agata; Piechowiak, Monika; Jaszewski, Ryszard
2011-01-01
Introduction The aim of the study was to analyse the risk factors of early and late mortality in patients undergoing the first reoperation for prosthetic valve dysfunction. Material and methods A retrospective observational study was performed in 194 consecutive patients (M = 75, F = 119; mean age 53.2 ±11 years) with a mechanical prosthetic valve (n = 103 cases; 53%) or bioprosthesis (91; 47%). Univariate and multivariate Cox statistical analysis was performed to determine risk factors of early and late mortality. Results The overall early mortality was 18.6%: 31.4% in patients with symptoms of NYHA functional class III-IV and 3.4% in pts in NYHA class I-II. Multivariate analysis identified symptoms of NYHA class III-IV and endocarditis as independent predictors of early mortality. The overall late mortality (> 30 days) was 8.2% (0.62% year/patient). Multivariate analysis identified age at the time of reoperation as a strong independent predictor of late mortality. Conclusions Reoperation in patients with prosthetic valves, performed urgently, especially in patients with symptoms of NYHA class III-IV or in the case of endocarditis, bears a high mortality rate. Risk of planned reoperation, mostly in patients with symptoms of NYHA class I-II, does not differ from the risk of the first operation. PMID:22291767
Gauging Skills of Hospital Security Personnel: a Statistically-driven, Questionnaire-based Approach.
Rinkoo, Arvind Vashishta; Mishra, Shubhra; Rahesuddin; Nabi, Tauqeer; Chandra, Vidha; Chandra, Hem
2013-01-01
This study aims to gauge the technical and soft skills of the hospital security personnel so as to enable prioritization of their training needs. A cross sectional questionnaire based study was conducted in December 2011. Two separate predesigned and pretested questionnaires were used for gauging soft skills and technical skills of the security personnel. Extensive statistical analysis, including Multivariate Analysis (Pillai-Bartlett trace along with Multi-factorial ANOVA) and Post-hoc Tests (Bonferroni Test) was applied. The 143 participants performed better on the soft skills front with an average score of 6.43 and standard deviation of 1.40. The average technical skills score was 5.09 with a standard deviation of 1.44. The study avowed a need for formal hands on training with greater emphasis on technical skills. Multivariate analysis of the available data further helped in identifying 20 security personnel who should be prioritized for soft skills training and a group of 36 security personnel who should receive maximum attention during technical skills training. This statistically driven approach can be used as a prototype by healthcare delivery institutions worldwide, after situation specific customizations, to identify the training needs of any category of healthcare staff.
NASA Technical Reports Server (NTRS)
Crutcher, H. L.; Falls, L. W.
1976-01-01
Sets of experimentally determined or routinely observed data provide information about the past, present and, hopefully, future sets of similarly produced data. An infinite set of statistical models exists which may be used to describe the data sets. The normal distribution is one model. If it serves at all, it serves well. If a data set, or a transformation of the set, representative of a larger population can be described by the normal distribution, then valid statistical inferences can be drawn. There are several tests which may be applied to a data set to determine whether the univariate normal model adequately describes the set. The chi-square test based on Pearson's work in the late nineteenth and early twentieth centuries is often used. Like all tests, it has some weaknesses which are discussed in elementary texts. Extension of the chi-square test to the multivariate normal model is provided. Tables and graphs permit easier application of the test in the higher dimensions. Several examples, using recorded data, illustrate the procedures. Tests of maximum absolute differences, mean sum of squares of residuals, runs and changes of sign are included in these tests. Dimensions one through five with selected sample sizes 11 to 101 are used to illustrate the statistical tests developed.
A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants
Broadaway, K. Alaine; Cutler, David J.; Duncan, Richard; Moore, Jacob L.; Ware, Erin B.; Jhun, Min A.; Bielak, Lawrence F.; Zhao, Wei; Smith, Jennifer A.; Peyser, Patricia A.; Kardia, Sharon L.R.; Ghosh, Debashis; Epstein, Michael P.
2016-01-01
Increasing empirical evidence suggests that many genetic variants influence multiple distinct phenotypes. When cross-phenotype effects exist, multivariate association methods that consider pleiotropy are often more powerful than univariate methods that model each phenotype separately. Although several statistical approaches exist for testing cross-phenotype effects for common variants, there is a lack of similar tests for gene-based analysis of rare variants. In order to fill this important gap, we introduce a statistical method for cross-phenotype analysis of rare variants using a nonparametric distance-covariance approach that compares similarity in multivariate phenotypes to similarity in rare-variant genotypes across a gene. The approach can accommodate both binary and continuous phenotypes and further can adjust for covariates. Our approach yields a closed-form test whose significance can be evaluated analytically, thereby improving computational efficiency and permitting application on a genome-wide scale. We use simulated data to demonstrate that our method, which we refer to as the Gene Association with Multiple Traits (GAMuT) test, provides increased power over competing approaches. We also illustrate our approach using exome-chip data from the Genetic Epidemiology Network of Arteriopathy. PMID:26942286
Investigation of advanced UQ for CRUD prediction with VIPRE.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eldred, Michael Scott
2011-09-01
This document summarizes the results from a level 3 milestone study within the CASL VUQ effort. It demonstrates the application of 'advanced UQ,' in particular dimension-adaptive p-refinement for polynomial chaos and stochastic collocation. The study calculates statistics for several quantities of interest that are indicators for the formation of CRUD (Chalk River unidentified deposit), which can lead to CIPS (CRUD induced power shift). Stochastic expansion methods are attractive methods for uncertainty quantification due to their fast convergence properties. For smooth functions (i.e., analytic, infinitely-differentiable) in L{sup 2} (i.e., possessing finite variance), exponential convergence rates can be obtained under order refinementmore » for integrated statistical quantities of interest such as mean, variance, and probability. Two stochastic expansion methods are of interest: nonintrusive polynomial chaos expansion (PCE), which computes coefficients for a known basis of multivariate orthogonal polynomials, and stochastic collocation (SC), which forms multivariate interpolation polynomials for known coefficients. Within the DAKOTA project, recent research in stochastic expansion methods has focused on automated polynomial order refinement ('p-refinement') of expansions to support scalability to higher dimensional random input spaces [4, 3]. By preferentially refining only in the most important dimensions of the input space, the applicability of these methods can be extended from O(10{sup 0})-O(10{sup 1}) random variables to O(10{sup 2}) and beyond, depending on the degree of anisotropy (i.e., the extent to which randominput variables have differing degrees of influence on the statistical quantities of interest (QOIs)). Thus, the purpose of this study is to investigate the application of these adaptive stochastic expansion methods to the analysis of CRUD using the VIPRE simulation tools for two different plant models of differing random dimension, anisotropy, and smoothness.« less
Goldman, S A
1996-10-01
Neurotoxicity in relation to concomitant administration of lithium and neuroleptic drugs, particularly haloperidol, has been an ongoing issue. This study examined whether use of lithium with neuroleptic drugs enhances neurotoxicity leading to permanent sequelae. The Spontaneous Reporting System database of the United States Food and Drug Administration and extant literature were reviewed for spectrum cases of lithium/neuroleptic neurotoxicity. Groups taking lithium alone (Li), lithium/haloperidol (LiHal) and lithium/ nonhaloperidol neuroleptics (LiNeuro), each paired for recovery and sequelae, were established for 237 cases. Statistical analyses included pairwise comparisons of lithium levels using the Wilcoxon Rank Sum procedure and logistic regression to analyze the relationship between independent variables and development of sequelae. The Li and Li-Neuro groups showed significant statistical differences in median lithium levels between recovery and sequelae pairs, whereas the LiHal pair did not differ significantly. Lithium level was associated with sequelae development overall and within the Li and LiNeuro groups; no such association was evident in the LiHal group. On multivariable logistic regression analysis, lithium level and taking lithium/haloperidol were significant factors in the development of sequelae, with multiple possibly confounding factors (e.g., age, sex) not statistically significant. Multivariable logistic regression analyses with neuroleptic dose as five discrete dose ranges or actual dose did not show an association between development of sequelae and dose. Database limitations notwithstanding, the lack of apparent impact of serum lithium level on the development of sequelae in patients treated with haloperidol contrasts notably with results in the Li and LiNeuro groups. These findings may suggest a possible effect of pharmacodynamic factors in lithium/neuroleptic combination therapy.
Yeo, Heather; Niland, Joyce; Milne, Dana; ter Veer, Anna; Bekaii-Saab, Tanios; Farma, Jeffrey M.; Lai, Lily; Skibber, John M.; Small, William; Wilkinson, Neal; Schrag, Deborah
2015-01-01
Background: Laparoscopic colectomy has been shown to have equivalent oncologic outcomes to open colectomy for the management of colon cancer, but its adoption nationally has been slow. This study investigates the prevalence and factors associated with laparoscopic colorectal resection at National Comprehensive Cancer Network (NCCN) centers. Methods: Data on patients undergoing surgery for colon and rectal cancer at NCCN centers from 2005 to 2010 were obtained from chart review of medical records for the NCCN Outcomes Project and included information on socioeconomic status, insurance coverage, comorbidity, and physician-reported Eastern Cooperative Oncology Group (ECOG) performance status. Associations between receipt of minimally invasive surgery and patient and clinical variables were analyzed with univariate and multivariable logistic regression. All statistical tests were two-sided. Results: A total of 4032 patients, diagnosed between September 2005 and December 2010, underwent elective colon or rectal resection for cancer at NCCN centers. Median age of colon cancer patients was 62.6 years, and 49% were men. The percent of colon cancer patients treated with minimally invasive surgery (MIS) increased from 35% in 2006 to 51% in 2010 across all centers but varied statistically significantly between centers. On multivariable analysis, factors associated with minimally invasive surgery for colon cancer patients who had surgery at an NCCN institution were older age (P = .02), male sex (P = .006), fewer comorbidities (P ≤ .001), lower final T-stage (P < .001), median household income greater than or equal to $80000 (P < .001), ECOG performance status = 0 (P = .02), and NCCN institution (P ≤ .001). Conclusions: The use of MIS increased at NCCN centers. However, there was statistically significant variation in adoption of MIS technique among centers. PMID:25527640
2014-01-01
Objectives The purpose of the present study was to identify the association between presenteeism and long working hours, shiftwork, and occupational stress using representative national survey data on Korean workers. Methods We analyzed data from the second Korean Working Conditions Survey (KWCS), which was conducted in 2010, in which a total of 6,220 wage workers were analyzed. The study population included the economically active population aged above 15 years, and living in the Republic of Korea. We used the chi-squared test and multivariate logistic regression to test the statistical association between presenteeism and working hours, shiftwork, and occupational stress. Results Approximately 19% of the workers experienced presenteeism during the previous 12 months. Women had higher rates of presenteeism than men. We found a statistically significant dose–response relationship between working hours and presenteeism. Shift workers had a slightly higher rate of presenteeism than non-shift workers, but the difference was not statistically significant. Occupational stress, such as high job demand, lack of rewards, and inadequate social support, had a significant association with presenteeism. Conclusions The present study suggests that long working hours and occupational stress are significantly related to presenteeism. PMID:24661575