NASA Astrophysics Data System (ADS)
Safi, A.; Campanella, B.; Grifoni, E.; Legnaioli, S.; Lorenzetti, G.; Pagnotta, S.; Poggialini, F.; Ripoll-Seguer, L.; Hidalgo, M.; Palleschi, V.
2018-06-01
The introduction of multivariate calibration curve approach in Laser-Induced Breakdown Spectroscopy (LIBS) quantitative analysis has led to a general improvement of the LIBS analytical performances, since a multivariate approach allows to exploit the redundancy of elemental information that are typically present in a LIBS spectrum. Software packages implementing multivariate methods are available in the most diffused commercial and open source analytical programs; in most of the cases, the multivariate algorithms are robust against noise and operate in unsupervised mode. The reverse of the coin of the availability and ease of use of such packages is the (perceived) difficulty in assessing the reliability of the results obtained which often leads to the consideration of the multivariate algorithms as 'black boxes' whose inner mechanism is supposed to remain hidden to the user. In this paper, we will discuss the dangers of a 'black box' approach in LIBS multivariate analysis, and will discuss how to overcome them using the chemical-physical knowledge that is at the base of any LIBS quantitative analysis.
A non-iterative extension of the multivariate random effects meta-analysis.
Makambi, Kepher H; Seung, Hyunuk
2015-01-01
Multivariate methods in meta-analysis are becoming popular and more accepted in biomedical research despite computational issues in some of the techniques. A number of approaches, both iterative and non-iterative, have been proposed including the multivariate DerSimonian and Laird method by Jackson et al. (2010), which is non-iterative. In this study, we propose an extension of the method by Hartung and Makambi (2002) and Makambi (2001) to multivariate situations. A comparison of the bias and mean square error from a simulation study indicates that, in some circumstances, the proposed approach perform better than the multivariate DerSimonian-Laird approach. An example is presented to demonstrate the application of the proposed approach.
Sun, Hui; Wang, Huiyu; Zhang, Aihua; Yan, Guangli; Han, Ying; Li, Yuan; Wu, Xiuhong; Meng, Xiangcai; Wang, Xijun
2016-01-01
As herbal medicines have an important position in health care systems worldwide, their current assessment, and quality control are a major bottleneck. Cortex Phellodendri chinensis (CPC) and Cortex Phellodendri amurensis (CPA) are widely used in China, however, how to identify species of CPA and CPC has become urgent. In this study, multivariate analysis approach was performed to the investigation of chemical discrimination of CPA and CPC. Principal component analysis showed that two herbs could be separated clearly. The chemical markers such as berberine, palmatine, phellodendrine, magnoflorine, obacunone, and obaculactone were identified through the orthogonal partial least squared discriminant analysis, and were identified tentatively by the accurate mass of quadruple-time-of-flight mass spectrometry. A total of 29 components can be used as the chemical markers for discrimination of CPA and CPC. Of them, phellodenrine is significantly higher in CPC than that of CPA, whereas obacunone and obaculactone are significantly higher in CPA than that of CPC. The present study proves that multivariate analysis approach based chemical analysis greatly contributes to the investigation of CPA and CPC, and showed that the identified chemical markers as a whole should be used to discriminate the two herbal medicines, and simultaneously the results also provided chemical information for their quality assessment. Multivariate analysis approach was performed to the investigate the herbal medicineThe chemical markers were identified through multivariate analysis approachA total of 29 components can be used as the chemical markers. UPLC-Q/TOF-MS-based multivariate analysis method for the herbal medicine samples Abbreviations used: CPC: Cortex Phellodendri chinensis, CPA: Cortex Phellodendri amurensis, PCA: Principal component analysis, OPLS-DA: Orthogonal partial least squares discriminant analysis, BPI: Base peaks ion intensity.
Multivariate analysis: A statistical approach for computations
NASA Astrophysics Data System (ADS)
Michu, Sachin; Kaushik, Vandana
2014-10-01
Multivariate analysis is a type of multivariate statistical approach commonly used in, automotive diagnosis, education evaluating clusters in finance etc and more recently in the health-related professions. The objective of the paper is to provide a detailed exploratory discussion about factor analysis (FA) in image retrieval method and correlation analysis (CA) of network traffic. Image retrieval methods aim to retrieve relevant images from a collected database, based on their content. The problem is made more difficult due to the high dimension of the variable space in which the images are represented. Multivariate correlation analysis proposes an anomaly detection and analysis method based on the correlation coefficient matrix. Anomaly behaviors in the network include the various attacks on the network like DDOs attacks and network scanning.
Multivariate geometry as an approach to algal community analysis
Allen, T.F.H.; Skagen, S.
1973-01-01
Multivariate analyses are put in the context of more usual approaches to phycological investigations. The intuitive common-sense involved in methods of ordination, classification and discrimination are emphasised by simple geometric accounts which avoid jargon and matrix algebra. Warnings are given that artifacts result from technique abuses by the naive or over-enthusiastic. An analysis of a simple periphyton data set is presented as an example of the approach. Suggestions are made as to situations in phycological investigations, where the techniques could be appropriate. The discipline is reprimanded for its neglect of the multivariate approach.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loveday, D.L.; Craggs, C.
Box-Jenkins-based multivariate stochastic modeling is carried out using data recorded from a domestic heating system. The system comprises an air-source heat pump sited in the roof space of a house, solar assistance being provided by the conventional tile roof acting as a radiation absorber. Multivariate models are presented which illustrate the time-dependent relationships between three air temperatures - at external ambient, at entry to, and at exit from, the heat pump evaporator. Using a deterministic modeling approach, physical interpretations are placed on the results of the multivariate technique. It is concluded that the multivariate Box-Jenkins approach is a suitable techniquemore » for building thermal analysis. Application to multivariate Box-Jenkins approach is a suitable technique for building thermal analysis. Application to multivariate model-based control is discussed, with particular reference to building energy management systems. It is further concluded that stochastic modeling of data drawn from a short monitoring period offers a means of retrofitting an advanced model-based control system in existing buildings, which could be used to optimize energy savings. An approach to system simulation is suggested.« less
Multivariate Analysis and Machine Learning in Cerebral Palsy Research
Zhang, Jing
2017-01-01
Cerebral palsy (CP), a common pediatric movement disorder, causes the most severe physical disability in children. Early diagnosis in high-risk infants is critical for early intervention and possible early recovery. In recent years, multivariate analytic and machine learning (ML) approaches have been increasingly used in CP research. This paper aims to identify such multivariate studies and provide an overview of this relatively young field. Studies reviewed in this paper have demonstrated that multivariate analytic methods are useful in identification of risk factors, detection of CP, movement assessment for CP prediction, and outcome assessment, and ML approaches have made it possible to automatically identify movement impairments in high-risk infants. In addition, outcome predictors for surgical treatments have been identified by multivariate outcome studies. To make the multivariate and ML approaches useful in clinical settings, further research with large samples is needed to verify and improve these multivariate methods in risk factor identification, CP detection, movement assessment, and outcome evaluation or prediction. As multivariate analysis, ML and data processing technologies advance in the era of Big Data of this century, it is expected that multivariate analysis and ML will play a bigger role in improving the diagnosis and treatment of CP to reduce mortality and morbidity rates, and enhance patient care for children with CP. PMID:29312134
Multivariate Analysis and Machine Learning in Cerebral Palsy Research.
Zhang, Jing
2017-01-01
Cerebral palsy (CP), a common pediatric movement disorder, causes the most severe physical disability in children. Early diagnosis in high-risk infants is critical for early intervention and possible early recovery. In recent years, multivariate analytic and machine learning (ML) approaches have been increasingly used in CP research. This paper aims to identify such multivariate studies and provide an overview of this relatively young field. Studies reviewed in this paper have demonstrated that multivariate analytic methods are useful in identification of risk factors, detection of CP, movement assessment for CP prediction, and outcome assessment, and ML approaches have made it possible to automatically identify movement impairments in high-risk infants. In addition, outcome predictors for surgical treatments have been identified by multivariate outcome studies. To make the multivariate and ML approaches useful in clinical settings, further research with large samples is needed to verify and improve these multivariate methods in risk factor identification, CP detection, movement assessment, and outcome evaluation or prediction. As multivariate analysis, ML and data processing technologies advance in the era of Big Data of this century, it is expected that multivariate analysis and ML will play a bigger role in improving the diagnosis and treatment of CP to reduce mortality and morbidity rates, and enhance patient care for children with CP.
Comparison of connectivity analyses for resting state EEG data
NASA Astrophysics Data System (ADS)
Olejarczyk, Elzbieta; Marzetti, Laura; Pizzella, Vittorio; Zappasodi, Filippo
2017-06-01
Objective. In the present work, a nonlinear measure (transfer entropy, TE) was used in a multivariate approach for the analysis of effective connectivity in high density resting state EEG data in eyes open and eyes closed. Advantages of the multivariate approach in comparison to the bivariate one were tested. Moreover, the multivariate TE was compared to an effective linear measure, i.e. directed transfer function (DTF). Finally, the existence of a relationship between the information transfer and the level of brain synchronization as measured by phase synchronization value (PLV) was investigated. Approach. The comparison between the connectivity measures, i.e. bivariate versus multivariate TE, TE versus DTF, TE versus PLV, was performed by means of statistical analysis of indexes based on graph theory. Main results. The multivariate approach is less sensitive to false indirect connections with respect to the bivariate estimates. The multivariate TE differentiated better between eyes closed and eyes open conditions compared to DTF. Moreover, the multivariate TE evidenced non-linear phenomena in information transfer, which are not evidenced by the use of DTF. We also showed that the target of information flow, in particular the frontal region, is an area of greater brain synchronization. Significance. Comparison of different connectivity analysis methods pointed to the advantages of nonlinear methods, and indicated a relationship existing between the flow of information and the level of synchronization of the brain.
Dehesh, Tania; Zare, Najaf; Ayatollahi, Seyyed Mohammad Taghi
2015-01-01
Univariate meta-analysis (UM) procedure, as a technique that provides a single overall result, has become increasingly popular. Neglecting the existence of other concomitant covariates in the models leads to loss of treatment efficiency. Our aim was proposing four new approximation approaches for the covariance matrix of the coefficients, which is not readily available for the multivariate generalized least square (MGLS) method as a multivariate meta-analysis approach. We evaluated the efficiency of four new approaches including zero correlation (ZC), common correlation (CC), estimated correlation (EC), and multivariate multilevel correlation (MMC) on the estimation bias, mean square error (MSE), and 95% probability coverage of the confidence interval (CI) in the synthesis of Cox proportional hazard models coefficients in a simulation study. Comparing the results of the simulation study on the MSE, bias, and CI of the estimated coefficients indicated that MMC approach was the most accurate procedure compared to EC, CC, and ZC procedures. The precision ranking of the four approaches according to all above settings was MMC ≥ EC ≥ CC ≥ ZC. This study highlights advantages of MGLS meta-analysis on UM approach. The results suggested the use of MMC procedure to overcome the lack of information for having a complete covariance matrix of the coefficients.
Multivariate Meta-Analysis of Genetic Association Studies: A Simulation Study
Neupane, Binod; Beyene, Joseph
2015-01-01
In a meta-analysis with multiple end points of interests that are correlated between or within studies, multivariate approach to meta-analysis has a potential to produce more precise estimates of effects by exploiting the correlation structure between end points. However, under random-effects assumption the multivariate estimation is more complex (as it involves estimation of more parameters simultaneously) than univariate estimation, and sometimes can produce unrealistic parameter estimates. Usefulness of multivariate approach to meta-analysis of the effects of a genetic variant on two or more correlated traits is not well understood in the area of genetic association studies. In such studies, genetic variants are expected to roughly maintain Hardy-Weinberg equilibrium within studies, and also their effects on complex traits are generally very small to modest and could be heterogeneous across studies for genuine reasons. We carried out extensive simulation to explore the comparative performance of multivariate approach with most commonly used univariate inverse-variance weighted approach under random-effects assumption in various realistic meta-analytic scenarios of genetic association studies of correlated end points. We evaluated the performance with respect to relative mean bias percentage, and root mean square error (RMSE) of the estimate and coverage probability of corresponding 95% confidence interval of the effect for each end point. Our simulation results suggest that multivariate approach performs similarly or better than univariate method when correlations between end points within or between studies are at least moderate and between-study variation is similar or larger than average within-study variation for meta-analyses of 10 or more genetic studies. Multivariate approach produces estimates with smaller bias and RMSE especially for the end point that has randomly or informatively missing summary data in some individual studies, when the missing data in the endpoint are imputed with null effects and quite large variance. PMID:26196398
Multivariate Meta-Analysis of Genetic Association Studies: A Simulation Study.
Neupane, Binod; Beyene, Joseph
2015-01-01
In a meta-analysis with multiple end points of interests that are correlated between or within studies, multivariate approach to meta-analysis has a potential to produce more precise estimates of effects by exploiting the correlation structure between end points. However, under random-effects assumption the multivariate estimation is more complex (as it involves estimation of more parameters simultaneously) than univariate estimation, and sometimes can produce unrealistic parameter estimates. Usefulness of multivariate approach to meta-analysis of the effects of a genetic variant on two or more correlated traits is not well understood in the area of genetic association studies. In such studies, genetic variants are expected to roughly maintain Hardy-Weinberg equilibrium within studies, and also their effects on complex traits are generally very small to modest and could be heterogeneous across studies for genuine reasons. We carried out extensive simulation to explore the comparative performance of multivariate approach with most commonly used univariate inverse-variance weighted approach under random-effects assumption in various realistic meta-analytic scenarios of genetic association studies of correlated end points. We evaluated the performance with respect to relative mean bias percentage, and root mean square error (RMSE) of the estimate and coverage probability of corresponding 95% confidence interval of the effect for each end point. Our simulation results suggest that multivariate approach performs similarly or better than univariate method when correlations between end points within or between studies are at least moderate and between-study variation is similar or larger than average within-study variation for meta-analyses of 10 or more genetic studies. Multivariate approach produces estimates with smaller bias and RMSE especially for the end point that has randomly or informatively missing summary data in some individual studies, when the missing data in the endpoint are imputed with null effects and quite large variance.
A refined method for multivariate meta-analysis and meta-regression.
Jackson, Daniel; Riley, Richard D
2014-02-20
Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects' standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. Copyright © 2013 John Wiley & Sons, Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beppler, Christina L
2015-12-01
A new approach was created for studying energetic material degradation. This approach involved detecting and tentatively identifying non-volatile chemical species by liquid chromatography-mass spectrometry (LC-MS) with multivariate statistical data analysis that form as the CL-20 energetic material thermally degraded. Multivariate data analysis showed clear separation and clustering of samples based on sample group: either pristine or aged material. Further analysis showed counter-clockwise trends in the principal components analysis (PCA), a type of multivariate data analysis, Scores plots. These trends may indicate that there was a discrete shift in the chemical markers as the went from pristine to aged material, andmore » then again when the aged CL-20 mixed with a potentially incompatible material was thermally aged for 4, 6, or 9 months. This new approach to studying energetic material degradation should provide greater knowledge of potential degradation markers in these materials.« less
Huang, Jun; Kaul, Goldi; Cai, Chunsheng; Chatlapalli, Ramarao; Hernandez-Abad, Pedro; Ghosh, Krishnendu; Nagi, Arwinder
2009-12-01
To facilitate an in-depth process understanding, and offer opportunities for developing control strategies to ensure product quality, a combination of experimental design, optimization and multivariate techniques was integrated into the process development of a drug product. A process DOE was used to evaluate effects of the design factors on manufacturability and final product CQAs, and establish design space to ensure desired CQAs. Two types of analyses were performed to extract maximal information, DOE effect & response surface analysis and multivariate analysis (PCA and PLS). The DOE effect analysis was used to evaluate the interactions and effects of three design factors (water amount, wet massing time and lubrication time), on response variables (blend flow, compressibility and tablet dissolution). The design space was established by the combined use of DOE, optimization and multivariate analysis to ensure desired CQAs. Multivariate analysis of all variables from the DOE batches was conducted to study relationships between the variables and to evaluate the impact of material attributes/process parameters on manufacturability and final product CQAs. The integrated multivariate approach exemplifies application of QbD principles and tools to drug product and process development.
The Potential of Multivariate Analysis in Assessing Students' Attitude to Curriculum Subjects
ERIC Educational Resources Information Center
Gaotlhobogwe, Michael; Laugharne, Janet; Durance, Isabelle
2011-01-01
Background: Understanding student attitudes to curriculum subjects is central to providing evidence-based options to policy makers in education. Purpose: We illustrate how quantitative approaches used in the social sciences and based on multivariate analysis (categorical Principal Components Analysis, Clustering Analysis and General Linear…
NASA Astrophysics Data System (ADS)
Vittal, H.; Singh, Jitendra; Kumar, Pankaj; Karmakar, Subhankar
2015-06-01
In watershed management, flood frequency analysis (FFA) is performed to quantify the risk of flooding at different spatial locations and also to provide guidelines for determining the design periods of flood control structures. The traditional FFA was extensively performed by considering univariate scenario for both at-site and regional estimation of return periods. However, due to inherent mutual dependence of the flood variables or characteristics [i.e., peak flow (P), flood volume (V) and flood duration (D), which are random in nature], analysis has been further extended to multivariate scenario, with some restrictive assumptions. To overcome the assumption of same family of marginal density function for all flood variables, the concept of copula has been introduced. Although, the advancement from univariate to multivariate analyses drew formidable attention to the FFA research community, the basic limitation was that the analyses were performed with the implementation of only parametric family of distributions. The aim of the current study is to emphasize the importance of nonparametric approaches in the field of multivariate FFA; however, the nonparametric distribution may not always be a good-fit and capable of replacing well-implemented multivariate parametric and multivariate copula-based applications. Nevertheless, the potential of obtaining best-fit using nonparametric distributions might be improved because such distributions reproduce the sample's characteristics, resulting in more accurate estimations of the multivariate return period. Hence, the current study shows the importance of conjugating multivariate nonparametric approach with multivariate parametric and copula-based approaches, thereby results in a comprehensive framework for complete at-site FFA. Although the proposed framework is designed for at-site FFA, this approach can also be applied to regional FFA because regional estimations ideally include at-site estimations. The framework is based on the following steps: (i) comprehensive trend analysis to assess nonstationarity in the observed data; (ii) selection of the best-fit univariate marginal distribution with a comprehensive set of parametric and nonparametric distributions for the flood variables; (iii) multivariate frequency analyses with parametric, copula-based and nonparametric approaches; and (iv) estimation of joint and various conditional return periods. The proposed framework for frequency analysis is demonstrated using 110 years of observed data from Allegheny River at Salamanca, New York, USA. The results show that for both univariate and multivariate cases, the nonparametric Gaussian kernel provides the best estimate. Further, we perform FFA for twenty major rivers over continental USA, which shows for seven rivers, all the flood variables followed nonparametric Gaussian kernel; whereas for other rivers, parametric distributions provide the best-fit either for one or two flood variables. Thus the summary of results shows that the nonparametric method cannot substitute the parametric and copula-based approaches, but should be considered during any at-site FFA to provide the broadest choices for best estimation of the flood return periods.
Multivariate Analysis of Schools and Educational Policy.
ERIC Educational Resources Information Center
Kiesling, Herbert J.
This report describes a multivariate analysis technique that approaches the problems of educational production function analysis by (1) using comparable measures of output across large experiments, (2) accounting systematically for differences in socioeconomic background, and (3) treating the school as a complete system in which different…
A refined method for multivariate meta-analysis and meta-regression
Jackson, Daniel; Riley, Richard D
2014-01-01
Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects’ standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:23996351
Multivariate Methods for Meta-Analysis of Genetic Association Studies.
Dimou, Niki L; Pantavou, Katerina G; Braliou, Georgia G; Bagos, Pantelis G
2018-01-01
Multivariate meta-analysis of genetic association studies and genome-wide association studies has received a remarkable attention as it improves the precision of the analysis. Here, we review, summarize and present in a unified framework methods for multivariate meta-analysis of genetic association studies and genome-wide association studies. Starting with the statistical methods used for robust analysis and genetic model selection, we present in brief univariate methods for meta-analysis and we then scrutinize multivariate methodologies. Multivariate models of meta-analysis for a single gene-disease association studies, including models for haplotype association studies, multiple linked polymorphisms and multiple outcomes are discussed. The popular Mendelian randomization approach and special cases of meta-analysis addressing issues such as the assumption of the mode of inheritance, deviation from Hardy-Weinberg Equilibrium and gene-environment interactions are also presented. All available methods are enriched with practical applications and methodologies that could be developed in the future are discussed. Links for all available software implementing multivariate meta-analysis methods are also provided.
Voxelwise multivariate analysis of multimodality magnetic resonance imaging
Naylor, Melissa G.; Cardenas, Valerie A.; Tosun, Duygu; Schuff, Norbert; Weiner, Michael; Schwartzman, Armin
2015-01-01
Most brain magnetic resonance imaging (MRI) studies concentrate on a single MRI contrast or modality, frequently structural MRI. By performing an integrated analysis of several modalities, such as structural, perfusion-weighted, and diffusion-weighted MRI, new insights may be attained to better understand the underlying processes of brain diseases. We compare two voxelwise approaches: (1) fitting multiple univariate models, one for each outcome and then adjusting for multiple comparisons among the outcomes and (2) fitting a multivariate model. In both cases, adjustment for multiple comparisons is performed over all voxels jointly to account for the search over the brain. The multivariate model is able to account for the multiple comparisons over outcomes without assuming independence because the covariance structure between modalities is estimated. Simulations show that the multivariate approach is more powerful when the outcomes are correlated and, even when the outcomes are independent, the multivariate approach is just as powerful or more powerful when at least two outcomes are dependent on predictors in the model. However, multiple univariate regressions with Bonferroni correction remains a desirable alternative in some circumstances. To illustrate the power of each approach, we analyze a case control study of Alzheimer's disease, in which data from three MRI modalities are available. PMID:23408378
Drunk driving detection based on classification of multivariate time series.
Li, Zhenlong; Jin, Xue; Zhao, Xiaohua
2015-09-01
This paper addresses the problem of detecting drunk driving based on classification of multivariate time series. First, driving performance measures were collected from a test in a driving simulator located in the Traffic Research Center, Beijing University of Technology. Lateral position and steering angle were used to detect drunk driving. Second, multivariate time series analysis was performed to extract the features. A piecewise linear representation was used to represent multivariate time series. A bottom-up algorithm was then employed to separate multivariate time series. The slope and time interval of each segment were extracted as the features for classification. Third, a support vector machine classifier was used to classify driver's state into two classes (normal or drunk) according to the extracted features. The proposed approach achieved an accuracy of 80.0%. Drunk driving detection based on the analysis of multivariate time series is feasible and effective. The approach has implications for drunk driving detection. Copyright © 2015 Elsevier Ltd and National Safety Council. All rights reserved.
Structural analysis and design of multivariable control systems: An algebraic approach
NASA Technical Reports Server (NTRS)
Tsay, Yih Tsong; Shieh, Leang-San; Barnett, Stephen
1988-01-01
The application of algebraic system theory to the design of controllers for multivariable (MV) systems is explored analytically using an approach based on state-space representations and matrix-fraction descriptions. Chapters are devoted to characteristic lambda matrices and canonical descriptions of MIMO systems; spectral analysis, divisors, and spectral factors of nonsingular lambda matrices; feedback control of MV systems; and structural decomposition theories and their application to MV control systems.
Multivariate missing data in hydrology - Review and applications
NASA Astrophysics Data System (ADS)
Ben Aissia, Mohamed-Aymen; Chebana, Fateh; Ouarda, Taha B. M. J.
2017-12-01
Water resources planning and management require complete data sets of a number of hydrological variables, such as flood peaks and volumes. However, hydrologists are often faced with the problem of missing data (MD) in hydrological databases. Several methods are used to deal with the imputation of MD. During the last decade, multivariate approaches have gained popularity in the field of hydrology, especially in hydrological frequency analysis (HFA). However, treating the MD remains neglected in the multivariate HFA literature whereas the focus has been mainly on the modeling component. For a complete analysis and in order to optimize the use of data, MD should also be treated in the multivariate setting prior to modeling and inference. Imputation of MD in the multivariate hydrological framework can have direct implications on the quality of the estimation. Indeed, the dependence between the series represents important additional information that can be included in the imputation process. The objective of the present paper is to highlight the importance of treating MD in multivariate hydrological frequency analysis by reviewing and applying multivariate imputation methods and by comparing univariate and multivariate imputation methods. An application is carried out for multiple flood attributes on three sites in order to evaluate the performance of the different methods based on the leave-one-out procedure. The results indicate that, the performance of imputation methods can be improved by adopting the multivariate setting, compared to mean substitution and interpolation methods, especially when using the copula-based approach.
ERIC Educational Resources Information Center
Bejar, Isaac I.
1981-01-01
Effects of nutritional supplementation on physical development of malnourished children was analyzed by univariate and multivariate methods for the analysis of repeated measures. Results showed that the nutritional treatment was successful, but it was necessary to resort to the multivariate approach. (Author/GK)
Applied Statistics: From Bivariate through Multivariate Techniques [with CD-ROM
ERIC Educational Resources Information Center
Warner, Rebecca M.
2007-01-01
This book provides a clear introduction to widely used topics in bivariate and multivariate statistics, including multiple regression, discriminant analysis, MANOVA, factor analysis, and binary logistic regression. The approach is applied and does not require formal mathematics; equations are accompanied by verbal explanations. Students are asked…
Voxelwise multivariate analysis of multimodality magnetic resonance imaging.
Naylor, Melissa G; Cardenas, Valerie A; Tosun, Duygu; Schuff, Norbert; Weiner, Michael; Schwartzman, Armin
2014-03-01
Most brain magnetic resonance imaging (MRI) studies concentrate on a single MRI contrast or modality, frequently structural MRI. By performing an integrated analysis of several modalities, such as structural, perfusion-weighted, and diffusion-weighted MRI, new insights may be attained to better understand the underlying processes of brain diseases. We compare two voxelwise approaches: (1) fitting multiple univariate models, one for each outcome and then adjusting for multiple comparisons among the outcomes and (2) fitting a multivariate model. In both cases, adjustment for multiple comparisons is performed over all voxels jointly to account for the search over the brain. The multivariate model is able to account for the multiple comparisons over outcomes without assuming independence because the covariance structure between modalities is estimated. Simulations show that the multivariate approach is more powerful when the outcomes are correlated and, even when the outcomes are independent, the multivariate approach is just as powerful or more powerful when at least two outcomes are dependent on predictors in the model. However, multiple univariate regressions with Bonferroni correction remain a desirable alternative in some circumstances. To illustrate the power of each approach, we analyze a case control study of Alzheimer's disease, in which data from three MRI modalities are available. Copyright © 2013 Wiley Periodicals, Inc.
Mapping Informative Clusters in a Hierarchial Framework of fMRI Multivariate Analysis
Xu, Rui; Zhen, Zonglei; Liu, Jia
2010-01-01
Pattern recognition methods have become increasingly popular in fMRI data analysis, which are powerful in discriminating between multi-voxel patterns of brain activities associated with different mental states. However, when they are used in functional brain mapping, the location of discriminative voxels varies significantly, raising difficulties in interpreting the locus of the effect. Here we proposed a hierarchical framework of multivariate approach that maps informative clusters rather than voxels to achieve reliable functional brain mapping without compromising the discriminative power. In particular, we first searched for local homogeneous clusters that consisted of voxels with similar response profiles. Then, a multi-voxel classifier was built for each cluster to extract discriminative information from the multi-voxel patterns. Finally, through multivariate ranking, outputs from the classifiers were served as a multi-cluster pattern to identify informative clusters by examining interactions among clusters. Results from both simulated and real fMRI data demonstrated that this hierarchical approach showed better performance in the robustness of functional brain mapping than traditional voxel-based multivariate methods. In addition, the mapped clusters were highly overlapped for two perceptually equivalent object categories, further confirming the validity of our approach. In short, the hierarchical framework of multivariate approach is suitable for both pattern classification and brain mapping in fMRI studies. PMID:21152081
Analyzing Multiple Outcomes in Clinical Research Using Multivariate Multilevel Models
Baldwin, Scott A.; Imel, Zac E.; Braithwaite, Scott R.; Atkins, David C.
2014-01-01
Objective Multilevel models have become a standard data analysis approach in intervention research. Although the vast majority of intervention studies involve multiple outcome measures, few studies use multivariate analysis methods. The authors discuss multivariate extensions to the multilevel model that can be used by psychotherapy researchers. Method and Results Using simulated longitudinal treatment data, the authors show how multivariate models extend common univariate growth models and how the multivariate model can be used to examine multivariate hypotheses involving fixed effects (e.g., does the size of the treatment effect differ across outcomes?) and random effects (e.g., is change in one outcome related to change in the other?). An online supplemental appendix provides annotated computer code and simulated example data for implementing a multivariate model. Conclusions Multivariate multilevel models are flexible, powerful models that can enhance clinical research. PMID:24491071
Multivariate meta-analysis: a robust approach based on the theory of U-statistic.
Ma, Yan; Mazumdar, Madhu
2011-10-30
Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.
Multivariate longitudinal data analysis with censored and intermittent missing responses.
Lin, Tsung-I; Lachos, Victor H; Wang, Wan-Lun
2018-05-08
The multivariate linear mixed model (MLMM) has emerged as an important analytical tool for longitudinal data with multiple outcomes. However, the analysis of multivariate longitudinal data could be complicated by the presence of censored measurements because of a detection limit of the assay in combination with unavoidable missing values arising when subjects miss some of their scheduled visits intermittently. This paper presents a generalization of the MLMM approach, called the MLMM-CM, for a joint analysis of the multivariate longitudinal data with censored and intermittent missing responses. A computationally feasible expectation maximization-based procedure is developed to carry out maximum likelihood estimation within the MLMM-CM framework. Moreover, the asymptotic standard errors of fixed effects are explicitly obtained via the information-based method. We illustrate our methodology by using simulated data and a case study from an AIDS clinical trial. Experimental results reveal that the proposed method is able to provide more satisfactory performance as compared with the traditional MLMM approach. Copyright © 2018 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Tchumtchoua, Sylvie; Dey, Dipak K.
2012-01-01
This paper proposes a semiparametric Bayesian framework for the analysis of associations among multivariate longitudinal categorical variables in high-dimensional data settings. This type of data is frequent, especially in the social and behavioral sciences. A semiparametric hierarchical factor analysis model is developed in which the…
Sciutto, Giorgia; Oliveri, Paolo; Catelli, Emilio; Bonacini, Irene
2017-01-01
In the field of applied researches in heritage science, the use of multivariate approach is still quite limited and often chemometric results obtained are often underinterpreted. Within this scenario, the present paper is aimed at disseminating the use of suitable multivariate methodologies and proposes a procedural workflow applied on a representative group of case studies, of considerable importance for conservation purposes, as a sort of guideline on the processing and on the interpretation of this FTIR data. Initially, principal component analysis (PCA) is performed and the score values are converted into chemical maps. Successively, the brushing approach is applied, demonstrating its usefulness for a deep understanding of the relationships between the multivariate map and PC score space, as well as for the identification of the spectral bands mainly involved in the definition of each area localised within the score maps. PMID:29333162
Motegi, Hiromi; Tsuboi, Yuuri; Saga, Ayako; Kagami, Tomoko; Inoue, Maki; Toki, Hideaki; Minowa, Osamu; Noda, Tetsuo; Kikuchi, Jun
2015-11-04
There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers. In addition to classical analyses such as hierarchical cluster analysis, principal component analysis, and partial least squares discriminant analysis, various multivariate strategies, including independent component analysis, non-negative matrix factorization, and multivariate curve resolution, have recently been proposed. However, determining the number of components is problematic. Despite the proposal of several different methods, no satisfactory approach has yet been reported. To resolve this problem, we implemented a new idea: classifying a component as "reliable" or "unreliable" based on the reproducibility of its appearance, regardless of the number of components in the calculation. Using the clustering method for classification, we applied this idea to multivariate curve resolution-alternating least squares (MCR-ALS). Comparisons between conventional and modified methods applied to proton nuclear magnetic resonance ((1)H-NMR) spectral datasets derived from known standard mixtures and biological mixtures (urine and feces of mice) revealed that more plausible results are obtained by the modified method. In particular, clusters containing little information were detected with reliability. This strategy, named "cluster-aided MCR-ALS," will facilitate the attainment of more reliable results in the metabolomics datasets.
Dinç, Erdal; Ozdemir, Abdil
2005-01-01
Multivariate chromatographic calibration technique was developed for the quantitative analysis of binary mixtures enalapril maleate (EA) and hydrochlorothiazide (HCT) in tablets in the presence of losartan potassium (LST). The mathematical algorithm of multivariate chromatographic calibration technique is based on the use of the linear regression equations constructed using relationship between concentration and peak area at the five-wavelength set. The algorithm of this mathematical calibration model having a simple mathematical content was briefly described. This approach is a powerful mathematical tool for an optimum chromatographic multivariate calibration and elimination of fluctuations coming from instrumental and experimental conditions. This multivariate chromatographic calibration contains reduction of multivariate linear regression functions to univariate data set. The validation of model was carried out by analyzing various synthetic binary mixtures and using the standard addition technique. Developed calibration technique was applied to the analysis of the real pharmaceutical tablets containing EA and HCT. The obtained results were compared with those obtained by classical HPLC method. It was observed that the proposed multivariate chromatographic calibration gives better results than classical HPLC.
Multivariate Density Estimation and Remote Sensing
NASA Technical Reports Server (NTRS)
Scott, D. W.
1983-01-01
Current efforts to develop methods and computer algorithms to effectively represent multivariate data commonly encountered in remote sensing applications are described. While this may involve scatter diagrams, multivariate representations of nonparametric probability density estimates are emphasized. The density function provides a useful graphical tool for looking at data and a useful theoretical tool for classification. This approach is called a thunderstorm data analysis.
Alegre-Cortés, J; Soto-Sánchez, C; Pizá, Á G; Albarracín, A L; Farfán, F D; Felice, C J; Fernández, E
2016-07-15
Linear analysis has classically provided powerful tools for understanding the behavior of neural populations, but the neuron responses to real-world stimulation are nonlinear under some conditions, and many neuronal components demonstrate strong nonlinear behavior. In spite of this, temporal and frequency dynamics of neural populations to sensory stimulation have been usually analyzed with linear approaches. In this paper, we propose the use of Noise-Assisted Multivariate Empirical Mode Decomposition (NA-MEMD), a data-driven template-free algorithm, plus the Hilbert transform as a suitable tool for analyzing population oscillatory dynamics in a multi-dimensional space with instantaneous frequency (IF) resolution. The proposed approach was able to extract oscillatory information of neurophysiological data of deep vibrissal nerve and visual cortex multiunit recordings that were not evidenced using linear approaches with fixed bases such as the Fourier analysis. Texture discrimination analysis performance was increased when Noise-Assisted Multivariate Empirical Mode plus Hilbert transform was implemented, compared to linear techniques. Cortical oscillatory population activity was analyzed with precise time-frequency resolution. Similarly, NA-MEMD provided increased time-frequency resolution of cortical oscillatory population activity. Noise-Assisted Multivariate Empirical Mode Decomposition plus Hilbert transform is an improved method to analyze neuronal population oscillatory dynamics overcoming linear and stationary assumptions of classical methods. Copyright © 2016 Elsevier B.V. All rights reserved.
The use of multivariate statistics in studies of wildlife habitat
David E. Capen
1981-01-01
This report contains edited and reviewed versions of papers presented at a workshop held at the University of Vermont in April 1980. Topics include sampling avian habitats, multivariate methods, applications, examples, and new approaches to analysis and interpretation.
Risk Factors for Central Serous Chorioretinopathy: Multivariate Approach in a Case-Control Study.
Chatziralli, Irini; Kabanarou, Stamatina A; Parikakis, Efstratios; Chatzirallis, Alexandros; Xirou, Tina; Mitropoulos, Panagiotis
2017-07-01
The purpose of this prospective study was to investigate the potential risk factors associated independently with central serous retinopathy (CSR) in a Greek population, using multivariate approach. Participants in the study were 183 consecutive patients diagnosed with CSR and 183 controls, matched for age. All participants underwent complete ophthalmological examination and information regarding their sociodemographic, clinical, medical and ophthalmological history were recorded, so as to assess potential risk factors for CSR. Univariate and multivariate analysis was performed. Univariate analysis showed that male sex, high educational status, high income, alcohol consumption, smoking, hypertension, coronary heart disease, obstructive sleep apnea, autoimmune disorders, H. pylori infection, type A personality and stress, steroid use, pregnancy and hyperopia were associated with CSR, while myopia was found to protect from CSR. In multivariate analysis, alcohol consumption, hypertension, coronary heart disease and autoimmune disorders lost their significance, while the remaining factors were all independently associated with CSR. It is important to take into account the various risk factors for CSR, so as to define vulnerable groups and to shed light into the pathogenesis of the disease.
Bohn, Justin; Eddings, Wesley; Schneeweiss, Sebastian
2017-03-15
Distributed networks of health-care data sources are increasingly being utilized to conduct pharmacoepidemiologic database studies. Such networks may contain data that are not physically pooled but instead are distributed horizontally (separate patients within each data source) or vertically (separate measures within each data source) in order to preserve patient privacy. While multivariable methods for the analysis of horizontally distributed data are frequently employed, few practical approaches have been put forth to deal with vertically distributed health-care databases. In this paper, we propose 2 propensity score-based approaches to vertically distributed data analysis and test their performance using 5 example studies. We found that these approaches produced point estimates close to what could be achieved without partitioning. We further found a performance benefit (i.e., lower mean squared error) for sequentially passing a propensity score through each data domain (called the "sequential approach") as compared with fitting separate domain-specific propensity scores (called the "parallel approach"). These results were validated in a small simulation study. This proof-of-concept study suggests a new multivariable analysis approach to vertically distributed health-care databases that is practical, preserves patient privacy, and warrants further investigation for use in clinical research applications that rely on health-care databases. © The Author 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Clark, Neil R.; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D.; Jones, Matthew R.; Ma’ayan, Avi
2016-01-01
Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community. PMID:26848405
Clark, Neil R; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D; Jones, Matthew R; Ma'ayan, Avi
2015-11-01
Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community.
D'Amico, E J; Neilands, T B; Zambarano, R
2001-11-01
Although power analysis is an important component in the planning and implementation of research designs, it is often ignored. Computer programs for performing power analysis are available, but most have limitations, particularly for complex multivariate designs. An SPSS procedure is presented that can be used for calculating power for univariate, multivariate, and repeated measures models with and without time-varying and time-constant covariates. Three examples provide a framework for calculating power via this method: an ANCOVA, a MANOVA, and a repeated measures ANOVA with two or more groups. The benefits and limitations of this procedure are discussed.
Variable Importance in Multivariate Group Comparisons.
ERIC Educational Resources Information Center
Huberty, Carl J.; Wisenbaker, Joseph M.
1992-01-01
Interpretations of relative variable importance in multivariate analysis of variance are discussed, with attention to (1) latent construct definition; (2) linear discriminant function scores; and (3) grouping variable effects. Two numerical ranking methods are proposed and compared by the bootstrap approach using two real data sets. (SLD)
ERIC Educational Resources Information Center
Braverman, Marc T.
2016-01-01
Extension program evaluations often present opportunities to analyze data in multiple ways. This article suggests that program evaluations can involve more sophisticated data analysis approaches than are often used. On the basis of a hypothetical program scenario and corresponding data set, two approaches to testing for evidence of program impact…
Multivariate meta-analysis for non-linear and other multi-parameter associations
Gasparrini, A; Armstrong, B; Kenward, M G
2012-01-01
In this paper, we formalize the application of multivariate meta-analysis and meta-regression to synthesize estimates of multi-parameter associations obtained from different studies. This modelling approach extends the standard two-stage analysis used to combine results across different sub-groups or populations. The most straightforward application is for the meta-analysis of non-linear relationships, described for example by regression coefficients of splines or other functions, but the methodology easily generalizes to any setting where complex associations are described by multiple correlated parameters. The modelling framework of multivariate meta-analysis is implemented in the package mvmeta within the statistical environment R. As an illustrative example, we propose a two-stage analysis for investigating the non-linear exposure–response relationship between temperature and non-accidental mortality using time-series data from multiple cities. Multivariate meta-analysis represents a useful analytical tool for studying complex associations through a two-stage procedure. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22807043
MacNab, Ying C
2016-08-01
This paper concerns with multivariate conditional autoregressive models defined by linear combination of independent or correlated underlying spatial processes. Known as linear models of coregionalization, the method offers a systematic and unified approach for formulating multivariate extensions to a broad range of univariate conditional autoregressive models. The resulting multivariate spatial models represent classes of coregionalized multivariate conditional autoregressive models that enable flexible modelling of multivariate spatial interactions, yielding coregionalization models with symmetric or asymmetric cross-covariances of different spatial variation and smoothness. In the context of multivariate disease mapping, for example, they facilitate borrowing strength both over space and cross variables, allowing for more flexible multivariate spatial smoothing. Specifically, we present a broadened coregionalization framework to include order-dependent, order-free, and order-robust multivariate models; a new class of order-free coregionalized multivariate conditional autoregressives is introduced. We tackle computational challenges and present solutions that are integral for Bayesian analysis of these models. We also discuss two ways of computing deviance information criterion for comparison among competing hierarchical models with or without unidentifiable prior parameters. The models and related methodology are developed in the broad context of modelling multivariate data on spatial lattice and illustrated in the context of multivariate disease mapping. The coregionalization framework and related methods also present a general approach for building spatially structured cross-covariance functions for multivariate geostatistics. © The Author(s) 2016.
NASA Astrophysics Data System (ADS)
Hsiao, Y. R.; Tsai, C.
2017-12-01
As the WHO Air Quality Guideline indicates, ambient air pollution exposes world populations under threat of fatal symptoms (e.g. heart disease, lung cancer, asthma etc.), raising concerns of air pollution sources and relative factors. This study presents a novel approach to investigating the multiscale variations of PM2.5 in southern Taiwan over the past decade, with four meteorological influencing factors (Temperature, relative humidity, precipitation and wind speed),based on Noise-assisted Multivariate Empirical Mode Decomposition(NAMEMD) algorithm, Hilbert Spectral Analysis(HSA) and Time-dependent Intrinsic Correlation(TDIC) method. NAMEMD algorithm is a fully data-driven approach designed for nonlinear and nonstationary multivariate signals, and is performed to decompose multivariate signals into a collection of channels of Intrinsic Mode Functions (IMFs). TDIC method is an EMD-based method using a set of sliding window sizes to quantify localized correlation coefficients for multiscale signals. With the alignment property and quasi-dyadic filter bank of NAMEMD algorithm, one is able to produce same number of IMFs for all variables and estimates the cross correlation in a more accurate way. The performance of spectral representation of NAMEMD-HSA method is compared with Complementary Empirical Mode Decomposition/ Hilbert Spectral Analysis (CEEMD-HSA) and Wavelet Analysis. The nature of NAMAMD-based TDICC analysis is then compared with CEEMD-based TDIC analysis and the traditional correlation analysis.
A Study of Effects of MultiCollinearity in the Multivariable Analysis
Yoo, Wonsuk; Mayberry, Robert; Bae, Sejong; Singh, Karan; (Peter) He, Qinghua; Lillard, James W.
2015-01-01
A multivariable analysis is the most popular approach when investigating associations between risk factors and disease. However, efficiency of multivariable analysis highly depends on correlation structure among predictive variables. When the covariates in the model are not independent one another, collinearity/multicollinearity problems arise in the analysis, which leads to biased estimation. This work aims to perform a simulation study with various scenarios of different collinearity structures to investigate the effects of collinearity under various correlation structures amongst predictive and explanatory variables and to compare these results with existing guidelines to decide harmful collinearity. Three correlation scenarios among predictor variables are considered: (1) bivariate collinear structure as the most simple collinearity case, (2) multivariate collinear structure where an explanatory variable is correlated with two other covariates, (3) a more realistic scenario when an independent variable can be expressed by various functions including the other variables. PMID:25664257
A Study of Effects of MultiCollinearity in the Multivariable Analysis.
Yoo, Wonsuk; Mayberry, Robert; Bae, Sejong; Singh, Karan; Peter He, Qinghua; Lillard, James W
2014-10-01
A multivariable analysis is the most popular approach when investigating associations between risk factors and disease. However, efficiency of multivariable analysis highly depends on correlation structure among predictive variables. When the covariates in the model are not independent one another, collinearity/multicollinearity problems arise in the analysis, which leads to biased estimation. This work aims to perform a simulation study with various scenarios of different collinearity structures to investigate the effects of collinearity under various correlation structures amongst predictive and explanatory variables and to compare these results with existing guidelines to decide harmful collinearity. Three correlation scenarios among predictor variables are considered: (1) bivariate collinear structure as the most simple collinearity case, (2) multivariate collinear structure where an explanatory variable is correlated with two other covariates, (3) a more realistic scenario when an independent variable can be expressed by various functions including the other variables.
Multi-country health surveys: are the analyses misleading?
Masood, Mohd; Reidpath, Daniel D
2014-05-01
The aim of this paper was to review the types of approaches currently utilized in the analysis of multi-country survey data, specifically focusing on design and modeling issues with a focus on analyses of significant multi-country surveys published in 2010. A systematic search strategy was used to identify the 10 multi-country surveys and the articles published from them in 2010. The surveys were selected to reflect diverse topics and foci; and provide an insight into analytic approaches across research themes. The search identified 159 articles appropriate for full text review and data extraction. The analyses adopted in the multi-country surveys can be broadly classified as: univariate/bivariate analyses, and multivariate/multivariable analyses. Multivariate/multivariable analyses may be further divided into design- and model-based analyses. Of the 159 articles reviewed, 129 articles used model-based analysis, 30 articles used design-based analyses. Similar patterns could be seen in all the individual surveys. While there is general agreement among survey statisticians that complex surveys are most appropriately analyzed using design-based analyses, most researchers continued to use the more common model-based approaches. Recent developments in design-based multi-level analysis may be one approach to include all the survey design characteristics. This is a relatively new area, however, and there remains statistical, as well as applied analytic research required. An important limitation of this study relates to the selection of the surveys used and the choice of year for the analysis, i.e., year 2010 only. There is, however, no strong reason to believe that analytic strategies have changed radically in the past few years, and 2010 provides a credible snapshot of current practice.
Exploratory Multivariate Analysis. A Graphical Approach.
1981-01-01
Gnanadesikan , 1977) but we feel that these should be used with great caution unless one really has good reason to believe that the data came from such a...are referred to Gnanadesikan (1977). The present author hopes that the convenience of a single summary or significance level will not deter his readers...fit of a harmonic model to meteorological data. (In preparation). Gnanadesikan , R. (1977). Methods for Statistical Data Analysis of Multivariate
Multivariate time series clustering on geophysical data recorded at Mt. Etna from 1996 to 2003
NASA Astrophysics Data System (ADS)
Di Salvo, Roberto; Montalto, Placido; Nunnari, Giuseppe; Neri, Marco; Puglisi, Giuseppe
2013-02-01
Time series clustering is an important task in data analysis issues in order to extract implicit, previously unknown, and potentially useful information from a large collection of data. Finding useful similar trends in multivariate time series represents a challenge in several areas including geophysics environment research. While traditional time series analysis methods deal only with univariate time series, multivariate time series analysis is a more suitable approach in the field of research where different kinds of data are available. Moreover, the conventional time series clustering techniques do not provide desired results for geophysical datasets due to the huge amount of data whose sampling rate is different according to the nature of signal. In this paper, a novel approach concerning geophysical multivariate time series clustering is proposed using dynamic time series segmentation and Self Organizing Maps techniques. This method allows finding coupling among trends of different geophysical data recorded from monitoring networks at Mt. Etna spanning from 1996 to 2003, when the transition from summit eruptions to flank eruptions occurred. This information can be used to carry out a more careful evaluation of the state of volcano and to define potential hazard assessment at Mt. Etna.
Defining critical habitats of threatened and endemic reef fishes with a multivariate approach.
Purcell, Steven W; Clarke, K Robert; Rushworth, Kelvin; Dalton, Steven J
2014-12-01
Understanding critical habitats of threatened and endemic animals is essential for mitigating extinction risks, developing recovery plans, and siting reserves, but assessment methods are generally lacking. We evaluated critical habitats of 8 threatened or endemic fish species on coral and rocky reefs of subtropical eastern Australia, by measuring physical and substratum-type variables of habitats at fish sightings. We used nonmetric and metric multidimensional scaling (nMDS, mMDS), Analysis of similarities (ANOSIM), similarity percentages analysis (SIMPER), permutational analysis of multivariate dispersions (PERMDISP), and other multivariate tools to distinguish critical habitats. Niche breadth was widest for 2 endemic wrasses, and reef inclination was important for several species, often found in relatively deep microhabitats. Critical habitats of mainland reef species included small caves or habitat-forming hosts such as gorgonian corals and black coral trees. Hard corals appeared important for reef fishes at Lord Howe Island, and red algae for mainland reef fishes. A wide range of habitat variables are required to assess critical habitats owing to varied affinities of species to different habitat features. We advocate assessments of critical habitats matched to the spatial scale used by the animals and a combination of multivariate methods. Our multivariate approach furnishes a general template for assessing the critical habitats of species, understanding how these vary among species, and determining differences in the degree of habitat specificity. © 2014 Society for Conservation Biology.
2014-09-01
approaches. Ecological Modelling Volume 200, Issues 1–2, 10, pp 1–19. Buhlmann, Kurt A ., Thomas S.B. Akre , John B. Iverson, Deno Karapatakis, Russell A ...statistical multivariate analysis to define the current and projected future range probability for species of interest to Army land managers. A software...15 Figure 4. RCW omission rate and predicted area as a function of the cumulative threshold
Docking and multivariate methods to explore HIV-1 drug-resistance: a comparative analysis
NASA Astrophysics Data System (ADS)
Almerico, Anna Maria; Tutone, Marco; Lauria, Antonino
2008-05-01
In this paper we describe a comparative analysis between multivariate and docking methods in the study of the drug resistance to the reverse transcriptase and the protease inhibitors. In our early papers we developed a simple but efficient method to evaluate the features of compounds that are less likely to trigger resistance or are effective against mutant HIV strains, using the multivariate statistical procedures PCA and DA. In the attempt to create a more solid background for the prediction of susceptibility or resistance, we carried out a comparative analysis between our previous multivariate approach and molecular docking study. The intent of this paper is not only to find further support to the results obtained by the combined use of PCA and DA, but also to evidence the structural features, in terms of molecular descriptors, similarity, and energetic contributions, derived from docking, which can account for the arising of drug-resistance against mutant strains.
Multivariate generalized multifactor dimensionality reduction to detect gene-gene interactions
2013-01-01
Background Recently, one of the greatest challenges in genome-wide association studies is to detect gene-gene and/or gene-environment interactions for common complex human diseases. Ritchie et al. (2001) proposed multifactor dimensionality reduction (MDR) method for interaction analysis. MDR is a combinatorial approach to reduce multi-locus genotypes into high-risk and low-risk groups. Although MDR has been widely used for case-control studies with binary phenotypes, several extensions have been proposed. One of these methods, a generalized MDR (GMDR) proposed by Lou et al. (2007), allows adjusting for covariates and applying to both dichotomous and continuous phenotypes. GMDR uses the residual score of a generalized linear model of phenotypes to assign either high-risk or low-risk group, while MDR uses the ratio of cases to controls. Methods In this study, we propose multivariate GMDR, an extension of GMDR for multivariate phenotypes. Jointly analysing correlated multivariate phenotypes may have more power to detect susceptible genes and gene-gene interactions. We construct generalized estimating equations (GEE) with multivariate phenotypes to extend generalized linear models. Using the score vectors from GEE we discriminate high-risk from low-risk groups. We applied the multivariate GMDR method to the blood pressure data of the 7,546 subjects from the Korean Association Resource study: systolic blood pressure (SBP) and diastolic blood pressure (DBP). We compare the results of multivariate GMDR for SBP and DBP to the results from separate univariate GMDR for SBP and DBP, respectively. We also applied the multivariate GMDR method to the repeatedly measured hypertension status from 5,466 subjects and compared its result with those of univariate GMDR at each time point. Results Results from the univariate GMDR and multivariate GMDR in two-locus model with both blood pressures and hypertension phenotypes indicate best combinations of SNPs whose interaction has significant association with risk for high blood pressures or hypertension. Although the test balanced accuracy (BA) of multivariate analysis was not always greater than that of univariate analysis, the multivariate BAs were more stable with smaller standard deviations. Conclusions In this study, we have developed multivariate GMDR method using GEE approach. It is useful to use multivariate GMDR with correlated multiple phenotypes of interests. PMID:24565370
Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti
2016-07-01
A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J.; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T.; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti
2016-01-01
Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco Contacts: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153689
Multivariate meta-analysis using individual participant data
Riley, R. D.; Price, M. J.; Jackson, D.; Wardle, M.; Gueyffier, F.; Wang, J.; Staessen, J. A.; White, I. R.
2016-01-01
When combining results across related studies, a multivariate meta-analysis allows the joint synthesis of correlated effect estimates from multiple outcomes. Joint synthesis can improve efficiency over separate univariate syntheses, may reduce selective outcome reporting biases, and enables joint inferences across the outcomes. A common issue is that within-study correlations needed to fit the multivariate model are unknown from published reports. However, provision of individual participant data (IPD) allows them to be calculated directly. Here, we illustrate how to use IPD to estimate within-study correlations, using a joint linear regression for multiple continuous outcomes and bootstrapping methods for binary, survival and mixed outcomes. In a meta-analysis of 10 hypertension trials, we then show how these methods enable multivariate meta-analysis to address novel clinical questions about continuous, survival and binary outcomes; treatment–covariate interactions; adjusted risk/prognostic factor effects; longitudinal data; prognostic and multiparameter models; and multiple treatment comparisons. Both frequentist and Bayesian approaches are applied, with example software code provided to derive within-study correlations and to fit the models. PMID:26099484
Brito Lopes, Fernando; da Silva, Marcelo Corrêa; Magnabosco, Cláudio Ulhôa; Goncalves Narciso, Marcelo; Sainz, Roberto Daniel
2016-01-01
This research evaluated a multivariate approach as an alternative tool for the purpose of selection regarding expected progeny differences (EPDs). Data were fitted using a multi-trait model and consisted of growth traits (birth weight and weights at 120, 210, 365 and 450 days of age) and carcass traits (longissimus muscle area (LMA), back-fat thickness (BF), and rump fat thickness (RF)), registered over 21 years in extensive breeding systems of Polled Nellore cattle in Brazil. Multivariate analyses were performed using standardized (zero mean and unit variance) EPDs. The k mean method revealed that the best fit of data occurred using three clusters (k = 3) (P < 0.001). Estimates of genetic correlation among growth and carcass traits and the estimates of heritability were moderate to high, suggesting that a correlated response approach is suitable for practical decision making. Estimates of correlation between selection indices and the multivariate index (LD1) were moderate to high, ranging from 0.48 to 0.97. This reveals that both types of indices give similar results and that the multivariate approach is reliable for the purpose of selection. The alternative tool seems very handy when economic weights are not available or in cases where more rapid identification of the best animals is desired. Interestingly, multivariate analysis allowed forecasting information based on the relationships among breeding values (EPDs). Also, it enabled fine discrimination, rapid data summarization after genetic evaluation, and permitted accounting for maternal ability and the genetic direct potential of the animals. In addition, we recommend the use of longissimus muscle area and subcutaneous fat thickness as selection criteria, to allow estimation of breeding values before the first mating season in order to accelerate the response to individual selection. PMID:26789008
Brito Lopes, Fernando; da Silva, Marcelo Corrêa; Magnabosco, Cláudio Ulhôa; Goncalves Narciso, Marcelo; Sainz, Roberto Daniel
2016-01-01
This research evaluated a multivariate approach as an alternative tool for the purpose of selection regarding expected progeny differences (EPDs). Data were fitted using a multi-trait model and consisted of growth traits (birth weight and weights at 120, 210, 365 and 450 days of age) and carcass traits (longissimus muscle area (LMA), back-fat thickness (BF), and rump fat thickness (RF)), registered over 21 years in extensive breeding systems of Polled Nellore cattle in Brazil. Multivariate analyses were performed using standardized (zero mean and unit variance) EPDs. The k mean method revealed that the best fit of data occurred using three clusters (k = 3) (P < 0.001). Estimates of genetic correlation among growth and carcass traits and the estimates of heritability were moderate to high, suggesting that a correlated response approach is suitable for practical decision making. Estimates of correlation between selection indices and the multivariate index (LD1) were moderate to high, ranging from 0.48 to 0.97. This reveals that both types of indices give similar results and that the multivariate approach is reliable for the purpose of selection. The alternative tool seems very handy when economic weights are not available or in cases where more rapid identification of the best animals is desired. Interestingly, multivariate analysis allowed forecasting information based on the relationships among breeding values (EPDs). Also, it enabled fine discrimination, rapid data summarization after genetic evaluation, and permitted accounting for maternal ability and the genetic direct potential of the animals. In addition, we recommend the use of longissimus muscle area and subcutaneous fat thickness as selection criteria, to allow estimation of breeding values before the first mating season in order to accelerate the response to individual selection.
Westman, Eric; Aguilar, Carlos; Muehlboeck, J-Sebastian; Simmons, Andrew
2013-01-01
Automated structural magnetic resonance imaging (MRI) processing pipelines are gaining popularity for Alzheimer's disease (AD) research. They generate regional volumes, cortical thickness measures and other measures, which can be used as input for multivariate analysis. It is not clear which combination of measures and normalization approach are most useful for AD classification and to predict mild cognitive impairment (MCI) conversion. The current study includes MRI scans from 699 subjects [AD, MCI and controls (CTL)] from the Alzheimer's disease Neuroimaging Initiative (ADNI). The Freesurfer pipeline was used to generate regional volume, cortical thickness, gray matter volume, surface area, mean curvature, gaussian curvature, folding index and curvature index measures. 259 variables were used for orthogonal partial least square to latent structures (OPLS) multivariate analysis. Normalisation approaches were explored and the optimal combination of measures determined. Results indicate that cortical thickness measures should not be normalized, while volumes should probably be normalized by intracranial volume (ICV). Combining regional cortical thickness measures (not normalized) with cortical and subcortical volumes (normalized with ICV) using OPLS gave a prediction accuracy of 91.5 % when distinguishing AD versus CTL. This model prospectively predicted future decline from MCI to AD with 75.9 % of converters correctly classified. Normalization strategy did not have a significant effect on the accuracies of multivariate models containing multiple MRI measures for this large dataset. The appropriate choice of input for multivariate analysis in AD and MCI is of great importance. The results support the use of un-normalised cortical thickness measures and volumes normalised by ICV.
Goh, Choon Fu; Craig, Duncan Q M; Hadgraft, Jonathan; Lane, Majella E
2017-02-01
Drug permeation through the intercellular lipids, which pack around and between corneocytes, may be enhanced by increasing the thermodynamic activity of the active in a formulation. However, this may also result in unwanted drug crystallisation on and in the skin. In this work, we explore the combination of ATR-FTIR spectroscopy and multivariate data analysis to study drug crystallisation in the skin. Ex vivo permeation studies of saturated solutions of diclofenac sodium (DF Na) in two vehicles, propylene glycol (PG) and dimethyl sulphoxide (DMSO), were carried out in porcine ear skin. Tape stripping and ATR-FTIR spectroscopy were conducted simultaneously to collect spectral data as a function of skin depth. Multivariate data analysis was applied to visualise and categorise the spectral data in the region of interest (1700-1500cm -1 ) containing the carboxylate (COO - ) asymmetric stretching vibrations of DF Na. Spectral data showed the redshifts of the COO - asymmetric stretching vibrations for DF Na in the solution compared with solid drug. Similar shifts were evident following application of saturated solutions of DF Na to porcine skin samples. Multivariate data analysis categorised the spectral data based on the spectral differences and drug crystallisation was found to be confined to the upper layers of the skin. This proof-of-concept study highlights the utility of ATR-FTIR spectroscopy in combination with multivariate data analysis as a simple and rapid approach in the investigation of drug deposition in the skin. The approach described here will be extended to the study of other actives for topical application to the skin. Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Hasyim, M.; Prastyo, D. D.
2018-03-01
Survival analysis performs relationship between independent variables and survival time as dependent variable. In fact, not all survival data can be recorded completely by any reasons. In such situation, the data is called censored data. Moreover, several model for survival analysis requires assumptions. One of the approaches in survival analysis is nonparametric that gives more relax assumption. In this research, the nonparametric approach that is employed is Multivariate Regression Adaptive Spline (MARS). This study is aimed to measure the performance of private university’s lecturer. The survival time in this study is duration needed by lecturer to obtain their professional certificate. The results show that research activities is a significant factor along with developing courses material, good publication in international or national journal, and activities in research collaboration.
Galas, David J; Sakhanenko, Nikita A; Skupin, Alexander; Ignac, Tomasz
2014-02-01
Context dependence is central to the description of complexity. Keying on the pairwise definition of "set complexity," we use an information theory approach to formulate general measures of systems complexity. We examine the properties of multivariable dependency starting with the concept of interaction information. We then present a new measure for unbiased detection of multivariable dependency, "differential interaction information." This quantity for two variables reduces to the pairwise "set complexity" previously proposed as a context-dependent measure of information in biological systems. We generalize it here to an arbitrary number of variables. Critical limiting properties of the "differential interaction information" are key to the generalization. This measure extends previous ideas about biological information and provides a more sophisticated basis for the study of complexity. The properties of "differential interaction information" also suggest new approaches to data analysis. Given a data set of system measurements, differential interaction information can provide a measure of collective dependence, which can be represented in hypergraphs describing complex system interaction patterns. We investigate this kind of analysis using simulated data sets. The conjoining of a generalized set complexity measure, multivariable dependency analysis, and hypergraphs is our central result. While our focus is on complex biological systems, our results are applicable to any complex system.
Multivariate normative comparisons using an aggregated database
Murre, Jaap M. J.; Huizenga, Hilde M.
2017-01-01
In multivariate normative comparisons, a patient’s profile of test scores is compared to those in a normative sample. Recently, it has been shown that these multivariate normative comparisons enhance the sensitivity of neuropsychological assessment. However, multivariate normative comparisons require multivariate normative data, which are often unavailable. In this paper, we show how a multivariate normative database can be constructed by combining healthy control group data from published neuropsychological studies. We show that three issues should be addressed to construct a multivariate normative database. First, the database may have a multilevel structure, with participants nested within studies. Second, not all tests are administered in every study, so many data may be missing. Third, a patient should be compared to controls of similar age, gender and educational background rather than to the entire normative sample. To address these issues, we propose a multilevel approach for multivariate normative comparisons that accounts for missing data and includes covariates for age, gender and educational background. Simulations show that this approach controls the number of false positives and has high sensitivity to detect genuine deviations from the norm. An empirical example is provided. Implications for other domains than neuropsychology are also discussed. To facilitate broader adoption of these methods, we provide code implementing the entire analysis in the open source software package R. PMID:28267796
Kia, Seyed Mostafa; Vega Pons, Sandro; Weisz, Nathan; Passerini, Andrea
2016-01-01
Brain decoding is a popular multivariate approach for hypothesis testing in neuroimaging. Linear classifiers are widely employed in the brain decoding paradigm to discriminate among experimental conditions. Then, the derived linear weights are visualized in the form of multivariate brain maps to further study spatio-temporal patterns of underlying neural activities. It is well known that the brain maps derived from weights of linear classifiers are hard to interpret because of high correlations between predictors, low signal to noise ratios, and the high dimensionality of neuroimaging data. Therefore, improving the interpretability of brain decoding approaches is of primary interest in many neuroimaging studies. Despite extensive studies of this type, at present, there is no formal definition for interpretability of multivariate brain maps. As a consequence, there is no quantitative measure for evaluating the interpretability of different brain decoding methods. In this paper, first, we present a theoretical definition of interpretability in brain decoding; we show that the interpretability of multivariate brain maps can be decomposed into their reproducibility and representativeness. Second, as an application of the proposed definition, we exemplify a heuristic for approximating the interpretability in multivariate analysis of evoked magnetoencephalography (MEG) responses. Third, we propose to combine the approximated interpretability and the generalization performance of the brain decoding into a new multi-objective criterion for model selection. Our results, for the simulated and real MEG data, show that optimizing the hyper-parameters of the regularized linear classifier based on the proposed criterion results in more informative multivariate brain maps. More importantly, the presented definition provides the theoretical background for quantitative evaluation of interpretability, and hence, facilitates the development of more effective brain decoding algorithms in the future.
Kia, Seyed Mostafa; Vega Pons, Sandro; Weisz, Nathan; Passerini, Andrea
2017-01-01
Brain decoding is a popular multivariate approach for hypothesis testing in neuroimaging. Linear classifiers are widely employed in the brain decoding paradigm to discriminate among experimental conditions. Then, the derived linear weights are visualized in the form of multivariate brain maps to further study spatio-temporal patterns of underlying neural activities. It is well known that the brain maps derived from weights of linear classifiers are hard to interpret because of high correlations between predictors, low signal to noise ratios, and the high dimensionality of neuroimaging data. Therefore, improving the interpretability of brain decoding approaches is of primary interest in many neuroimaging studies. Despite extensive studies of this type, at present, there is no formal definition for interpretability of multivariate brain maps. As a consequence, there is no quantitative measure for evaluating the interpretability of different brain decoding methods. In this paper, first, we present a theoretical definition of interpretability in brain decoding; we show that the interpretability of multivariate brain maps can be decomposed into their reproducibility and representativeness. Second, as an application of the proposed definition, we exemplify a heuristic for approximating the interpretability in multivariate analysis of evoked magnetoencephalography (MEG) responses. Third, we propose to combine the approximated interpretability and the generalization performance of the brain decoding into a new multi-objective criterion for model selection. Our results, for the simulated and real MEG data, show that optimizing the hyper-parameters of the regularized linear classifier based on the proposed criterion results in more informative multivariate brain maps. More importantly, the presented definition provides the theoretical background for quantitative evaluation of interpretability, and hence, facilitates the development of more effective brain decoding algorithms in the future. PMID:28167896
Casarrubea, M; Magnusson, M S; Roy, V; Arabo, A; Sorbera, F; Santangelo, A; Faulisi, F; Crescimanno, G
2014-08-30
Aim of this article is to illustrate the application of a multivariate approach known as t-pattern analysis in the study of rat behavior in elevated plus maze. By means of this multivariate approach, significant relationships among behavioral events in the course of time can be described. Both quantitative and t-pattern analyses were utilized to analyze data obtained from fifteen male Wistar rats following a trial 1-trial 2 protocol. In trial 2, in comparison with the initial exposure, mean occurrences of behavioral elements performed in protected zones of the maze showed a significant increase counterbalanced by a significant decrease of mean occurrences of behavioral elements in unprotected zones. Multivariate t-pattern analysis, in trial 1, revealed the presence of 134 t-patterns of different composition. In trial 2, the temporal structure of behavior become more simple, being present only 32 different t-patterns. Behavioral strings and stripes (i.e. graphical representation of each t-pattern onset) of all t-patterns were presented both for trial 1 and trial 2 as well. Finally, percent distributions in the three zones of the maze show a clear-cut increase of t-patterns in closed arm and a significant reduction in the remaining zones. Results show that previous experience deeply modifies the temporal structure of rat behavior in the elevated plus maze. In addition, this article, by highlighting several conceptual, methodological and illustrative aspects on the utilization of t-pattern analysis, could represent a useful background to employ such a refined approach in the study of rat behavior in elevated plus maze. Copyright © 2014 Elsevier B.V. All rights reserved.
Fontes, Cristiano Hora; Budman, Hector
2017-11-01
A clustering problem involving multivariate time series (MTS) requires the selection of similarity metrics. This paper shows the limitations of the PCA similarity factor (SPCA) as a single metric in nonlinear problems where there are differences in magnitude of the same process variables due to expected changes in operation conditions. A novel method for clustering MTS based on a combination between SPCA and the average-based Euclidean distance (AED) within a fuzzy clustering approach is proposed. Case studies involving either simulated or real industrial data collected from a large scale gas turbine are used to illustrate that the hybrid approach enhances the ability to recognize normal and fault operating patterns. This paper also proposes an oversampling procedure to create synthetic multivariate time series that can be useful in commonly occurring situations involving unbalanced data sets. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Predictive and mechanistic multivariate linear regression models for reaction development
Santiago, Celine B.; Guo, Jing-Yao
2018-01-01
Multivariate Linear Regression (MLR) models utilizing computationally-derived and empirically-derived physical organic molecular descriptors are described in this review. Several reports demonstrating the effectiveness of this methodological approach towards reaction optimization and mechanistic interrogation are discussed. A detailed protocol to access quantitative and predictive MLR models is provided as a guide for model development and parameter analysis. PMID:29719711
Avalappampatty Sivasamy, Aneetha; Sundan, Bose
2015-01-01
The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668
Sivasamy, Aneetha Avalappampatty; Sundan, Bose
2015-01-01
The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.
Valverde-Som, Lucia; Ruiz-Samblás, Cristina; Rodríguez-García, Francisco P; Cuadros-Rodríguez, Luis
2018-02-09
The organoleptic quality of virgin olive oil depends on positive and negative sensory attributes. These attributes are related to volatile organic compounds and phenolic compounds that represent the aroma and taste (flavour) of the virgin olive oil. The flavour is the characteristic that can be measured by a taster panel. However, as for any analytical measuring device, the tasters, individually, and the panel, as a whole, should be harmonized and validated and proper olive oil standards are needed. In the present study, multivariate approaches are put into practice in addition to the rules to build a multivariate control chart from chromatographic volatile fingerprinting and chemometrics. Fingerprinting techniques provide analytical information without identify and quantify the analytes. This methodology is used to monitor the stability of sensory reference materials. The similarity indices have been calculated to build multivariate control chart with two olive oils certified reference materials that have been used as examples to monitor their stabilities. This methodology with chromatographic data could be applied in parallel with the 'panel test' sensory method to reduce the work of sensory analysis. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.
NONPARAMETRIC MANOVA APPROACHES FOR NON-NORMAL MULTIVARIATE OUTCOMES WITH MISSING VALUES
He, Fanyin; Mazumdar, Sati; Tang, Gong; Bhatia, Triptish; Anderson, Stewart J.; Dew, Mary Amanda; Krafty, Robert; Nimgaonkar, Vishwajit; Deshpande, Smita; Hall, Martica; Reynolds, Charles F.
2017-01-01
Between-group comparisons often entail many correlated response variables. The multivariate linear model, with its assumption of multivariate normality, is the accepted standard tool for these tests. When this assumption is violated, the nonparametric multivariate Kruskal-Wallis (MKW) test is frequently used. However, this test requires complete cases with no missing values in response variables. Deletion of cases with missing values likely leads to inefficient statistical inference. Here we extend the MKW test to retain information from partially-observed cases. Results of simulated studies and analysis of real data show that the proposed method provides adequate coverage and superior power to complete-case analyses. PMID:29416225
Ji, Hong; Petro, Nathan M; Chen, Badong; Yuan, Zejian; Wang, Jianji; Zheng, Nanning; Keil, Andreas
2018-02-06
Over the past decade, the simultaneous recording of electroencephalogram (EEG) and functional magnetic resonance imaging (fMRI) data has garnered growing interest because it may provide an avenue towards combining the strengths of both imaging modalities. Given their pronounced differences in temporal and spatial statistics, the combination of EEG and fMRI data is however methodologically challenging. Here, we propose a novel screening approach that relies on a Cross Multivariate Correlation Coefficient (xMCC) framework. This approach accomplishes three tasks: (1) It provides a measure for testing multivariate correlation and multivariate uncorrelation of the two modalities; (2) it provides criterion for the selection of EEG features; (3) it performs a screening of relevant EEG information by grouping the EEG channels into clusters to improve efficiency and to reduce computational load when searching for the best predictors of the BOLD signal. The present report applies this approach to a data set with concurrent recordings of steady-state-visual evoked potentials (ssVEPs) and fMRI, recorded while observers viewed phase-reversing Gabor patches. We test the hypothesis that fluctuations in visuo-cortical mass potentials systematically covary with BOLD fluctuations not only in visual cortical, but also in anterior temporal and prefrontal areas. Results supported the hypothesis and showed that the xMCC-based analysis provides straightforward identification of neurophysiological plausible brain regions with EEG-fMRI covariance. Furthermore xMCC converged with other extant methods for EEG-fMRI analysis. © 2018 The Authors Journal of Neuroscience Research Published by Wiley Periodicals, Inc.
Kumar, Keshav
2017-11-01
Multivariate curve resolution alternating least square (MCR-ALS) analysis is the most commonly used curve resolution technique. The MCR-ALS model is fitted using the alternate least square (ALS) algorithm that needs initialisation of either contribution profiles or spectral profiles of each of the factor. The contribution profiles can be initialised using the evolve factor analysis; however, in principle, this approach requires that data must belong to the sequential process. The initialisation of the spectral profiles are usually carried out using the pure variable approach such as SIMPLISMA algorithm, this approach demands that each factor must have the pure variables in the data sets. Despite these limitations, the existing approaches have been quite a successful for initiating the MCR-ALS analysis. However, the present work proposes an alternate approach for the initialisation of the spectral variables by generating the random variables in the limits spanned by the maxima and minima of each spectral variable of the data set. The proposed approach does not require that there must be pure variables for each component of the multicomponent system or the concentration direction must follow the sequential process. The proposed approach is successfully validated using the excitation-emission matrix fluorescence data sets acquired for certain fluorophores with significant spectral overlap. The calculated contribution and spectral profiles of these fluorophores are found to correlate well with the experimental results. In summary, the present work proposes an alternate way to initiate the MCR-ALS analysis.
A time-series approach to dynamical systems from classical and quantum worlds
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fossion, Ruben
2014-01-08
This contribution discusses some recent applications of time-series analysis in Random Matrix Theory (RMT), and applications of RMT in the statistial analysis of eigenspectra of correlation matrices of multivariate time series.
An Extension of Dominance Analysis to Canonical Correlation Analysis
ERIC Educational Resources Information Center
Huo, Yan; Budescu, David V.
2009-01-01
Dominance analysis (Budescu, 1993) offers a general framework for determination of relative importance of predictors in univariate and multivariate multiple regression models. This approach relies on pairwise comparisons of the contribution of predictors in all relevant subset models. In this article we extend dominance analysis to canonical…
Yang, Jun-Ho; Yoh, Jack J
2018-01-01
A novel technique is reported for separating overlapping latent fingerprints using chemometric approaches that combine laser-induced breakdown spectroscopy (LIBS) and multivariate analysis. The LIBS technique provides the capability of real time analysis and high frequency scanning as well as the data regarding the chemical composition of overlapping latent fingerprints. These spectra offer valuable information for the classification and reconstruction of overlapping latent fingerprints by implementing appropriate statistical multivariate analysis. The current study employs principal component analysis and partial least square methods for the classification of latent fingerprints from the LIBS spectra. This technique was successfully demonstrated through a classification study of four distinct latent fingerprints using classification methods such as soft independent modeling of class analogy (SIMCA) and partial least squares discriminant analysis (PLS-DA). The novel method yielded an accuracy of more than 85% and was proven to be sufficiently robust. Furthermore, through laser scanning analysis at a spatial interval of 125 µm, the overlapping fingerprints were reconstructed as separate two-dimensional forms.
NASA Astrophysics Data System (ADS)
Jia, Xiaoliang; An, Haizhong; Sun, Xiaoqi; Huang, Xuan; Gao, Xiangyun
2016-04-01
The globalization and regionalization of crude oil trade inevitably give rise to the difference of crude oil prices. The understanding of the pattern of the crude oil prices' mutual propagation is essential for analyzing the development of global oil trade. Previous research has focused mainly on the fuzzy long- or short-term one-to-one propagation of bivariate oil prices, generally ignoring various patterns of periodical multivariate propagation. This study presents a wavelet-based network approach to help uncover the multipath propagation of multivariable crude oil prices in a joint time-frequency period. The weekly oil spot prices of the OPEC member states from June 1999 to March 2011 are adopted as the sample data. First, we used wavelet analysis to find different subseries based on an optimal decomposing scale to describe the periodical feature of the original oil price time series. Second, a complex network model was constructed based on an optimal threshold selection to describe the structural feature of multivariable oil prices. Third, Bayesian network analysis (BNA) was conducted to find the probability causal relationship based on periodical structural features to describe the various patterns of periodical multivariable propagation. Finally, the significance of the leading and intermediary oil prices is discussed. These findings are beneficial for the implementation of periodical target-oriented pricing policies and investment strategies.
Multivariate pattern analysis of fMRI: the early beginnings.
Haxby, James V
2012-08-15
In 2001, we published a paper on the representation of faces and objects in ventral temporal cortex that introduced a new method for fMRI analysis, which subsequently came to be called multivariate pattern analysis (MVPA). MVPA now refers to a diverse set of methods that analyze neural responses as patterns of activity that reflect the varying brain states that a cortical field or system can produce. This paper recounts the circumstances and events that led to the original study and later developments and innovations that have greatly expanded this approach to fMRI data analysis, leading to its widespread application. Copyright © 2012 Elsevier Inc. All rights reserved.
FREQ: A computational package for multivariable system loop-shaping procedures
NASA Technical Reports Server (NTRS)
Giesy, Daniel P.; Armstrong, Ernest S.
1989-01-01
Many approaches in the field of linear, multivariable time-invariant systems analysis and controller synthesis employ loop-sharing procedures wherein design parameters are chosen to shape frequency-response singular value plots of selected transfer matrices. A software package, FREQ, is documented for computing within on unified framework many of the most used multivariable transfer matrices for both continuous and discrete systems. The matrices are evaluated at user-selected frequency-response values, and singular values against frequency. Example computations are presented to demonstrate the use of the FREQ code.
Linn, Kristin A; Gaonkar, Bilwaj; Satterthwaite, Theodore D; Doshi, Jimit; Davatzikos, Christos; Shinohara, Russell T
2016-05-15
Normalization of feature vector values is a common practice in machine learning. Generally, each feature value is standardized to the unit hypercube or by normalizing to zero mean and unit variance. Classification decisions based on support vector machines (SVMs) or by other methods are sensitive to the specific normalization used on the features. In the context of multivariate pattern analysis using neuroimaging data, standardization effectively up- and down-weights features based on their individual variability. Since the standard approach uses the entire data set to guide the normalization, it utilizes the total variability of these features. This total variation is inevitably dependent on the amount of marginal separation between groups. Thus, such a normalization may attenuate the separability of the data in high dimensional space. In this work we propose an alternate approach that uses an estimate of the control-group standard deviation to normalize features before training. We study our proposed approach in the context of group classification using structural MRI data. We show that control-based normalization leads to better reproducibility of estimated multivariate disease patterns and improves the classifier performance in many cases. Copyright © 2016 Elsevier Inc. All rights reserved.
Use of Multivariate Linkage Analysis for Dissection of a Complex Cognitive Trait
Marlow, Angela J.; Fisher, Simon E.; Francks, Clyde; MacPhie, I. Laurence; Cherny, Stacey S.; Richardson, Alex J.; Talcott, Joel B.; Stein, John F.; Monaco, Anthony P.; Cardon, Lon R.
2003-01-01
Replication of linkage results for complex traits has been exceedingly difficult, owing in part to the inability to measure the precise underlying phenotype, small sample sizes, genetic heterogeneity, and statistical methods employed in analysis. Often, in any particular study, multiple correlated traits have been collected, yet these have been analyzed independently or, at most, in bivariate analyses. Theoretical arguments suggest that full multivariate analysis of all available traits should offer more power to detect linkage; however, this has not yet been evaluated on a genomewide scale. Here, we conduct multivariate genomewide analyses of quantitative-trait loci that influence reading- and language-related measures in families affected with developmental dyslexia. The results of these analyses are substantially clearer than those of previous univariate analyses of the same data set, helping to resolve a number of key issues. These outcomes highlight the relevance of multivariate analysis for complex disorders for dissection of linkage results in correlated traits. The approach employed here may aid positional cloning of susceptibility genes in a wide spectrum of complex traits. PMID:12587094
Multivariate meta-analysis using individual participant data.
Riley, R D; Price, M J; Jackson, D; Wardle, M; Gueyffier, F; Wang, J; Staessen, J A; White, I R
2015-06-01
When combining results across related studies, a multivariate meta-analysis allows the joint synthesis of correlated effect estimates from multiple outcomes. Joint synthesis can improve efficiency over separate univariate syntheses, may reduce selective outcome reporting biases, and enables joint inferences across the outcomes. A common issue is that within-study correlations needed to fit the multivariate model are unknown from published reports. However, provision of individual participant data (IPD) allows them to be calculated directly. Here, we illustrate how to use IPD to estimate within-study correlations, using a joint linear regression for multiple continuous outcomes and bootstrapping methods for binary, survival and mixed outcomes. In a meta-analysis of 10 hypertension trials, we then show how these methods enable multivariate meta-analysis to address novel clinical questions about continuous, survival and binary outcomes; treatment-covariate interactions; adjusted risk/prognostic factor effects; longitudinal data; prognostic and multiparameter models; and multiple treatment comparisons. Both frequentist and Bayesian approaches are applied, with example software code provided to derive within-study correlations and to fit the models. © 2014 The Authors. Research Synthesis Methods published by John Wiley & Sons, Ltd.
Large-scale Granger causality analysis on resting-state functional MRI
NASA Astrophysics Data System (ADS)
D'Souza, Adora M.; Abidin, Anas Zainul; Leistritz, Lutz; Wismüller, Axel
2016-03-01
We demonstrate an approach to measure the information flow between each pair of time series in resting-state functional MRI (fMRI) data of the human brain and subsequently recover its underlying network structure. By integrating dimensionality reduction into predictive time series modeling, large-scale Granger Causality (lsGC) analysis method can reveal directed information flow suggestive of causal influence at an individual voxel level, unlike other multivariate approaches. This method quantifies the influence each voxel time series has on every other voxel time series in a multivariate sense and hence contains information about the underlying dynamics of the whole system, which can be used to reveal functionally connected networks within the brain. To identify such networks, we perform non-metric network clustering, such as accomplished by the Louvain method. We demonstrate the effectiveness of our approach to recover the motor and visual cortex from resting state human brain fMRI data and compare it with the network recovered from a visuomotor stimulation experiment, where the similarity is measured by the Dice Coefficient (DC). The best DC obtained was 0.59 implying a strong agreement between the two networks. In addition, we thoroughly study the effect of dimensionality reduction in lsGC analysis on network recovery. We conclude that our approach is capable of detecting causal influence between time series in a multivariate sense, which can be used to segment functionally connected networks in the resting-state fMRI.
Multiple imputation for handling missing outcome data when estimating the relative risk.
Sullivan, Thomas R; Lee, Katherine J; Ryan, Philip; Salter, Amy B
2017-09-06
Multiple imputation is a popular approach to handling missing data in medical research, yet little is known about its applicability for estimating the relative risk. Standard methods for imputing incomplete binary outcomes involve logistic regression or an assumption of multivariate normality, whereas relative risks are typically estimated using log binomial models. It is unclear whether misspecification of the imputation model in this setting could lead to biased parameter estimates. Using simulated data, we evaluated the performance of multiple imputation for handling missing data prior to estimating adjusted relative risks from a correctly specified multivariable log binomial model. We considered an arbitrary pattern of missing data in both outcome and exposure variables, with missing data induced under missing at random mechanisms. Focusing on standard model-based methods of multiple imputation, missing data were imputed using multivariate normal imputation or fully conditional specification with a logistic imputation model for the outcome. Multivariate normal imputation performed poorly in the simulation study, consistently producing estimates of the relative risk that were biased towards the null. Despite outperforming multivariate normal imputation, fully conditional specification also produced somewhat biased estimates, with greater bias observed for higher outcome prevalences and larger relative risks. Deleting imputed outcomes from analysis datasets did not improve the performance of fully conditional specification. Both multivariate normal imputation and fully conditional specification produced biased estimates of the relative risk, presumably since both use a misspecified imputation model. Based on simulation results, we recommend researchers use fully conditional specification rather than multivariate normal imputation and retain imputed outcomes in the analysis when estimating relative risks. However fully conditional specification is not without its shortcomings, and so further research is needed to identify optimal approaches for relative risk estimation within the multiple imputation framework.
Simoneau, Gabrielle; Levis, Brooke; Cuijpers, Pim; Ioannidis, John P A; Patten, Scott B; Shrier, Ian; Bombardier, Charles H; de Lima Osório, Flavia; Fann, Jesse R; Gjerdingen, Dwenda; Lamers, Femke; Lotrakul, Manote; Löwe, Bernd; Shaaban, Juwita; Stafford, Lesley; van Weert, Henk C P M; Whooley, Mary A; Wittkampf, Karin A; Yeung, Albert S; Thombs, Brett D; Benedetti, Andrea
2017-11-01
Individual patient data (IPD) meta-analyses are increasingly common in the literature. In the context of estimating the diagnostic accuracy of ordinal or semi-continuous scale tests, sensitivity and specificity are often reported for a given threshold or a small set of thresholds, and a meta-analysis is conducted via a bivariate approach to account for their correlation. When IPD are available, sensitivity and specificity can be pooled for every possible threshold. Our objective was to compare the bivariate approach, which can be applied separately at every threshold, to two multivariate methods: the ordinal multivariate random-effects model and the Poisson correlated gamma-frailty model. Our comparison was empirical, using IPD from 13 studies that evaluated the diagnostic accuracy of the 9-item Patient Health Questionnaire depression screening tool, and included simulations. The empirical comparison showed that the implementation of the two multivariate methods is more laborious in terms of computational time and sensitivity to user-supplied values compared to the bivariate approach. Simulations showed that ignoring the within-study correlation of sensitivity and specificity across thresholds did not worsen inferences with the bivariate approach compared to the Poisson model. The ordinal approach was not suitable for simulations because the model was highly sensitive to user-supplied starting values. We tentatively recommend the bivariate approach rather than more complex multivariate methods for IPD diagnostic accuracy meta-analyses of ordinal scale tests, although the limited type of diagnostic data considered in the simulation study restricts the generalization of our findings. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Calypso: a user-friendly web-server for mining and visualizing microbiome-environment interactions.
Zakrzewski, Martha; Proietti, Carla; Ellis, Jonathan J; Hasan, Shihab; Brion, Marie-Jo; Berger, Bernard; Krause, Lutz
2017-03-01
Calypso is an easy-to-use online software suite that allows non-expert users to mine, interpret and compare taxonomic information from metagenomic or 16S rDNA datasets. Calypso has a focus on multivariate statistical approaches that can identify complex environment-microbiome associations. The software enables quantitative visualizations, statistical testing, multivariate analysis, supervised learning, factor analysis, multivariable regression, network analysis and diversity estimates. Comprehensive help pages, tutorials and videos are provided via a wiki page. The web-interface is accessible via http://cgenome.net/calypso/ . The software is programmed in Java, PERL and R and the source code is available from Zenodo ( https://zenodo.org/record/50931 ). The software is freely available for non-commercial users. l.krause@uq.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Hierarchical multivariate covariance analysis of metabolic connectivity.
Carbonell, Felix; Charil, Arnaud; Zijdenbos, Alex P; Evans, Alan C; Bedell, Barry J
2014-12-01
Conventional brain connectivity analysis is typically based on the assessment of interregional correlations. Given that correlation coefficients are derived from both covariance and variance, group differences in covariance may be obscured by differences in the variance terms. To facilitate a comprehensive assessment of connectivity, we propose a unified statistical framework that interrogates the individual terms of the correlation coefficient. We have evaluated the utility of this method for metabolic connectivity analysis using [18F]2-fluoro-2-deoxyglucose (FDG) positron emission tomography (PET) data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. As an illustrative example of the utility of this approach, we examined metabolic connectivity in angular gyrus and precuneus seed regions of mild cognitive impairment (MCI) subjects with low and high β-amyloid burdens. This new multivariate method allowed us to identify alterations in the metabolic connectome, which would not have been detected using classic seed-based correlation analysis. Ultimately, this novel approach should be extensible to brain network analysis and broadly applicable to other imaging modalities, such as functional magnetic resonance imaging (MRI).
Application of multivariate statistical techniques in microbial ecology
Paliy, O.; Shankar, V.
2016-01-01
Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large scale ecological datasets. Especially noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions, and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amounts of data, powerful statistical techniques of multivariate analysis are well suited to analyze and interpret these datasets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular dataset. In this review we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive, and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and dataset structure. PMID:26786791
NASA Astrophysics Data System (ADS)
Valder, J.; Kenner, S.; Long, A.
2008-12-01
Portions of the Cheyenne River are characterized as impaired by the U.S. Environmental Protection Agency because of water-quality exceedences. The Cheyenne River watershed includes the Black Hills National Forest and part of the Badlands National Park. Preliminary analysis indicates that the Badlands National Park is a major contributor to the exceedances of the water-quality constituents for total dissolved solids and total suspended solids. Water-quality data have been collected continuously since 2007, and in the second year of collection (2008), monthly grab and passive sediment samplers are being used to collect total suspended sediment and total dissolved solids in both base-flow and runoff-event conditions. In addition, sediment samples from the river channel, including bed, bank, and floodplain, have been collected. These samples are being analyzed at the South Dakota School of Mines and Technology's X-Ray Diffraction Lab to quantify the mineralogy of the sediments. A multivariate statistical approach (including principal components, least squares, and maximum likelihood techniques) is applied to the mineral percentages that were characterized for each site to identify the contributing source areas that are causing exceedances of sediment transport in the Cheyenne River watershed. Results of the multivariate analysis demonstrate the likely sources of solids found in the Cheyenne River samples. A further refinement of the methods is in progress that utilizes a conceptual model which, when applied with the multivariate statistical approach, provides a better estimate for sediment sources.
Multivariate Phylogenetic Comparative Methods: Evaluations, Comparisons, and Recommendations.
Adams, Dean C; Collyer, Michael L
2018-01-01
Recent years have seen increased interest in phylogenetic comparative analyses of multivariate data sets, but to date the varied proposed approaches have not been extensively examined. Here we review the mathematical properties required of any multivariate method, and specifically evaluate existing multivariate phylogenetic comparative methods in this context. Phylogenetic comparative methods based on the full multivariate likelihood are robust to levels of covariation among trait dimensions and are insensitive to the orientation of the data set, but display increasing model misspecification as the number of trait dimensions increases. This is because the expected evolutionary covariance matrix (V) used in the likelihood calculations becomes more ill-conditioned as trait dimensionality increases, and as evolutionary models become more complex. Thus, these approaches are only appropriate for data sets with few traits and many species. Methods that summarize patterns across trait dimensions treated separately (e.g., SURFACE) incorrectly assume independence among trait dimensions, resulting in nearly a 100% model misspecification rate. Methods using pairwise composite likelihood are highly sensitive to levels of trait covariation, the orientation of the data set, and the number of trait dimensions. The consequences of these debilitating deficiencies are that a user can arrive at differing statistical conclusions, and therefore biological inferences, simply from a dataspace rotation, like principal component analysis. By contrast, algebraic generalizations of the standard phylogenetic comparative toolkit that use the trace of covariance matrices are insensitive to levels of trait covariation, the number of trait dimensions, and the orientation of the data set. Further, when appropriate permutation tests are used, these approaches display acceptable Type I error and statistical power. We conclude that methods summarizing information across trait dimensions, as well as pairwise composite likelihood methods should be avoided, whereas algebraic generalizations of the phylogenetic comparative toolkit provide a useful means of assessing macroevolutionary patterns in multivariate data. Finally, we discuss areas in which multivariate phylogenetic comparative methods are still in need of future development; namely highly multivariate Ornstein-Uhlenbeck models and approaches for multivariate evolutionary model comparisons. © The Author(s) 2017. Published by Oxford University Press on behalf of the Systematic Biology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Simultaneous Two-Way Clustering of Multiple Correspondence Analysis
ERIC Educational Resources Information Center
Hwang, Heungsun; Dillon, William R.
2010-01-01
A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is…
ERIC Educational Resources Information Center
Montiel, Mariana; Wilhelmi, Miguel R.; Vidakovic, Draga; Elstak, Iwan
2012-01-01
In a previous study, the onto-semiotic approach was employed to analyse the mathematical notion of different coordinate systems, as well as some situations and university students' actions related to these coordinate systems in the context of multivariate calculus. This study approaches different coordinate systems through the process of change of…
Multivariate stochastic analysis for Monthly hydrological time series at Cuyahoga River Basin
NASA Astrophysics Data System (ADS)
zhang, L.
2011-12-01
Copula has become a very powerful statistic and stochastic methodology in case of the multivariate analysis in Environmental and Water resources Engineering. In recent years, the popular one-parameter Archimedean copulas, e.g. Gumbel-Houggard copula, Cook-Johnson copula, Frank copula, the meta-elliptical copula, e.g. Gaussian Copula, Student-T copula, etc. have been applied in multivariate hydrological analyses, e.g. multivariate rainfall (rainfall intensity, duration and depth), flood (peak discharge, duration and volume), and drought analyses (drought length, mean and minimum SPI values, and drought mean areal extent). Copula has also been applied in the flood frequency analysis at the confluences of river systems by taking into account the dependence among upstream gauge stations rather than by using the hydrological routing technique. In most of the studies above, the annual time series have been considered as stationary signal which the time series have been assumed as independent identically distributed (i.i.d.) random variables. But in reality, hydrological time series, especially the daily and monthly hydrological time series, cannot be considered as i.i.d. random variables due to the periodicity existed in the data structure. Also, the stationary assumption is also under question due to the Climate Change and Land Use and Land Cover (LULC) change in the fast years. To this end, it is necessary to revaluate the classic approach for the study of hydrological time series by relaxing the stationary assumption by the use of nonstationary approach. Also as to the study of the dependence structure for the hydrological time series, the assumption of same type of univariate distribution also needs to be relaxed by adopting the copula theory. In this paper, the univariate monthly hydrological time series will be studied through the nonstationary time series analysis approach. The dependence structure of the multivariate monthly hydrological time series will be studied through the copula theory. As to the parameter estimation, the maximum likelihood estimation (MLE) will be applied. To illustrate the method, the univariate time series model and the dependence structure will be determined and tested using the monthly discharge time series of Cuyahoga River Basin.
Malaquias, José B; Ramalho, Francisco S; Dos S Dias, Carlos T; Brugger, Bruno P; S Lira, Aline Cristina; Wilcken, Carlos F; Pachú, Jéssica K S; Zanuncio, José C
2017-02-09
The relationship between pests and natural enemies using multivariate analysis on cotton in different spacing has not been documented yet. Using multivariate approaches is possible to optimize strategies to control Aphis gossypii at different crop spacings because the possibility of a better use of the aphid sampling strategies as well as the conservation and release of its natural enemies. The aims of the study were (i) to characterize the temporal abundance data of aphids and its natural enemies using principal components, (ii) to analyze the degree of correlation between the insects and between groups of variables (pests and natural enemies), (iii) to identify the main natural enemies responsible for regulating A. gossypii populations, and (iv) to investigate the similarities in arthropod occurrence patterns at different spacings of cotton crops over two seasons. High correlations in the occurrence of Scymnus rubicundus with aphids are shown through principal component analysis and through the important role the species plays in canonical correlation analysis. Clustering the presence of apterous aphids matches the pattern verified for Chrysoperla externa at the three different spacings between rows. Our results indicate that S. rubicundus is the main candidate to regulate the aphid populations in all spacings studied.
Malaquias, José B.; Ramalho, Francisco S.; dos S. Dias, Carlos T.; Brugger, Bruno P.; S. Lira, Aline Cristina; Wilcken, Carlos F.; Pachú, Jéssica K. S.; Zanuncio, José C.
2017-01-01
The relationship between pests and natural enemies using multivariate analysis on cotton in different spacing has not been documented yet. Using multivariate approaches is possible to optimize strategies to control Aphis gossypii at different crop spacings because the possibility of a better use of the aphid sampling strategies as well as the conservation and release of its natural enemies. The aims of the study were (i) to characterize the temporal abundance data of aphids and its natural enemies using principal components, (ii) to analyze the degree of correlation between the insects and between groups of variables (pests and natural enemies), (iii) to identify the main natural enemies responsible for regulating A. gossypii populations, and (iv) to investigate the similarities in arthropod occurrence patterns at different spacings of cotton crops over two seasons. High correlations in the occurrence of Scymnus rubicundus with aphids are shown through principal component analysis and through the important role the species plays in canonical correlation analysis. Clustering the presence of apterous aphids matches the pattern verified for Chrysoperla externa at the three different spacings between rows. Our results indicate that S. rubicundus is the main candidate to regulate the aphid populations in all spacings studied. PMID:28181503
Fazeli, Bahare; Ravari, Hassan; Assadi, Reza
2012-08-01
The aim of this study was first to describe the natural history of Buerger's disease (BD) and then to discuss a clinical approach to this disease based on multivariate analysis. One hundred eight patients who corresponded with Shionoya's criteria were selected from 2000 to 2007 for this study. Major amputation was considered the ultimate adverse event. Survival analyses were performed by Kaplan-Meier curves. Independent variables including gender, duration of smoking, number of cigarettes smoked per day, minor amputation events and type of treatments, were determined by multivariate Cox regression analysis. The recorded data demonstrated that BD may present in four forms, including relapsing-remitting (75%), secondary progressive (4.6%), primary progressive (14.2%) and benign BD (6.2%). Most of the amputations occurred due to relapses within the six years after diagnosis of BD. In multivariate analysis, duration of smoking of more than 20 years had a significant relationship with further major amputation among patients with BD. Smoking cessation programs with experienced psychotherapists are strongly recommended for those areas in which Buerger's disease is common. Patients who have smoked for more than 20 years should be encouraged to quit smoking, but should also be recommended for more advanced treatment for limb salvage.
NASA Astrophysics Data System (ADS)
Malaquias, José B.; Ramalho, Francisco S.; Dos S. Dias, Carlos T.; Brugger, Bruno P.; S. Lira, Aline Cristina; Wilcken, Carlos F.; Pachú, Jéssica K. S.; Zanuncio, José C.
2017-02-01
The relationship between pests and natural enemies using multivariate analysis on cotton in different spacing has not been documented yet. Using multivariate approaches is possible to optimize strategies to control Aphis gossypii at different crop spacings because the possibility of a better use of the aphid sampling strategies as well as the conservation and release of its natural enemies. The aims of the study were (i) to characterize the temporal abundance data of aphids and its natural enemies using principal components, (ii) to analyze the degree of correlation between the insects and between groups of variables (pests and natural enemies), (iii) to identify the main natural enemies responsible for regulating A. gossypii populations, and (iv) to investigate the similarities in arthropod occurrence patterns at different spacings of cotton crops over two seasons. High correlations in the occurrence of Scymnus rubicundus with aphids are shown through principal component analysis and through the important role the species plays in canonical correlation analysis. Clustering the presence of apterous aphids matches the pattern verified for Chrysoperla externa at the three different spacings between rows. Our results indicate that S. rubicundus is the main candidate to regulate the aphid populations in all spacings studied.
Probabilistic, meso-scale flood loss modelling
NASA Astrophysics Data System (ADS)
Kreibich, Heidi; Botto, Anna; Schröter, Kai; Merz, Bruno
2016-04-01
Flood risk analyses are an important basis for decisions on flood risk management and adaptation. However, such analyses are associated with significant uncertainty, even more if changes in risk due to global change are expected. Although uncertainty analysis and probabilistic approaches have received increased attention during the last years, they are still not standard practice for flood risk assessments and even more for flood loss modelling. State of the art in flood loss modelling is still the use of simple, deterministic approaches like stage-damage functions. Novel probabilistic, multi-variate flood loss models have been developed and validated on the micro-scale using a data-mining approach, namely bagging decision trees (Merz et al. 2013). In this presentation we demonstrate and evaluate the upscaling of the approach to the meso-scale, namely on the basis of land-use units. The model is applied in 19 municipalities which were affected during the 2002 flood by the River Mulde in Saxony, Germany (Botto et al. submitted). The application of bagging decision tree based loss models provide a probability distribution of estimated loss per municipality. Validation is undertaken on the one hand via a comparison with eight deterministic loss models including stage-damage functions as well as multi-variate models. On the other hand the results are compared with official loss data provided by the Saxon Relief Bank (SAB). The results show, that uncertainties of loss estimation remain high. Thus, the significant advantage of this probabilistic flood loss estimation approach is that it inherently provides quantitative information about the uncertainty of the prediction. References: Merz, B.; Kreibich, H.; Lall, U. (2013): Multi-variate flood damage assessment: a tree-based data-mining approach. NHESS, 13(1), 53-64. Botto A, Kreibich H, Merz B, Schröter K (submitted) Probabilistic, multi-variable flood loss modelling on the meso-scale with BT-FLEMO. Risk Analysis.
A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants
Broadaway, K. Alaine; Cutler, David J.; Duncan, Richard; Moore, Jacob L.; Ware, Erin B.; Jhun, Min A.; Bielak, Lawrence F.; Zhao, Wei; Smith, Jennifer A.; Peyser, Patricia A.; Kardia, Sharon L.R.; Ghosh, Debashis; Epstein, Michael P.
2016-01-01
Increasing empirical evidence suggests that many genetic variants influence multiple distinct phenotypes. When cross-phenotype effects exist, multivariate association methods that consider pleiotropy are often more powerful than univariate methods that model each phenotype separately. Although several statistical approaches exist for testing cross-phenotype effects for common variants, there is a lack of similar tests for gene-based analysis of rare variants. In order to fill this important gap, we introduce a statistical method for cross-phenotype analysis of rare variants using a nonparametric distance-covariance approach that compares similarity in multivariate phenotypes to similarity in rare-variant genotypes across a gene. The approach can accommodate both binary and continuous phenotypes and further can adjust for covariates. Our approach yields a closed-form test whose significance can be evaluated analytically, thereby improving computational efficiency and permitting application on a genome-wide scale. We use simulated data to demonstrate that our method, which we refer to as the Gene Association with Multiple Traits (GAMuT) test, provides increased power over competing approaches. We also illustrate our approach using exome-chip data from the Genetic Epidemiology Network of Arteriopathy. PMID:26942286
Combining Correlation Matrices: Simulation Analysis of Improved Fixed-Effects Methods
ERIC Educational Resources Information Center
Hafdahl, Adam R.
2007-01-01
The originally proposed multivariate meta-analysis approach for correlation matrices--analyze Pearson correlations, with each study's observed correlations replacing their population counterparts in its conditional-covariance matrix--performs poorly. Two refinements are considered: Analyze Fisher Z-transformed correlations, and substitute better…
Williams, L. Keoki; Buu, Anne
2017-01-01
We propose a multivariate genome-wide association test for mixed continuous, binary, and ordinal phenotypes. A latent response model is used to estimate the correlation between phenotypes with different measurement scales so that the empirical distribution of the Fisher’s combination statistic under the null hypothesis is estimated efficiently. The simulation study shows that our proposed correlation estimation methods have high levels of accuracy. More importantly, our approach conservatively estimates the variance of the test statistic so that the type I error rate is controlled. The simulation also shows that the proposed test maintains the power at the level very close to that of the ideal analysis based on known latent phenotypes while controlling the type I error. In contrast, conventional approaches–dichotomizing all observed phenotypes or treating them as continuous variables–could either reduce the power or employ a linear regression model unfit for the data. Furthermore, the statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that conducting a multivariate test on multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests. The proposed method also offers a new approach to analyzing the Fagerström Test for Nicotine Dependence as multivariate phenotypes in genome-wide association studies. PMID:28081206
Kim, Sungduk; Chen, Ming-Hui; Ibrahim, Joseph G.; Shah, Arvind K.; Lin, Jianxin
2013-01-01
In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data (IPD) in meta-analysis. Our modeling formulation uses a multivariate normal response meta-analysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the Deviance Information Criterion (DIC) is used to select the best transformation model. Since the model is quite complex, a novel Monte Carlo Markov chain (MCMC) sampling scheme is developed to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol lowering drugs where the goal is to jointly model the three dimensional response consisting of Low Density Lipoprotein Cholesterol (LDL-C), High Density Lipoprotein Cholesterol (HDL-C), and Triglycerides (TG) (LDL-C, HDL-C, TG). Since the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately: however, a multivariate approach would be more appropriate since these variables are correlated with each other. A detailed analysis of these data is carried out using the proposed methodology. PMID:23580436
Kim, Sungduk; Chen, Ming-Hui; Ibrahim, Joseph G; Shah, Arvind K; Lin, Jianxin
2013-10-15
In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data in meta-analysis. Our modeling formulation uses a multivariate normal response meta-analysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the deviance information criterion is used to select the best transformation model. Because the model is quite complex, we develop a novel Monte Carlo Markov chain sampling scheme to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol-lowering drugs where the goal is to jointly model the three-dimensional response consisting of low density lipoprotein cholesterol (LDL-C), high density lipoprotein cholesterol (HDL-C), and triglycerides (TG) (LDL-C, HDL-C, TG). Because the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately; however, a multivariate approach would be more appropriate because these variables are correlated with each other. We carry out a detailed analysis of these data by using the proposed methodology. Copyright © 2013 John Wiley & Sons, Ltd.
Zhu, Hongxiao; Morris, Jeffrey S; Wei, Fengrong; Cox, Dennis D
2017-07-01
Many scientific studies measure different types of high-dimensional signals or images from the same subject, producing multivariate functional data. These functional measurements carry different types of information about the scientific process, and a joint analysis that integrates information across them may provide new insights into the underlying mechanism for the phenomenon under study. Motivated by fluorescence spectroscopy data in a cervical pre-cancer study, a multivariate functional response regression model is proposed, which treats multivariate functional observations as responses and a common set of covariates as predictors. This novel modeling framework simultaneously accounts for correlations between functional variables and potential multi-level structures in data that are induced by experimental design. The model is fitted by performing a two-stage linear transformation-a basis expansion to each functional variable followed by principal component analysis for the concatenated basis coefficients. This transformation effectively reduces the intra-and inter-function correlations and facilitates fast and convenient calculation. A fully Bayesian approach is adopted to sample the model parameters in the transformed space, and posterior inference is performed after inverse-transforming the regression coefficients back to the original data domain. The proposed approach produces functional tests that flag local regions on the functional effects, while controlling the overall experiment-wise error rate or false discovery rate. It also enables functional discriminant analysis through posterior predictive calculation. Analysis of the fluorescence spectroscopy data reveals local regions with differential expressions across the pre-cancer and normal samples. These regions may serve as biomarkers for prognosis and disease assessment.
An integrated phenomic approach to multivariate allelic association
Medland, Sarah Elizabeth; Neale, Michael Churton
2010-01-01
The increased feasibility of genome-wide association has resulted in association becoming the primary method used to localize genetic variants that cause phenotypic variation. Much attention has been focused on the vast multiple testing problems arising from analyzing large numbers of single nucleotide polymorphisms. However, the inflation of experiment-wise type I error rates through testing numerous phenotypes has received less attention. Multivariate analyses can be used to detect both pleiotropic effects that influence a latent common factor, and monotropic effects that operate at a variable-specific levels, whilst controlling for non-independence between phenotypes. In this study, we present a maximum likelihood approach, which combines both latent and variable-specific tests and which may be used with either individual or family data. Simulation results indicate that in the presence of factor-level association, the combined multivariate (CMV) analysis approach performs well with a minimal loss of power as compared with a univariate analysis of a factor or sum score (SS). As the deviation between the pattern of allelic effects and the factor loadings increases, the power of univariate analyses of both factor and SSs decreases dramatically, whereas the power of the CMV approach is maintained. We show the utility of the approach by examining the association between dopamine receptor D2 TaqIA and the initiation of marijuana, tranquilizers and stimulants in data from the Add Health Study. Perl scripts that takes ped and dat files as input and produces Mx scripts and data for running the CMV approach can be downloaded from www.vipbg.vcu.edu/~sarahme/WriteMx. PMID:19707246
Lie, Octavian V; van Mierlo, Pieter
2017-01-01
The visual interpretation of intracranial EEG (iEEG) is the standard method used in complex epilepsy surgery cases to map the regions of seizure onset targeted for resection. Still, visual iEEG analysis is labor-intensive and biased due to interpreter dependency. Multivariate parametric functional connectivity measures using adaptive autoregressive (AR) modeling of the iEEG signals based on the Kalman filter algorithm have been used successfully to localize the electrographic seizure onsets. Due to their high computational cost, these methods have been applied to a limited number of iEEG time-series (<60). The aim of this study was to test two Kalman filter implementations, a well-known multivariate adaptive AR model (Arnold et al. 1998) and a simplified, computationally efficient derivation of it, for their potential application to connectivity analysis of high-dimensional (up to 192 channels) iEEG data. When used on simulated seizures together with a multivariate connectivity estimator, the partial directed coherence, the two AR models were compared for their ability to reconstitute the designed seizure signal connections from noisy data. Next, focal seizures from iEEG recordings (73-113 channels) in three patients rendered seizure-free after surgery were mapped with the outdegree, a graph-theory index of outward directed connectivity. Simulation results indicated high levels of mapping accuracy for the two models in the presence of low-to-moderate noise cross-correlation. Accordingly, both AR models correctly mapped the real seizure onset to the resection volume. This study supports the possibility of conducting fully data-driven multivariate connectivity estimations on high-dimensional iEEG datasets using the Kalman filter approach.
Bathke, Arne C.; Friedrich, Sarah; Pauly, Markus; Konietschke, Frank; Staffen, Wolfgang; Strobl, Nicolas; Höller, Yvonne
2018-01-01
ABSTRACT To date, there is a lack of satisfactory inferential techniques for the analysis of multivariate data in factorial designs, when only minimal assumptions on the data can be made. Presently available methods are limited to very particular study designs or assume either multivariate normality or equal covariance matrices across groups, or they do not allow for an assessment of the interaction effects across within-subjects and between-subjects variables. We propose and methodologically validate a parametric bootstrap approach that does not suffer from any of the above limitations, and thus provides a rather general and comprehensive methodological route to inference for multivariate and repeated measures data. As an example application, we consider data from two different Alzheimer’s disease (AD) examination modalities that may be used for precise and early diagnosis, namely, single-photon emission computed tomography (SPECT) and electroencephalogram (EEG). These data violate the assumptions of classical multivariate methods, and indeed classical methods would not have yielded the same conclusions with regards to some of the factors involved. PMID:29565679
Keenan, Michael R; Smentkowski, Vincent S; Ulfig, Robert M; Oltman, Edward; Larson, David J; Kelly, Thomas F
2011-06-01
We demonstrate for the first time that multivariate statistical analysis techniques can be applied to atom probe tomography data to estimate the chemical composition of a sample at the full spatial resolution of the atom probe in three dimensions. Whereas the raw atom probe data provide the specific identity of an atom at a precise location, the multivariate results can be interpreted in terms of the probabilities that an atom representing a particular chemical phase is situated there. When aggregated to the size scale of a single atom (∼0.2 nm), atom probe spectral-image datasets are huge and extremely sparse. In fact, the average spectrum will have somewhat less than one total count per spectrum due to imperfect detection efficiency. These conditions, under which the variance in the data is completely dominated by counting noise, test the limits of multivariate analysis, and an extensive discussion of how to extract the chemical information is presented. Efficient numerical approaches to performing principal component analysis (PCA) on these datasets, which may number hundreds of millions of individual spectra, are put forward, and it is shown that PCA can be computed in a few seconds on a typical laptop computer.
Chen, Zhixiang; Shao, Peng; Sun, Qizhao; Zhao, Dong
2015-03-01
The purpose of the present study was to use a prospectively collected data to evaluate the rate of incidental durotomy (ID) during lumbar surgery and determine the associated risk factors by using univariate and multivariate analysis. We retrospectively reviewed 2184 patients who underwent lumbar surgery from January 1, 2009 to December 31, 2011 at a single hospital. Patients with ID (n=97) were compared with the patients without ID (n=2019). The influences of several potential risk factors that might affect the occurrence of ID were assessed using univariate and multivariate analyses. The overall incidence of ID was 4.62%. Univariate analysis demonstrated that older age, diabetes, lumbar central stenosis, posterior approach, revision surgery, prior lumber surgery and minimal invasive surgery are risk factors for ID during lumbar surgery. However, multivariate analysis identified older age, prior lumber surgery, revision surgery, and minimally invasive surgery as independent risk factors. Older age, prior lumber surgery, revision surgery, and minimal invasive surgery were independent risk factors for ID during lumbar surgery. These findings may guide clinicians making future surgical decisions regarding ID and aid in the patient counseling process to alleviate risks and complications. Copyright © 2015 Elsevier B.V. All rights reserved.
Wang, Fang-Xu; Yuan, Jian-Chao; Kang, Li-Ping; Pang, Xu; Yan, Ren-Yi; Zhao, Yang; Zhang, Jie; Sun, Xin-Guang; Ma, Bai-Ping
2016-09-10
An ultra high-performance liquid chromatography quadrupole time-of-flight tandem mass spectrometry approach coupled with multivariate statistical analysis was established and applied to rapidly distinguish the chemical differences between fibrous root and rhizome of Anemarrhena asphodeloides. The datasets of tR-m/z pairs, ion intensity and sample code were processed by principal component analysis and orthogonal partial least squares discriminant analysis. Chemical markers could be identified based on their exact mass data, fragmentation characteristics, and retention times. And the new compounds among chemical markers could be isolated rapidly guided by the ultra high-performance liquid chromatography quadrupole time-of-flight tandem mass spectrometry and their definitive structures would be further elucidated by NMR spectra. Using this approach, twenty-four markers were identified on line including nine new saponins and five new steroidal saponins of them were obtained in pure form. The study validated this proposed approach as a suitable method for identification of the chemical differences between various medicinal parts in order to expand medicinal parts and increase the utilization rate of resources. Copyright © 2016 Elsevier B.V. All rights reserved.
Marini, Federico; de Beer, Dalene; Walters, Nico A; de Villiers, André; Joubert, Elizabeth; Walczak, Beata
2017-03-17
An ultimate goal of investigations of rooibos plant material subjected to different stages of fermentation is to identify the chemical changes taking place in the phenolic composition, using an untargeted approach and chromatographic fingerprints. Realization of this goal requires, among others, identification of the main components of the plant material involved in chemical reactions during the fermentation process. Quantitative chromatographic data for the compounds for extracts of green, semi-fermented and fermented rooibos form the basis of preliminary study following a targeted approach. The aim is to estimate whether treatment has a significant effect based on all quantified compounds and to identify the compounds, which contribute significantly to it. Analysis of variance is performed using modern multivariate methods such as ANOVA-Simultaneous Component Analysis, ANOVA - Target Projection and regularized MANOVA. This study is the first one in which all three approaches are compared and evaluated. For the data studied, all tree methods reveal the same significance of the fermentation effect on the extract compositions, but they lead to its different interpretation. Copyright © 2017 Elsevier B.V. All rights reserved.
Race and Older Mothers’ Differentiation: A Sequential Quantitative and Qualitative Analysis
Sechrist, Jori; Suitor, J. Jill; Riffin, Catherine; Taylor-Watson, Kadari; Pillemer, Karl
2011-01-01
The goal of this paper is to demonstrate a process by which qualitative and quantitative approaches are combined to reveal patterns in the data that are unlikely to be detected and confirmed by either method alone. Specifically, we take a sequential approach to combining qualitative and quantitative data to explore race differences in how mothers differentiate among their adult children. We began with a standard multivariate analysis examining race differences in mothers’ differentiation among their adult children regarding emotional closeness and confiding. Finding no race differences in this analysis, we conducted an in-depth comparison of the Black and White mothers’ narratives to determine whether there were underlying patterns that we had been unable to detect in our first analysis. Using this method, we found that Black mothers were substantially more likely than White mothers to emphasize interpersonal relationships within the family when describing differences among their children. In our final step, we developed a measure of familism based on the qualitative data and conducted a multivariate analysis to confirm the patterns revealed by the in-depth comparison of the mother’s narratives. We conclude that using such a sequential mixed methods approach to data analysis has the potential to shed new light on complex family relations. PMID:21967639
NASA Astrophysics Data System (ADS)
DSouza, Adora M.; Abidin, Anas Z.; Leistritz, Lutz; Wismüller, Axel
2017-02-01
We investigate the applicability of large-scale Granger Causality (lsGC) for extracting a measure of multivariate information flow between pairs of regional brain activities from resting-state functional MRI (fMRI) and test the effectiveness of these measures for predicting a disease state. Such pairwise multivariate measures of interaction provide high-dimensional representations of connectivity profiles for each subject and are used in a machine learning task to distinguish between healthy controls and individuals presenting with symptoms of HIV Associated Neurocognitive Disorder (HAND). Cognitive impairment in several domains can occur as a result of HIV infection of the central nervous system. The current paradigm for assessing such impairment is through neuropsychological testing. With fMRI data analysis, we aim at non-invasively capturing differences in brain connectivity patterns between healthy subjects and subjects presenting with symptoms of HAND. To classify the extracted interaction patterns among brain regions, we use a prototype-based learning algorithm called Generalized Matrix Learning Vector Quantization (GMLVQ). Our approach to characterize connectivity using lsGC followed by GMLVQ for subsequent classification yields good prediction results with an accuracy of 87% and an area under the ROC curve (AUC) of up to 0.90. We obtain a statistically significant improvement (p<0.01) over a conventional Granger causality approach (accuracy = 0.76, AUC = 0.74). High accuracy and AUC values using our multivariate method to connectivity analysis suggests that our approach is able to better capture changes in interaction patterns between different brain regions when compared to conventional Granger causality analysis known from the literature.
Tailored multivariate analysis for modulated enhanced diffraction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Caliandro, Rocco; Guccione, Pietro; Nico, Giovanni
2015-10-21
Modulated enhanced diffraction (MED) is a technique allowing the dynamic structural characterization of crystalline materials subjected to an external stimulus, which is particularly suited forin situandoperandostructural investigations at synchrotron sources. Contributions from the (active) part of the crystal system that varies synchronously with the stimulus can be extracted by an offline analysis, which can only be applied in the case of periodic stimuli and linear system responses. In this paper a new decomposition approach based on multivariate analysis is proposed. The standard principal component analysis (PCA) is adapted to treat MED data: specific figures of merit based on their scoresmore » and loadings are found, and the directions of the principal components obtained by PCA are modified to maximize such figures of merit. As a result, a general method to decompose MED data, called optimum constrained components rotation (OCCR), is developed, which produces very precise results on simulated data, even in the case of nonperiodic stimuli and/or nonlinear responses. The multivariate analysis approach is able to supply in one shot both the diffraction pattern related to the active atoms (through the OCCR loadings) and the time dependence of the system response (through the OCCR scores). When applied to real data, OCCR was able to supply only the latter information, as the former was hindered by changes in abundances of different crystal phases, which occurred besides structural variations in the specific case considered. To develop a decomposition procedure able to cope with this combined effect represents the next challenge in MED analysis.« less
Motivational Profiles of Adult Learners
ERIC Educational Resources Information Center
Rothes, Ana; Lemos, Marina S.; Gonçalves, Teresa
2017-01-01
This study investigated profiles of autonomous and controlled motivation and their effects in a sample of 188 adult learners from two Portuguese urban areas. Using a person-centered approach, results of cluster analysis and multivariate analysis of covariance revealed four motivational groups with different effects in self-efficacy, engagement,…
A Review of Multivariate Methods for Multimodal Fusion of Brain Imaging Data
Adali, Tülay; Yu, Qingbao; Calhoun, Vince D.
2011-01-01
The development of various neuroimaging techniques is rapidly improving the measurements of brain function/structure. However, despite improvements in individual modalities, it is becoming increasingly clear that the most effective research approaches will utilize multi-modal fusion, which takes advantage of the fact that each modality provides a limited view of the brain. The goal of multimodal fusion is to capitalize on the strength of each modality in a joint analysis, rather than a separate analysis of each. This is a more complicated endeavor that must be approached more carefully and efficient methods should be developed to draw generalized and valid conclusions from high dimensional data with a limited number of subjects. Numerous research efforts have been reported in the field based on various statistical approaches, e.g. independent component analysis (ICA), canonical correlation analysis (CCA) and partial least squares (PLS). In this review paper, we survey a number of multivariate methods appearing in previous reports, which are performed with or without prior information and may have utility for identifying potential brain illness biomarkers. We also discuss the possible strengths and limitations of each method, and review their applications to brain imaging data. PMID:22108139
Valverde-Som, Lucia; Ruiz-Samblás, Cristina; Rodríguez-García, Francisco P; Cuadros-Rodríguez, Luis
2018-02-09
Virgin olive oil is the only food product for which sensory analysis is regulated to classify it in different quality categories. To harmonize the results of the sensorial method, the use of standards or reference materials is crucial. The stability of sensory reference materials is required to enable their suitable control, aiming to confirm that their specific target values are maintained on an ongoing basis. Currently, such stability is monitored by means of sensory analysis and the sensory panels are in the paradoxical situation of controlling the standards that are devoted to controlling the panels. In the present study, several approaches based on similarity analysis are exploited. For each approach, the specific methodology to build a proper multivariate control chart to monitor the stability of the sensory properties is explained and discussed. The normalized Euclidean and Mahalanobis distances, the so-called nearness and hardiness indices respectively, have been defined as new similarity indices to range the values from 0 to 1. Also, the squared mean from Hotelling's T 2 -statistic and Q 2 -statistic has been proposed as another similarity index. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.
Multivariate analysis of cytokine profiles in pregnancy complications.
Azizieh, Fawaz; Dingle, Kamaludin; Raghupathy, Raj; Johnson, Kjell; VanderPlas, Jacob; Ansari, Ali
2018-03-01
The immunoregulation to tolerate the semiallogeneic fetus during pregnancy includes a harmonious dynamic balance between anti- and pro-inflammatory cytokines. Several earlier studies reported significantly different levels and/or ratios of several cytokines in complicated pregnancy as compared to normal pregnancy. However, as cytokines operate in networks with potentially complex interactions, it is also interesting to compare groups with multi-cytokine data sets, with multivariate analysis. Such analysis will further examine how great the differences are, and which cytokines are more different than others. Various multivariate statistical tools, such as Cramer test, classification and regression trees, partial least squares regression figures, 2-dimensional Kolmogorov-Smirmov test, principal component analysis and gap statistic, were used to compare cytokine data of normal vs anomalous groups of different pregnancy complications. Multivariate analysis assisted in examining if the groups were different, how strongly they differed, in what ways they differed and further reported evidence for subgroups in 1 group (pregnancy-induced hypertension), possibly indicating multiple causes for the complication. This work contributes to a better understanding of cytokines interaction and may have important implications on targeting cytokine balance modulation or design of future medications or interventions that best direct management or prevention from an immunological approach. © 2018 The Authors. American Journal of Reproductive Immunology Published by John Wiley & Sons Ltd.
Wang, Yong; Yao, Xiaomei; Parthasarathy, Ranganathan
2008-01-01
Fourier transform infrared (FTIR) chemical imaging can be used to investigate molecular chemical features of the adhesive/dentin interfaces. However, the information is not straightforward, and is not easily extracted. The objective of this study was to use multivariate analysis methods, principal component analysis and fuzzy c-means clustering, to analyze spectral data in comparison with univariate analysis. The spectral imaging data collected from both the adhesive/healthy dentin and adhesive/caries-affected dentin specimens were used and compared. The univariate statistical methods such as mapping of intensities of specific functional group do not always accurately identify functional group locations and concentrations due to more or less band overlapping in adhesive and dentin. Apart from the ease with which information can be extracted, multivariate methods highlight subtle and often important changes in the spectra that are difficult to observe using univariate methods. The results showed that the multivariate methods gave more satisfactory, interpretable results than univariate methods and were conclusive in showing that they can discriminate and classify differences between healthy dentin and caries-affected dentin within the interfacial regions. It is demonstrated that the multivariate FTIR imaging approaches can be used in the rapid characterization of heterogeneous, complex structure. PMID:18980198
Enhancing e-waste estimates: Improving data quality by multivariate Input–Output Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Feng, E-mail: fwang@unu.edu; Design for Sustainability Lab, Faculty of Industrial Design Engineering, Delft University of Technology, Landbergstraat 15, 2628CE Delft; Huisman, Jaco
2013-11-15
Highlights: • A multivariate Input–Output Analysis method for e-waste estimates is proposed. • Applying multivariate analysis to consolidate data can enhance e-waste estimates. • We examine the influence of model selection and data quality on e-waste estimates. • Datasets of all e-waste related variables in a Dutch case study have been provided. • Accurate modeling of time-variant lifespan distributions is critical for estimate. - Abstract: Waste electrical and electronic equipment (or e-waste) is one of the fastest growing waste streams, which encompasses a wide and increasing spectrum of products. Accurate estimation of e-waste generation is difficult, mainly due to lackmore » of high quality data referred to market and socio-economic dynamics. This paper addresses how to enhance e-waste estimates by providing techniques to increase data quality. An advanced, flexible and multivariate Input–Output Analysis (IOA) method is proposed. It links all three pillars in IOA (product sales, stock and lifespan profiles) to construct mathematical relationships between various data points. By applying this method, the data consolidation steps can generate more accurate time-series datasets from available data pool. This can consequently increase the reliability of e-waste estimates compared to the approach without data processing. A case study in the Netherlands is used to apply the advanced IOA model. As a result, for the first time ever, complete datasets of all three variables for estimating all types of e-waste have been obtained. The result of this study also demonstrates significant disparity between various estimation models, arising from the use of data under different conditions. It shows the importance of applying multivariate approach and multiple sources to improve data quality for modelling, specifically using appropriate time-varying lifespan parameters. Following the case study, a roadmap with a procedural guideline is provided to enhance e-waste estimation studies.« less
Chapat, Ludivine; Hilaire, Florence; Bouvet, Jérome; Pialot, Daniel; Philippe-Reversat, Corinne; Guiot, Anne-Laure; Remolue, Lydie; Lechenet, Jacques; Andreoni, Christine; Poulet, Hervé; Day, Michael J; De Luca, Karelle; Cariou, Carine; Cupillard, Lionel
2017-07-01
The assessment of vaccine combinations, or the evaluation of the impact of minor modifications of one component in well-established vaccines, requires animal challenges in the absence of previously validated correlates of protection. As an alternative, we propose conducting a multivariate analysis of the specific immune response to the vaccine. This approach is consistent with the principles of the 3Rs (Refinement, Reduction and Replacement) and avoids repeating efficacy studies based on infectious challenges in vivo. To validate this approach, a set of nine immunological parameters was selected in order to characterize B and T lymphocyte responses against canine rabies virus and to evaluate the compatibility between two canine vaccines, an inactivated rabies vaccine (RABISIN ® ) and a combined vaccine (EURICAN ® DAPPi-Lmulti) injected at two different sites in the same animals. The analysis was focused on the magnitude and quality of the immune response. The multi-dimensional picture given by this 'immune fingerprint' was used to assess the impact of the concomitant injection of the combined vaccine on the immunogenicity of the rabies vaccine. A principal component analysis fully discriminated the control group from the groups vaccinated with RABISIN ® alone or RABISIN ® +EURICAN ® DAPPi-Lmulti and confirmed the compatibility between the rabies vaccines. This study suggests that determining the immune fingerprint, combined with a multivariate statistical analysis, is a promising approach to characterizing the immunogenicity of a vaccine with an established record of efficacy. It may also avoid the need to repeat efficacy studies involving challenge infection in case of minor modifications of the vaccine or for compatibility studies. Copyright © 2017 Elsevier B.V. All rights reserved.
Martyna, Agnieszka; Zadora, Grzegorz; Neocleous, Tereza; Michalska, Aleksandra; Dean, Nema
2016-08-10
Many chemometric tools are invaluable and have proven effective in data mining and substantial dimensionality reduction of highly multivariate data. This becomes vital for interpreting various physicochemical data due to rapid development of advanced analytical techniques, delivering much information in a single measurement run. This concerns especially spectra, which are frequently used as the subject of comparative analysis in e.g. forensic sciences. In the presented study the microtraces collected from the scenarios of hit-and-run accidents were analysed. Plastic containers and automotive plastics (e.g. bumpers, headlamp lenses) were subjected to Fourier transform infrared spectrometry and car paints were analysed using Raman spectroscopy. In the forensic context analytical results must be interpreted and reported according to the standards of the interpretation schemes acknowledged in forensic sciences using the likelihood ratio approach. However, for proper construction of LR models for highly multivariate data, such as spectra, chemometric tools must be employed for substantial data compression. Conversion from classical feature representation to distance representation was proposed for revealing hidden data peculiarities and linear discriminant analysis was further applied for minimising the within-sample variability while maximising the between-sample variability. Both techniques enabled substantial reduction of data dimensionality. Univariate and multivariate likelihood ratio models were proposed for such data. It was shown that the combination of chemometric tools and the likelihood ratio approach is capable of solving the comparison problem of highly multivariate and correlated data after proper extraction of the most relevant features and variance information hidden in the data structure. Copyright © 2016 Elsevier B.V. All rights reserved.
Detecting synchronization clusters in multivariate time series via coarse-graining of Markov chains.
Allefeld, Carsten; Bialonski, Stephan
2007-12-01
Synchronization cluster analysis is an approach to the detection of underlying structures in data sets of multivariate time series, starting from a matrix R of bivariate synchronization indices. A previous method utilized the eigenvectors of R for cluster identification, analogous to several recent attempts at group identification using eigenvectors of the correlation matrix. All of these approaches assumed a one-to-one correspondence of dominant eigenvectors and clusters, which has however been shown to be wrong in important cases. We clarify the usefulness of eigenvalue decomposition for synchronization cluster analysis by translating the problem into the language of stochastic processes, and derive an enhanced clustering method harnessing recent insights from the coarse-graining of finite-state Markov processes. We illustrate the operation of our method using a simulated system of coupled Lorenz oscillators, and we demonstrate its superior performance over the previous approach. Finally we investigate the question of robustness of the algorithm against small sample size, which is important with regard to field applications.
NASA Technical Reports Server (NTRS)
Aires, Filipe; Rossow, William B.; Hansen, James E. (Technical Monitor)
2001-01-01
A new approach is presented for the analysis of feedback processes in a nonlinear dynamical system by observing its variations. The new methodology consists of statistical estimates of the sensitivities between all pairs of variables in the system based on a neural network modeling of the dynamical system. The model can then be used to estimate the instantaneous, multivariate and nonlinear sensitivities, which are shown to be essential for the analysis of the feedbacks processes involved in the dynamical system. The method is described and tested on synthetic data from the low-order Lorenz circulation model where the correct sensitivities can be evaluated analytically.
Hierl, L.A.; Loftin, C.S.; Longcore, J.R.; McAuley, D.G.; Urban, D.L.
2007-01-01
We assessed changes in vegetative structure of 49 impoundments at Moosehorn National Wildlife Refuge (MNWR), Maine, USA, between the periods 1984-1985 to 2002 with a multivariate, adaptive approach that may be useful in a variety of wetland and other habitat management situations. We used Mahalanobis Distance (MD) analysis to classify the refuge?s wetlands as poor or good waterbird habitat based on five variables: percent emergent vegetation, percent shrub, percent open water, relative richness of vegetative types, and an interspersion juxtaposition index that measures adjacency of vegetation patches. Mahalanobis Distance is a multivariate statistic that examines whether a particular data point is an outlier or a member of a data cluster while accounting for correlations among inputs. For each wetland, we used MD analysis to quantify a distance from a reference condition defined a priori by habitat conditions measured in MNWR wetlands used by waterbirds. Twenty-five wetlands declined in quality between the two periods, whereas 23 wetlands improved. We identified specific wetland characteristics that may be modified to improve habitat conditions for waterbirds. The MD analysis seems ideal for instituting an adaptive wetland management approach because metrics can be easily added or removed, ranges of target habitat conditions can be defined by field-collected data, and the analysis can identify priorities for single or multiple management objectives.
The Multi-Isotope Process (MIP) Monitor Project: FY13 Final Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meier, David E.; Coble, Jamie B.; Jordan, David V.
The Multi-Isotope Process (MIP) Monitor provides an efficient approach to monitoring the process conditions in reprocessing facilities in support of the goal of “… (minimization of) the risks of nuclear proliferation and terrorism.” The MIP Monitor measures the distribution of the radioactive isotopes in product and waste streams of a nuclear reprocessing facility. These isotopes are monitored online by gamma spectrometry and compared, in near-real-time, to spectral patterns representing “normal” process conditions using multivariate analysis and pattern recognition algorithms. The combination of multivariate analysis and gamma spectroscopy allows us to detect small changes in the gamma spectrum, which may indicatemore » changes in process conditions. By targeting multiple gamma-emitting indicator isotopes, the MIP Monitor approach is compatible with the use of small, portable, relatively high-resolution gamma detectors that may be easily deployed throughout an existing facility. The automated multivariate analysis can provide a level of data obscurity, giving a built-in information barrier to protect sensitive or proprietary operational data. Proof-of-concept simulations and experiments have been performed in previous years to demonstrate the validity of this tool in a laboratory setting for systems representing aqueous reprocessing facilities. However, pyroprocessing is emerging as an alternative to aqueous reprocessing techniques.« less
Application of multivariate statistical techniques in microbial ecology.
Paliy, O; Shankar, V
2016-03-01
Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure. © 2016 John Wiley & Sons Ltd.
The intervals method: a new approach to analyse finite element outputs using multivariate statistics
De Esteban-Trivigno, Soledad; Püschel, Thomas A.; Fortuny, Josep
2017-01-01
Background In this paper, we propose a new method, named the intervals’ method, to analyse data from finite element models in a comparative multivariate framework. As a case study, several armadillo mandibles are analysed, showing that the proposed method is useful to distinguish and characterise biomechanical differences related to diet/ecomorphology. Methods The intervals’ method consists of generating a set of variables, each one defined by an interval of stress values. Each variable is expressed as a percentage of the area of the mandible occupied by those stress values. Afterwards these newly generated variables can be analysed using multivariate methods. Results Applying this novel method to the biological case study of whether armadillo mandibles differ according to dietary groups, we show that the intervals’ method is a powerful tool to characterize biomechanical performance and how this relates to different diets. This allows us to positively discriminate between specialist and generalist species. Discussion We show that the proposed approach is a useful methodology not affected by the characteristics of the finite element mesh. Additionally, the positive discriminating results obtained when analysing a difficult case study suggest that the proposed method could be a very useful tool for comparative studies in finite element analysis using multivariate statistical approaches. PMID:29043107
Hierarchical multivariate covariance analysis of metabolic connectivity
Carbonell, Felix; Charil, Arnaud; Zijdenbos, Alex P; Evans, Alan C; Bedell, Barry J
2014-01-01
Conventional brain connectivity analysis is typically based on the assessment of interregional correlations. Given that correlation coefficients are derived from both covariance and variance, group differences in covariance may be obscured by differences in the variance terms. To facilitate a comprehensive assessment of connectivity, we propose a unified statistical framework that interrogates the individual terms of the correlation coefficient. We have evaluated the utility of this method for metabolic connectivity analysis using [18F]2-fluoro-2-deoxyglucose (FDG) positron emission tomography (PET) data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. As an illustrative example of the utility of this approach, we examined metabolic connectivity in angular gyrus and precuneus seed regions of mild cognitive impairment (MCI) subjects with low and high β-amyloid burdens. This new multivariate method allowed us to identify alterations in the metabolic connectome, which would not have been detected using classic seed-based correlation analysis. Ultimately, this novel approach should be extensible to brain network analysis and broadly applicable to other imaging modalities, such as functional magnetic resonance imaging (MRI). PMID:25294129
Kurosawa, R N F; do Amaral Junior, A T; Silva, F H L; Dos Santos, A; Vivas, M; Kamphorst, S H; Pena, G F
2017-02-08
The multivariate analyses are useful tools to estimate the genetic variability between accessions. In the breeding programs, the Ward-Modified Location Model (MLM) multivariate method has been a powerful strategy to quantify variability using quantitative and qualitative variables simultaneously. The present study was proposed in view of the dearth of information about popcorn breeding programs under a multivariate approach using the Ward-MLM methodology. The objective of this study was thus to estimate the genetic diversity among 37 genotypes of popcorn aiming to identify divergent groups associated with morpho-agronomic traits and traits related to resistance to Fusarium spp. To this end, 7 qualitative and 17 quantitative variables were analyzed. The experiment was conducted in 2014, at Universidade Estadual do Norte Fluminense, located in Campos dos Goytacazes, RJ, Brazil. The Ward-MLM strategy allowed the identification of four groups as follows: Group I with 10 genotypes, Group II with 11 genotypes, Group III with 9 genotypes, and Group IV with 7 genotypes. Group IV was distant in relation to the other groups, while groups I, II, and III were near. The crosses between genotypes from the other groups with those of group IV allow an exploitation of heterosis. The Ward-MLM strategy provided an appropriate grouping of genotypes; ear weight, ear diameter, and grain yield were the traits that most contributed to the analysis of genetic diversity.
Multivariate spatial models of excess crash frequency at area level: case of Costa Rica.
Aguero-Valverde, Jonathan
2013-10-01
Recently, areal models of crash frequency have being used in the analysis of various area-wide factors affecting road crashes. On the other hand, disease mapping methods are commonly used in epidemiology to assess the relative risk of the population at different spatial units. A natural next step is to combine these two approaches to estimate the excess crash frequency at area level as a measure of absolute crash risk. Furthermore, multivariate spatial models of crash severity are explored in order to account for both frequency and severity of crashes and control for the spatial correlation frequently found in crash data. This paper aims to extent the concept of safety performance functions to be used in areal models of crash frequency. A multivariate spatial model is used for that purpose and compared to its univariate counterpart. Full Bayes hierarchical approach is used to estimate the models of crash frequency at canton level for Costa Rica. An intrinsic multivariate conditional autoregressive model is used for modeling spatial random effects. The results show that the multivariate spatial model performs better than its univariate counterpart in terms of the penalized goodness-of-fit measure Deviance Information Criteria. Additionally, the effects of the spatial smoothing due to the multivariate spatial random effects are evident in the estimation of excess equivalent property damage only crashes. Copyright © 2013 Elsevier Ltd. All rights reserved.
A multivariate time series approach to modeling and forecasting demand in the emergency department.
Jones, Spencer S; Evans, R Scott; Allen, Todd L; Thomas, Alun; Haug, Peter J; Welch, Shari J; Snow, Gregory L
2009-02-01
The goals of this investigation were to study the temporal relationships between the demands for key resources in the emergency department (ED) and the inpatient hospital, and to develop multivariate forecasting models. Hourly data were collected from three diverse hospitals for the year 2006. Descriptive analysis and model fitting were carried out using graphical and multivariate time series methods. Multivariate models were compared to a univariate benchmark model in terms of their ability to provide out-of-sample forecasts of ED census and the demands for diagnostic resources. Descriptive analyses revealed little temporal interaction between the demand for inpatient resources and the demand for ED resources at the facilities considered. Multivariate models provided more accurate forecasts of ED census and of the demands for diagnostic resources. Our results suggest that multivariate time series models can be used to reliably forecast ED patient census; however, forecasts of the demands for diagnostic resources were not sufficiently reliable to be useful in the clinical setting.
NASA Astrophysics Data System (ADS)
Gourdol, L.; Hissler, C.; Pfister, L.
2012-04-01
The Luxembourg sandstone aquifer is of major relevance for the national supply of drinking water in Luxembourg. The city of Luxembourg (20% of the country's population) gets almost 2/3 of its drinking water from this aquifer. As a consequence, the study of both the groundwater hydrochemistry, as well as its spatial and temporal variations, are considered as of highest priority. Since 2005, a monitoring network has been implemented by the Water Department of Luxembourg City, with a view to a more sustainable management of this strategic water resource. The data collected to date forms a large and complex dataset, describing spatial and temporal variations of many hydrochemical parameters. The data treatment issue is tightly connected to this kind of water monitoring programs and complex databases. Standard multivariate statistical techniques, such as principal components analysis and hierarchical cluster analysis, have been widely used as unbiased methods for extracting meaningful information from groundwater quality data and are now classically used in many hydrogeological studies, in particular to characterize temporal or spatial hydrochemical variations induced by natural and anthropogenic factors. But these classical multivariate methods deal with two-way matrices, usually parameters/sites or parameters/time, while often the dataset resulting from qualitative water monitoring programs should be seen as a datacube parameters/sites/time. Three-way matrices, such as the one we propose here, are difficult to handle and to analyse by classical multivariate statistical tools and thus should be treated with approaches dealing with three-way data structures. One possible analysis approach consists in the use of partial triadic analysis (PTA). The PTA was previously used with success in many ecological studies but never to date in the domain of hydrogeology. Applied to the dataset of the Luxembourg Sandstone aquifer, the PTA appears as a new promising statistical instrument for hydrogeologists, in particular to characterize temporal and spatial hydrochemical variations induced by natural and anthropogenic factors. This new approach for groundwater management offers potential for 1) identifying a common multivariate spatial structure, 2) untapping the different hydrochemical patterns and explaining their controlling factors and 3) analysing the temporal variability of this structure and grasping hydrochemical changes.
Gao, Wen; Yang, Hua; Qi, Lian-Wen; Liu, E-Hu; Ren, Mei-Ting; Yan, Yu-Ting; Chen, Jun; Li, Ping
2012-07-06
Plant-based medicines become increasingly popular over the world. Authentication of herbal raw materials is important to ensure their safety and efficacy. Some herbs belonging to closely related species but differing in medicinal properties are difficult to be identified because of similar morphological and microscopic characteristics. Chromatographic fingerprinting is an alternative method to distinguish them. Existing approaches do not allow a comprehensive analysis for herbal authentication. We have now developed a strategy consisting of (1) full metabolic profiling of herbal medicines by rapid resolution liquid chromatography (RRLC) combined with quadrupole time-of-flight mass spectrometry (QTOF MS), (2) global analysis of non-targeted compounds by molecular feature extraction algorithm, (3) multivariate statistical analysis for classification and prediction, and (4) marker compounds characterization. This approach has provided a fast and unbiased comparative multivariate analysis of the metabolite composition of 33-batch samples covering seven Lonicera species. Individual metabolic profiles are performed at the level of molecular fragments without prior structural assignment. In the entire set, the obtained classifier for seven Lonicera species flower buds showed good prediction performance and a total of 82 statistically different components were rapidly obtained by the strategy. The elemental compositions of discriminative metabolites were characterized by the accurate mass measurement of the pseudomolecular ions and their chemical types were assigned by the MS/MS spectra. The high-resolution, comprehensive and unbiased strategy for metabolite data analysis presented here is powerful and opens the new direction of authentication in herbal analysis. Copyright © 2012 Elsevier B.V. All rights reserved.
Domingo-Almenara, Xavier; Perera, Alexandre; Brezmes, Jesus
2016-11-25
Gas chromatography-mass spectrometry (GC-MS) produces large and complex datasets characterized by co-eluted compounds and at trace levels, and with a distinct compound ion-redundancy as a result of the high fragmentation by the electron impact ionization. Compounds in GC-MS can be resolved by taking advantage of the multivariate nature of GC-MS data by applying multivariate resolution methods. However, multivariate methods have to be applied in small regions of the chromatogram, and therefore chromatograms are segmented prior to the application of the algorithms. The automation of this segmentation process is a challenging task as it implies separating between informative data and noise from the chromatogram. This study demonstrates the capabilities of independent component analysis-orthogonal signal deconvolution (ICA-OSD) and multivariate curve resolution-alternating least squares (MCR-ALS) with an overlapping moving window implementation to avoid the typical hard chromatographic segmentation. Also, after being resolved, compounds are aligned across samples by an automated alignment algorithm. We evaluated the proposed methods through a quantitative analysis of GC-qTOF MS data from 25 serum samples. The quantitative performance of both moving window ICA-OSD and MCR-ALS-based implementations was compared with the quantification of 33 compounds by the XCMS package. Results shown that most of the R 2 coefficients of determination exhibited a high correlation (R 2 >0.90) in both ICA-OSD and MCR-ALS moving window-based approaches. Copyright © 2016 Elsevier B.V. All rights reserved.
Stamate, Mirela Cristina; Todor, Nicolae; Cosgarea, Marcel
2015-01-01
The clinical utility of otoacoustic emissions as a noninvasive objective test of cochlear function has been long studied. Both transient otoacoustic emissions and distorsion products can be used to identify hearing loss, but to what extent they can be used as predictors for hearing loss is still debated. Most studies agree that multivariate analyses have better test performances than univariate analyses. The aim of the study was to determine transient otoacoustic emissions and distorsion products performance in identifying normal and impaired hearing loss, using the pure tone audiogram as a gold standard procedure and different multivariate statistical approaches. The study included 105 adult subjects with normal hearing and hearing loss who underwent the same test battery: pure-tone audiometry, tympanometry, otoacoustic emission tests. We chose to use the logistic regression as a multivariate statistical technique. Three logistic regression models were developed to characterize the relations between different risk factors (age, sex, tinnitus, demographic features, cochlear status defined by otoacoustic emissions) and hearing status defined by pure-tone audiometry. The multivariate analyses allow the calculation of the logistic score, which is a combination of the inputs, weighted by coefficients, calculated within the analyses. The accuracy of each model was assessed using receiver operating characteristics curve analysis. We used the logistic score to generate receivers operating curves and to estimate the areas under the curves in order to compare different multivariate analyses. We compared the performance of each otoacoustic emission (transient, distorsion product) using three different multivariate analyses for each ear, when multi-frequency gold standards were used. We demonstrated that all multivariate analyses provided high values of the area under the curve proving the performance of the otoacoustic emissions. Each otoacoustic emission test presented high values of area under the curve, suggesting that implementing a multivariate approach to evaluate the performances of each otoacoustic emission test would serve to increase the accuracy in identifying the normal and impaired ears. We encountered the highest area under the curve value for the combined multivariate analysis suggesting that both otoacoustic emission tests should be used in assessing hearing status. Our multivariate analyses revealed that age is a constant predictor factor of the auditory status for both ears, but the presence of tinnitus was the most important predictor for the hearing level, only for the left ear. Age presented similar coefficients, but tinnitus coefficients, by their high value, produced the highest variations of the logistic scores, only for the left ear group, thus increasing the risk of hearing loss. We did not find gender differences between ears for any otoacoustic emission tests, but studies still debate this question as the results are contradictory. Neither gender, nor environment origin had any predictive value for the hearing status, according to the results of our study. Like any other audiological test, using otoacoustic emissions to identify hearing loss is not without error. Even when applying multivariate analysis, perfect test performance is never achieved. Although most studies demonstrated the benefit of using the multivariate analysis, it has not been incorporated into clinical decisions maybe because of the idiosyncratic nature of multivariate solutions or because of the lack of the validation studies.
STAMATE, MIRELA CRISTINA; TODOR, NICOLAE; COSGAREA, MARCEL
2015-01-01
Background and aim The clinical utility of otoacoustic emissions as a noninvasive objective test of cochlear function has been long studied. Both transient otoacoustic emissions and distorsion products can be used to identify hearing loss, but to what extent they can be used as predictors for hearing loss is still debated. Most studies agree that multivariate analyses have better test performances than univariate analyses. The aim of the study was to determine transient otoacoustic emissions and distorsion products performance in identifying normal and impaired hearing loss, using the pure tone audiogram as a gold standard procedure and different multivariate statistical approaches. Methods The study included 105 adult subjects with normal hearing and hearing loss who underwent the same test battery: pure-tone audiometry, tympanometry, otoacoustic emission tests. We chose to use the logistic regression as a multivariate statistical technique. Three logistic regression models were developed to characterize the relations between different risk factors (age, sex, tinnitus, demographic features, cochlear status defined by otoacoustic emissions) and hearing status defined by pure-tone audiometry. The multivariate analyses allow the calculation of the logistic score, which is a combination of the inputs, weighted by coefficients, calculated within the analyses. The accuracy of each model was assessed using receiver operating characteristics curve analysis. We used the logistic score to generate receivers operating curves and to estimate the areas under the curves in order to compare different multivariate analyses. Results We compared the performance of each otoacoustic emission (transient, distorsion product) using three different multivariate analyses for each ear, when multi-frequency gold standards were used. We demonstrated that all multivariate analyses provided high values of the area under the curve proving the performance of the otoacoustic emissions. Each otoacoustic emission test presented high values of area under the curve, suggesting that implementing a multivariate approach to evaluate the performances of each otoacoustic emission test would serve to increase the accuracy in identifying the normal and impaired ears. We encountered the highest area under the curve value for the combined multivariate analysis suggesting that both otoacoustic emission tests should be used in assessing hearing status. Our multivariate analyses revealed that age is a constant predictor factor of the auditory status for both ears, but the presence of tinnitus was the most important predictor for the hearing level, only for the left ear. Age presented similar coefficients, but tinnitus coefficients, by their high value, produced the highest variations of the logistic scores, only for the left ear group, thus increasing the risk of hearing loss. We did not find gender differences between ears for any otoacoustic emission tests, but studies still debate this question as the results are contradictory. Neither gender, nor environment origin had any predictive value for the hearing status, according to the results of our study. Conclusion Like any other audiological test, using otoacoustic emissions to identify hearing loss is not without error. Even when applying multivariate analysis, perfect test performance is never achieved. Although most studies demonstrated the benefit of using the multivariate analysis, it has not been incorporated into clinical decisions maybe because of the idiosyncratic nature of multivariate solutions or because of the lack of the validation studies. PMID:26733749
Augustin, Regina; Lichtenthaler, Stefan F.; Greeff, Michael; Hansen, Jens; Wurst, Wolfgang; Trümbach, Dietrich
2011-01-01
The molecular mechanisms and genetic risk factors underlying Alzheimer's disease (AD) pathogenesis are only partly understood. To identify new factors, which may contribute to AD, different approaches are taken including proteomics, genetics, and functional genomics. Here, we used a bioinformatics approach and found that distinct AD-related genes share modules of transcription factor binding sites, suggesting a transcriptional coregulation. To detect additional coregulated genes, which may potentially contribute to AD, we established a new bioinformatics workflow with known multivariate methods like support vector machines, biclustering, and predicted transcription factor binding site modules by using in silico analysis and over 400 expression arrays from human and mouse. Two significant modules are composed of three transcription factor families: CTCF, SP1F, and EGRF/ZBPF, which are conserved between human and mouse APP promoter sequences. The specific combination of in silico promoter and multivariate analysis can identify regulation mechanisms of genes involved in multifactorial diseases. PMID:21559189
Demanuele, Charmaine; Bähner, Florian; Plichta, Michael M; Kirsch, Peter; Tost, Heike; Meyer-Lindenberg, Andreas; Durstewitz, Daniel
2015-01-01
Multivariate pattern analysis can reveal new information from neuroimaging data to illuminate human cognition and its disturbances. Here, we develop a methodological approach, based on multivariate statistical/machine learning and time series analysis, to discern cognitive processing stages from functional magnetic resonance imaging (fMRI) blood oxygenation level dependent (BOLD) time series. We apply this method to data recorded from a group of healthy adults whilst performing a virtual reality version of the delayed win-shift radial arm maze (RAM) task. This task has been frequently used to study working memory and decision making in rodents. Using linear classifiers and multivariate test statistics in conjunction with time series bootstraps, we show that different cognitive stages of the task, as defined by the experimenter, namely, the encoding/retrieval, choice, reward and delay stages, can be statistically discriminated from the BOLD time series in brain areas relevant for decision making and working memory. Discrimination of these task stages was significantly reduced during poor behavioral performance in dorsolateral prefrontal cortex (DLPFC), but not in the primary visual cortex (V1). Experimenter-defined dissection of time series into class labels based on task structure was confirmed by an unsupervised, bottom-up approach based on Hidden Markov Models. Furthermore, we show that different groupings of recorded time points into cognitive event classes can be used to test hypotheses about the specific cognitive role of a given brain region during task execution. We found that whilst the DLPFC strongly differentiated between task stages associated with different memory loads, but not between different visual-spatial aspects, the reverse was true for V1. Our methodology illustrates how different aspects of cognitive information processing during one and the same task can be separated and attributed to specific brain regions based on information contained in multivariate patterns of voxel activity.
A Unified Framework for Association Analysis with Multiple Related Phenotypes
Stephens, Matthew
2013-01-01
We consider the problem of assessing associations between multiple related outcome variables, and a single explanatory variable of interest. This problem arises in many settings, including genetic association studies, where the explanatory variable is genotype at a genetic variant. We outline a framework for conducting this type of analysis, based on Bayesian model comparison and model averaging for multivariate regressions. This framework unifies several common approaches to this problem, and includes both standard univariate and standard multivariate association tests as special cases. The framework also unifies the problems of testing for associations and explaining associations – that is, identifying which outcome variables are associated with genotype. This provides an alternative to the usual, but conceptually unsatisfying, approach of resorting to univariate tests when explaining and interpreting significant multivariate findings. The method is computationally tractable genome-wide for modest numbers of phenotypes (e.g. 5–10), and can be applied to summary data, without access to raw genotype and phenotype data. We illustrate the methods on both simulated examples, and to a genome-wide association study of blood lipid traits where we identify 18 potential novel genetic associations that were not identified by univariate analyses of the same data. PMID:23861737
A Baseline for the Multivariate Comparison of Resting-State Networks
Allen, Elena A.; Erhardt, Erik B.; Damaraju, Eswar; Gruner, William; Segall, Judith M.; Silva, Rogers F.; Havlicek, Martin; Rachakonda, Srinivas; Fries, Jill; Kalyanam, Ravi; Michael, Andrew M.; Caprihan, Arvind; Turner, Jessica A.; Eichele, Tom; Adelsheim, Steven; Bryan, Angela D.; Bustillo, Juan; Clark, Vincent P.; Feldstein Ewing, Sarah W.; Filbey, Francesca; Ford, Corey C.; Hutchison, Kent; Jung, Rex E.; Kiehl, Kent A.; Kodituwakku, Piyadasa; Komesu, Yuko M.; Mayer, Andrew R.; Pearlson, Godfrey D.; Phillips, John P.; Sadek, Joseph R.; Stevens, Michael; Teuscher, Ursina; Thoma, Robert J.; Calhoun, Vince D.
2011-01-01
As the size of functional and structural MRI datasets expands, it becomes increasingly important to establish a baseline from which diagnostic relevance may be determined, a processing strategy that efficiently prepares data for analysis, and a statistical approach that identifies important effects in a manner that is both robust and reproducible. In this paper, we introduce a multivariate analytic approach that optimizes sensitivity and reduces unnecessary testing. We demonstrate the utility of this mega-analytic approach by identifying the effects of age and gender on the resting-state networks (RSNs) of 603 healthy adolescents and adults (mean age: 23.4 years, range: 12–71 years). Data were collected on the same scanner, preprocessed using an automated analysis pipeline based in SPM, and studied using group independent component analysis. RSNs were identified and evaluated in terms of three primary outcome measures: time course spectral power, spatial map intensity, and functional network connectivity. Results revealed robust effects of age on all three outcome measures, largely indicating decreases in network coherence and connectivity with increasing age. Gender effects were of smaller magnitude but suggested stronger intra-network connectivity in females and more inter-network connectivity in males, particularly with regard to sensorimotor networks. These findings, along with the analysis approach and statistical framework described here, provide a useful baseline for future investigations of brain networks in health and disease. PMID:21442040
Jamshidi-Zanjani, Ahmad; Saeedi, Mohsen
2017-07-01
Vertical distribution of metals (Cu, Zn, Cr, Fe, Mn, Pb, Ni, Cd, and Li) in four sediment core samples (C 1 , C 2 , C 3 , and C 4 ) from Anzali international wetland located southwest of the Caspian Sea was examined. Background concentration of each metal was calculated according to different statistical approaches. The results of multivariate statistical analysis showed that Fe and Mn might have significant role in the fate of Ni and Zn in sediment core samples. Different sediment quality indexes were utilized to assess metal pollution in sediment cores. Moreover, a new sediment quality index named aggregative toxicity index (ATI) based on sediment quality guidelines (SQGs) was developed to assess the degree of metal toxicity in an aggregative manner. The increasing pattern of metal pollution and their toxicity degree in upper layers of core samples indicated increasing effects of anthropogenic sources in the study area.
NASA Astrophysics Data System (ADS)
Gu, Huaying; Liu, Zhixue; Weng, Yingliang
2017-04-01
The present study applies the multivariate generalized autoregressive conditional heteroscedasticity (MGARCH) with spatial effects approach for the analysis of the time-varying conditional correlations and contagion effects among global real estate markets. A distinguishing feature of the proposed model is that it can simultaneously capture the spatial interactions and the dynamic conditional correlations compared with the traditional MGARCH models. Results reveal that the estimated dynamic conditional correlations have exhibited significant increases during the global financial crisis from 2007 to 2009, thereby suggesting contagion effects among global real estate markets. The analysis further indicates that the returns of the regional real estate markets that are in close geographic and economic proximities exhibit strong co-movement. In addition, evidence of significantly positive leverage effects in global real estate markets is also determined. The findings have significant implications on global portfolio diversification opportunities and risk management practices.
NASA Astrophysics Data System (ADS)
Sadegh, Mojtaba; Ragno, Elisa; AghaKouchak, Amir
2017-06-01
We present a newly developed Multivariate Copula Analysis Toolbox (MvCAT) which includes a wide range of copula families with different levels of complexity. MvCAT employs a Bayesian framework with a residual-based Gaussian likelihood function for inferring copula parameters and estimating the underlying uncertainties. The contribution of this paper is threefold: (a) providing a Bayesian framework to approximate the predictive uncertainties of fitted copulas, (b) introducing a hybrid-evolution Markov Chain Monte Carlo (MCMC) approach designed for numerical estimation of the posterior distribution of copula parameters, and (c) enabling the community to explore a wide range of copulas and evaluate them relative to the fitting uncertainties. We show that the commonly used local optimization methods for copula parameter estimation often get trapped in local minima. The proposed method, however, addresses this limitation and improves describing the dependence structure. MvCAT also enables evaluation of uncertainties relative to the length of record, which is fundamental to a wide range of applications such as multivariate frequency analysis.
Tailored multivariate analysis for modulated enhanced diffraction
Caliandro, Rocco; Guccione, Pietro; Nico, Giovanni; ...
2015-10-21
Modulated enhanced diffraction (MED) is a technique allowing the dynamic structural characterization of crystalline materials subjected to an external stimulus, which is particularly suited forin situandoperandostructural investigations at synchrotron sources. Contributions from the (active) part of the crystal system that varies synchronously with the stimulus can be extracted by an offline analysis, which can only be applied in the case of periodic stimuli and linear system responses. In this paper a new decomposition approach based on multivariate analysis is proposed. The standard principal component analysis (PCA) is adapted to treat MED data: specific figures of merit based on their scoresmore » and loadings are found, and the directions of the principal components obtained by PCA are modified to maximize such figures of merit. As a result, a general method to decompose MED data, called optimum constrained components rotation (OCCR), is developed, which produces very precise results on simulated data, even in the case of nonperiodic stimuli and/or nonlinear responses. Furthermore, the multivariate analysis approach is able to supply in one shot both the diffraction pattern related to the active atoms (through the OCCR loadings) and the time dependence of the system response (through the OCCR scores). Furthermore, when applied to real data, OCCR was able to supply only the latter information, as the former was hindered by changes in abundances of different crystal phases, which occurred besides structural variations in the specific case considered. In order to develop a decomposition procedure able to cope with this combined effect represents the next challenge in MED analysis.« less
Varekar, Vikas; Karmakar, Subhankar; Jha, Ramakar
2016-02-01
The design of surface water quality sampling location is a crucial decision-making process for rationalization of monitoring network. The quantity, quality, and types of available dataset (watershed characteristics and water quality data) may affect the selection of appropriate design methodology. The modified Sanders approach and multivariate statistical techniques [particularly factor analysis (FA)/principal component analysis (PCA)] are well-accepted and widely used techniques for design of sampling locations. However, their performance may vary significantly with quantity, quality, and types of available dataset. In this paper, an attempt has been made to evaluate performance of these techniques by accounting the effect of seasonal variation, under a situation of limited water quality data but extensive watershed characteristics information, as continuous and consistent river water quality data is usually difficult to obtain, whereas watershed information may be made available through application of geospatial techniques. A case study of Kali River, Western Uttar Pradesh, India, is selected for the analysis. The monitoring was carried out at 16 sampling locations. The discrete and diffuse pollution loads at different sampling sites were estimated and accounted using modified Sanders approach, whereas the monitored physical and chemical water quality parameters were utilized as inputs for FA/PCA. The designed optimum number of sampling locations for monsoon and non-monsoon seasons by modified Sanders approach are eight and seven while that for FA/PCA are eleven and nine, respectively. Less variation in the number and locations of designed sampling sites were obtained by both techniques, which shows stability of results. A geospatial analysis has also been carried out to check the significance of designed sampling location with respect to river basin characteristics and land use of the study area. Both methods are equally efficient; however, modified Sanders approach outperforms FA/PCA when limited water quality and extensive watershed information is available. The available water quality dataset is limited and FA/PCA-based approach fails to identify monitoring locations with higher variation, as these multivariate statistical approaches are data-driven. The priority/hierarchy and number of sampling sites designed by modified Sanders approach are well justified by the land use practices and observed river basin characteristics of the study area.
A General Approach for Estimating Scale Score Reliability for Panel Survey Data
ERIC Educational Resources Information Center
Biemer, Paul P.; Christ, Sharon L.; Wiesen, Christopher A.
2009-01-01
Scale score measures are ubiquitous in the psychological literature and can be used as both dependent and independent variables in data analysis. Poor reliability of scale score measures leads to inflated standard errors and/or biased estimates, particularly in multivariate analysis. Reliability estimation is usually an integral step to assess…
Tuan Pham; Julia Jones; Ronald Metoyer; Frederick Colwell
2014-01-01
The study of the diversity of multivariate objects shares common characteristics and goals across disciplines, including ecology and organizational management. Nevertheless, subject-matter experts have adopted somewhat separate diversity concepts and analysis techniques, limiting the potential for sharing and comparing across disciplines. Moreover, while large and...
Indic, Premananda; Bloch-Salisbury, Elisabeth; Bednarek, Frank; Brown, Emery N; Paydarfar, David; Barbieri, Riccardo
2011-07-01
Cardio-respiratory interactions are weak at the earliest stages of human development, suggesting that assessment of their presence and integrity may be an important indicator of development in infants. Despite the valuable research devoted to infant development, there is still a need for specifically targeted standards and methods to assess cardiopulmonary functions in the early stages of life. We present a new methodological framework for the analysis of cardiovascular variables in preterm infants. Our approach is based on a set of mathematical tools that have been successful in quantifying important cardiovascular control mechanisms in adult humans, here specifically adapted to reflect the physiology of the developing cardiovascular system. We applied our methodology in a study of cardio-respiratory responses for 11 preterm infants. We quantified cardio-respiratory interactions using specifically tailored multivariate autoregressive analysis and calculated the coherence as well as gain using causal approaches. The significance of the interactions in each subject was determined by surrogate data analysis. The method was tested in control conditions as well as in two different experimental conditions; with and without use of mild mechanosensory intervention. Our multivariate analysis revealed a significantly higher coherence, as confirmed by surrogate data analysis, in the frequency range associated with eupneic breathing compared to the other ranges. Our analysis validates the models behind our new approaches, and our results confirm the presence of cardio-respiratory coupling in early stages of development, particularly during periods of mild mechanosensory intervention, thus encouraging further application of our approach. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Snell, Kym I E; Hua, Harry; Debray, Thomas P A; Ensor, Joie; Look, Maxime P; Moons, Karel G M; Riley, Richard D
2016-01-01
Our aim was to improve meta-analysis methods for summarizing a prediction model's performance when individual participant data are available from multiple studies for external validation. We suggest multivariate meta-analysis for jointly synthesizing calibration and discrimination performance, while accounting for their correlation. The approach estimates a prediction model's average performance, the heterogeneity in performance across populations, and the probability of "good" performance in new populations. This allows different implementation strategies (e.g., recalibration) to be compared. Application is made to a diagnostic model for deep vein thrombosis (DVT) and a prognostic model for breast cancer mortality. In both examples, multivariate meta-analysis reveals that calibration performance is excellent on average but highly heterogeneous across populations unless the model's intercept (baseline hazard) is recalibrated. For the cancer model, the probability of "good" performance (defined by C statistic ≥0.7 and calibration slope between 0.9 and 1.1) in a new population was 0.67 with recalibration but 0.22 without recalibration. For the DVT model, even with recalibration, there was only a 0.03 probability of "good" performance. Multivariate meta-analysis can be used to externally validate a prediction model's calibration and discrimination performance across multiple populations and to evaluate different implementation strategies. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.
Ramdani, Sofiane; Bonnet, Vincent; Tallon, Guillaume; Lagarde, Julien; Bernard, Pierre Louis; Blain, Hubert
2016-08-01
Entropy measures are often used to quantify the regularity of postural sway time series. Recent methodological developments provided both multivariate and multiscale approaches allowing the extraction of complexity features from physiological signals; see "Dynamical complexity of human responses: A multivariate data-adaptive framework," in Bulletin of Polish Academy of Science and Technology, vol. 60, p. 433, 2012. The resulting entropy measures are good candidates for the analysis of bivariate postural sway signals exhibiting nonstationarity and multiscale properties. These methods are dependant on several input parameters such as embedding parameters. Using two data sets collected from institutionalized frail older adults, we numerically investigate the behavior of a recent multivariate and multiscale entropy estimator; see "Multivariate multiscale entropy: A tool for complexity analysis of multichannel data," Physics Review E, vol. 84, p. 061918, 2011. We propose criteria for the selection of the input parameters. Using these optimal parameters, we statistically compare the multivariate and multiscale entropy values of postural sway data of non-faller subjects to those of fallers. These two groups are discriminated by the resulting measures over multiple time scales. We also demonstrate that the typical parameter settings proposed in the literature lead to entropy measures that do not distinguish the two groups. This last result confirms the importance of the selection of appropriate input parameters.
A Machine Learning Approach to Automated Gait Analysis for the Noldus Catwalk System.
Frohlich, Holger; Claes, Kasper; De Wolf, Catherine; Van Damme, Xavier; Michel, Anne
2018-05-01
Gait analysis of animal disease models can provide valuable insights into in vivo compound effects and thus help in preclinical drug development. The purpose of this paper is to establish a computational gait analysis approach for the Noldus Catwalk system, in which footprints are automatically captured and stored. We present a - to our knowledge - first machine learning based approach for the Catwalk system, which comprises a step decomposition, definition and extraction of meaningful features, multivariate step sequence alignment, feature selection, and training of different classifiers (gradient boosting machine, random forest, and elastic net). Using animal-wise leave-one-out cross validation we demonstrate that with our method we can reliable separate movement patterns of a putative Parkinson's disease animal model and several control groups. Furthermore, we show that we can predict the time point after and the type of different brain lesions and can even forecast the brain region, where the intervention was applied. We provide an in-depth analysis of the features involved into our classifiers via statistical techniques for model interpretation. A machine learning method for automated analysis of data from the Noldus Catwalk system was established. Our works shows the ability of machine learning to discriminate pharmacologically relevant animal groups based on their walking behavior in a multivariate manner. Further interesting aspects of the approach include the ability to learn from past experiments, improve with more data arriving and to make predictions for single animals in future studies.
Mathew, Boby; Holand, Anna Marie; Koistinen, Petri; Léon, Jens; Sillanpää, Mikko J
2016-02-01
A novel reparametrization-based INLA approach as a fast alternative to MCMC for the Bayesian estimation of genetic parameters in multivariate animal model is presented. Multi-trait genetic parameter estimation is a relevant topic in animal and plant breeding programs because multi-trait analysis can take into account the genetic correlation between different traits and that significantly improves the accuracy of the genetic parameter estimates. Generally, multi-trait analysis is computationally demanding and requires initial estimates of genetic and residual correlations among the traits, while those are difficult to obtain. In this study, we illustrate how to reparametrize covariance matrices of a multivariate animal model/animal models using modified Cholesky decompositions. This reparametrization-based approach is used in the Integrated Nested Laplace Approximation (INLA) methodology to estimate genetic parameters of multivariate animal model. Immediate benefits are: (1) to avoid difficulties of finding good starting values for analysis which can be a problem, for example in Restricted Maximum Likelihood (REML); (2) Bayesian estimation of (co)variance components using INLA is faster to execute than using Markov Chain Monte Carlo (MCMC) especially when realized relationship matrices are dense. The slight drawback is that priors for covariance matrices are assigned for elements of the Cholesky factor but not directly to the covariance matrix elements as in MCMC. Additionally, we illustrate the concordance of the INLA results with the traditional methods like MCMC and REML approaches. We also present results obtained from simulated data sets with replicates and field data in rice.
Barimani, Shirin; Kleinebudde, Peter
2017-10-01
A multivariate analysis method, Science-Based Calibration (SBC), was used for the first time for endpoint determination of a tablet coating process using Raman data. Two types of tablet cores, placebo and caffeine cores, received a coating suspension comprising a polyvinyl alcohol-polyethylene glycol graft-copolymer and titanium dioxide to a maximum coating thickness of 80µm. Raman spectroscopy was used as in-line PAT tool. The spectra were acquired every minute and correlated to the amount of applied aqueous coating suspension. SBC was compared to another well-known multivariate analysis method, Partial Least Squares-regression (PLS) and a simpler approach, Univariate Data Analysis (UVDA). All developed calibration models had coefficient of determination values (R 2 ) higher than 0.99. The coating endpoints could be predicted with root mean square errors (RMSEP) less than 3.1% of the applied coating suspensions. Compared to PLS and UVDA, SBC proved to be an alternative multivariate calibration method with high predictive power. Copyright © 2017 Elsevier B.V. All rights reserved.
Targeted metabolomic profiling in rat tissues reveals sex differences.
Ruoppolo, Margherita; Caterino, Marianna; Albano, Lucia; Pecce, Rita; Di Girolamo, Maria Grazia; Crisci, Daniela; Costanzo, Michele; Milella, Luigi; Franconi, Flavia; Campesi, Ilaria
2018-03-16
Sex differences affect several diseases and are organ-and parameter-specific. In humans and animals, sex differences also influence the metabolism and homeostasis of amino acids and fatty acids, which are linked to the onset of diseases. Thus, the use of targeted metabolite profiles in tissues represents a powerful approach to examine the intermediary metabolism and evidence for any sex differences. To clarify the sex-specific activities of liver, heart and kidney tissues, we used targeted metabolomics, linear discriminant analysis (LDA), principal component analysis (PCA), cluster analysis and linear correlation models to evaluate sex and organ-specific differences in amino acids, free carnitine and acylcarnitine levels in male and female Sprague-Dawley rats. Several intra-sex differences affect tissues, indicating that metabolite profiles in rat hearts, livers and kidneys are organ-dependent. Amino acids and carnitine levels in rat hearts, livers and kidneys are affected by sex: male and female hearts show the greatest sexual dimorphism, both qualitatively and quantitatively. Finally, multivariate analysis confirmed the influence of sex on the metabolomics profiling. Our data demonstrate that the metabolomics approach together with a multivariate approach can capture the dynamics of physiological and pathological states, which are essential for explaining the basis of the sex differences observed in physiological and pathological conditions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ladd-Lively, Jennifer L
2014-01-01
The objective of this work was to determine the feasibility of using on-line multivariate statistical process control (MSPC) for safeguards applications in natural uranium conversion plants. Multivariate statistical process control is commonly used throughout industry for the detection of faults. For safeguards applications in uranium conversion plants, faults could include the diversion of intermediate products such as uranium dioxide, uranium tetrafluoride, and uranium hexafluoride. This study was limited to a 100 metric ton of uranium (MTU) per year natural uranium conversion plant (NUCP) using the wet solvent extraction method for the purification of uranium ore concentrate. A key component inmore » the multivariate statistical methodology is the Principal Component Analysis (PCA) approach for the analysis of data, development of the base case model, and evaluation of future operations. The PCA approach was implemented through the use of singular value decomposition of the data matrix where the data matrix represents normal operation of the plant. Component mole balances were used to model each of the process units in the NUCP. However, this approach could be applied to any data set. The monitoring framework developed in this research could be used to determine whether or not a diversion of material has occurred at an NUCP as part of an International Atomic Energy Agency (IAEA) safeguards system. This approach can be used to identify the key monitoring locations, as well as locations where monitoring is unimportant. Detection limits at the key monitoring locations can also be established using this technique. Several faulty scenarios were developed to test the monitoring framework after the base case or normal operating conditions of the PCA model were established. In all of the scenarios, the monitoring framework was able to detect the fault. Overall this study was successful at meeting the stated objective.« less
ERIC Educational Resources Information Center
Chang, Chi-Cheng; Chou, Pao-Nan; Liang, Chaoyan
2018-01-01
The purpose of the present study was to examine the effects of the ePortfolio-based learning approach (ePBLA) on knowledge sharing and creation with 92 college students majoring in electrical engineering as the participants. Multivariate analysis of covariance (MANCOVA) with a covariance of pretest on knowledge sharing and creation was conducted…
Beiras, Ricardo; Durán, Iria
2014-12-01
Some relevant shortcomings have been identified in the current approach for the classification of ecological status in marine water bodies, leading to delays in the fulfillment of the Water Framework Directive objectives. Natural variability makes difficult to settle fixed reference values and boundary values for the Ecological Quality Ratios (EQR) for the biological quality elements. Biological responses to environmental degradation are frequently of nonmonotonic nature, hampering the EQR approach. Community structure traits respond only once ecological damage has already been done and do not provide early warning signals. An alternative methodology for the classification of ecological status integrating chemical measurements, ecotoxicological bioassays and community structure traits (species richness and diversity), and using multivariate analyses (multidimensional scaling and cluster analysis), is proposed. This approach does not depend on the arbitrary definition of fixed reference values and EQR boundary values, and it is suitable to integrate nonlinear, sensitive signals of ecological degradation. As a disadvantage, this approach demands the inclusion of sampling sites representing the full range of ecological status in each monitoring campaign. National or international agencies in charge of coastal pollution monitoring have comprehensive data sets available to overcome this limitation.
Yang, James J; Williams, L Keoki; Buu, Anne
2017-08-24
A multivariate genome-wide association test is proposed for analyzing data on multivariate quantitative phenotypes collected from related subjects. The proposed method is a two-step approach. The first step models the association between the genotype and marginal phenotype using a linear mixed model. The second step uses the correlation between residuals of the linear mixed model to estimate the null distribution of the Fisher combination test statistic. The simulation results show that the proposed method controls the type I error rate and is more powerful than the marginal tests across different population structures (admixed or non-admixed) and relatedness (related or independent). The statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that applying the multivariate association test may facilitate identification of the pleiotropic genes contributing to the risk for alcohol dependence commonly expressed by four correlated phenotypes. This study proposes a multivariate method for identifying pleiotropic genes while adjusting for cryptic relatedness and population structure between subjects. The two-step approach is not only powerful but also computationally efficient even when the number of subjects and the number of phenotypes are both very large.
2011-01-01
Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook’s distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards. PMID:21966586
Keithley, Richard B; Wightman, R Mark
2011-06-07
Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook's distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards.
Ma, Emily; Vetter, Joel; Bliss, Laura; Lai, H. Henry; Mysorekar, Indira U.
2016-01-01
Overactive bladder (OAB) is a common debilitating bladder condition with unknown etiology and limited diagnostic modalities. Here, we explored a novel high-throughput and unbiased multiplex approach with cellular and molecular components in a well-characterized patient cohort to identify biomarkers that could be reliably used to distinguish OAB from controls or provide insights into underlying etiology. As a secondary analysis, we determined whether this method could discriminate between OAB and other chronic bladder conditions. We analyzed plasma samples from healthy volunteers (n = 19) and patients diagnosed with OAB, interstitial cystitis/bladder pain syndrome (IC/BPS), or urinary tract infections (UTI; n = 51) for proinflammatory, chemokine, cytokine, angiogenesis, and vascular injury factors using Meso Scale Discovery (MSD) analysis and urinary cytological analysis. Wilcoxon rank-sum tests were used to perform univariate and multivariate comparisons between patient groups (controls, OAB, IC/BPS, and UTI). Multivariate logistic regression models were fit for each MSD analyte on 1) OAB patients and controls, 2) OAB and IC/BPS patients, and 3) OAB and UTI patients. Age, race, and sex were included as independent variables in all multivariate analysis. Receiver operating characteristic (ROC) curves were generated to determine the diagnostic potential of a given analyte. Our findings demonstrate that five analytes, i.e., interleukin 4, TNF-α, macrophage inflammatory protein-1β, serum amyloid A, and Tie2 can reliably differentiate OAB relative to controls and can be used to distinguish OAB from the other conditions. Together, our pilot study suggests a molecular imbalance in inflammatory proteins may contribute to OAB pathogenesis. PMID:27029431
Advanced multivariate analysis to assess remediation of hydrocarbons in soils.
Lin, Deborah S; Taylor, Peter; Tibbett, Mark
2014-10-01
Accurate monitoring of degradation levels in soils is essential in order to understand and achieve complete degradation of petroleum hydrocarbons in contaminated soils. We aimed to develop the use of multivariate methods for the monitoring of biodegradation of diesel in soils and to determine if diesel contaminated soils could be remediated to a chemical composition similar to that of an uncontaminated soil. An incubation experiment was set up with three contrasting soil types. Each soil was exposed to diesel at varying stages of degradation and then analysed for key hydrocarbons throughout 161 days of incubation. Hydrocarbon distributions were analysed by Principal Coordinate Analysis and similar samples grouped by cluster analysis. Variation and differences between samples were determined using permutational multivariate analysis of variance. It was found that all soils followed trajectories approaching the chemical composition of the unpolluted soil. Some contaminated soils were no longer significantly different to that of uncontaminated soil after 161 days of incubation. The use of cluster analysis allows the assignment of a percentage chemical similarity of a diesel contaminated soil to an uncontaminated soil sample. This will aid in the monitoring of hydrocarbon contaminated sites and the establishment of potential endpoints for successful remediation.
Li, Min; Zhang, Lu; Yao, Xiaolong; Jiang, Xingyu
2017-01-01
The emerging membrane introduction mass spectrometry technique has been successfully used to detect benzene, toluene, ethyl benzene and xylene (BTEX), while overlapped spectra have unfortunately hindered its further application to the analysis of mixtures. Multivariate calibration, an efficient method to analyze mixtures, has been widely applied. In this paper, we compared univariate and multivariate analyses for quantification of the individual components of mixture samples. The results showed that the univariate analysis creates poor models with regression coefficients of 0.912, 0.867, 0.440 and 0.351 for BTEX, respectively. For multivariate analysis, a comparison to the partial-least squares (PLS) model shows that the orthogonal partial-least squares (OPLS) regression exhibits an optimal performance with regression coefficients of 0.995, 0.999, 0.980 and 0.976, favorable calibration parameters (RMSEC and RMSECV) and a favorable validation parameter (RMSEP). Furthermore, the OPLS exhibits a good recovery of 73.86 - 122.20% and relative standard deviation (RSD) of the repeatability of 1.14 - 4.87%. Thus, MIMS coupled with the OPLS regression provides an optimal approach for a quantitative BTEX mixture analysis in monitoring and predicting water pollution.
Network structure of multivariate time series.
Lacasa, Lucas; Nicosia, Vincenzo; Latora, Vito
2015-10-21
Our understanding of a variety of phenomena in physics, biology and economics crucially depends on the analysis of multivariate time series. While a wide range tools and techniques for time series analysis already exist, the increasing availability of massive data structures calls for new approaches for multidimensional signal processing. We present here a non-parametric method to analyse multivariate time series, based on the mapping of a multidimensional time series into a multilayer network, which allows to extract information on a high dimensional dynamical system through the analysis of the structure of the associated multiplex network. The method is simple to implement, general, scalable, does not require ad hoc phase space partitioning, and is thus suitable for the analysis of large, heterogeneous and non-stationary time series. We show that simple structural descriptors of the associated multiplex networks allow to extract and quantify nontrivial properties of coupled chaotic maps, including the transition between different dynamical phases and the onset of various types of synchronization. As a concrete example we then study financial time series, showing that a multiplex network analysis can efficiently discriminate crises from periods of financial stability, where standard methods based on time-series symbolization often fail.
Samuel A. Cushman; Kevin McGarigal
2007-01-01
Integrating temporal variabilily into spatial analyses is one of the abiding challenges in landscape ecology. In this chapter we use landscape trajectory analysis to assess changes in landscape patterns over time. Landscape trajectory analysis is an approach to quantify changes in landscape structure over time. There are three key concepts which underlie the...
Popp, Oliver; Müller, Dirk; Didzus, Katharina; Paul, Wolfgang; Lipsmeier, Florian; Kirchner, Florian; Niklas, Jens; Mauch, Klaus; Beaucamp, Nicola
2016-09-01
In-depth characterization of high-producer cell lines and bioprocesses is vital to ensure robust and consistent production of recombinant therapeutic proteins in high quantity and quality for clinical applications. This requires applying appropriate methods during bioprocess development to enable meaningful characterization of CHO clones and processes. Here, we present a novel hybrid approach for supporting comprehensive characterization of metabolic clone performance. The approach combines metabolite profiling with multivariate data analysis and fluxomics to enable a data-driven mechanistic analysis of key metabolic traits associated with desired cell phenotypes. We applied the methodology to quantify and compare metabolic performance in a set of 10 recombinant CHO-K1 producer clones and a host cell line. The comprehensive characterization enabled us to derive an extended set of clone performance criteria that not only captured growth and product formation, but also incorporated information on intracellular clone physiology and on metabolic changes during the process. These criteria served to establish a quantitative clone ranking and allowed us to identify metabolic differences between high-producing CHO-K1 clones yielding comparably high product titers. Through multivariate data analysis of the combined metabolite and flux data we uncovered common metabolic traits characteristic of high-producer clones in the screening setup. This included high intracellular rates of glutamine synthesis, low cysteine uptake, reduced excretion of aspartate and glutamate, and low intracellular degradation rates of branched-chain amino acids and of histidine. Finally, the above approach was integrated into a workflow that enables standardized high-content selection of CHO producer clones in a high-throughput fashion. In conclusion, the combination of quantitative metabolite profiling, multivariate data analysis, and mechanistic network model simulations can identify metabolic traits characteristic of high-performance clones and enables informed decisions on which clones provide a good match for a particular process platform. The proposed approach also provides a mechanistic link between observed clone phenotype, process setup, and feeding regimes, and thereby offers concrete starting points for subsequent process optimization. Biotechnol. Bioeng. 2016;113: 2005-2019. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Predicting trauma patient mortality: ICD [or ICD-10-AM] versus AIS based approaches.
Willis, Cameron D; Gabbe, Belinda J; Jolley, Damien; Harrison, James E; Cameron, Peter A
2010-11-01
The International Classification of Diseases Injury Severity Score (ICISS) has been proposed as an International Classification of Diseases (ICD)-10-based alternative to mortality prediction tools that use Abbreviated Injury Scale (AIS) data, including the Trauma and Injury Severity Score (TRISS). To date, studies have not examined the performance of ICISS using Australian trauma registry data. This study aimed to compare the performance of ICISS with other mortality prediction tools in an Australian trauma registry. This was a retrospective review of prospectively collected data from the Victorian State Trauma Registry. A training dataset was created for model development and a validation dataset for evaluation. The multiplicative ICISS model was compared with a worst injury ICISS approach, Victorian TRISS (V-TRISS, using local coefficients), maximum AIS severity and a multivariable model including ICD-10-AM codes as predictors. Models were investigated for discrimination (C-statistic) and calibration (Hosmer-Lemeshow statistic). The multivariable approach had the highest level of discrimination (C-statistic 0.90) and calibration (H-L 7.65, P= 0.468). Worst injury ICISS, V-TRISS and maximum AIS had similar performance. The multiplicative ICISS produced the lowest level of discrimination (C-statistic 0.80) and poorest calibration (H-L 50.23, P < 0.001). The performance of ICISS may be affected by the data used to develop estimates, the ICD version employed, the methods for deriving estimates and the inclusion of covariates. In this analysis, a multivariable approach using ICD-10-AM codes was the best-performing method. A multivariable ICISS approach may therefore be a useful alternative to AIS-based methods and may have comparable predictive performance to locally derived TRISS models. © 2010 The Authors. ANZ Journal of Surgery © 2010 Royal Australasian College of Surgeons.
Relevant Feature Set Estimation with a Knock-out Strategy and Random Forests
Ganz, Melanie; Greve, Douglas N.; Fischl, Bruce; Konukoglu, Ender
2015-01-01
Group analysis of neuroimaging data is a vital tool for identifying anatomical and functional variations related to diseases as well as normal biological processes. The analyses are often performed on a large number of highly correlated measurements using a relatively smaller number of samples. Despite the correlation structure, the most widely used approach is to analyze the data using univariate methods followed by post-hoc corrections that try to account for the data’s multivariate nature. Although widely used, this approach may fail to recover from the adverse effects of the initial analysis when local effects are not strong. Multivariate pattern analysis (MVPA) is a powerful alternative to the univariate approach for identifying relevant variations. Jointly analyzing all the measures, MVPA techniques can detect global effects even when individual local effects are too weak to detect with univariate analysis. Current approaches are successful in identifying variations that yield highly predictive and compact models. However, they suffer from lessened sensitivity and instabilities in identification of relevant variations. Furthermore, current methods’ user-defined parameters are often unintuitive and difficult to determine. In this article, we propose a novel MVPA method for group analysis of high-dimensional data that overcomes the drawbacks of the current techniques. Our approach explicitly aims to identify all relevant variations using a “knock-out” strategy and the Random Forest algorithm. In evaluations with synthetic datasets the proposed method achieved substantially higher sensitivity and accuracy than the state-of-the-art MVPA methods, and outperformed the univariate approach when the effect size is low. In experiments with real datasets the proposed method identified regions beyond the univariate approach, while other MVPA methods failed to replicate the univariate results. More importantly, in a reproducibility study with the well-known ADNI dataset the proposed method yielded higher stability and power than the univariate approach. PMID:26272728
Jha, Dilip Kumar; Vinithkumar, Nambali Valsalan; Sahu, Biraja Kumar; Dheenan, Palaiya Sukumaran; Das, Apurba Kumar; Begum, Mehmuna; Devi, Marimuthu Prashanthi; Kirubagaran, Ramalingam
2015-07-15
Chidiyatappu Bay is one of the least disturbed marine environments of Andaman & Nicobar Islands, the union territory of India. Oceanic flushing from southeast and northwest direction is prevalent in this bay. Further, anthropogenic activity is minimal in the adjoining environment. Considering the pristine nature of this bay, seawater samples collected from 12 sampling stations covering three seasons were analyzed. Principal Component Analysis (PCA) revealed 69.9% of total variance and exhibited strong factor loading for nitrite, chlorophyll a and phaeophytin. In addition, analysis of variance (ANOVA-one way), regression analysis, box-whisker plots and Geographical Information System based hot spot analysis further simplified and supported multivariate results. The results obtained are important to establish reference conditions for comparative study with other similar ecosystems in the region. Copyright © 2015 Elsevier Ltd. All rights reserved.
Estimation of failure criteria in multivariate sensory shelf life testing using survival analysis.
Giménez, Ana; Gagliardi, Andrés; Ares, Gastón
2017-09-01
For most food products, shelf life is determined by changes in their sensory characteristics. A predetermined increase or decrease in the intensity of a sensory characteristic has frequently been used to signal that a product has reached the end of its shelf life. Considering all attributes change simultaneously, the concept of multivariate shelf life allows a single measurement of deterioration that takes into account all these sensory changes at a certain storage time. The aim of the present work was to apply survival analysis to estimate failure criteria in multivariate sensory shelf life testing using two case studies, hamburger buns and orange juice, by modelling the relationship between consumers' rejection of the product and the deterioration index estimated using PCA. In both studies, a panel of 13 trained assessors evaluated the samples using descriptive analysis whereas a panel of 100 consumers answered a "yes" or "no" question regarding intention to buy or consume the product. PC1 explained the great majority of the variance, indicating all sensory characteristics evolved similarly with storage time. Thus, PC1 could be regarded as index of sensory deterioration and a single failure criterion could be estimated through survival analysis for 25 and 50% consumers' rejection. The proposed approach based on multivariate shelf life testing may increase the accuracy of shelf life estimations. Copyright © 2017 Elsevier Ltd. All rights reserved.
A Bayesian approach to multivariate measurement system assessment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hamada, Michael Scott
This article considers system assessment for multivariate measurements and presents a Bayesian approach to analyzing gauge R&R study data. The evaluation of variances for univariate measurement becomes the evaluation of covariance matrices for multivariate measurements. The Bayesian approach ensures positive definite estimates of the covariance matrices and easily provides their uncertainty. Furthermore, various measurement system assessment criteria are easily evaluated. The approach is illustrated with data from a real gauge R&R study as well as simulated data.
A Bayesian approach to multivariate measurement system assessment
Hamada, Michael Scott
2016-07-01
This article considers system assessment for multivariate measurements and presents a Bayesian approach to analyzing gauge R&R study data. The evaluation of variances for univariate measurement becomes the evaluation of covariance matrices for multivariate measurements. The Bayesian approach ensures positive definite estimates of the covariance matrices and easily provides their uncertainty. Furthermore, various measurement system assessment criteria are easily evaluated. The approach is illustrated with data from a real gauge R&R study as well as simulated data.
Exploring High-D Spaces with Multiform Matrices and Small Multiples
MacEachren, Alan; Dai, Xiping; Hardisty, Frank; Guo, Diansheng; Lengerich, Gene
2011-01-01
We introduce an approach to visual analysis of multivariate data that integrates several methods from information visualization, exploratory data analysis (EDA), and geovisualization. The approach leverages the component-based architecture implemented in GeoVISTA Studio to construct a flexible, multiview, tightly (but generically) coordinated, EDA toolkit. This toolkit builds upon traditional ideas behind both small multiples and scatterplot matrices in three fundamental ways. First, we develop a general, MultiForm, Bivariate Matrix and a complementary MultiForm, Bivariate Small Multiple plot in which different bivariate representation forms can be used in combination. We demonstrate the flexibility of this approach with matrices and small multiples that depict multivariate data through combinations of: scatterplots, bivariate maps, and space-filling displays. Second, we apply a measure of conditional entropy to (a) identify variables from a high-dimensional data set that are likely to display interesting relationships and (b) generate a default order of these variables in the matrix or small multiple display. Third, we add conditioning, a kind of dynamic query/filtering in which supplementary (undisplayed) variables are used to constrain the view onto variables that are displayed. Conditioning allows the effects of one or more well understood variables to be removed from the analysis, making relationships among remaining variables easier to explore. We illustrate the individual and combined functionality enabled by this approach through application to analysis of cancer diagnosis and mortality data and their associated covariates and risk factors. PMID:21947129
Pleiotropy Analysis of Quantitative Traits at Gene Level by Multivariate Functional Linear Models
Wang, Yifan; Liu, Aiyi; Mills, James L.; Boehnke, Michael; Wilson, Alexander F.; Bailey-Wilson, Joan E.; Xiong, Momiao; Wu, Colin O.; Fan, Ruzong
2015-01-01
In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate F-distribution tests based on Pillai–Bartlett trace, Hotelling–Lawley trace, and Wilks’s Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and optimal sequence kernel association test (SKAT-O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate F-distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and SKAT-O for the three biochemical traits. The approximate F-distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT-O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT-O in the univariate case. PMID:25809955
Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models.
Wang, Yifan; Liu, Aiyi; Mills, James L; Boehnke, Michael; Wilson, Alexander F; Bailey-Wilson, Joan E; Xiong, Momiao; Wu, Colin O; Fan, Ruzong
2015-05-01
In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate F-distribution tests based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and optimal sequence kernel association test (SKAT-O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate F-distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and SKAT-O for the three biochemical traits. The approximate F-distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT-O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT-O in the univariate case. © 2015 WILEY PERIODICALS, INC.
Multivariate longitudinal data analysis with mixed effects hidden Markov models.
Raffa, Jesse D; Dubin, Joel A
2015-09-01
Multiple longitudinal responses are often collected as a means to capture relevant features of the true outcome of interest, which is often hidden and not directly measurable. We outline an approach which models these multivariate longitudinal responses as generated from a hidden disease process. We propose a class of models which uses a hidden Markov model with separate but correlated random effects between multiple longitudinal responses. This approach was motivated by a smoking cessation clinical trial, where a bivariate longitudinal response involving both a continuous and a binomial response was collected for each participant to monitor smoking behavior. A Bayesian method using Markov chain Monte Carlo is used. Comparison of separate univariate response models to the bivariate response models was undertaken. Our methods are demonstrated on the smoking cessation clinical trial dataset, and properties of our approach are examined through extensive simulation studies. © 2015, The International Biometric Society.
Chen, Gang; Adleman, Nancy E.; Saad, Ziad S.; Leibenluft, Ellen; Cox, RobertW.
2014-01-01
All neuroimaging packages can handle group analysis with t-tests or general linear modeling (GLM). However, they are quite hamstrung when there are multiple within-subject factors or when quantitative covariates are involved in the presence of a within-subject factor. In addition, sphericity is typically assumed for the variance–covariance structure when there are more than two levels in a within-subject factor. To overcome such limitations in the traditional AN(C)OVA and GLM, we adopt a multivariate modeling (MVM) approach to analyzing neuroimaging data at the group level with the following advantages: a) there is no limit on the number of factors as long as sample sizes are deemed appropriate; b) quantitative covariates can be analyzed together with within- subject factors; c) when a within-subject factor is involved, three testing methodologies are provided: traditional univariate testing (UVT)with sphericity assumption (UVT-UC) and with correction when the assumption is violated (UVT-SC), and within-subject multivariate testing (MVT-WS); d) to correct for sphericity violation at the voxel level, we propose a hybrid testing (HT) approach that achieves equal or higher power via combining traditional sphericity correction methods (Greenhouse–Geisser and Huynh–Feldt) with MVT-WS. PMID:24954281
Graphite Web: web tool for gene set analysis exploiting pathway topology
Sales, Gabriele; Calura, Enrica; Martini, Paolo; Romualdi, Chiara
2013-01-01
Graphite web is a novel web tool for pathway analyses and network visualization for gene expression data of both microarray and RNA-seq experiments. Several pathway analyses have been proposed either in the univariate or in the global and multivariate context to tackle the complexity and the interpretation of expression results. These methods can be further divided into ‘topological’ and ‘non-topological’ methods according to their ability to gain power from pathway topology. Biological pathways are, in fact, not only gene lists but can be represented through a network where genes and connections are, respectively, nodes and edges. To this day, the most used approaches are non-topological and univariate although they miss the relationship among genes. On the contrary, topological and multivariate approaches are more powerful, but difficult to be used by researchers without bioinformatic skills. Here we present Graphite web, the first public web server for pathway analysis on gene expression data that combines topological and multivariate pathway analyses with an efficient system of interactive network visualizations for easy results interpretation. Specifically, Graphite web implements five different gene set analyses on three model organisms and two pathway databases. Graphite Web is freely available at http://graphiteweb.bio.unipd.it/. PMID:23666626
Multivariate proteomic profiling identifies novel accessory proteins of coated vesicles
Antrobus, Robin; Hirst, Jennifer; Bhumbra, Gary S.; Kozik, Patrycja; Jackson, Lauren P.; Sahlender, Daniela A.
2012-01-01
Despite recent advances in mass spectrometry, proteomic characterization of transport vesicles remains challenging. Here, we describe a multivariate proteomics approach to analyzing clathrin-coated vesicles (CCVs) from HeLa cells. siRNA knockdown of coat components and different fractionation protocols were used to obtain modified coated vesicle-enriched fractions, which were compared by stable isotope labeling of amino acids in cell culture (SILAC)-based quantitative mass spectrometry. 10 datasets were combined through principal component analysis into a “profiling” cluster analysis. Overall, 136 CCV-associated proteins were predicted, including 36 new proteins. The method identified >93% of established CCV coat proteins and assigned >91% correctly to intracellular or endocytic CCVs. Furthermore, the profiling analysis extends to less well characterized types of coated vesicles, and we identify and characterize the first AP-4 accessory protein, which we have named tepsin. Finally, our data explain how sequestration of TACC3 in cytosolic clathrin cages causes the severe mitotic defects observed in auxilin-depleted cells. The profiling approach can be adapted to address related cell and systems biological questions. PMID:22472443
NASA Technical Reports Server (NTRS)
Schierman, John D.; Lovell, T. A.; Schmidt, David K.
1993-01-01
Three multivariable robustness analysis methods are compared and contrasted. The focus of the analysis is on system stability and performance robustness to uncertainty in the coupling dynamics between two interacting subsystems. Of particular interest is interacting airframe and engine subsystems, and an example airframe/engine vehicle configuration is utilized in the demonstration of these approaches. The singular value (SV) and structured singular value (SSV) analysis methods are compared to a method especially well suited for analysis of robustness to uncertainties in subsystem interactions. This approach is referred to here as the interacting subsystem (IS) analysis method. This method has been used previously to analyze airframe/engine systems, emphasizing the study of stability robustness. However, performance robustness is also investigated here, and a new measure of allowable uncertainty for acceptable performance robustness is introduced. The IS methodology does not require plant uncertainty models to measure the robustness of the system, and is shown to yield valuable information regarding the effects of subsystem interactions. In contrast, the SV and SSV methods allow for the evaluation of the robustness of the system to particular models of uncertainty, and do not directly indicate how the airframe (engine) subsystem interacts with the engine (airframe) subsystem.
A Robust Bayesian Approach for Structural Equation Models with Missing Data
ERIC Educational Resources Information Center
Lee, Sik-Yum; Xia, Ye-Mao
2008-01-01
In this paper, normal/independent distributions, including but not limited to the multivariate t distribution, the multivariate contaminated distribution, and the multivariate slash distribution, are used to develop a robust Bayesian approach for analyzing structural equation models with complete or missing data. In the context of a nonlinear…
NASA Astrophysics Data System (ADS)
Candefjord, Stefan; Nyberg, Morgan; Jalkanen, Ville; Ramser, Kerstin; Lindahl, Olof A.
2010-12-01
Tissue characterization is fundamental for identification of pathological conditions. Raman spectroscopy (RS) and tactile resonance measurement (TRM) are two promising techniques that measure biochemical content and stiffness, respectively. They have potential to complement the golden standard--histological analysis. By combining RS and TRM, complementary information about tissue content can be obtained and specific drawbacks can be avoided. The aim of this study was to develop a multivariate approach to compare RS and TRM information. The approach was evaluated on measurements at the same points on porcine abdominal tissue. The measurement points were divided into five groups by multivariate analysis of the RS data. A regression analysis was performed and receiver operating characteristic (ROC) curves were used to compare the RS and TRM data. TRM identified one group efficiently (area under ROC curve 0.99). The RS data showed that the proportion of saturated fat was high in this group. The regression analysis showed that stiffness was mainly determined by the amount of fat and its composition. We concluded that RS provided additional, important information for tissue identification that was not provided by TRM alone. The results are promising for development of a method combining RS and TRM for intraoperative tissue characterization.
Goode, C; LeRoy, J; Allen, D G
2007-01-01
This study reports on a multivariate analysis of the moving bed biofilm reactor (MBBR) wastewater treatment system at a Canadian pulp mill. The modelling approach involved a data overview by principal component analysis (PCA) followed by partial least squares (PLS) modelling with the objective of explaining and predicting changes in the BOD output of the reactor. Over two years of data with 87 process measurements were used to build the models. Variables were collected from the MBBR control scheme as well as upstream in the bleach plant and in digestion. To account for process dynamics, a variable lagging approach was used for variables with significant temporal correlations. It was found that wood type pulped at the mill was a significant variable governing reactor performance. Other important variables included flow parameters, faults in the temperature or pH control of the reactor, and some potential indirect indicators of biomass activity (residual nitrogen and pH out). The most predictive model was found to have an RMSEP value of 606 kgBOD/d, representing a 14.5% average error. This was a good fit, given the measurement error of the BOD test. Overall, the statistical approach was effective in describing and predicting MBBR treatment performance.
Goudey, Benjamin; Abedini, Mani; Hopper, John L; Inouye, Michael; Makalic, Enes; Schmidt, Daniel F; Wagner, John; Zhou, Zeyu; Zobel, Justin; Reumann, Matthias
2015-01-01
Genome-wide association studies (GWAS) are a common approach for systematic discovery of single nucleotide polymorphisms (SNPs) which are associated with a given disease. Univariate analysis approaches commonly employed may miss important SNP associations that only appear through multivariate analysis in complex diseases. However, multivariate SNP analysis is currently limited by its inherent computational complexity. In this work, we present a computational framework that harnesses supercomputers. Based on our results, we estimate a three-way interaction analysis on 1.1 million SNP GWAS data requiring over 5.8 years on the full "Avoca" IBM Blue Gene/Q installation at the Victorian Life Sciences Computation Initiative. This is hundreds of times faster than estimates for other CPU based methods and four times faster than runtimes estimated for GPU methods, indicating how the improvement in the level of hardware applied to interaction analysis may alter the types of analysis that can be performed. Furthermore, the same analysis would take under 3 months on the currently largest IBM Blue Gene/Q supercomputer "Sequoia" at the Lawrence Livermore National Laboratory assuming linear scaling is maintained as our results suggest. Given that the implementation used in this study can be further optimised, this runtime means it is becoming feasible to carry out exhaustive analysis of higher order interaction studies on large modern GWAS.
The Multivariate Structure of Communication Avoidance.
ERIC Educational Resources Information Center
Bell, Robert A.
1986-01-01
Clarifies the nature of communication avoidance through a structural analysis grounded in facet theory. Presents evidence for a duplex model of avoidance in which theoretical distinctions among modalities of approach-avoidance and context proved empirically relevant. Discusses implications of these findings for the explication, treatment, and…
Falahati, Farshad; Westman, Eric; Simmons, Andrew
2014-01-01
Machine learning algorithms and multivariate data analysis methods have been widely utilized in the field of Alzheimer's disease (AD) research in recent years. Advances in medical imaging and medical image analysis have provided a means to generate and extract valuable neuroimaging information. Automatic classification techniques provide tools to analyze this information and observe inherent disease-related patterns in the data. In particular, these classifiers have been used to discriminate AD patients from healthy control subjects and to predict conversion from mild cognitive impairment to AD. In this paper, recent studies are reviewed that have used machine learning and multivariate analysis in the field of AD research. The main focus is on studies that used structural magnetic resonance imaging (MRI), but studies that included positron emission tomography and cerebrospinal fluid biomarkers in addition to MRI are also considered. A wide variety of materials and methods has been employed in different studies, resulting in a range of different outcomes. Influential factors such as classifiers, feature extraction algorithms, feature selection methods, validation approaches, and cohort properties are reviewed, as well as key MRI-based and multi-modal based studies. Current and future trends are discussed.
Dankers, Frank; Wijsman, Robin; Troost, Esther G C; Monshouwer, René; Bussink, Johan; Hoffmann, Aswin L
2017-05-07
In our previous work, a multivariable normal-tissue complication probability (NTCP) model for acute esophageal toxicity (AET) Grade ⩾2 after highly conformal (chemo-)radiotherapy for non-small cell lung cancer (NSCLC) was developed using multivariable logistic regression analysis incorporating clinical parameters and mean esophageal dose (MED). Since the esophagus is a tubular organ, spatial information of the esophageal wall dose distribution may be important in predicting AET. We investigated whether the incorporation of esophageal wall dose-surface data with spatial information improves the predictive power of our established NTCP model. For 149 NSCLC patients treated with highly conformal radiation therapy esophageal wall dose-surface histograms (DSHs) and polar dose-surface maps (DSMs) were generated. DSMs were used to generate new DSHs and dose-length-histograms that incorporate spatial information of the dose-surface distribution. From these histograms dose parameters were derived and univariate logistic regression analysis showed that they correlated significantly with AET. Following our previous work, new multivariable NTCP models were developed using the most significant dose histogram parameters based on univariate analysis (19 in total). However, the 19 new models incorporating esophageal wall dose-surface data with spatial information did not show improved predictive performance (area under the curve, AUC range 0.79-0.84) over the established multivariable NTCP model based on conventional dose-volume data (AUC = 0.84). For prediction of AET, based on the proposed multivariable statistical approach, spatial information of the esophageal wall dose distribution is of no added value and it is sufficient to only consider MED as a predictive dosimetric parameter.
NASA Astrophysics Data System (ADS)
Dankers, Frank; Wijsman, Robin; Troost, Esther G. C.; Monshouwer, René; Bussink, Johan; Hoffmann, Aswin L.
2017-05-01
In our previous work, a multivariable normal-tissue complication probability (NTCP) model for acute esophageal toxicity (AET) Grade ⩾2 after highly conformal (chemo-)radiotherapy for non-small cell lung cancer (NSCLC) was developed using multivariable logistic regression analysis incorporating clinical parameters and mean esophageal dose (MED). Since the esophagus is a tubular organ, spatial information of the esophageal wall dose distribution may be important in predicting AET. We investigated whether the incorporation of esophageal wall dose-surface data with spatial information improves the predictive power of our established NTCP model. For 149 NSCLC patients treated with highly conformal radiation therapy esophageal wall dose-surface histograms (DSHs) and polar dose-surface maps (DSMs) were generated. DSMs were used to generate new DSHs and dose-length-histograms that incorporate spatial information of the dose-surface distribution. From these histograms dose parameters were derived and univariate logistic regression analysis showed that they correlated significantly with AET. Following our previous work, new multivariable NTCP models were developed using the most significant dose histogram parameters based on univariate analysis (19 in total). However, the 19 new models incorporating esophageal wall dose-surface data with spatial information did not show improved predictive performance (area under the curve, AUC range 0.79-0.84) over the established multivariable NTCP model based on conventional dose-volume data (AUC = 0.84). For prediction of AET, based on the proposed multivariable statistical approach, spatial information of the esophageal wall dose distribution is of no added value and it is sufficient to only consider MED as a predictive dosimetric parameter.
de Falco, Bruna; Incerti, Guido; Pepe, Rosa; Amato, Mariana; Lanzotti, Virginia
2016-09-01
Globe artichoke (Cynara cardunculus L. var. scolymus L. Fiori) and cardoon (Cynara cardunculus L. var. altilis DC) are sources of nutraceuticals and bioactive compounds. To apply a NMR metabolomic fingerprinting approach to Cynara cardunculus heads to obtain simultaneous identification and quantitation of the major classes of organic compounds. The edible part of 14 Globe artichoke populations, belonging to the Romaneschi varietal group, were extracted to obtain apolar and polar organic extracts. The analysis was also extended to one species of cultivated cardoon for comparison. The (1) H-NMR of the extracts allowed simultaneous identification of the bioactive metabolites whose quantitation have been obtained by spectral integration followed by principal component analysis (PCA). Apolar organic extracts were mainly based on highly unsaturated long chain lipids. Polar organic extracts contained organic acids, amino acids, sugars (mainly inulin), caffeoyl derivatives (mainly cynarin), flavonoids, and terpenes. The level of nutraceuticals was found to be highest in the Italian landraces Bianco di Pertosa zia E and Natalina while cardoon showed the lowest content of all metabolites thus confirming the genetic distance between artichokes and cardoon. Metabolomic approach coupling NMR spectroscopy with multivariate data analysis allowed for a detailed metabolite profile of artichoke and cardoon varieties to be obtained. Relevant differences in the relative content of the metabolites were observed for the species analysed. This work is the first application of (1) H-NMR with multivariate statistics to provide a metabolomic fingerprinting of Cynara scolymus. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Sepehrband, Farshid; Lynch, Kirsten M; Cabeen, Ryan P; Gonzalez-Zacarias, Clio; Zhao, Lu; D'Arcy, Mike; Kesselman, Carl; Herting, Megan M; Dinov, Ivo D; Toga, Arthur W; Clark, Kristi A
2018-05-15
Exploring neuroanatomical sex differences using a multivariate statistical learning approach can yield insights that cannot be derived with univariate analysis. While gross differences in total brain volume are well-established, uncovering the more subtle, regional sex-related differences in neuroanatomy requires a multivariate approach that can accurately model spatial complexity as well as the interactions between neuroanatomical features. Here, we developed a multivariate statistical learning model using a support vector machine (SVM) classifier to predict sex from MRI-derived regional neuroanatomical features from a single-site study of 967 healthy youth from the Philadelphia Neurodevelopmental Cohort (PNC). Then, we validated the multivariate model on an independent dataset of 682 healthy youth from the multi-site Pediatric Imaging, Neurocognition and Genetics (PING) cohort study. The trained model exhibited an 83% cross-validated prediction accuracy, and correctly predicted the sex of 77% of the subjects from the independent multi-site dataset. Results showed that cortical thickness of the middle occipital lobes and the angular gyri are major predictors of sex. Results also demonstrated the inferential benefits of going beyond classical regression approaches to capture the interactions among brain features in order to better characterize sex differences in male and female youths. We also identified specific cortical morphological measures and parcellation techniques, such as cortical thickness as derived from the Destrieux atlas, that are better able to discriminate between males and females in comparison to other brain atlases (Desikan-Killiany, Brodmann and subcortical atlases). Copyright © 2018 Elsevier Inc. All rights reserved.
Geurts, Brigitte P; Neerincx, Anne H; Bertrand, Samuel; Leemans, Manja A A P; Postma, Geert J; Wolfender, Jean-Luc; Cristescu, Simona M; Buydens, Lutgarde M C; Jansen, Jeroen J
2017-04-22
Revealing the biochemistry associated to micro-organismal interspecies interactions is highly relevant for many purposes. Each pathogen has a characteristic metabolic fingerprint that allows identification based on their unique multivariate biochemistry. When pathogen species come into mutual contact, their co-culture will display a chemistry that may be attributed both to mixing of the characteristic chemistries of the mono-cultures and to competition between the pathogens. Therefore, investigating pathogen development in a polymicrobial environment requires dedicated chemometric methods to untangle and focus upon these sources of variation. The multivariate data analysis method Projected Orthogonalised Chemical Encounter Monitoring (POCHEMON) is dedicated to highlight metabolites characteristic for the interaction of two micro-organisms in co-culture. However, this approach is currently limited to a single time-point, while development of polymicrobial interactions may be highly dynamic. A well-known multivariate implementation of Analysis of Variance (ANOVA) uses Principal Component Analysis (ANOVA-PCA). This allows the overall dynamics to be separated from the pathogen-specific chemistry to analyse the contributions of both aspects separately. For this reason, we propose to integrate ANOVA-PCA with the POCHEMON approach to disentangle the pathogen dynamics and the specific biochemistry in interspecies interactions. Two complementary case studies show great potential for both liquid and gas chromatography - mass spectrometry to reveal novel information on chemistry specific to interspecies interaction during pathogen development. Copyright © 2017 The Author(s). Published by Elsevier B.V. All rights reserved.
Feinauer, Christoph; Procaccini, Andrea; Zecchina, Riccardo; Weigt, Martin; Pagnani, Andrea
2014-01-01
In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code. PMID:24663061
NASA Astrophysics Data System (ADS)
Belianinov, Alex; Ganesh, Panchapakesan; Lin, Wenzhi; Sales, Brian C.; Sefat, Athena S.; Jesse, Stephen; Pan, Minghu; Kalinin, Sergei V.
2014-12-01
Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1-xSex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.
Piecewise multivariate modelling of sequential metabolic profiling data.
Rantalainen, Mattias; Cloarec, Olivier; Ebbels, Timothy M D; Lundstedt, Torbjörn; Nicholson, Jeremy K; Holmes, Elaine; Trygg, Johan
2008-02-19
Modelling the time-related behaviour of biological systems is essential for understanding their dynamic responses to perturbations. In metabolic profiling studies, the sampling rate and number of sampling points are often restricted due to experimental and biological constraints. A supervised multivariate modelling approach with the objective to model the time-related variation in the data for short and sparsely sampled time-series is described. A set of piecewise Orthogonal Projections to Latent Structures (OPLS) models are estimated, describing changes between successive time points. The individual OPLS models are linear, but the piecewise combination of several models accommodates modelling and prediction of changes which are non-linear with respect to the time course. We demonstrate the method on both simulated and metabolic profiling data, illustrating how time related changes are successfully modelled and predicted. The proposed method is effective for modelling and prediction of short and multivariate time series data. A key advantage of the method is model transparency, allowing easy interpretation of time-related variation in the data. The method provides a competitive complement to commonly applied multivariate methods such as OPLS and Principal Component Analysis (PCA) for modelling and analysis of short time-series data.
Goodwin, Cody R; Sherrod, Stacy D; Marasco, Christina C; Bachmann, Brian O; Schramm-Sapyta, Nicole; Wikswo, John P; McLean, John A
2014-07-01
A metabolic system is composed of inherently interconnected metabolic precursors, intermediates, and products. The analysis of untargeted metabolomics data has conventionally been performed through the use of comparative statistics or multivariate statistical analysis-based approaches; however, each falls short in representing the related nature of metabolic perturbations. Herein, we describe a complementary method for the analysis of large metabolite inventories using a data-driven approach based upon a self-organizing map algorithm. This workflow allows for the unsupervised clustering, and subsequent prioritization of, correlated features through Gestalt comparisons of metabolic heat maps. We describe this methodology in detail, including a comparison to conventional metabolomics approaches, and demonstrate the application of this method to the analysis of the metabolic repercussions of prolonged cocaine exposure in rat sera profiles.
Big-Data RHEED analysis for understanding epitaxial film growth processes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vasudevan, Rama K; Tselev, Alexander; Baddorf, Arthur P
Reflection high energy electron diffraction (RHEED) has by now become a standard tool for in-situ monitoring of film growth by pulsed laser deposition and molecular beam epitaxy. Yet despite the widespread adoption and wealth of information in RHEED image, most applications are limited to observing intensity oscillations of the specular spot, and much additional information on growth is discarded. With ease of data acquisition and increased computation speeds, statistical methods to rapidly mine the dataset are now feasible. Here, we develop such an approach to the analysis of the fundamental growth processes through multivariate statistical analysis of RHEED image sequence.more » This approach is illustrated for growth of LaxCa1-xMnO3 films grown on etched (001) SrTiO3 substrates, but is universal. The multivariate methods including principal component analysis and k-means clustering provide insight into the relevant behaviors, the timing and nature of a disordered to ordered growth change, and highlight statistically significant patterns. Fourier analysis yields the harmonic components of the signal and allows separation of the relevant components and baselines, isolating the assymetric nature of the step density function and the transmission spots from the imperfect layer-by-layer (LBL) growth. These studies show the promise of big data approaches to obtaining more insight into film properties during and after epitaxial film growth. Furthermore, these studies open the pathway to use forward prediction methods to potentially allow significantly more control over growth process and hence final film quality.« less
Mathematical models for exploring different aspects of genotoxicity and carcinogenicity databases.
Benigni, R; Giuliani, A
1991-12-01
One great obstacle to understanding and using the information contained in the genotoxicity and carcinogenicity databases is the very size of such databases. Their vastness makes them difficult to read; this leads to inadequate exploitation of the information, which becomes costly in terms of time, labor, and money. In its search for adequate approaches to the problem, the scientific community has, curiously, almost entirely neglected an existent series of very powerful methods of data analysis: the multivariate data analysis techniques. These methods were specifically designed for exploring large data sets. This paper presents the multivariate techniques and reports a number of applications to genotoxicity problems. These studies show how biology and mathematical modeling can be combined and how successful this combination is.
Walling, Craig A; Morrissey, Michael B; Foerster, Katharina; Clutton-Brock, Tim H; Pemberton, Josephine M; Kruuk, Loeske E B
2014-12-01
Evolutionary theory predicts that genetic constraints should be widespread, but empirical support for their existence is surprisingly rare. Commonly applied univariate and bivariate approaches to detecting genetic constraints can underestimate their prevalence, with important aspects potentially tractable only within a multivariate framework. However, multivariate genetic analyses of data from natural populations are challenging because of modest sample sizes, incomplete pedigrees, and missing data. Here we present results from a study of a comprehensive set of life history traits (juvenile survival, age at first breeding, annual fecundity, and longevity) for both males and females in a wild, pedigreed, population of red deer (Cervus elaphus). We use factor analytic modeling of the genetic variance-covariance matrix ( G: ) to reduce the dimensionality of the problem and take a multivariate approach to estimating genetic constraints. We consider a range of metrics designed to assess the effect of G: on the deflection of a predicted response to selection away from the direction of fastest adaptation and on the evolvability of the traits. We found limited support for genetic constraint through genetic covariances between traits, both within sex and between sexes. We discuss these results with respect to other recent findings and to the problems of estimating these parameters for natural populations. Copyright © 2014 Walling et al.
Walling, Craig A.; Morrissey, Michael B.; Foerster, Katharina; Clutton-Brock, Tim H.; Pemberton, Josephine M.; Kruuk, Loeske E. B.
2014-01-01
Evolutionary theory predicts that genetic constraints should be widespread, but empirical support for their existence is surprisingly rare. Commonly applied univariate and bivariate approaches to detecting genetic constraints can underestimate their prevalence, with important aspects potentially tractable only within a multivariate framework. However, multivariate genetic analyses of data from natural populations are challenging because of modest sample sizes, incomplete pedigrees, and missing data. Here we present results from a study of a comprehensive set of life history traits (juvenile survival, age at first breeding, annual fecundity, and longevity) for both males and females in a wild, pedigreed, population of red deer (Cervus elaphus). We use factor analytic modeling of the genetic variance–covariance matrix (G) to reduce the dimensionality of the problem and take a multivariate approach to estimating genetic constraints. We consider a range of metrics designed to assess the effect of G on the deflection of a predicted response to selection away from the direction of fastest adaptation and on the evolvability of the traits. We found limited support for genetic constraint through genetic covariances between traits, both within sex and between sexes. We discuss these results with respect to other recent findings and to the problems of estimating these parameters for natural populations. PMID:25278555
Multivariate moment closure techniques for stochastic kinetic models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lakatos, Eszter, E-mail: e.lakatos13@imperial.ac.uk; Ale, Angelique; Kirk, Paul D. W.
2015-09-07
Stochastic effects dominate many chemical and biochemical processes. Their analysis, however, can be computationally prohibitively expensive and a range of approximation schemes have been proposed to lighten the computational burden. These, notably the increasingly popular linear noise approximation and the more general moment expansion methods, perform well for many dynamical regimes, especially linear systems. At higher levels of nonlinearity, it comes to an interplay between the nonlinearities and the stochastic dynamics, which is much harder to capture correctly by such approximations to the true stochastic processes. Moment-closure approaches promise to address this problem by capturing higher-order terms of the temporallymore » evolving probability distribution. Here, we develop a set of multivariate moment-closures that allows us to describe the stochastic dynamics of nonlinear systems. Multivariate closure captures the way that correlations between different molecular species, induced by the reaction dynamics, interact with stochastic effects. We use multivariate Gaussian, gamma, and lognormal closure and illustrate their use in the context of two models that have proved challenging to the previous attempts at approximating stochastic dynamics: oscillations in p53 and Hes1. In addition, we consider a larger system, Erk-mediated mitogen-activated protein kinases signalling, where conventional stochastic simulation approaches incur unacceptably high computational costs.« less
Linking multimetric and multivariate approaches to assess the ecological condition of streams.
Collier, Kevin J
2009-10-01
Few attempts have been made to combine multimetric and multivariate analyses for bioassessment despite recognition that an integrated method could yield powerful tools for bioassessment. An approach is described that integrates eight macroinvertebrate community metrics into a Principal Components Analysis to develop a Multivariate Condition Score (MCS) from a calibration dataset of 511 samples. The MCS is compared to an Index of Biotic Integrity (IBI) derived using the same metrics based on the ratio to the reference site mean. Both approaches were highly correlated although the MCS appeared to offer greater potential for discriminating a wider range of impaired conditions. Both the MCS and IBI displayed low temporal variability within reference sites, and were able to distinguish between reference conditions and low levels of catchment modification and local habitat degradation, although neither discriminated among three levels of low impact. Pseudosamples developed to test the response of the metric aggregation approaches to organic enrichment, urban, mining, pastoral and logging stressor scenarios ranked pressures in the same order, but the MCS provided a lower score for the urban scenario and a higher score for the pastoral scenario. The MCS was calculated for an independent test dataset of urban and reference sites, and yielded similar results to the IBI. Although both methods performed comparably, the MCS approach may have some advantages because it removes the subjectivity of assigning thresholds for scoring biological condition, and it appears to discriminate a wider range of degraded conditions.
Halliday, David M; Senik, Mohd Harizal; Stevenson, Carl W; Mason, Rob
2016-08-01
The ability to infer network structure from multivariate neuronal signals is central to computational neuroscience. Directed network analyses typically use parametric approaches based on auto-regressive (AR) models, where networks are constructed from estimates of AR model parameters. However, the validity of using low order AR models for neurophysiological signals has been questioned. A recent article introduced a non-parametric approach to estimate directionality in bivariate data, non-parametric approaches are free from concerns over model validity. We extend the non-parametric framework to include measures of directed conditional independence, using scalar measures that decompose the overall partial correlation coefficient summatively by direction, and a set of functions that decompose the partial coherence summatively by direction. A time domain partial correlation function allows both time and frequency views of the data to be constructed. The conditional independence estimates are conditioned on a single predictor. The framework is applied to simulated cortical neuron networks and mixtures of Gaussian time series data with known interactions. It is applied to experimental data consisting of local field potential recordings from bilateral hippocampus in anaesthetised rats. The framework offers a non-parametric approach to estimation of directed interactions in multivariate neuronal recordings, and increased flexibility in dealing with both spike train and time series data. The framework offers a novel alternative non-parametric approach to estimate directed interactions in multivariate neuronal recordings, and is applicable to spike train and time series data. Copyright © 2016 Elsevier B.V. All rights reserved.
Carlesi, Serena; Ricci, Marilena; Cucci, Costanza; La Nasa, Jacopo; Lofrumento, Cristiana; Picollo, Marcello; Becucci, Maurizio
2015-07-01
This work explores the application of chemometric techniques to the analysis of lipidic paint binders (i.e., drying oils) by means of Raman and near-infrared spectroscopy. These binders have been widely used by artists throughout history, both individually and in mixtures. We prepared various model samples of the pure binders (linseed, poppy seed, and walnut oils) obtained from different manufacturers. These model samples were left to dry and then characterized by Raman and reflectance near-infrared spectroscopy. Multivariate analysis was performed by applying principal component analysis (PCA) on the first derivative of the corresponding Raman spectra (1800-750 cm(-1)), near-infrared spectra (6000-3900 cm(-1)), and their combination to test whether spectral differences could enable samples to be distinguished on the basis of their composition. The vibrational bands we found most useful to discriminate between the different products we studied are the fundamental ν(C=C) stretching and methylenic stretching and bending combination bands. The results of the multivariate analysis demonstrated the potential of chemometric approaches for characterizing and identifying drying oils, and also for gaining a deeper insight into the aging process. Comparison with high-performance liquid chromatography data was conducted to check the PCA results.
Moya, Claudio E; Raiber, Matthias; Taulis, Mauricio; Cox, Malcolm E
2015-03-01
The Galilee and Eromanga basins are sub-basins of the Great Artesian Basin (GAB). In this study, a multivariate statistical approach (hierarchical cluster analysis, principal component analysis and factor analysis) is carried out to identify hydrochemical patterns and assess the processes that control hydrochemical evolution within key aquifers of the GAB in these basins. The results of the hydrochemical assessment are integrated into a 3D geological model (previously developed) to support the analysis of spatial patterns of hydrochemistry, and to identify the hydrochemical and hydrological processes that control hydrochemical variability. In this area of the GAB, the hydrochemical evolution of groundwater is dominated by evapotranspiration near the recharge area resulting in a dominance of the Na-Cl water types. This is shown conceptually using two selected cross-sections which represent discrete groundwater flow paths from the recharge areas to the deeper parts of the basins. With increasing distance from the recharge area, a shift towards a dominance of carbonate (e.g. Na-HCO3 water type) has been observed. The assessment of hydrochemical changes along groundwater flow paths highlights how aquifers are separated in some areas, and how mixing between groundwater from different aquifers occurs elsewhere controlled by geological structures, including between GAB aquifers and coal bearing strata of the Galilee Basin. The results of this study suggest that distinct hydrochemical differences can be observed within the previously defined Early Cretaceous-Jurassic aquifer sequence of the GAB. A revision of the two previously recognised hydrochemical sequences is being proposed, resulting in three hydrochemical sequences based on systematic differences in hydrochemistry, salinity and dominant hydrochemical processes. The integrated approach presented in this study which combines different complementary multivariate statistical techniques with a detailed assessment of the geological framework of these sedimentary basins, can be adopted in other complex multi-aquifer systems to assess hydrochemical evolution and its geological controls. Copyright © 2014 Elsevier B.V. All rights reserved.
Multivariate Classification of Original and Fake Perfumes by Ion Analysis and Ethanol Content.
Gomes, Clêrton L; de Lima, Ari Clecius A; Loiola, Adonay R; da Silva, Abel B R; Cândido, Manuela C L; Nascimento, Ronaldo F
2016-07-01
The increased marketing of fake perfumes has encouraged us to investigate how to identify such products by their chemical characteristics and multivariate analysis. The aim of this study was to present an alternative approach to distinguish original from fake perfumes by means of the investigation of sodium, potassium, chloride ions, and ethanol contents by chemometric tools. For this, 50 perfumes were used (25 original and 25 counterfeit) for the analysis of ions (ion chromatography) and ethanol (gas chromatography). The results demonstrated that the fake perfume had low levels of ethanol and high levels of chloride compared to the original product. The data were treated by chemometric tools such as principal component analysis and linear discriminant analysis. This study proved that the analysis of ethanol is an effective method of distinguishing original from the fake products, and it may potentially be used to assist legal authorities in such cases. © 2016 American Academy of Forensic Sciences.
Fu, Zhibiao; Baker, Daniel; Cheng, Aili; Leighton, Julie; Appelbaum, Edward; Aon, Juan
2016-05-01
The principle of quality by design (QbD) has been widely applied to biopharmaceutical manufacturing processes. Process characterization is an essential step to implement the QbD concept to establish the design space and to define the proven acceptable ranges (PAR) for critical process parameters (CPPs). In this study, we present characterization of a Saccharomyces cerevisiae fermentation process using risk assessment analysis, statistical design of experiments (DoE), and the multivariate Bayesian predictive approach. The critical quality attributes (CQAs) and CPPs were identified with a risk assessment. The statistical model for each attribute was established using the results from the DoE study with consideration given to interactions between CPPs. Both the conventional overlapping contour plot and the multivariate Bayesian predictive approaches were used to establish the region of process operating conditions where all attributes met their specifications simultaneously. The quantitative Bayesian predictive approach was chosen to define the PARs for the CPPs, which apply to the manufacturing control strategy. Experience from the 10,000 L manufacturing scale process validation, including 64 continued process verification batches, indicates that the CPPs remain under a state of control and within the established PARs. The end product quality attributes were within their drug substance specifications. The probability generated with the Bayesian approach was also used as a tool to assess CPP deviations. This approach can be extended to develop other production process characterization and quantify a reliable operating region. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 32:799-812, 2016. © 2016 American Institute of Chemical Engineers.
Huang, Jun; Goolcharran, Chimanlall; Ghosh, Krishnendu
2011-05-01
This paper presents the use of experimental design, optimization and multivariate techniques to investigate root-cause of tablet dissolution shift (slow-down) upon stability and develop control strategies for a drug product during formulation and process development. The effectiveness and usefulness of these methodologies were demonstrated through two application examples. In both applications, dissolution slow-down was observed during a 4-week accelerated stability test under 51°C/75%RH storage condition. In Application I, an experimental design was carried out to evaluate the interactions and effects of the design factors on critical quality attribute (CQA) of dissolution upon stability. The design space was studied by design of experiment (DOE) and multivariate analysis to ensure desired dissolution profile and minimal dissolution shift upon stability. Multivariate techniques, such as multi-way principal component analysis (MPCA) of the entire dissolution profiles upon stability, were performed to reveal batch relationships and to evaluate the impact of design factors on dissolution. In Application II, an experiment was conducted to study the impact of varying tablet breaking force on dissolution upon stability utilizing MPCA. It was demonstrated that the use of multivariate methods, defined as Quality by Design (QbD) principles and tools in ICH-Q8 guidance, provides an effective means to achieve a greater understanding of tablet dissolution upon stability. Copyright © 2010 Elsevier B.V. All rights reserved.
Casarrubea, M; Jonsson, G K; Faulisi, F; Sorbera, F; Di Giovanni, G; Benigno, A; Crescimanno, G; Magnusson, M S
2015-01-15
A basic tenet in the realm of modern behavioral sciences is that behavior consists of patterns in time. For this reason, investigations of behavior deal with sequences that are not easily perceivable by the unaided observer. This problem calls for improved means of detection, data handling and analysis. This review focuses on the analysis of the temporal structure of behavior carried out by means of a multivariate approach known as T-pattern analysis. Using this technique, recurring sequences of behavioral events, usually hard to detect, can be unveiled and carefully described. T-pattern analysis has been successfully applied in the study of various aspects of human or animal behavior such as behavioral modifications in neuro-psychiatric diseases, route-tracing stereotypy in mice, interaction between human subjects and animal or artificial agents, hormonal-behavioral interactions, patterns of behavior associated with emesis and, in our laboratories, exploration and anxiety-related behaviors in rodents. After describing the theory and concepts of T-pattern analysis, this review will focus on the application of the analysis to the study of the temporal characteristics of behavior in different species from rodents to human beings. This work could represent a useful background for researchers who intend to employ such a refined multivariate approach to the study of behavior. Copyright © 2014 Elsevier B.V. All rights reserved.
The bio-optical properties of CDOM as descriptor of lake stratification.
Bracchini, Luca; Dattilo, Arduino Massimo; Hull, Vincent; Loiselle, Steven Arthur; Martini, Silvia; Rossi, Claudio; Santinelli, Chiara; Seritti, Alfredo
2006-11-01
Multivariate statistical techniques are used to demonstrate the fundamental role of CDOM optical properties in the description of water masses during the summer stratification of a deep lake. PC1 was linked with dissolved species and PC2 with suspended particles. In the first principal component that the role of CDOM bio-optical properties give a better description of the stratification of the Salto Lake with respect to temperature. The proposed multivariate approach can be used for the analysis of different stratified aquatic ecosystems in relation to interaction between bio-optical properties and stratification of the water body.
Estimating residential price elasticity of demand for water: A contingent valuation approach
NASA Astrophysics Data System (ADS)
Thomas, John F.; Syme, Geoffrey J.
1988-11-01
Residential households in Perth, Western Australia have access to privately extracted groundwater as well as a public mains water supply, which has been charged through a two-part block tariff. A contingent valuation approach is developed to estimate price elasticity of demand for public supply. Results are compared with those of a multivariate time series analysis. Validation tests for the contingent approach are proposed, based on a comparison of predicted behaviors following hypothesised price changes with relevant independent data. Properly conducted, the contingent approach appears to be reliable, applicable where the available data do not favor regression analysis, and a fruitful source of information about social, technical, and behavioral responses to change in the price of water.
McFarquhar, Martyn; McKie, Shane; Emsley, Richard; Suckling, John; Elliott, Rebecca; Williams, Stephen
2016-01-01
Repeated measurements and multimodal data are common in neuroimaging research. Despite this, conventional approaches to group level analysis ignore these repeated measurements in favour of multiple between-subject models using contrasts of interest. This approach has a number of drawbacks as certain designs and comparisons of interest are either not possible or complex to implement. Unfortunately, even when attempting to analyse group level data within a repeated-measures framework, the methods implemented in popular software packages make potentially unrealistic assumptions about the covariance structure across the brain. In this paper, we describe how this issue can be addressed in a simple and efficient manner using the multivariate form of the familiar general linear model (GLM), as implemented in a new MATLAB toolbox. This multivariate framework is discussed, paying particular attention to methods of inference by permutation. Comparisons with existing approaches and software packages for dependent group-level neuroimaging data are made. We also demonstrate how this method is easily adapted for dependency at the group level when multiple modalities of imaging are collected from the same individuals. Follow-up of these multimodal models using linear discriminant functions (LDA) is also discussed, with applications to future studies wishing to integrate multiple scanning techniques into investigating populations of interest. PMID:26921716
Factors Associated with Sexual Behavior among Adolescents: A Multivariate Analysis.
ERIC Educational Resources Information Center
Harvey, S. Marie; Spigner, Clarence
1995-01-01
A self-administered survey examining multiple factors associated with engaging in sexual intercourse was completed by 1,026 high school students in a classroom setting. Findings suggest that effective interventions to address teenage pregnancy need to utilize a multifaceted approach to the prevention of high-risk behaviors. (JPS)
A Call for Conducting Multivariate Mixed Analyses
ERIC Educational Resources Information Center
Onwuegbuzie, Anthony J.
2016-01-01
Several authors have written methodological works that provide an introductory- and/or intermediate-level guide to conducting mixed analyses. Although these works have been useful for beginning and emergent mixed researchers, with very few exceptions, works are lacking that describe and illustrate advanced-level mixed analysis approaches. Thus,…
Girgis, Mark D; Zenati, Mazen S; Steve, Jennifer; Bartlett, David L; Zureikat, Amer; Zeh, Herbert J; Hogg, Melissa E
2017-02-01
The aim was to evaluate the impact of obesity on perioperative outcomes in patients undergoing robotic pancreaticoduodenectomy (RPD) compared to open pancreaticoduodenectomy (OPD). A retrospective review of all pancreaticoduodenectomies from 9/2011 to 4/2015 was performed. Obesity was defined as body mass index (BMI) > 30 kg/m 2 . Of 474 pancreaticoduodenectomies performed: RPD = 213 (45%) and OPD = 261 (55%). A total of 145 (31%) patients were obese (70 RPD, 75 OPD). Obese patients had increased EBL (p = 0.03), pancreatic fistula (B&C; p = 0.077), and wound infection (p = 0.068) compared to the non-obese. For obese patients, RPD had decreased OR time (p = 0.0003), EBL (p < 0.001), and wound infection (p = 0.001) with no difference in Clavien ≥3 complications, margins, LOS or 30-day mortality compared with OPD. In multivariate analysis, obesity was the strongest predictor of Clavien ≥3 (OR 1.6; p = 0.041) and wound infection if BMI > 35 (OR 2.6; p = 0.03). The robotic approach was protective of Clavien ≥3 (OR 0.6; p = 0.03) on univariate analysis and wound infection (OR 0.3; p < 0.001) and grade B/C pancreatic fistula (OR 0.34; p < 0.001) on multivariate analysis. Obese patients are at risk for increased postoperative complications regardless of approach. However, the robotic approach mitigates some of the increased complication rate, while preserving other perioperative outcomes. Published by Elsevier Ltd.
Dimou, Niki L; Pantavou, Katerina G; Bagos, Pantelis G
2017-09-01
Apolipoprotein E (ApoE) is potentially a genetic risk factor for the development of left ventricular failure (LVF), the main cause of death in beta-thalassemia homozygotes. In the present study, we synthesize the results of independent studies examining the effect of ApoE on LVF development in thalassemic patients through a meta-analytic approach. However, all studies report more than one outcome, as patients are classified into three groups according to the severity of the symptoms and the genetic polymorphism. Thus, a multivariate meta-analytic method that addresses simultaneously multiple exposures and multiple comparison groups was developed. Four individual studies were included in the meta-analysis involving 613 beta-thalassemic patients and 664 controls. The proposed method that takes into account the correlation of log odds ratios (log(ORs)), revealed a statistically significant overall association (P-value = 0.009), mainly attributed to the contrast of E4 versus E3 allele for patients with evidence (OR: 2.32, 95% CI: 1.19, 4.53) or patients with clinical and echocardiographic findings (OR: 3.34, 95% CI: 1.78, 6.26) of LVF. This study suggests that E4 is a genetic risk factor for LVF in beta-thalassemia major. The presented multivariate approach can be applied in several fields of research. © 2017 John Wiley & Sons Ltd/University College London.
Faes, Luca; Nollo, Giandomenico; Porta, Alberto
2012-03-01
The complexity of the short-term cardiovascular control prompts for the introduction of multivariate (MV) nonlinear time series analysis methods to assess directional interactions reflecting the underlying regulatory mechanisms. This study introduces a new approach for the detection of nonlinear Granger causality in MV time series, based on embedding the series by a sequential, non-uniform procedure, and on estimating the information flow from one series to another by means of the corrected conditional entropy. The approach is validated on short realizations of linear stochastic and nonlinear deterministic processes, and then evaluated on heart period, systolic arterial pressure and respiration variability series measured from healthy humans in the resting supine position and in the upright position after head-up tilt. Copyright © 2011 Elsevier Ltd. All rights reserved.
Bayesian Factor Analysis as a Variable Selection Problem: Alternative Priors and Consequences
Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric
2016-01-01
Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, a Bayesian structural equation modeling (BSEM) approach (Muthén & Asparouhov, 2012) has been proposed as a way to explore the presence of cross-loadings in CFA models. We show that the issue of determining factor loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov’s approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike and slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set (Byrne, 2012; Pettegrew & Wolf, 1982) is used to demonstrate our approach. PMID:27314566
Gauging Skills of Hospital Security Personnel: a Statistically-driven, Questionnaire-based Approach.
Rinkoo, Arvind Vashishta; Mishra, Shubhra; Rahesuddin; Nabi, Tauqeer; Chandra, Vidha; Chandra, Hem
2013-01-01
This study aims to gauge the technical and soft skills of the hospital security personnel so as to enable prioritization of their training needs. A cross sectional questionnaire based study was conducted in December 2011. Two separate predesigned and pretested questionnaires were used for gauging soft skills and technical skills of the security personnel. Extensive statistical analysis, including Multivariate Analysis (Pillai-Bartlett trace along with Multi-factorial ANOVA) and Post-hoc Tests (Bonferroni Test) was applied. The 143 participants performed better on the soft skills front with an average score of 6.43 and standard deviation of 1.40. The average technical skills score was 5.09 with a standard deviation of 1.44. The study avowed a need for formal hands on training with greater emphasis on technical skills. Multivariate analysis of the available data further helped in identifying 20 security personnel who should be prioritized for soft skills training and a group of 36 security personnel who should receive maximum attention during technical skills training. This statistically driven approach can be used as a prototype by healthcare delivery institutions worldwide, after situation specific customizations, to identify the training needs of any category of healthcare staff.
Gauging Skills of Hospital Security Personnel: a Statistically-driven, Questionnaire-based Approach
Rinkoo, Arvind Vashishta; Mishra, Shubhra; Rahesuddin; Nabi, Tauqeer; Chandra, Vidha; Chandra, Hem
2013-01-01
Objectives This study aims to gauge the technical and soft skills of the hospital security personnel so as to enable prioritization of their training needs. Methodology A cross sectional questionnaire based study was conducted in December 2011. Two separate predesigned and pretested questionnaires were used for gauging soft skills and technical skills of the security personnel. Extensive statistical analysis, including Multivariate Analysis (Pillai-Bartlett trace along with Multi-factorial ANOVA) and Post-hoc Tests (Bonferroni Test) was applied. Results The 143 participants performed better on the soft skills front with an average score of 6.43 and standard deviation of 1.40. The average technical skills score was 5.09 with a standard deviation of 1.44. The study avowed a need for formal hands on training with greater emphasis on technical skills. Multivariate analysis of the available data further helped in identifying 20 security personnel who should be prioritized for soft skills training and a group of 36 security personnel who should receive maximum attention during technical skills training. Conclusion This statistically driven approach can be used as a prototype by healthcare delivery institutions worldwide, after situation specific customizations, to identify the training needs of any category of healthcare staff. PMID:23559904
Kahramangil, Bora; Berber, Eren
2018-04-01
Although numerous studies have been published on robotic adrenalectomy (RA) in the literature, none has done a comparison of posterior retroperitoneal (PR) and transabdominal lateral (TL) approaches. The aim of this study was to compare the outcomes of robotic PR and TL adrenalectomy. This is a retrospective analysis of a prospectively maintained database. Between September 2008 and January 2017, perioperative outcomes of patients undergoing RA through PR and TL approaches were recorded into an IRB-approved database. Clinical and perioperative parameters were compared using Student's t test, Wilcoxon rank-sum test, and χ 2 test. Multivariate regression analysis was performed to determine factors associated with total operative time. 188 patients underwent 200 RAs. 110 patients were operated through TL and 78 patients through PR approach. Overall, conversion rate to open was 2.5% and 90-day morbidity 4.8%. The perioperative outcomes of TL and PR approaches were similar regarding estimated blood loss, rate of conversion to open, length of hospital stay, and 90-day morbidity. PR approach resulted in a shorter mean ± SD total operative time (136.3 ± 38.7 vs. 154.6 ± 48.4 min; p = 0.005) and lower visual analog scale pain score on postoperative day #1 (4.3 ± 2.5 vs. 5.4 ± 2.4; p = 0.001). After excluding tumors larger than 6 cm operated through TL approach, the difference in operative times persisted (136.3 ± 38.7 vs. 153.7 ± 45.7 min; p = 0.009). On multivariate regression analysis, increasing BMI and TL approaches were associated with longer total operative time. This study shows that robotic PR and TL approaches are equally safe and efficacious. With experience, shorter operative time and less postoperative pain can be achieved with PR technique. This supports the preferential utilization of PR approach in high-volume centers with enough experience.
Iafrati, Jillian; Malvache, Arnaud; Gonzalez Campo, Cecilia; Orejarena, M. Juliana; Lassalle, Olivier; Bouamrane, Lamine; Chavis, Pascale
2016-01-01
The postnatal maturation of the prefrontal cortex (PFC) represents a period of increased vulnerability to risk factors and emergence of neuropsychiatric disorders. To disambiguate the pathophysiological mechanisms contributing to these disorders, we revisited the endophenotype approach from a developmental viewpoint. The extracellular matrix protein reelin which contributes to cellular and network plasticity, is a risk factor for several psychiatric diseases. We mapped the aggregate effect of the RELN risk allele on postnatal development of PFC functions by cross-sectional synaptic and behavioral analysis of reelin-haploinsufficient mice. Multivariate analysis of bootstrapped datasets revealed subgroups of phenotypic traits specific to each maturational epoch. The preeminence of synaptic AMPA/NMDA receptor content to pre-weaning and juvenile endophenotypes shifts to long-term potentiation and memory renewal during adolescence followed by NMDA-GluN2B synaptic content in adulthood. Strikingly, multivariate analysis shows that pharmacological rehabilitation of reelin haploinsufficient dysfunctions is mediated through induction of new endophenotypes rather than reversion to wild-type traits. By delineating previously unknown developmental endophenotypic sequences, we conceived a promising general strategy to disambiguate the molecular underpinnings of complex psychiatric disorders and for the rational design of pharmacotherapies in these disorders. PMID:27765946
He, M; Wang, H L; Yan, J Y; Xu, S W; Chen, W; Wang, J
2018-05-01
Objective: To compare the efficiency between the transhepatic hilar approach and conventional approach for the surgical treatment of Bismuth type Ⅲ and Ⅳ hilar cholangiocarcinoma. Methods: There were 42 consecutive patients with hilar cholangiocarcinoma of Bismuth type Ⅲ and Ⅳ who underwent surgical treatment at Department of Biliary-Pancreatic Surgery, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University from January 2008 to December 2013.The transhepatic hilar approach was used in 19 patients and conventional approach was performed in 23 patients.There were no differences in clinical parameters between the two groups(all P >0.05). The t-test was used to analyze the measurement data, and the χ(2) test was used to analyze the count data.Kaplan-Meier analysis was used to analyze the survival period.Multivariate COX regression analysis was used to analyze the prognosis factors. Results: Among the 19 patients who underwent transhepatic hilar approach, 3 patients changed the operative planning after reevaluated by exposing the hepatic hilus.The intraoperative blood was 300(250-400)ml in the transhepatic hilar approach group, which was significantly less than the conventional approach group, 800(450-1 300)ml( t =4.276, P =0.00 1), meanwhile, the R0 resection rate was significantly higher in the transhepatic hilar approach group than in the conventional approach group(89.4% vs . 52.2; χ(2)=6.773, P =0.009) and the 3-year and 5-year cumulative survival rate was better in the transhepatic hilar approach group than in the conventional approach group(63.2% vs . 47.8%, 26.3% vs . 0; χ(2)=66.363, 127.185, P =0.000). On univariate analysis, transhepatic hilar approach, intraoperative blood loss, intraoperative blood transfusion, R0 resection and lymph node metastasis were significant risk factors for patient survival(all P <0.05). On multivariate analysis, use of transhepatic hilar approach, intraoperative blood loss, R0 resection and lymph node metastasis were significant independent risk factors for patient survival(all P <0.05). Conclusion: The transhepatic hilar approach is the preferred technique for surgical treatment for hilar cholangiocarcinoma because it can improve accuracy of surgical planning, safety of operation, R0 resection rate and survival rate compared with the conventional approach.
The Effect of Visual Information on the Manual Approach and Landing
NASA Technical Reports Server (NTRS)
Wewerinke, P. H.
1982-01-01
The effect of visual information in combination with basic display information on the approach performance. A pre-experimental model analysis was performed in terms of the optimal control model. The resulting aircraft approach performance predictions were compared with the results of a moving base simulator program. The results illustrate that the model provides a meaningful description of the visual (scene) perception process involved in the complex (multi-variable, time varying) manual approach task with a useful predictive capability. The theoretical framework was shown to allow a straight-forward investigation of the complex interaction of a variety of task variables.
TENSOR DECOMPOSITIONS AND SPARSE LOG-LINEAR MODELS
Johndrow, James E.; Bhattacharya, Anirban; Dunson, David B.
2017-01-01
Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. We derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions. PMID:29332971
De Luca, Michele; Ragno, Gaetano; Ioele, Giuseppina; Tauler, Romà
2014-07-21
An advanced and powerful chemometric approach is proposed for the analysis of incomplete multiset data obtained by fusion of hyphenated liquid chromatographic DAD/MS data with UV spectrophotometric data from acid-base titration and kinetic degradation experiments. Column- and row-wise augmented data blocks were combined and simultaneously processed by means of a new version of the multivariate curve resolution-alternating least squares (MCR-ALS) technique, including the simultaneous analysis of incomplete multiset data from different instrumental techniques. The proposed procedure was applied to the detailed study of the kinetic photodegradation process of the amiloride (AML) drug. All chemical species involved in the degradation and equilibrium reactions were resolved and the pH dependent kinetic pathway described. Copyright © 2014 Elsevier B.V. All rights reserved.
The natural mathematics of behavior analysis.
Li, Don; Hautus, Michael J; Elliffe, Douglas
2018-04-19
Models that generate event records have very general scope regarding the dimensions of the target behavior that we measure. From a set of predicted event records, we can generate predictions for any dependent variable that we could compute from the event records of our subjects. In this sense, models that generate event records permit us a freely multivariate analysis. To explore this proposition, we conducted a multivariate examination of Catania's Operant Reserve on single VI schedules in transition using a Markov Chain Monte Carlo scheme for Approximate Bayesian Computation. Although we found systematic deviations between our implementation of Catania's Operant Reserve and our observed data (e.g., mismatches in the shape of the interresponse time distributions), the general approach that we have demonstrated represents an avenue for modelling behavior that transcends the typical constraints of algebraic models. © 2018 Society for the Experimental Analysis of Behavior.
Fighting for Intelligence: A Brief Overview of the Academic Work of John L. Horn
McArdle, John J.; Hofer, Scott M.
2015-01-01
John L. Horn (1928–2006) was a pioneer in multivariate thinking and the application of multivariate methods to research on intelligence and personality. His key works on individual differences in the methodological areas of factor analysis and the substantive areas of cognition are reviewed here. John was also our mentor, teacher, colleague, and friend. We overview John Horn’s main contributions to the field of intelligence by highlighting 3 issues about his methods of factor analysis and 3 of his substantive debates about intelligence. We first focus on Horn’s methodological demonstrations describing (a) the many uses of simulated random variables in exploratory factor analysis; (b) the exploratory uses of confirmatory factor analysis; and (c) the key differences between states, traits, and trait-changes. On a substantive basis, John believed that there were important individual differences among people in terms of cognition and personality. These sentiments led to his intellectual battles about (d) Spearman’s g theory of a unitary intelligence, (e) Guilford’s multifaceted model of intelligence, and (f) the Schaie and Baltes approach to defining the lack of decline of intelligence earlier in the life span. We conclude with a summary of John Horn’s unique approaches to dealing with common issues. PMID:26246642
Novikova, Anna; Carstensen, Jens M; Rades, Thomas; Leopold, Prof Dr Claudia S
2016-12-30
In the present study the applicability of multispectral UV imaging in combination with multivariate image analysis for surface evaluation of MUPS tablets was investigated with respect to the differentiation of the API pellets from the excipients matrix, estimation of the drug content as well as pellet distribution, and influence of the coating material and tablet thickness on the predictive model. Different formulations consisting of coated drug pellets with two coating polymers (Aquacoat ® ECD and Eudragit ® NE 30 D) at three coating levels each were compressed to MUPS tablets with various amounts of coated pellets and different tablet thicknesses. The coated drug pellets were clearly distinguishable from the excipients matrix using a partial least squares approach regardless of the coating layer thickness and coating material used. Furthermore, the number of the detected drug pellets on the tablet surface allowed an estimation of the true drug content in the respective MUPS tablet. In addition, the pellet distribution in the MUPS formulations could be estimated by UV image analysis of the tablet surface. In conclusion, this study revealed that UV imaging in combination with multivariate image analysis is a promising approach for the automatic quality control of MUPS tablets during the manufacturing process. Copyright © 2016 Elsevier B.V. All rights reserved.
Vasudevan, Rama K; Tselev, Alexander; Baddorf, Arthur P; Kalinin, Sergei V
2014-10-28
Reflection high energy electron diffraction (RHEED) has by now become a standard tool for in situ monitoring of film growth by pulsed laser deposition and molecular beam epitaxy. Yet despite the widespread adoption and wealth of information in RHEED images, most applications are limited to observing intensity oscillations of the specular spot, and much additional information on growth is discarded. With ease of data acquisition and increased computation speeds, statistical methods to rapidly mine the data set are now feasible. Here, we develop such an approach to the analysis of the fundamental growth processes through multivariate statistical analysis of a RHEED image sequence. This approach is illustrated for growth of La(x)Ca(1-x)MnO(3) films grown on etched (001) SrTiO(3) substrates, but is universal. The multivariate methods including principal component analysis and k-means clustering provide insight into the relevant behaviors, the timing and nature of a disordered to ordered growth change, and highlight statistically significant patterns. Fourier analysis yields the harmonic components of the signal and allows separation of the relevant components and baselines, isolating the asymmetric nature of the step density function and the transmission spots from the imperfect layer-by-layer (LBL) growth. These studies show the promise of big data approaches to obtaining more insight into film properties during and after epitaxial film growth. Furthermore, these studies open the pathway to use forward prediction methods to potentially allow significantly more control over growth process and hence final film quality.
Ponsoda, Vicente; Martínez, Kenia; Pineda-Pardo, José A; Abad, Francisco J; Olea, Julio; Román, Francisco J; Barbey, Aron K; Colom, Roberto
2017-02-01
Neuroimaging research involves analyses of huge amounts of biological data that might or might not be related with cognition. This relationship is usually approached using univariate methods, and, therefore, correction methods are mandatory for reducing false positives. Nevertheless, the probability of false negatives is also increased. Multivariate frameworks have been proposed for helping to alleviate this balance. Here we apply multivariate distance matrix regression for the simultaneous analysis of biological and cognitive data, namely, structural connections among 82 brain regions and several latent factors estimating cognitive performance. We tested whether cognitive differences predict distances among individuals regarding their connectivity pattern. Beginning with 3,321 connections among regions, the 36 edges better predicted by the individuals' cognitive scores were selected. Cognitive scores were related to connectivity distances in both the full (3,321) and reduced (36) connectivity patterns. The selected edges connect regions distributed across the entire brain and the network defined by these edges supports high-order cognitive processes such as (a) (fluid) executive control, (b) (crystallized) recognition, learning, and language processing, and (c) visuospatial processing. This multivariate study suggests that one widespread, but limited number, of regions in the human brain, supports high-level cognitive ability differences. Hum Brain Mapp 38:803-816, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Hot spots of multivariate extreme anomalies in Earth observations
NASA Astrophysics Data System (ADS)
Flach, M.; Sippel, S.; Bodesheim, P.; Brenning, A.; Denzler, J.; Gans, F.; Guanche, Y.; Reichstein, M.; Rodner, E.; Mahecha, M. D.
2016-12-01
Anomalies in Earth observations might indicate data quality issues, extremes or the change of underlying processes within a highly multivariate system. Thus, considering the multivariate constellation of variables for extreme detection yields crucial additional information over conventional univariate approaches. We highlight areas in which multivariate extreme anomalies are more likely to occur, i.e. hot spots of extremes in global atmospheric Earth observations that impact the Biosphere. In addition, we present the year of the most unusual multivariate extreme between 2001 and 2013 and show that these coincide with well known high impact extremes. Technically speaking, we account for multivariate extremes by using three sophisticated algorithms adapted from computer science applications. Namely an ensemble of the k-nearest neighbours mean distance, a kernel density estimation and an approach based on recurrences is used. However, the impact of atmosphere extremes on the Biosphere might largely depend on what is considered to be normal, i.e. the shape of the mean seasonal cycle and its inter-annual variability. We identify regions with similar mean seasonality by means of dimensionality reduction in order to estimate in each region both the `normal' variance and robust thresholds for detecting the extremes. In addition, we account for challenges like heteroscedasticity in Northern latitudes. Apart from hot spot areas, those anomalies in the atmosphere time series are of particular interest, which can only be detected by a multivariate approach but not by a simple univariate approach. Such an anomalous constellation of atmosphere variables is of interest if it impacts the Biosphere. The multivariate constellation of such an anomalous part of a time series is shown in one case study indicating that multivariate anomaly detection can provide novel insights into Earth observations.
Lepre, Jorge; Rice, J Jeremy; Tu, Yuhai; Stolovitzky, Gustavo
2004-05-01
Despite the growing literature devoted to finding differentially expressed genes in assays probing different tissues types, little attention has been paid to the combinatorial nature of feature selection inherent to large, high-dimensional gene expression datasets. New flexible data analysis approaches capable of searching relevant subgroups of genes and experiments are needed to understand multivariate associations of gene expression patterns with observed phenotypes. We present in detail a deterministic algorithm to discover patterns of multivariate gene associations in gene expression data. The patterns discovered are differential with respect to a control dataset. The algorithm is exhaustive and efficient, reporting all existent patterns that fit a given input parameter set while avoiding enumeration of the entire pattern space. The value of the pattern discovery approach is demonstrated by finding a set of genes that differentiate between two types of lymphoma. Moreover, these genes are found to behave consistently in an independent dataset produced in a different laboratory using different arrays, thus validating the genes selected using our algorithm. We show that the genes deemed significant in terms of their multivariate statistics will be missed using other methods. Our set of pattern discovery algorithms including a user interface is distributed as a package called Genes@Work. This package is freely available to non-commercial users and can be downloaded from our website (http://www.research.ibm.com/FunGen).
NASA Astrophysics Data System (ADS)
Badrzadeh, Honey; Sarukkalige, Ranjan; Jayawardena, A. W.
2013-12-01
Discrete wavelet transform was applied to decomposed ANN and ANFIS inputs.Novel approach of WNF with subtractive clustering applied for flow forecasting.Forecasting was performed in 1-5 step ahead, using multi-variate inputs.Forecasting accuracy of peak values and longer lead-time significantly improved.
Evolutionary Losses? The Growth of Graduate Programs at Undergraduate Colleges.
ERIC Educational Resources Information Center
McCormick, Alexander C.; Staklis, Sandra
This study examined the addition and expansion of graduate programs at primarily undergraduate colleges. The primary approach of the study was quantitative, consisting of descriptive and multivariate analysis of master's degree programs at colleges that were classified in 1994 as Baccalaureate Colleges. Data came from the 1994 and 2000 Carnegie…
2003-07-01
4, Gnanadesikan , 1977). An entity whose measured features fall into one of the regions is classified accordingly. For the approaches we discuss here... Gnanadesikan , R. 1977. Methods for Statistical Data Analysis of Multivariate Observations. John Wiley & Sons, New York. Hassig, N. L., O’Brien, R. F
Multivariate assessment of event-related potentials with the t-CWT method.
Bostanov, Vladimir
2015-11-05
Event-related brain potentials (ERPs) are usually assessed with univariate statistical tests although they are essentially multivariate objects. Brain-computer interface applications are a notable exception to this practice, because they are based on multivariate classification of single-trial ERPs. Multivariate ERP assessment can be facilitated by feature extraction methods. One such method is t-CWT, a mathematical-statistical algorithm based on the continuous wavelet transform (CWT) and Student's t-test. This article begins with a geometric primer on some basic concepts of multivariate statistics as applied to ERP assessment in general and to the t-CWT method in particular. Further, it presents for the first time a detailed, step-by-step, formal mathematical description of the t-CWT algorithm. A new multivariate outlier rejection procedure based on principal component analysis in the frequency domain is presented as an important pre-processing step. The MATLAB and GNU Octave implementation of t-CWT is also made publicly available for the first time as free and open source code. The method is demonstrated on some example ERP data obtained in a passive oddball paradigm. Finally, some conceptually novel applications of the multivariate approach in general and of the t-CWT method in particular are suggested and discussed. Hopefully, the publication of both the t-CWT source code and its underlying mathematical algorithm along with a didactic geometric introduction to some basic concepts of multivariate statistics would make t-CWT more accessible to both users and developers in the field of neuroscience research.
Chen, Yong; Luo, Sheng; Chu, Haitao; Wei, Peng
2013-05-01
Multivariate meta-analysis is useful in combining evidence from independent studies which involve several comparisons among groups based on a single outcome. For binary outcomes, the commonly used statistical models for multivariate meta-analysis are multivariate generalized linear mixed effects models which assume risks, after some transformation, follow a multivariate normal distribution with possible correlations. In this article, we consider an alternative model for multivariate meta-analysis where the risks are modeled by the multivariate beta distribution proposed by Sarmanov (1966). This model have several attractive features compared to the conventional multivariate generalized linear mixed effects models, including simplicity of likelihood function, no need to specify a link function, and has a closed-form expression of distribution functions for study-specific risk differences. We investigate the finite sample performance of this model by simulation studies and illustrate its use with an application to multivariate meta-analysis of adverse events of tricyclic antidepressants treatment in clinical trials.
Konukoglu, Ender; Coutu, Jean-Philippe; Salat, David H; Fischl, Bruce
2016-07-01
Diffusion magnetic resonance imaging (dMRI) is a unique technology that allows the noninvasive quantification of microstructural tissue properties of the human brain in healthy subjects as well as the probing of disease-induced variations. Population studies of dMRI data have been essential in identifying pathological structural changes in various conditions, such as Alzheimer's and Huntington's diseases (Salat et al., 2010; Rosas et al., 2006). The most common form of dMRI involves fitting a tensor to the underlying imaging data (known as diffusion tensor imaging, or DTI), then deriving parametric maps, each quantifying a different aspect of the underlying microstructure, e.g. fractional anisotropy and mean diffusivity. To date, the statistical methods utilized in most DTI population studies either analyzed only one such map or analyzed several of them, each in isolation. However, it is most likely that variations in the microstructure due to pathology or normal variability would affect several parameters simultaneously, with differing variations modulating the various parameters to differing degrees. Therefore, joint analysis of the available diffusion maps can be more powerful in characterizing histopathology and distinguishing between conditions than the widely used univariate analysis. In this article, we propose a multivariate approach for statistical analysis of diffusion parameters that uses partial least squares correlation (PLSC) analysis and permutation testing as building blocks in a voxel-wise fashion. Stemming from the common formulation, we present three different multivariate procedures for group analysis, regressing-out nuisance parameters and comparing effects of different conditions. We used the proposed procedures to study the effects of non-demented aging, Alzheimer's disease and mild cognitive impairment on the white matter. Here, we present results demonstrating that the proposed PLSC-based approach can differentiate between effects of different conditions in the same region as well as uncover spatial variations of effects across the white matter. The proposed procedures were able to answer questions on structural variations such as: "are there regions in the white matter where Alzheimer's disease has a different effect than aging or similar effect as aging?" and "are there regions in the white matter that are affected by both mild cognitive impairment and Alzheimer's disease but with differing multivariate effects?" Copyright © 2016 Elsevier Inc. All rights reserved.
Konukoglu, Ender; Coutu, Jean-Philippe; Salat, David H.; Fischl, Bruce
2016-01-01
Diffusion magnetic resonance imaging (dMRI) is a unique technology that allows the noninvasive quantification of microstructural tissue properties of the human brain in healthy subjects as well as the probing of disease-induced variations. Population studies of dMRI data have been essential in identifying pathological structural changes in various conditions, such as Alzheimer’s and Huntington’s diseases1,2. The most common form of dMRI involves fitting a tensor to the underlying imaging data (known as Diffusion Tensor Imaging, or DTI), then deriving parametric maps, each quantifying a different aspect of the underlying microstructure, e.g. fractional anisotropy and mean diffusivity. To date, the statistical methods utilized in most DTI population studies either analyzed only one such map or analyzed several of them, each in isolation. However, it is most likely that variations in the microstructure due to pathology or normal variability would affect several parameters simultaneously, with differing variations modulating the various parameters to differing degrees. Therefore, joint analysis of the available diffusion maps can be more powerful in characterizing histopathology and distinguishing between conditions than the widely used univariate analysis. In this article, we propose a multivariate approach for statistical analysis of diffusion parameters that uses partial least squares correlation (PLSC) analysis and permutation testing as building blocks in a voxel-wise fashion. Stemming from the common formulation, we present three different multivariate procedures for group analysis, regressing-out nuisance parameters and comparing effects of different conditions. We used the proposed procedures to study the effects of non-demented aging, Alzheimer’s disease and mild cognitive impairment on the white matter. Here, we present results demonstrating that the proposed PLSC-based approach can differentiate between effects of different conditions in the same region as well as uncover spatial variations of effects across the white matter. The proposed procedures were able to answer questions on structural variations such as: “are there regions in the white matter where Alzheimer’s disease has a different effect than aging or similar effect as aging?” and “are there regions in the white matter that are affected by both mild cognitive impairment and Alzheimer’s disease but with differing multivariate effects?” PMID:27103138
SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *
Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.
2014-01-01
The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844
Structural changes in cross-border liabilities: A multidimensional approach
NASA Astrophysics Data System (ADS)
Araújo, Tanya; Spelta, Alessandro
2014-01-01
We study the international interbank market through a geometric analysis of empirical data. The geometric analysis of the time series of cross-country liabilities shows that the systematic information of the interbank international market is contained in a space of small dimension. Geometric spaces of financial relations across countries are developed, for which the space volume, multivariate skewness and multivariate kurtosis are computed. The behavior of these coefficients reveals an important modification acting in the financial linkages since 1997 and allows us to relate the shape of the geometric space that emerges in recent years to the globally turbulent period that has characterized financial systems since the late 1990s. Here we show that, besides a persistent decrease in the volume of the geometric space since 1997, the observation of a generalized increase in the values of the multivariate skewness and kurtosis sheds some light on the behavior of cross-border interdependencies during periods of financial crises. This was found to occur in such a systematic fashion, that these coefficients may be used as a proxy for systemic risk.
Multiscale analysis of information dynamics for linear multivariate processes.
Faes, Luca; Montalto, Alessandro; Stramaglia, Sebastiano; Nollo, Giandomenico; Marinazzo, Daniele
2016-08-01
In the study of complex physical and physiological systems represented by multivariate time series, an issue of great interest is the description of the system dynamics over a range of different temporal scales. While information-theoretic approaches to the multiscale analysis of complex dynamics are being increasingly used, the theoretical properties of the applied measures are poorly understood. This study introduces for the first time a framework for the analytical computation of information dynamics for linear multivariate stochastic processes explored at different time scales. After showing that the multiscale processing of a vector autoregressive (VAR) process introduces a moving average (MA) component, we describe how to represent the resulting VARMA process using statespace (SS) models and how to exploit the SS model parameters to compute analytical measures of information storage and information transfer for the original and rescaled processes. The framework is then used to quantify multiscale information dynamics for simulated unidirectionally and bidirectionally coupled VAR processes, showing that rescaling may lead to insightful patterns of information storage and transfer but also to potentially misleading behaviors.
Riley, Richard D; Elia, Eleni G; Malin, Gemma; Hemming, Karla; Price, Malcolm P
2015-07-30
A prognostic factor is any measure that is associated with the risk of future health outcomes in those with existing disease. Often, the prognostic ability of a factor is evaluated in multiple studies. However, meta-analysis is difficult because primary studies often use different methods of measurement and/or different cut-points to dichotomise continuous factors into 'high' and 'low' groups; selective reporting is also common. We illustrate how multivariate random effects meta-analysis models can accommodate multiple prognostic effect estimates from the same study, relating to multiple cut-points and/or methods of measurement. The models account for within-study and between-study correlations, which utilises more information and reduces the impact of unreported cut-points and/or measurement methods in some studies. The applicability of the approach is improved with individual participant data and by assuming a functional relationship between prognostic effect and cut-point to reduce the number of unknown parameters. The models provide important inferential results for each cut-point and method of measurement, including the summary prognostic effect, the between-study variance and a 95% prediction interval for the prognostic effect in new populations. Two applications are presented. The first reveals that, in a multivariate meta-analysis using published results, the Apgar score is prognostic of neonatal mortality but effect sizes are smaller at most cut-points than previously thought. In the second, a multivariate meta-analysis of two methods of measurement provides weak evidence that microvessel density is prognostic of mortality in lung cancer, even when individual participant data are available so that a continuous prognostic trend is examined (rather than cut-points). © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Detecting spatio-temporal modes in multivariate data by entropy field decomposition
NASA Astrophysics Data System (ADS)
Frank, Lawrence R.; Galinsky, Vitaly L.
2016-09-01
A new data analysis method that addresses a general problem of detecting spatio-temporal variations in multivariate data is presented. The method utilizes two recent and complimentary general approaches to data analysis, information field theory (IFT) and entropy spectrum pathways (ESPs). Both methods reformulate and incorporate Bayesian theory, thus use prior information to uncover underlying structure of the unknown signal. Unification of ESP and IFT creates an approach that is non-Gaussian and nonlinear by construction and is found to produce unique spatio-temporal modes of signal behavior that can be ranked according to their significance, from which space-time trajectories of parameter variations can be constructed and quantified. Two brief examples of real world applications of the theory to the analysis of data bearing completely different, unrelated nature, lacking any underlying similarity, are also presented. The first example provides an analysis of resting state functional magnetic resonance imaging data that allowed us to create an efficient and accurate computational method for assessing and categorizing brain activity. The second example demonstrates the potential of the method in the application to the analysis of a strong atmospheric storm circulation system during the complicated stage of tornado development and formation using data recorded by a mobile Doppler radar. Reference implementation of the method will be made available as a part of the QUEST toolkit that is currently under development at the Center for Scientific Computation in Imaging.
Belianinov, Alex; Panchapakesan, G.; Lin, Wenzhi; ...
2014-12-02
Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1 x Sex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signaturemore » and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Belianinov, Alex, E-mail: belianinova@ornl.gov; Ganesh, Panchapakesan; Lin, Wenzhi
2014-12-01
Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe{sub 0.55}Se{sub 0.45} (T{sub c} = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe{sub 1−x}Se{sub x} structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified bymore » their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less
Multivariate Bias Correction Procedures for Improving Water Quality Predictions from the SWAT Model
NASA Astrophysics Data System (ADS)
Arumugam, S.; Libera, D.
2017-12-01
Water quality observations are usually not available on a continuous basis for longer than 1-2 years at a time over a decadal period given the labor requirements making calibrating and validating mechanistic models difficult. Further, any physical model predictions inherently have bias (i.e., under/over estimation) and require post-simulation techniques to preserve the long-term mean monthly attributes. This study suggests a multivariate bias-correction technique and compares to a common technique in improving the performance of the SWAT model in predicting daily streamflow and TN loads across the southeast based on split-sample validation. The approach is a dimension reduction technique, canonical correlation analysis (CCA) that regresses the observed multivariate attributes with the SWAT model simulated values. The common approach is a regression based technique that uses an ordinary least squares regression to adjust model values. The observed cross-correlation between loadings and streamflow is better preserved when using canonical correlation while simultaneously reducing individual biases. Additionally, canonical correlation analysis does a better job in preserving the observed joint likelihood of observed streamflow and loadings. These procedures were applied to 3 watersheds chosen from the Water Quality Network in the Southeast Region; specifically, watersheds with sufficiently large drainage areas and number of observed data points. The performance of these two approaches are compared for the observed period and over a multi-decadal period using loading estimates from the USGS LOADEST model. Lastly, the CCA technique is applied in a forecasting sense by using 1-month ahead forecasts of P & T from ECHAM4.5 as forcings in the SWAT model. Skill in using the SWAT model for forecasting loadings and streamflow at the monthly and seasonal timescale is also discussed.
Exploring connectivity with large-scale Granger causality on resting-state functional MRI.
DSouza, Adora M; Abidin, Anas Z; Leistritz, Lutz; Wismüller, Axel
2017-08-01
Large-scale Granger causality (lsGC) is a recently developed, resting-state functional MRI (fMRI) connectivity analysis approach that estimates multivariate voxel-resolution connectivity. Unlike most commonly used multivariate approaches, which establish coarse-resolution connectivity by aggregating voxel time-series avoiding an underdetermined problem, lsGC estimates voxel-resolution, fine-grained connectivity by incorporating an embedded dimension reduction. We investigate application of lsGC on realistic fMRI simulations, modeling smoothing of neuronal activity by the hemodynamic response function and repetition time (TR), and empirical resting-state fMRI data. Subsequently, functional subnetworks are extracted from lsGC connectivity measures for both datasets and validated quantitatively. We also provide guidelines to select lsGC free parameters. Results indicate that lsGC reliably recovers underlying network structure with area under receiver operator characteristic curve (AUC) of 0.93 at TR=1.5s for a 10-min session of fMRI simulations. Furthermore, subnetworks of closely interacting modules are recovered from the aforementioned lsGC networks. Results on empirical resting-state fMRI data demonstrate recovery of visual and motor cortex in close agreement with spatial maps obtained from (i) visuo-motor fMRI stimulation task-sequence (Accuracy=0.76) and (ii) independent component analysis (ICA) of resting-state fMRI (Accuracy=0.86). Compared with conventional Granger causality approach (AUC=0.75), lsGC produces better network recovery on fMRI simulations. Furthermore, it cannot recover functional subnetworks from empirical fMRI data, since quantifying voxel-resolution connectivity is not possible as consequence of encountering an underdetermined problem. Functional network recovery from fMRI data suggests that lsGC gives useful insight into connectivity patterns from resting-state fMRI at a multivariate voxel-resolution. Copyright © 2017 Elsevier B.V. All rights reserved.
Detecting a currency’s dominance using multivariate time series analysis
NASA Astrophysics Data System (ADS)
Syahidah Yusoff, Nur; Sharif, Shamshuritawati
2017-09-01
A currency exchange rate is the price of one country’s currency in terms of another country’s currency. There are four different prices; opening, closing, highest, and lowest can be achieved from daily trading activities. In the past, a lot of studies have been carried out by using closing price only. However, those four prices are interrelated to each other. Thus, the multivariate time series can provide more information than univariate time series. Therefore, the enthusiasm of this paper is to compare the results of two different approaches, which are mean vector and Escoufier’s RV coefficient in constructing similarity matrices of 20 world currencies. Consequently, both matrices are used to substitute the correlation matrix required by network topology. With the help of degree centrality measure, we can detect the currency’s dominance for both networks. The pros and cons for both approaches will be presented at the end of this paper.
Materials Approach to Dissecting Surface Responses in the Attachment Stages of Biofouling Organisms
2016-04-25
their settlement behavior in regards to the coating surfaces. 5) Multivariate statistical analysis was used to examine the effect (if any) of the...applied to glass rods and were deployed in the field to evaluate settlement preferences. Canonical Analysis of Principal Coordinates were applied to...the influence of coating surface properties on the patterns in settlement observed in the field in the extension of this work over the coming year
Pariser, Joseph J; Pearce, Shane M; Patel, Sanjay G; Bales, Gregory T
2015-10-01
To examine the national trends of simple prostatectomy (SP) for benign prostatic hyperplasia (BPH) focusing on perioperative outcomes and risk factors for complications. The National Inpatient Sample (2002-2012) was utilized to identify patients with BPH undergoing SP. Analysis included demographics, hospital details, associated procedures, and operative approach (open, robotic, or laparoscopic). Outcomes included complications, length of stay, charges, and mortality. Multivariate logistic regression was used to determine the risk factors for perioperative complications. Linear regression was used to assess the trends in the national annual utilization of SP. The study population included 35,171 patients. Median length of stay was 4 days (interquartile range 3-6). Cystolithotomy was performed concurrently in 6041 patients (17%). The overall complication rate was 28%, with bleeding occurring most commonly. In total, 148 (0.4%) patients experienced in-hospital mortality. On multivariate analysis, older age, black race, and overall comorbidity were associated with greater risk of complications while the use of a minimally invasive approach and concurrent cystolithotomy had a decreased risk. Over the study period, the national use of simple prostatectomy decreased, on average, by 145 cases per year (P = .002). By 2012, 135/2580 procedures (5%) were performed using a minimally invasive approach. The nationwide utilization of SP for BPH has decreased. Bleeding complications are common, but perioperative mortality is low. Patients who are older, black race, or have multiple comorbidities are at higher risk of complications. Minimally invasive approaches, which are becoming increasingly utilized, may reduce perioperative morbidity. Copyright © 2015 Elsevier Inc. All rights reserved.
McFarquhar, Martyn; McKie, Shane; Emsley, Richard; Suckling, John; Elliott, Rebecca; Williams, Stephen
2016-05-15
Repeated measurements and multimodal data are common in neuroimaging research. Despite this, conventional approaches to group level analysis ignore these repeated measurements in favour of multiple between-subject models using contrasts of interest. This approach has a number of drawbacks as certain designs and comparisons of interest are either not possible or complex to implement. Unfortunately, even when attempting to analyse group level data within a repeated-measures framework, the methods implemented in popular software packages make potentially unrealistic assumptions about the covariance structure across the brain. In this paper, we describe how this issue can be addressed in a simple and efficient manner using the multivariate form of the familiar general linear model (GLM), as implemented in a new MATLAB toolbox. This multivariate framework is discussed, paying particular attention to methods of inference by permutation. Comparisons with existing approaches and software packages for dependent group-level neuroimaging data are made. We also demonstrate how this method is easily adapted for dependency at the group level when multiple modalities of imaging are collected from the same individuals. Follow-up of these multimodal models using linear discriminant functions (LDA) is also discussed, with applications to future studies wishing to integrate multiple scanning techniques into investigating populations of interest. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Liu, Chia-Chuan; Shih, Chih-Shiun; Pennarun, Nicolas; Cheng, Chih-Tao
2016-01-01
The feasibility and radicalism of lymph node dissection for lung cancer surgery by a single-port technique has frequently been challenged. We performed a retrospective cohort study to investigate this issue. Two chest surgeons initiated multiple-port thoracoscopic surgery in a 180-bed cancer centre in 2005 and shifted to a single-port technique gradually after 2010. Data, including demographic and clinical information, from 389 patients receiving multiport thoracoscopic lobectomy or segmentectomy and 149 consecutive patients undergoing either single-port lobectomy or segmentectomy for primary non-small-cell lung cancer were retrieved and entered for statistical analysis by multivariable linear regression models and Box-Cox transformed multivariable analysis. The mean number of total dissected lymph nodes in the lobectomy group was 28.5 ± 11.7 for the single-port group versus 25.2 ± 11.3 for the multiport group; the mean number of total dissected lymph nodes in the segmentectomy group was 19.5 ± 10.8 for the single-port group versus 17.9 ± 10.3 for the multiport group. In linear multivariable and after Box-Cox transformed multivariable analyses, the single-port approach was still associated with a higher total number of dissected lymph nodes. The total number of dissected lymph nodes for primary lung cancer surgery by single-port video-assisted thoracoscopic surgery (VATS) was higher than by multiport VATS in univariable, multivariable linear regression and Box-Cox transformed multivariable analyses. This study confirmed that highly effective lymph node dissection could be achieved through single-port VATS in our setting. © The Author 2015. Published by Oxford University Press on behalf of the European Association for Cardio-Thoracic Surgery. All rights reserved.
Harrison, Jay M; Howard, Delia; Malven, Marianne; Halls, Steven C; Culler, Angela H; Harrigan, George G; Wolfinger, Russell D
2013-07-03
Compositional studies on genetically modified (GM) and non-GM crops have consistently demonstrated that their respective levels of key nutrients and antinutrients are remarkably similar and that other factors such as germplasm and environment contribute more to compositional variability than transgenic breeding. We propose that graphical and statistical approaches that can provide meaningful evaluations of the relative impact of different factors to compositional variability may offer advantages over traditional frequentist testing. A case study on the novel application of principal variance component analysis (PVCA) in a compositional assessment of herbicide-tolerant GM cotton is presented. Results of the traditional analysis of variance approach confirmed the compositional equivalence of the GM and non-GM cotton. The multivariate approach of PVCA provided further information on the impact of location and germplasm on compositional variability relative to GM.
MDAS: an integrated system for metabonomic data analysis.
Liu, Juan; Li, Bo; Xiong, Jiang-Hui
2009-03-01
Metabonomics, the latest 'omics' research field, shows great promise as a tool in biomarker discovery, drug efficacy and toxicity analysis, disease diagnosis and prognosis. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system, e.g., the mechanism of diseases. Traditional methods employed in metabonomic data analysis use multivariate analysis methods developed independently in chemometrics research. Additionally, with the development of machine learning approaches, some methods such as SVMs also show promise for use in metabonomic data analysis. Aside from the application of general multivariate analysis and machine learning methods to this problem, there is also a need for an integrated tool customized for metabonomic data analysis which can be easily used by biologists to reveal interesting patterns in metabonomic data.In this paper, we present a novel software tool MDAS (Metabonomic Data Analysis System) for metabonomic data analysis which integrates traditional chemometrics methods and newly introduced machine learning approaches. MDAS contains a suite of functional models for metabonomic data analysis and optimizes the flow of data analysis. Several file formats can be accepted as input. The input data can be optionally preprocessed and can then be processed with operations such as feature analysis and dimensionality reduction. The data with reduced dimensionalities can be used for training or testing through machine learning models. The system supplies proper visualization for data preprocessing, feature analysis, and classification which can be a powerful function for users to extract knowledge from the data. MDAS is an integrated platform for metabonomic data analysis, which transforms a complex analysis procedure into a more formalized and simplified one. The software package can be obtained from the authors.
Multivariate survivorship analysis using two cross-sectional samples.
Hill, M E
1999-11-01
As an alternative to survival analysis with longitudinal data, I introduce a method that can be applied when one observes the same cohort in two cross-sectional samples collected at different points in time. The method allows for the estimation of log-probability survivorship models that estimate the influence of multiple time-invariant factors on survival over a time interval separating two samples. This approach can be used whenever the survival process can be adequately conceptualized as an irreversible single-decrement process (e.g., mortality, the transition to first marriage among a cohort of never-married individuals). Using data from the Integrated Public Use Microdata Series (Ruggles and Sobek 1997), I illustrate the multivariate method through an investigation of the effects of race, parity, and educational attainment on the survival of older women in the United States.
EEMD-based multiscale ICA method for slewing bearing fault detection and diagnosis
NASA Astrophysics Data System (ADS)
Žvokelj, Matej; Zupan, Samo; Prebil, Ivan
2016-05-01
A novel multivariate and multiscale statistical process monitoring method is proposed with the aim of detecting incipient failures in large slewing bearings, where subjective influence plays a minor role. The proposed method integrates the strengths of the Independent Component Analysis (ICA) multivariate monitoring approach with the benefits of Ensemble Empirical Mode Decomposition (EEMD), which adaptively decomposes signals into different time scales and can thus cope with multiscale system dynamics. The method, which was named EEMD-based multiscale ICA (EEMD-MSICA), not only enables bearing fault detection but also offers a mechanism of multivariate signal denoising and, in combination with the Envelope Analysis (EA), a diagnostic tool. The multiscale nature of the proposed approach makes the method convenient to cope with data which emanate from bearings in complex real-world rotating machinery and frequently represent the cumulative effect of many underlying phenomena occupying different regions in the time-frequency plane. The efficiency of the proposed method was tested on simulated as well as real vibration and Acoustic Emission (AE) signals obtained through conducting an accelerated run-to-failure lifetime experiment on a purpose-built laboratory slewing bearing test stand. The ability to detect and locate the early-stage rolling-sliding contact fatigue failure of the bearing indicates that AE and vibration signals carry sufficient information on the bearing condition and that the developed EEMD-MSICA method is able to effectively extract it, thereby representing a reliable bearing fault detection and diagnosis strategy.
Enhancing e-waste estimates: improving data quality by multivariate Input-Output Analysis.
Wang, Feng; Huisman, Jaco; Stevels, Ab; Baldé, Cornelis Peter
2013-11-01
Waste electrical and electronic equipment (or e-waste) is one of the fastest growing waste streams, which encompasses a wide and increasing spectrum of products. Accurate estimation of e-waste generation is difficult, mainly due to lack of high quality data referred to market and socio-economic dynamics. This paper addresses how to enhance e-waste estimates by providing techniques to increase data quality. An advanced, flexible and multivariate Input-Output Analysis (IOA) method is proposed. It links all three pillars in IOA (product sales, stock and lifespan profiles) to construct mathematical relationships between various data points. By applying this method, the data consolidation steps can generate more accurate time-series datasets from available data pool. This can consequently increase the reliability of e-waste estimates compared to the approach without data processing. A case study in the Netherlands is used to apply the advanced IOA model. As a result, for the first time ever, complete datasets of all three variables for estimating all types of e-waste have been obtained. The result of this study also demonstrates significant disparity between various estimation models, arising from the use of data under different conditions. It shows the importance of applying multivariate approach and multiple sources to improve data quality for modelling, specifically using appropriate time-varying lifespan parameters. Following the case study, a roadmap with a procedural guideline is provided to enhance e-waste estimation studies. Copyright © 2013 Elsevier Ltd. All rights reserved.
Peikert, Tobias; Duan, Fenghai; Rajagopalan, Srinivasan; Karwoski, Ronald A; Clay, Ryan; Robb, Richard A; Qin, Ziling; Sicks, JoRean; Bartholmai, Brian J; Maldonado, Fabien
2018-01-01
Optimization of the clinical management of screen-detected lung nodules is needed to avoid unnecessary diagnostic interventions. Herein we demonstrate the potential value of a novel radiomics-based approach for the classification of screen-detected indeterminate nodules. Independent quantitative variables assessing various radiologic nodule features such as sphericity, flatness, elongation, spiculation, lobulation and curvature were developed from the NLST dataset using 726 indeterminate nodules (all ≥ 7 mm, benign, n = 318 and malignant, n = 408). Multivariate analysis was performed using least absolute shrinkage and selection operator (LASSO) method for variable selection and regularization in order to enhance the prediction accuracy and interpretability of the multivariate model. The bootstrapping method was then applied for the internal validation and the optimism-corrected AUC was reported for the final model. Eight of the originally considered 57 quantitative radiologic features were selected by LASSO multivariate modeling. These 8 features include variables capturing Location: vertical location (Offset carina centroid z), Size: volume estimate (Minimum enclosing brick), Shape: flatness, Density: texture analysis (Score Indicative of Lesion/Lung Aggression/Abnormality (SILA) texture), and surface characteristics: surface complexity (Maximum shape index and Average shape index), and estimates of surface curvature (Average positive mean curvature and Minimum mean curvature), all with P<0.01. The optimism-corrected AUC for these 8 features is 0.939. Our novel radiomic LDCT-based approach for indeterminate screen-detected nodule characterization appears extremely promising however independent external validation is needed.
Multivariate η-μ fading distribution with arbitrary correlation model
NASA Astrophysics Data System (ADS)
Ghareeb, Ibrahim; Atiani, Amani
2018-03-01
An extensive analysis for the multivariate ? distribution with arbitrary correlation is presented, where novel analytical expressions for the multivariate probability density function, cumulative distribution function and moment generating function (MGF) of arbitrarily correlated and not necessarily identically distributed ? power random variables are derived. Also, this paper provides exact-form expression for the MGF of the instantaneous signal-to-noise ratio at the combiner output in a diversity reception system with maximal-ratio combining and post-detection equal-gain combining operating in slow frequency nonselective arbitrarily correlated not necessarily identically distributed ?-fading channels. The average bit error probability of differentially detected quadrature phase shift keying signals with post-detection diversity reception system over arbitrarily correlated and not necessarily identical fading parameters ?-fading channels is determined by using the MGF-based approach. The effect of fading correlation between diversity branches, fading severity parameters and diversity level is studied.
ERIC Educational Resources Information Center
Park, Hyeran; Nielsen, Wendy; Woodruff, Earl
2014-01-01
This study examined and compared students' understanding of nature of science (NOS) with 521 Grade 8 Canadian and Korean students using a mixed methods approach. The concepts of NOS were measured using a survey that had both quantitative and qualitative elements. Descriptive statistics and one-way multivariate analysis of variances examined the…
The Assessment of Neurological Systems with Functional Imaging
ERIC Educational Resources Information Center
Eidelberg, David
2007-01-01
In recent years a number of multivariate approaches have been introduced to map neural systems in health and disease. In this review, we focus on spatial covariance methods applied to functional imaging data to identify patterns of regional activity associated with behavior. In the rest state, this form of network analysis can be used to detect…
Zhu, Hongbin; Wang, Chunyan; Qi, Yao; Song, Fengrui; Liu, Zhiqiang; Liu, Shuying
2012-11-08
This study presents a novel and rapid method to identify chemical markers for the quality control of Radix Aconiti Preparata, a world widely used traditional herbal medicine. In the method, the samples with a fast extraction procedure were analyzed using direct analysis in real time mass spectrometry (DART MS) combined with multivariate data analysis. At present, the quality assessment approach of Radix Aconiti Preparata was based on the two processing methods recorded in Chinese Pharmacopoeia for the purpose of reducing the toxicity of Radix Aconiti and ensuring its clinical therapeutic efficacy. In order to ensure the safety and effectivity in clinical use, the processing degree of Radix Aconiti should be well controlled and assessed. In the paper, hierarchical cluster analysis and principal component analysis were performed to evaluate the DART MS data of Radix Aconiti Preparata samples in different processing times. The results showed that the well processed Radix Aconiti Preparata, unqualified processed and the raw Radix Aconiti could be clustered reasonably corresponding to their constituents. The loading plot shows that the main chemical markers having the most influence on the discrimination amongst the qualified and unqualified samples were mainly some monoester diterpenoid aconitines and diester diterpenoid aconitines, i.e. benzoylmesaconine, hypaconitine, mesaconitine, neoline, benzoylhypaconine, benzoylaconine, fuziline, aconitine and 10-OH-mesaconitine. The established DART MS approach in combination with multivariate data analysis provides a very flexible and reliable method for quality assessment of toxic herbal medicine. Copyright © 2012 Elsevier B.V. All rights reserved.
Multivariate analysis in thoracic research.
Mengual-Macenlle, Noemí; Marcos, Pedro J; Golpe, Rafael; González-Rivas, Diego
2015-03-01
Multivariate analysis is based in observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. The development of multivariate methods emerged to analyze large databases and increasingly complex data. Since the best way to represent the knowledge of reality is the modeling, we should use multivariate statistical methods. Multivariate methods are designed to simultaneously analyze data sets, i.e., the analysis of different variables for each person or object studied. Keep in mind at all times that all variables must be treated accurately reflect the reality of the problem addressed. There are different types of multivariate analysis and each one should be employed according to the type of variables to analyze: dependent, interdependence and structural methods. In conclusion, multivariate methods are ideal for the analysis of large data sets and to find the cause and effect relationships between variables; there is a wide range of analysis types that we can use.
Adaptive windowing and windowless approaches to estimate dynamic functional brain connectivity
NASA Astrophysics Data System (ADS)
Yaesoubi, Maziar; Calhoun, Vince D.
2017-08-01
In this work, we discuss estimation of dynamic dependence of a multi-variate signal. Commonly used approaches are often based on a locality assumption (e.g. sliding-window) which can miss spontaneous changes due to blurring with local but unrelated changes. We discuss recent approaches to overcome this limitation including 1) a wavelet-space approach, essentially adapting the window to the underlying frequency content and 2) a sparse signal-representation which removes any locality assumption. The latter is especially useful when there is no prior knowledge of the validity of such assumption as in brain-analysis. Results on several large resting-fMRI data sets highlight the potential of these approaches.
Gene set analysis using variance component tests.
Huang, Yen-Tsung; Lin, Xihong
2013-06-28
Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.
NASA Astrophysics Data System (ADS)
Guimarães Nobre, Gabriela; Arnbjerg-Nielsen, Karsten; Rosbjerg, Dan; Madsen, Henrik
2016-04-01
Traditionally, flood risk assessment studies have been carried out from a univariate frequency analysis perspective. However, statistical dependence between hydrological variables, such as extreme rainfall and extreme sea surge, is plausible to exist, since both variables to some extent are driven by common meteorological conditions. Aiming to overcome this limitation, multivariate statistical techniques has the potential to combine different sources of flooding in the investigation. The aim of this study was to apply a range of statistical methodologies for analyzing combined extreme hydrological variables that can lead to coastal and urban flooding. The study area is the Elwood Catchment, which is a highly urbanized catchment located in the city of Port Phillip, Melbourne, Australia. The first part of the investigation dealt with the marginal extreme value distributions. Two approaches to extract extreme value series were applied (Annual Maximum and Partial Duration Series), and different probability distribution functions were fit to the observed sample. Results obtained by using the Generalized Pareto distribution demonstrate the ability of the Pareto family to model the extreme events. Advancing into multivariate extreme value analysis, first an investigation regarding the asymptotic properties of extremal dependence was carried out. As a weak positive asymptotic dependence between the bivariate extreme pairs was found, the Conditional method proposed by Heffernan and Tawn (2004) was chosen. This approach is suitable to model bivariate extreme values, which are relatively unlikely to occur together. The results show that the probability of an extreme sea surge occurring during a one-hour intensity extreme precipitation event (or vice versa) can be twice as great as what would occur when assuming independent events. Therefore, presuming independence between these two variables would result in severe underestimation of the flooding risk in the study area.
Vasconcelos, A G; Almeida, R M; Nobre, F F
2001-08-01
This paper introduces an approach that includes non-quantitative factors for the selection and assessment of multivariate complex models in health. A goodness-of-fit based methodology combined with fuzzy multi-criteria decision-making approach is proposed for model selection. Models were obtained using the Path Analysis (PA) methodology in order to explain the interrelationship between health determinants and the post-neonatal component of infant mortality in 59 municipalities of Brazil in the year 1991. Socioeconomic and demographic factors were used as exogenous variables, and environmental, health service and agglomeration as endogenous variables. Five PA models were developed and accepted by statistical criteria of goodness-of fit. These models were then submitted to a group of experts, seeking to characterize their preferences, according to predefined criteria that tried to evaluate model relevance and plausibility. Fuzzy set techniques were used to rank the alternative models according to the number of times a model was superior to ("dominated") the others. The best-ranked model explained above 90% of the endogenous variables variation, and showed the favorable influences of income and education levels on post-neonatal mortality. It also showed the unfavorable effect on mortality of fast population growth, through precarious dwelling conditions and decreased access to sanitation. It was possible to aggregate expert opinions in model evaluation. The proposed procedure for model selection allowed the inclusion of subjective information in a clear and systematic manner.
Correlative and multivariate analysis of increased radon concentration in underground laboratory.
Maletić, Dimitrije M; Udovičić, Vladimir I; Banjanac, Radomir M; Joković, Dejan R; Dragić, Aleksandar L; Veselinović, Nikola B; Filipović, Jelena
2014-11-01
The results of analysis using correlative and multivariate methods, as developed for data analysis in high-energy physics and implemented in the Toolkit for Multivariate Analysis software package, of the relations of the variation of increased radon concentration with climate variables in shallow underground laboratory is presented. Multivariate regression analysis identified a number of multivariate methods which can give a good evaluation of increased radon concentrations based on climate variables. The use of the multivariate regression methods will enable the investigation of the relations of specific climate variable with increased radon concentrations by analysis of regression methods resulting in 'mapped' underlying functional behaviour of radon concentrations depending on a wide spectrum of climate variables. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Lapolla, Annunziata; Ragazzi, Eugenio; Andretta, Barbara; Fedele, Domenico; Tubaro, Michela; Seraglia, Roberta; Molin, Laura; Traldi, Pietro
2007-06-01
To clarify the possible pathogenetic role of oxidation products originated from the glycation of proteins, human globins from nephropathic patients have been studied by matrix-assisted laser desorption/ionization mass spectrometry (MALDI), revealing not only unglycated and monoglycated globins, but also a series of different species. For the last ones, structural assignments were tentatively done on the basis of observed masses and expectations for the Maillard reaction pattern. Consequently, they must be considered only propositive, and the discussion which will follow must be considered in this view. In our opinion this approach does not seem to compromise the intended diagnostic use of the data because distinctions are valid even if the assignments are uncertain. We studied nine healthy subjects and 19 nephropathic patients and processed the data obtained from the MALDI spectra using a multivariate analysis. Our results showed that multivariate analytical techniques enable differential aspects of the profile of molecular species to be identified in the blood of end stage nephropathic patients. A correct grouping can be achieved by principal component analysis (PCA) and the results suggest that several products involved in carbonyl stress exist in nephropathic patients. These compounds may have a relevant role as specific markers of the pathological state.
Vongsvivut, Jitraporn; Heraud, Philip; Gupta, Adarsha; Puri, Munish; McNaughton, Don; Barrow, Colin J
2013-10-21
The increase in polyunsaturated fatty acid (PUFA) consumption has prompted research into alternative resources other than fish oil. In this study, a new approach based on focal-plane-array Fourier transform infrared (FPA-FTIR) microspectroscopy and multivariate data analysis was developed for the characterisation of some marine microorganisms. Cell and lipid compositions in lipid-rich marine yeasts collected from the Australian coast were characterised in comparison to a commercially available PUFA-producing marine fungoid protist, thraustochytrid. Multivariate classification methods provided good discriminative accuracy evidenced from (i) separation of the yeasts from thraustochytrids and distinct spectral clusters among the yeasts that conformed well to their biological identities, and (ii) correct classification of yeasts from a totally independent set using cross-validation testing. The findings further indicated additional capability of the developed FPA-FTIR methodology, when combined with partial least squares regression (PLSR) analysis, for rapid monitoring of lipid production in one of the yeasts during the growth period, which was achieved at a high accuracy compared to the results obtained from the traditional lipid analysis based on gas chromatography. The developed FTIR-based approach when coupled to programmable withdrawal devices and a cytocentrifugation module would have strong potential as a novel online monitoring technology suited for bioprocessing applications and large-scale production.
Ali, Niloufer S; Ali, Farzana N; Khuwaja, Ali K; Nanji, Kashmira
2014-08-01
OBJECTIVES. To assess the proportion of women subjected to intimate partner violence and the associated factors, and to identify the attitudes of women towards the use of violence by their husbands. DESIGN. Cross-sectional study. SETTING. Family practice clinics at a teaching hospital in Karachi, Pakistan. PARTICIPANTS. A total of 520 women aged between 16 and 60 years were consecutively approached to participate in the study and interviewed by trained data collectors. Overall, 401 completed questionnaires were available for analysis. Multivariate logistic regression analysis was used to identify the association of various factors of interest. RESULTS. In all, 35% of the women reported being physically abused by their husbands in the last 12 months. Multivariate analysis showed that experiences of violence were independently associated with women's illiteracy (adjusted odds ratio=5.9; 95% confidence interval, 1.8-19.6), husband's illiteracy (3.9; 1.4-10.7), smoking habit of husbands (3.3; 1.9-5.8), and substance use (3.1; 1.7-5.7). CONCLUSION. It is imperative that intimate partner violence be considered a major public health concern. It can be prevented through comprehensive, multifaceted, and integrated approaches. The role of education is greatly emphasised in changing the perspectives of individuals and societies against intimate partner violence.
Wojcik, Pawel Jerzy; Pereira, Luís; Martins, Rodrigo; Fortunato, Elvira
2014-01-13
An efficient mathematical strategy in the field of solution processed electrochromic (EC) films is outlined as a combination of an experimental work, modeling, and information extraction from massive computational data via statistical software. Design of Experiment (DOE) was used for statistical multivariate analysis and prediction of mixtures through a multiple regression model, as well as the optimization of a five-component sol-gel precursor subjected to complex constraints. This approach significantly reduces the number of experiments to be realized, from 162 in the full factorial (L=3) and 72 in the extreme vertices (D=2) approach down to only 30 runs, while still maintaining a high accuracy of the analysis. By carrying out a finite number of experiments, the empirical modeling in this study shows reasonably good prediction ability in terms of the overall EC performance. An optimized ink formulation was employed in a prototype of a passive EC matrix fabricated in order to test and trial this optically active material system together with a solid-state electrolyte for the prospective application in EC displays. Coupling of DOE with chromogenic material formulation shows the potential to maximize the capabilities of these systems and ensures increased productivity in many potential solution-processed electrochemical applications.
Ide, Kazuki; Kawasaki, Yohei; Akutagawa, Maiko; Yamada, Hiroshi
2017-02-01
The aim of this study is to analyze the data obtained from a randomized trial on the prevention of influenza by gargling with green tea, which gave nonsignificant results based on frequentist approaches, by using Bayesian approaches. The posterior proportion, with 95% credible interval (CrI), of influenza in each group was calculated. The Bayesian index θ is the probability that a hypothesis is true. In this case, θ is the probability that the hypothesis that green tea gargling reduced influenza compared with water gargling is true. Univariate and multivariate logistic regression analyses were also performed by using the Markov chain Monte Carlo method. The full analysis set included 747 participants. During the study period, influenza occurred in 44 participants (5.9%). The difference between the two independent binominal proportions was -0.019 (95% CrI, -0.054 to 0.015; θ = 0.87). The partial regression coefficients in the univariate analysis were -0.35 (95% CrI, -1.00 to 0.24) with use of a uniform prior and -0.34 (95% CrI, -0.96 to 0.27) with use of a Jeffreys prior. In the multivariate analysis, the values were -0.37 (95% CrI, -0.96 to 0.30) and -0.36 (95% CrI, -1.03 to 0.21), respectively. The difference between the two independent binominal proportions was less than 0, and θ was greater than 0.85. Therefore, green tea gargling may slightly reduce influenza compared with water gargling. This analysis suggests that green tea gargling can be an additional preventive measure for use with other pharmaceutical and nonpharmaceutical measures and indicates the need for additional studies to confirm the effect of green tea gargling.
ERIC Educational Resources Information Center
Grochowalski, Joseph H.
2015-01-01
Component Universe Score Profile analysis (CUSP) is introduced in this paper as a psychometric alternative to multivariate profile analysis. The theoretical foundations of CUSP analysis are reviewed, which include multivariate generalizability theory and constrained principal components analysis. Because CUSP is a combination of generalizability…
Duarte, João V; Ribeiro, Maria J; Violante, Inês R; Cunha, Gil; Silva, Eduardo; Castelo-Branco, Miguel
2014-01-01
Neurofibromatosis Type 1 (NF1) is a common genetic condition associated with cognitive dysfunction. However, the pathophysiology of the NF1 cognitive deficits is not well understood. Abnormal brain structure, including increased total brain volume, white matter (WM) and grey matter (GM) abnormalities have been reported in the NF1 brain. These previous studies employed univariate model-driven methods preventing detection of subtle and spatially distributed differences in brain anatomy. Multivariate pattern analysis allows the combination of information from multiple spatial locations yielding a discriminative power beyond that of single voxels. Here we investigated for the first time subtle anomalies in the NF1 brain, using a multivariate data-driven classification approach. We used support vector machines (SVM) to classify whole-brain GM and WM segments of structural T1 -weighted MRI scans from 39 participants with NF1 and 60 non-affected individuals, divided in children/adolescents and adults groups. We also employed voxel-based morphometry (VBM) as a univariate gold standard to study brain structural differences. SVM classifiers correctly classified 94% of cases (sensitivity 92%; specificity 96%) revealing the existence of brain structural anomalies that discriminate NF1 individuals from controls. Accordingly, VBM analysis revealed structural differences in agreement with the SVM weight maps representing the most relevant brain regions for group discrimination. These included the hippocampus, basal ganglia, thalamus, and visual cortex. This multivariate data-driven analysis thus identified subtle anomalies in brain structure in the absence of visible pathology. Our results provide further insight into the neuroanatomical correlates of known features of the cognitive phenotype of NF1. Copyright © 2012 Wiley Periodicals, Inc.
ERIC Educational Resources Information Center
Seco, Guillermo Vallejo; Izquierdo, Marcelino Cuesta; Garcia, M. Paula Fernandez; Diez, F. Javier Herrero
2006-01-01
The authors compare the operating characteristics of the bootstrap-F approach, a direct extension of the work of Berkovits, Hancock, and Nevitt, with Huynh's improved general approximation (IGA) and the Brown-Forsythe (BF) multivariate approach in a mixed repeated measures design when normality and multisample sphericity assumptions do not hold.…
Wang, Yalin; Zhang, Jie; Gutman, Boris; Chan, Tony F.; Becker, James T.; Aizenstein, Howard J.; Lopez, Oscar L.; Tamburo, Robert J.; Toga, Arthur W.; Thompson, Paul M.
2010-01-01
Here we developed a new method, called multivariate tensor-based surface morphometry (TBM), and applied it to study lateral ventricular surface differences associated with HIV/AIDS. Using concepts from differential geometry and the theory of differential forms, we created mathematical structures known as holomorphic one-forms, to obtain an efficient and accurate conformal parameterization of the lateral ventricular surfaces in the brain. The new meshing approach also provides a natural way to register anatomical surfaces across subjects, and improves on prior methods as it handles surfaces that branch and join at complex 3D junctions. To analyze anatomical differences, we computed new statistics from the Riemannian surface metrics - these retain multivariate information on local surface geometry. We applied this framework to analyze lateral ventricular surface morphometry in 3D MRI data from 11 subjects with HIV/AIDS and 8 healthy controls. Our method detected a 3D profile of surface abnormalities even in this small sample. Multivariate statistics on the local tensors gave better effect sizes for detecting group differences, relative to other TBM-based methods including analysis of the Jacobian determinant, the largest and smallest eigenvalues of the surface metric, and the pair of eigenvalues of the Jacobian matrix. The resulting analysis pipeline may improve the power of surface-based morphometry studies of the brain. PMID:19900560
Application of multivariable search techniques to structural design optimization
NASA Technical Reports Server (NTRS)
Jones, R. T.; Hague, D. S.
1972-01-01
Multivariable optimization techniques are applied to a particular class of minimum weight structural design problems: the design of an axially loaded, pressurized, stiffened cylinder. Minimum weight designs are obtained by a variety of search algorithms: first- and second-order, elemental perturbation, and randomized techniques. An exterior penalty function approach to constrained minimization is employed. Some comparisons are made with solutions obtained by an interior penalty function procedure. In general, it would appear that an interior penalty function approach may not be as well suited to the class of design problems considered as the exterior penalty function approach. It is also shown that a combination of search algorithms will tend to arrive at an extremal design in a more reliable manner than a single algorithm. The effect of incorporating realistic geometrical constraints on stiffener cross-sections is investigated. A limited comparison is made between minimum weight cylinders designed on the basis of a linear stability analysis and cylinders designed on the basis of empirical buckling data. Finally, a technique for locating more than one extremal is demonstrated.
Zhou, Fei; Zhao, Yajing; Peng, Jiyu; Jiang, Yirong; Li, Maiquan; Jiang, Yuan; Lu, Baiyi
2017-07-01
Osmanthus fragrans flowers are used as folk medicine and additives for teas, beverages and foods. The metabolites of O. fragrans flowers from different geographical origins were inconsistent in some extent. Chromatography and mass spectrometry combined with multivariable analysis methods provides an approach for discriminating the origin of O. fragrans flowers. To discriminate the Osmanthus fragrans var. thunbergii flowers from different origins with the identified metabolites. GC-MS and UPLC-PDA were conducted to analyse the metabolites in O. fragrans var. thunbergii flowers (in total 150 samples). Principal component analysis (PCA), soft independent modelling of class analogy analysis (SIMCA) and random forest (RF) analysis were applied to group the GC-MS and UPLC-PDA data. GC-MS identified 32 compounds common to all samples while UPLC-PDA/QTOF-MS identified 16 common compounds. PCA of the UPLC-PDA data generated a better clustering than PCA of the GC-MS data. Ten metabolites (six from GC-MS and four from UPLC-PDA) were selected as effective compounds for discrimination by PCA loadings. SIMCA and RF analysis were used to build classification models, and the RF model, based on the four effective compounds (caffeic acid derivative, acteoside, ligustroside and compound 15), yielded better results with the classification rate of 100% in the calibration set and 97.8% in the prediction set. GC-MS and UPLC-PDA combined with multivariable analysis methods can discriminate the origin of Osmanthus fragrans var. thunbergii flowers. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Integrative Exploratory Analysis of Two or More Genomic Datasets.
Meng, Chen; Culhane, Aedin
2016-01-01
Exploratory analysis is an essential step in the analysis of high throughput data. Multivariate approaches such as correspondence analysis (CA), principal component analysis, and multidimensional scaling are widely used in the exploratory analysis of single dataset. Modern biological studies often assay multiple types of biological molecules (e.g., mRNA, protein, phosphoproteins) on a same set of biological samples, thereby creating multiple different types of omics data or multiassay data. Integrative exploratory analysis of these multiple omics data is required to leverage the potential of multiple omics studies. In this chapter, we describe the application of co-inertia analysis (CIA; for analyzing two datasets) and multiple co-inertia analysis (MCIA; for three or more datasets) to address this problem. These methods are powerful yet simple multivariate approaches that represent samples using a lower number of variables, allowing a more easily identification of the correlated structure in and between multiple high dimensional datasets. Graphical representations can be employed to this purpose. In addition, the methods simultaneously project samples and variables (genes, proteins) onto the same lower dimensional space, so the most variant variables from each dataset can be selected and associated with samples, which can be further used to facilitate biological interpretation and pathway analysis. We applied CIA to explore the concordance between mRNA and protein expression in a panel of 60 tumor cell lines from the National Cancer Institute. In the same 60 cell lines, we used MCIA to perform a cross-platform comparison of mRNA gene expression profiles obtained on four different microarray platforms. Last, as an example of integrative analysis of multiassay or multi-omics data we analyzed transcriptomic, proteomic, and phosphoproteomic data from pluripotent (iPS) and embryonic stem (ES) cell lines.
Fiore, Marco; Rimareix, Françoise; Mariani, Luigi; Domont, Julien; Collini, Paola; Le Péchoux, Cecile; Casali, Paolo G; Le Cesne, Axel; Gronchi, Alessandro; Bonvalot, Sylvie
2009-09-01
Surgery is still the standard treatment for desmoid-type fibromatosis (DF). Recently, the Institut Gustave Roussy (IGR), Villejuif, France, reported a series of patients treated with a front-line conservative approach (no surgery and no radiotherapy). The disease remained stable in more than half of patients. This study was designed to evaluate this approach on the natural history of the disease in a larger series of patients. A total of 142 patients presenting to the IGR or Istituto Nazionale Tumori (INT), Milan, Italy, were initially treated using a front-line deliberately conservative policy. Their progression-free survival (PFS) was observed and a multivariate analysis was performed for major clinical variables. Seventy-four patients presented with primary tumor, 68 with recurrence. Eighty-three patients received a "wait & see" policy (W&S), whereas 59 were initially offered medical therapy (MT), mainly hormonal therapy and chemotherapy. A family history of sporadic colorectal cancer was present in 8% of patients. The 5-year PFS was 49.9% for the W&S group and 58.6% for the medically treated patients (P = 0.3196). Similar results emerged for primary and recurrent DF. Multivariate analysis identified no clinical variables as independent predictors of PFS. In the event of progression, all patients were subsequently managed safely. A conservative policy could be a safe approach to primary and recurrent DF, which could avoid unnecessary morbidity from surgery and/or radiation therapy. Half of patients had medium-term stable disease after W&S or MT. A multidisciplinary, stepwise approach should be prospectively tested in DF.
NASA Astrophysics Data System (ADS)
Dikty, Sebastian; von Savigny, Christian; Sinnhuber, Bjoern-Martin; Rozanov, Alexej; Weber, Mark; Burrows, John P.
We use SCIAMACHY (SCanning Imaging Absorption spectroMeter for Atmospheric CHartog-raphY) ozone, nitrogen dioxide and bromine oxide profiles (20-50 km altitude, 2003-2008) to quantify the amplitudes of QBO, AO, and SAO signals with the help of a simple multivariate regression model. The analysis is being carried out with SCIAMACHY data covering all lat-itudes with the exception of polar nights, when measurements are not available. The overall global yield is approximately 10,000 profiles per month, which are binned into 10-steps with one zonal mean profile being calculated per day and per latitude bin.
Skibiński, Robert; Komsta, Łukasz
2012-01-01
The photodegradation of moclobemide was studied in methanolic media. Ultra-HPLC (UHPLC)/MS/MS analysis proved decomposition to 4-chlorobenzamide as a major degradation product and small amounts of Ro 16-3177 (4-chloro-N-[2-[(2-hydroxyethyl)amino] ethyl]benzamide) and 2-[(4-chlorobenzylidene)amino]-N-[2-ethoxyethenyl]ethenamine. The methanolic solution was investigated spectrophotometrically in the UV region, registering the spectra during 30 min of degradation. Using reference spectra and a multivariate chemometric method (multivariate curve resolution-alternating least squares), the spectra were resolved and concentration profiles were obtained. The obtained results were in good agreement with a quantitative approach, with UHPLC-diode array detection as the reference method.
A Multivariate Model for the Study of Parental Acceptance-Rejection and Child Abuse.
ERIC Educational Resources Information Center
Rohner, Ronald P.; Rohner, Evelyn C.
This paper proposes a multivariate strategy for the study of parental acceptance-rejection and child abuse and describes a research study on parental rejection and child abuse which illustrates the advantages of using a multivariate, (rather than a simple-model) approach. The multivariate model is a combination of three simple models used to study…
Zhang, Xiuxiu; Li, Yubo; Zhou, Huifang; Fan, Simiao; Zhang, Zhenzhu; Wang, Lei; Zhang, Yanjun
2014-08-01
Acyclovir (ACV) is an antiviral agent. However, its use is limited by adverse side effect, particularly by its nephrotoxicity. Metabonomics technology can provide essential information on the metabolic profiles of biofluids and organs upon drug administration. Therefore, in this study, mass spectrometry-based metabonomics coupled with multivariate data analysis was used to identify the plasma metabolites and metabolic pathways related to nephrotoxicity caused by intraperitoneal injection of low (50mg/kg) and high (100mg/kg) doses of acyclovir. Sixteen biomarkers were identified by metabonomics and nephrotoxicity results revealed the dose-dependent effect of acyclovir on kidney tissues. The present study showed that the top four metabolic pathways interrupted by acyclovir included the metabolisms of arachidonic acid, tryptophan, arginine and proline, and glycerophospholipid. This research proves the established metabonomic approach can provide information on changes in metabolites and metabolic pathways, which can be applied to in-depth research on the mechanism of acyclovir-induced kidney injury. Copyright © 2014 Elsevier B.V. All rights reserved.
Bioprospecting Chemical Diversity and Bioactivity in a Marine Derived Aspergillus terreus.
Adpressa, Donovon A; Loesgen, Sandra
2016-02-01
A comparative metabolomic study of a marine derived fungus (Aspergillus terreus) grown under various culture conditions is presented. The fungus was grown in eleven different culture conditions using solid agar, broth cultures, or grain based media (OSMAC). Multivariate analysis of LC/MS data from the organic extracts revealed drastic differences in the metabolic profiles and guided our subsequent isolation efforts. The compound 7-desmethylcitreoviridin was isolated and identified, and is fully described for the first time. In addition, 16 known fungal metabolites were also isolated and identified. All compounds were elucidated by detailed spectroscopic analysis and tested for antibacterial activities against five human pathogens and tested for cytotoxicity. This study demonstrates that LC/MS based multivariate analysis provides a simple yet powerful tool to analyze the metabolome of a single fungal strain grown under various conditions. This approach allows environmentally-induced changes in metabolite expression to be rapidly visualized, and uses these differences to guide the discovery of new bioactive molecules. Copyright © 2016 Verlag Helvetica Chimica Acta AG, Zürich.
A cross-species socio-emotional behaviour development revealed by a multivariate analysis.
Koshiba, Mamiko; Senoo, Aya; Mimura, Koki; Shirakawa, Yuka; Karino, Genta; Obara, Saya; Ozawa, Shinpei; Sekihara, Hitomi; Fukushima, Yuta; Ueda, Toyotoshi; Kishino, Hirohisa; Tanaka, Toshihisa; Ishibashi, Hidetoshi; Yamanouchi, Hideo; Yui, Kunio; Nakamura, Shun
2013-01-01
Recent progress in affective neuroscience and social neurobiology has been propelled by neuro-imaging technology and epigenetic approach in neurobiology of animal behaviour. However, quantitative measurements of socio-emotional development remains lacking, though sensory-motor development has been extensively studied in terms of digitised imaging analysis. Here, we developed a method for socio-emotional behaviour measurement that is based on the video recordings under well-defined social context using animal models with variously social sensory interaction during development. The behaviour features digitized from the video recordings were visualised in a multivariate statistic space using principal component analysis. The clustering of the behaviour parameters suggested the existence of species- and stage-specific as well as cross-species behaviour modules. These modules were used to characterise the behaviour of children with or without autism spectrum disorders (ASDs). We found that socio-emotional behaviour is highly dependent on social context and the cross-species behaviour modules may predict neurobiological basis of ASDs.
Handwriting Examination: Moving from Art to Science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jarman, K.H.; Hanlen, R.C.; Manzolillo, P.A.
In this document, we present a method for validating the premises and methodology of forensic handwriting examination. This method is intuitively appealing because it relies on quantitative measurements currently used qualitatively by FDE's in making comparisons, and it is scientifically rigorous because it exploits the power of multivariate statistical analysis. This approach uses measures of both central tendency and variation to construct a profile for a given individual. (Central tendency and variation are important for characterizing an individual's writing and both are currently used by FDE's in comparative analyses). Once constructed, different profiles are then compared for individuality using clustermore » analysis; they are grouped so that profiles within a group cannot be differentiated from one another based on the measured characteristics, whereas profiles between groups can. The cluster analysis procedure used here exploits the power of multivariate hypothesis testing. The result is not only a profile grouping but also an indication of statistical significance of the groups generated.« less
Busico, Gianluigi; Cuoco, Emilio; Kazakis, Nerantzis; Colombani, Nicolò; Mastrocicco, Micòl; Tedesco, Dario; Voudouris, Konstantinos
2018-03-01
Shallow aquifers are the most accessible reservoirs of potable groundwater; nevertheless, they are also prone to various sources of pollution and it is usually difficult to distinguish between human and natural sources at the watershed scale. The area chosen for this study (the Campania Plain) is characterized by high spatial heterogeneities both in geochemical features and in hydraulic properties. Groundwater mineralization is driven by many processes such as, geothermal activity, weathering of volcanic products and intense human activities. In such a landscape, multivariate statistical analysis has been used to differentiate among the main hydrochemical processes occurring in the area, using three different approaches of factor analysis: (i) major elements, (ii) trace elements, (iii) both major and trace elements. The elaboration of the factor analysis approaches has revealed seven distinct hydrogeochemical processes: i) Salinization (Cl - , Na + ); ii) Carbonate rocks dissolution; iii) Anthropogenic inputs (NO 3 - , SO 4 2- , U, V); iv) Reducing conditions (Fe 2+ , Mn 2+ ); v) Heavy metals contamination (Cr and Ni); vi) Geothermal fluids influence (Li + ); and vii) Volcanic products contribution (As, Rb). Results from this study highlight the need to separately apply factor analysis when a large data set of trace elements is available. In fact, the impact of geothermal fluids in the shallow aquifer was identified from the application of the factor analysis using only trace elements. This study also reveals that the factor analysis of major and trace elements can differentiate between anthropogenic and geogenic sources of pollution in intensively exploited aquifers. Copyright © 2017 Elsevier Ltd. All rights reserved.
Detecting Spatio-Temporal Modes in Multivariate Data by Entropy Field Decomposition
Frank, Lawrence R.; Galinsky, Vitaly L.
2016-01-01
A new data analysis method that addresses a general problem of detecting spatio-temporal variations in multivariate data is presented. The method utilizes two recent and complimentary general approaches to data analysis, information field theory (IFT) and entropy spectrum pathways (ESP). Both methods reformulate and incorporate Bayesian theory, thus use prior information to uncover underlying structure of the unknown signal. Unification of ESP and IFT creates an approach that is non-Gaussian and non-linear by construction and is found to produce unique spatio-temporal modes of signal behavior that can be ranked according to their significance, from which space-time trajectories of parameter variations can be constructed and quantified. Two brief examples of real world applications of the theory to the analysis of data bearing completely different, unrelated nature, lacking any underlying similarity, are also presented. The first example provides an analysis of resting state functional magnetic resonance imaging (rsFMRI) data that allowed us to create an efficient and accurate computational method for assessing and categorizing brain activity. The second example demonstrates the potential of the method in the application to the analysis of a strong atmospheric storm circulation system during the complicated stage of tornado development and formation using data recorded by a mobile Doppler radar. Reference implementation of the method will be made available as a part of the QUEST toolkit that is currently under development at the Center for Scientific Computation in Imaging. PMID:27695512
Riedl, Janet; Esslinger, Susanne; Fauhl-Hassek, Carsten
2015-07-23
Food fingerprinting approaches are expected to become a very potent tool in authentication processes aiming at a comprehensive characterization of complex food matrices. By non-targeted spectrometric or spectroscopic chemical analysis with a subsequent (multivariate) statistical evaluation of acquired data, food matrices can be investigated in terms of their geographical origin, species variety or possible adulterations. Although many successful research projects have already demonstrated the feasibility of non-targeted fingerprinting approaches, their uptake and implementation into routine analysis and food surveillance is still limited. In many proof-of-principle studies, the prediction ability of only one data set was explored, measured within a limited period of time using one instrument within one laboratory. Thorough validation strategies that guarantee reliability of the respective data basis and that allow conclusion on the applicability of the respective approaches for its fit-for-purpose have not yet been proposed. Within this review, critical steps of the fingerprinting workflow were explored to develop a generic scheme for multivariate model validation. As a result, a proposed scheme for "good practice" shall guide users through validation and reporting of non-targeted fingerprinting results. Furthermore, food fingerprinting studies were selected by a systematic search approach and reviewed with regard to (a) transparency of data processing and (b) validity of study results. Subsequently, the studies were inspected for measures of statistical model validation, analytical method validation and quality assurance measures. In this context, issues and recommendations were found that might be considered as an actual starting point for developing validation standards of non-targeted metabolomics approaches for food authentication in the future. Hence, this review intends to contribute to the harmonization and standardization of food fingerprinting, both required as a prior condition for the authentication of food in routine analysis and official control. Copyright © 2015 Elsevier B.V. All rights reserved.
Multivariate meta-analysis: potential and promise.
Jackson, Dan; Riley, Richard; White, Ian R
2011-09-10
The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day 'Multivariate meta-analysis' event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd.
Benson, Nsikak U.; Asuquo, Francis E.; Williams, Akan B.; Essien, Joseph P.; Ekong, Cyril I.; Akpabio, Otobong; Olajire, Abaas A.
2016-01-01
Trace metals (Cd, Cr, Cu, Ni and Pb) concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria). The degree of contamination was assessed using the individual contamination factors (ICF) and global contamination factor (GCF). Multivariate statistical approaches including principal component analysis (PCA), cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources. PMID:27257934
NASA Astrophysics Data System (ADS)
Sicard, Emeline; Sabatier, Robert; Niel, HéLèNe; Cadier, Eric
2002-12-01
The objective of this paper is to implement an original method for spatial and multivariate data, combining a method of three-way array analysis (STATIS) with geostatistical tools. The variables of interest are the monthly amounts of rainfall in the Nordeste region of Brazil, recorded from 1937 to 1975. The principle of the technique is the calculation of a linear combination of the initial variables, containing a large part of the initial variability and taking into account the spatial dependencies. It is a promising method that is able to analyze triple variability: spatial, seasonal, and interannual. In our case, the first component obtained discriminates a group of rain gauges, corresponding approximately to the Agreste, from all the others. The monthly variables of July and August strongly influence this separation. Furthermore, an annual study brings out the stability of the spatial structure of components calculated for each year.
Hemakom, Apit; Goverdovsky, Valentin; Looney, David; Mandic, Danilo P
2016-04-13
An extension to multivariate empirical mode decomposition (MEMD), termed adaptive-projection intrinsically transformed MEMD (APIT-MEMD), is proposed to cater for power imbalances and inter-channel correlations in real-world multichannel data. It is shown that the APIT-MEMD exhibits similar or better performance than MEMD for a large number of projection vectors, whereas it outperforms MEMD for the critical case of a small number of projection vectors within the sifting algorithm. We also employ the noise-assisted APIT-MEMD within our proposed intrinsic multiscale analysis framework and illustrate the advantages of such an approach in notoriously noise-dominated cooperative brain-computer interface (BCI) based on the steady-state visual evoked potentials and the P300 responses. Finally, we show that for a joint cognitive BCI task, the proposed intrinsic multiscale analysis framework improves system performance in terms of the information transfer rate. © 2016 The Author(s).
Mallette, Jennifer R; Casale, John F; Jordan, James; Morello, David R; Beyer, Paul M
2016-03-23
Previously, geo-sourcing to five major coca growing regions within South America was accomplished. However, the expansion of coca cultivation throughout South America made sub-regional origin determinations increasingly difficult. The former methodology was recently enhanced with additional stable isotope analyses ((2)H and (18)O) to fully characterize cocaine due to the varying environmental conditions in which the coca was grown. An improved data analysis method was implemented with the combination of machine learning and multivariate statistical analysis methods to provide further partitioning between growing regions. Here, we show how the combination of trace cocaine alkaloids, stable isotopes, and multivariate statistical analyses can be used to classify illicit cocaine as originating from one of 19 growing regions within South America. The data obtained through this approach can be used to describe current coca cultivation and production trends, highlight trafficking routes, as well as identify new coca growing regions.
NASA Astrophysics Data System (ADS)
Mallette, Jennifer R.; Casale, John F.; Jordan, James; Morello, David R.; Beyer, Paul M.
2016-03-01
Previously, geo-sourcing to five major coca growing regions within South America was accomplished. However, the expansion of coca cultivation throughout South America made sub-regional origin determinations increasingly difficult. The former methodology was recently enhanced with additional stable isotope analyses (2H and 18O) to fully characterize cocaine due to the varying environmental conditions in which the coca was grown. An improved data analysis method was implemented with the combination of machine learning and multivariate statistical analysis methods to provide further partitioning between growing regions. Here, we show how the combination of trace cocaine alkaloids, stable isotopes, and multivariate statistical analyses can be used to classify illicit cocaine as originating from one of 19 growing regions within South America. The data obtained through this approach can be used to describe current coca cultivation and production trends, highlight trafficking routes, as well as identify new coca growing regions.
ERIC Educational Resources Information Center
Mittal, Surabhi; Mehar, Mamta
2016-01-01
Purpose: The paper analyzes factors that affect the likelihood of adoption of different agriculture-related information sources by farmers. Design/Methodology/Approach: The paper links the theoretical understanding of the existing multiple sources of information that farmers use, with the empirical model to analyze the factors that affect the…
Computational Approaches to Image Understanding.
1981-10-01
represnting points, edges, surfaces, and volumes to facilitate display. The geometry or perspective and parailcl (or orthographic) projection has...of making the image forming process explicit. This in turn leads to a concern with geometry , such as the properties f the gradient, stereographic, and...dual spaces. Combining geometry and smoothness leads naturally to multi-variate vector analysis, and to differential geometry . For the most part, a
NASA Astrophysics Data System (ADS)
Sleighter, Rachel L.; Cory, Rose M.; Kaplan, Louis A.; Abdulla, Hussain A. N.; Hatcher, Patrick G.
2014-08-01
The bioreactivity or susceptibility of dissolved organic matter (DOM) to microbial degradation in streams and rivers is of critical importance to global change studies, but a comprehensive understanding of DOM bioreactivity has been elusive due, in part, to the stunningly diverse assemblages of organic molecules within DOM. We approach this problem by employing a range of techniques to characterize DOM as it flows through biofilm reactors: dissolved organic carbon (DOC) concentrations, excitation emission matrix spectroscopy (EEMs), and ultrahigh resolution mass spectrometry. The EEMs and mass spectral data were analyzed using a combination of multivariate statistical approaches. We found that 45% of stream water DOC was biodegraded by microorganisms, including 31-45% of the humic DOC. This bioreactive DOM separated into two different groups: (1) H/C centered at 1.5 with O/C 0.1-0.5 or (2) low H/C of 0.5-1.0 spanning O/C 0.2-0.7 that were positively correlated (Spearman ranking) with chromophoric and fluorescent DOM (CDOM and FDOM, respectively). DOM that was more recalcitrant and resistant to microbial degradation aligned tightly in the center of the van Krevelen space (H/C 1.0-1.5, O/C 0.25-0.6) and negatively correlated (Spearman ranking) with CDOM and FDOM. These findings were supported further by principal component analysis and 2-D correlation analysis of the relative magnitudes of the mass spectral peaks assigned to molecular formulas. This study demonstrates that our approach of processing stream water through bioreactors followed by EEMs and FTICR-MS analyses, in combination with multivariate statistical analysis, allows for precise, robust characterization of compound bioreactivity and associated molecular level composition.
Cardiovascular reactivity patterns and pathways to hypertension: a multivariate cluster analysis.
Brindle, R C; Ginty, A T; Jones, A; Phillips, A C; Roseboom, T J; Carroll, D; Painter, R C; de Rooij, S R
2016-12-01
Substantial evidence links exaggerated mental stress induced blood pressure reactivity to future hypertension, but the results for heart rate reactivity are less clear. For this reason multivariate cluster analysis was carried out to examine the relationship between heart rate and blood pressure reactivity patterns and hypertension in a large prospective cohort (age range 55-60 years). Four clusters emerged with statistically different systolic and diastolic blood pressure and heart rate reactivity patterns. Cluster 1 was characterised by a relatively exaggerated blood pressure and heart rate response while the blood pressure and heart rate responses of cluster 2 were relatively modest and in line with the sample mean. Cluster 3 was characterised by blunted cardiovascular stress reactivity across all variables and cluster 4, by an exaggerated blood pressure response and modest heart rate response. Membership to cluster 4 conferred an increased risk of hypertension at 5-year follow-up (hazard ratio=2.98 (95% CI: 1.50-5.90), P<0.01) that survived adjustment for a host of potential confounding variables. These results suggest that the cardiac reactivity plays a potentially important role in the link between blood pressure reactivity and hypertension and support the use of multivariate approaches to stress psychophysiology.
Laurens, L M L; Wolfrum, E J
2013-12-18
One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.
Cluster-based exposure variation analysis
2013-01-01
Background Static posture, repetitive movements and lack of physical variation are known risk factors for work-related musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new cluster-based method for analysis of exposure variation. Methods For this purpose, we simulated a repeated cyclic exposure varying within each cycle between “low” and “high” exposure levels in a “near” or “far” range, and with “low” or “high” velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a “small” or “large” standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity. Each simulation trace included two realizations of 100 concatenated cycles with either low (ρ = 0.1), medium (ρ = 0.5) or high (ρ = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel cluster-based EVA (C-EVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) C-EVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined. Results C-EVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate and multivariate), and C-EVA, respectively (p < 0.001). All three methods performed poorly in discriminating exposure patterns differing with respect to the variability in cycle time duration. Conclusion While C-EVA had a higher accuracy than conventional EVA, both failed to detect differences in temporal similarity. The data-driven optimality of data reduction and the capability of handling multiple exposure time lines in a single analysis are the advantages of the C-EVA. PMID:23557439
Yue, Yong; Osipov, Arsen; Fraass, Benedick; Sandler, Howard; Zhang, Xiao; Nissen, Nicholas; Hendifar, Andrew; Tuli, Richard
2017-02-01
To stratify risks of pancreatic adenocarcinoma (PA) patients using pre- and post-radiotherapy (RT) PET/CT images, and to assess the prognostic value of texture variations in predicting therapy response of patients. Twenty-six PA patients treated with RT from 2011-2013 with pre- and post-treatment 18F-FDG-PET/CT scans were identified. Tumor locoregional texture was calculated using 3D kernel-based approach, and texture variations were identified by fitting discrepancies of texture maps of pre- and post-treatment images. A total of 48 texture and clinical variables were identified and evaluated for association with overall survival (OS). The prognostic heterogeneity features were selected using lasso/elastic net regression, and further were evaluated by multivariate Cox analysis. Median age was 69 y (range, 46-86 y). The texture map and temporal variations between pre- and post-treatment were well characterized by histograms and statistical fitting. The lasso analysis identified seven predictors (age, node stage, post-RT SUVmax, variations of homogeneity, variance, sum mean, and cluster tendency). The multivariate Cox analysis identified five significant variables: age, node stage, variations of homogeneity, variance, and cluster tendency (with P=0.020, 0.040, 0.065, 0.078, and 0.081, respectively). The patients were stratified into two groups based on the risk score of multivariate analysis with log-rank P=0.001: a low risk group (n=11) with a longer mean OS (29.3 months) and higher texture variation (>30%), and a high risk group (n=15) with a shorter mean OS (17.7 months) and lower texture variation (<15%). Locoregional metabolic texture response provides a feasible approach for evaluating and predicting clinical outcomes following treatment of PA with RT. The proposed method can be used to stratify patient risk and help select appropriate treatment strategies for individual patients toward implementing response-driven adaptive RT.
Optical assay for biotechnology and clinical diagnosis.
Moczko, Ewa; Cauchi, Michael; Turner, Claire; Meglinski, Igor; Piletsky, Sergey
2011-08-01
In this paper, we present an optical diagnostic assay consisting of a mixture of environmental-sensitive fluorescent dyes combined with multivariate data analysis for quantitative and qualitative examination of biological and clinical samples. The performance of the assay is based on the analysis of spectrum of the selected fluorescent dyes with the operational principle similar to electronic nose and electronic tongue systems. This approach has been successfully applied for monitoring of growing cell cultures and identification of gastrointestinal diseases in humans.
Laser-Induced Breakdown Spectroscopy (LIBS) Measurement of Uranium in Molten Salt.
Williams, Ammon; Phongikaroon, Supathorn
2018-01-01
In this current study, the molten salt aerosol-laser-induced breakdown spectroscopy (LIBS) system was used to measure the uranium (U) content in a ternary UCl 3 -LiCl-KCl salt to investigate and assess a near real-time analytical approach for material safeguards and accountability. Experiments were conducted using five different U concentrations to determine the analytical figures of merit for the system with respect to U. In the analysis, three U lines were used to develop univariate calibration curves at the 367.01 nm, 385.96 nm, and 387.10 nm lines. The 367.01 nm line had the lowest limit of detection (LOD) of 0.065 wt% U. The 385.96 nm line had the best root mean square error of cross-validation (RMSECV) of 0.20 wt% U. In addition to the univariate calibration approach, a multivariate partial least squares (PLS) model was developed to further analyze the data. Using partial least squares (PLS) modeling, an RMSECV of 0.085 wt% U was determined. The RMSECV from the multivariate approach was significantly better than the univariate case and the PLS model is recommended for future LIBS analysis. Overall, the aerosol-LIBS system performed well in monitoring the U concentration and it is expected that the system could be used to quantitatively determine the U compositions within the normal operational concentrations of U in pyroprocessing molten salts.
Multivariate Models for Normal and Binary Responses in Intervention Studies
ERIC Educational Resources Information Center
Pituch, Keenan A.; Whittaker, Tiffany A.; Chang, Wanchen
2016-01-01
Use of multivariate analysis (e.g., multivariate analysis of variance) is common when normally distributed outcomes are collected in intervention research. However, when mixed responses--a set of normal and binary outcomes--are collected, standard multivariate analyses are no longer suitable. While mixed responses are often obtained in…
NASA Astrophysics Data System (ADS)
Hanrieder, Jörg; Ewing, Andrew G.
2014-06-01
Amyotrophic lateral sclerosis (ALS) is a devastating, rapidly progressing disease of the central nervous system that is characterized by motor neuron degeneration in the brain stem and the spinal cord. We employed time of flight secondary ion mass spectrometry (ToF-SIMS) to profile spatial lipid- and metabolite- regulations in post mortem human spinal cord tissue from ALS patients to investigate chemical markers of ALS pathogenesis. ToF-SIMS scans and multivariate analysis of image and spectral data were performed on thoracic human spinal cord sections. Multivariate statistics of the image data allowed delineation of anatomical regions of interest based on their chemical identity. Spectral data extracted from these regions were compared using two different approaches for multivariate statistics, for investigating ALS related lipid and metabolite changes. The results show a significant decrease for cholesterol, triglycerides, and vitamin E in the ventral horn of ALS samples, which is presumably a consequence of motor neuron degeneration. Conversely, the biogenic mediator lipid lysophosphatidylcholine and its fragments were increased in ALS ventral spinal cord, pointing towards neuroinflammatory mechanisms associated with neuronal cell death. ToF-SIMS imaging is a promising approach for chemical histology and pathology for investigating the subcellular mechanisms underlying motor neuron degeneration in amyotrophic lateral sclerosis.
Buttini, Francesca; Pasquali, Irene; Brambilla, Gaetano; Copelli, Diego; Alberi, Massimiliano Dagli; Balducci, Anna Giulia; Bettini, Ruggero; Sisti, Viviana
2016-03-01
The aim of this work was to evaluate the effect of two different dry powder inhalers, of the NGI induction port and Alberta throat and of the actual inspiratory profiles of asthmatic patients on in-vitro drug inhalation performances. The two devices considered were a reservoir multidose and a capsule-based inhaler. The formulation used to test the inhalers was a combination of formoterol fumarate and beclomethasone dipropionate. A breath simulator was used to mimic inhalatory patterns previously determined in vivo. A multivariate approach was adopted to estimate the significance of the effect of the investigated variables in the explored domain. Breath simulator was a useful tool to mimic in vitro the in vivo inspiratory profiles of asthmatic patients. The type of throat coupled with the impactor did not affect the aerodynamic distribution of the investigated formulation. However, the type of inhaler and inspiratory profiles affected the respirable dose of drugs. The multivariate statistical approach demonstrated that the multidose inhaler, released efficiently a high fine particle mass independently from the inspiratory profiles adopted. Differently, the single dose capsule inhaler, showed a significant decrease of fine particle mass of both drugs when the device was activated using the minimum inspiratory volume (592 mL).
Ramseyer, Fabian; Kupper, Zeno; Caspar, Franz; Znoj, Hansjörg; Tschacher, Wolfgang
2014-10-01
Processes occurring in the course of psychotherapy are characterized by the simple fact that they unfold in time and that the multiple factors engaged in change processes vary highly between individuals (idiographic phenomena). Previous research, however, has neglected the temporal perspective by its traditional focus on static phenomena, which were mainly assessed at the group level (nomothetic phenomena). To support a temporal approach, the authors introduce time-series panel analysis (TSPA), a statistical methodology explicitly focusing on the quantification of temporal, session-to-session aspects of change in psychotherapy. TSPA-models are initially built at the level of individuals and are subsequently aggregated at the group level, thus allowing the exploration of prototypical models. TSPA is based on vector auto-regression (VAR), an extension of univariate auto-regression models to multivariate time-series data. The application of TSPA is demonstrated in a sample of 87 outpatient psychotherapy patients who were monitored by postsession questionnaires. Prototypical mechanisms of change were derived from the aggregation of individual multivariate models of psychotherapy process. In a 2nd step, the associations between mechanisms of change (TSPA) and pre- to postsymptom change were explored. TSPA allowed a prototypical process pattern to be identified, where patient's alliance and self-efficacy were linked by a temporal feedback-loop. Furthermore, therapist's stability over time in both mastery and clarification interventions was positively associated with better outcomes. TSPA is a statistical tool that sheds new light on temporal mechanisms of change. Through this approach, clinicians may gain insight into prototypical patterns of change in psychotherapy. PsycINFO Database Record (c) 2014 APA, all rights reserved.
NASA Astrophysics Data System (ADS)
Guillen, George; Rainey, Gail; Morin, Michelle
2004-04-01
Currently, the Minerals Management Service uses the Oil Spill Risk Analysis model (OSRAM) to predict the movement of potential oil spills greater than 1000 bbl originating from offshore oil and gas facilities. OSRAM generates oil spill trajectories using meteorological and hydrological data input from either actual physical measurements or estimates generated from other hydrological models. OSRAM and many other models produce output matrices of average, maximum and minimum contact probabilities to specific landfall or target segments (columns) from oil spills at specific points (rows). Analysts and managers are often interested in identifying geographic areas or groups of facilities that pose similar risks to specific targets or groups of targets if a spill occurred. Unfortunately, due to the potentially large matrix generated by many spill models, this question is difficult to answer without the use of data reduction and visualization methods. In our study we utilized a multivariate statistical method called cluster analysis to group areas of similar risk based on potential distribution of landfall target trajectory probabilities. We also utilized ArcView™ GIS to display spill launch point groupings. The combination of GIS and multivariate statistical techniques in the post-processing of trajectory model output is a powerful tool for identifying and delineating areas of similar risk from multiple spill sources. We strongly encourage modelers, statistical and GIS software programmers to closely collaborate to produce a more seamless integration of these technologies and approaches to analyzing data. They are complimentary methods that strengthen the overall assessment of spill risks.
Havlicek, Martin; Jan, Jiri; Brazdil, Milan; Calhoun, Vince D.
2015-01-01
Increasing interest in understanding dynamic interactions of brain neural networks leads to formulation of sophisticated connectivity analysis methods. Recent studies have applied Granger causality based on standard multivariate autoregressive (MAR) modeling to assess the brain connectivity. Nevertheless, one important flaw of this commonly proposed method is that it requires the analyzed time series to be stationary, whereas such assumption is mostly violated due to the weakly nonstationary nature of functional magnetic resonance imaging (fMRI) time series. Therefore, we propose an approach to dynamic Granger causality in the frequency domain for evaluating functional network connectivity in fMRI data. The effectiveness and robustness of the dynamic approach was significantly improved by combining a forward and backward Kalman filter that improved estimates compared to the standard time-invariant MAR modeling. In our method, the functional networks were first detected by independent component analysis (ICA), a computational method for separating a multivariate signal into maximally independent components. Then the measure of Granger causality was evaluated using generalized partial directed coherence that is suitable for bivariate as well as multivariate data. Moreover, this metric provides identification of causal relation in frequency domain, which allows one to distinguish the frequency components related to the experimental paradigm. The procedure of evaluating Granger causality via dynamic MAR was demonstrated on simulated time series as well as on two sets of group fMRI data collected during an auditory sensorimotor (SM) or auditory oddball discrimination (AOD) tasks. Finally, a comparison with the results obtained from a standard time-invariant MAR model was provided. PMID:20561919
Yang, Heejung; Lee, Dong Young; Kang, Kyo Bin; Kim, Jeom Yong; Kim, Sun Ok; Yoo, Young Hyo; Sung, Sang Hyun
2015-05-10
A dry purified extract of Panax ginseng (PEG) was prepared using a manufacturing process that includes column chromatography, acid hydrolysis, and an enzyme reaction. During the manufacturing process, the more polar ginsenosides were altered into less polar forms via cleavage of their sugar chains and structural modifications of the aglycones, such as hydroxylation and dehydroxylation. The structural changes of ginsenosides during the intermediate steps from dried ginseng extract (DGE) to PEG were monitored by ultra-performance liquid chromatography coupled with quadrupole time-of-flight mass spectroscopy (UPLC-QTOF/MS). 22 ginsenosides isolated from PEG were used as the reference standards for determining of unknown ginsenosides and further suggesting of the metabolic markers. The elution order of 22 ginsenosides based on the type of aglycones, and the location and number of sugar chains can be used for the structural elucidation of unknown ginsenosides. This information could be used in a dereplication process for quick and efficient identification of ginsenoside derivatives in ginseng preparations. A dereplication approach helped the identification of the metabolic markers in the UPLC-QTOF/MS chromatograms during the conversion process with multivariate analyses, including principal component analysis (PCA) and orthogonal partial least squares discriminant analysis (OPLS-DA) plots. These metabolic markers were identified by comparing with the dereplication information of the reference standards of 22 ginsenosides, or they were assigned using the pattern of the MS/MS fragmented ions. Consequently, the developed metabolic profiling approach using UPLC-QTOF/MS and multivariate analysis represents a new method for providing quality control as well as useful criteria for a similarity evaluation of the manufacturing process of ginseng preparations. Copyright © 2015 Elsevier B.V. All rights reserved.
Lu, Tsui-Shan; Longnecker, Matthew P.; Zhou, Haibo
2016-01-01
Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data and the general ODS design for a continuous response. While substantial work has been done for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome dependent sampling (Multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the Multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the Multivariate-ODS or the estimator from a simple random sample with the same sample size. The Multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of PCB exposure to hearing loss in children born to the Collaborative Perinatal Study. PMID:27966260
Deconstructing multivariate decoding for the study of brain function.
Hebart, Martin N; Baker, Chris I
2017-08-04
Multivariate decoding methods were developed originally as tools to enable accurate predictions in real-world applications. The realization that these methods can also be employed to study brain function has led to their widespread adoption in the neurosciences. However, prior to the rise of multivariate decoding, the study of brain function was firmly embedded in a statistical philosophy grounded on univariate methods of data analysis. In this way, multivariate decoding for brain interpretation grew out of two established frameworks: multivariate decoding for predictions in real-world applications, and classical univariate analysis based on the study and interpretation of brain activation. We argue that this led to two confusions, one reflecting a mixture of multivariate decoding for prediction or interpretation, and the other a mixture of the conceptual and statistical philosophies underlying multivariate decoding and classical univariate analysis. Here we attempt to systematically disambiguate multivariate decoding for the study of brain function from the frameworks it grew out of. After elaborating these confusions and their consequences, we describe six, often unappreciated, differences between classical univariate analysis and multivariate decoding. We then focus on how the common interpretation of what is signal and noise changes in multivariate decoding. Finally, we use four examples to illustrate where these confusions may impact the interpretation of neuroimaging data. We conclude with a discussion of potential strategies to help resolve these confusions in interpreting multivariate decoding results, including the potential departure from multivariate decoding methods for the study of brain function. Copyright © 2017. Published by Elsevier Inc.
A Simpli ed, General Approach to Simulating from Multivariate Copula Functions
Barry Goodwin
2012-01-01
Copulas have become an important analytic tool for characterizing multivariate distributions and dependence. One is often interested in simulating data from copula estimates. The process can be analytically and computationally complex and usually involves steps that are unique to a given parametric copula. We describe an alternative approach that uses \\probability{...
Haller, Sven; Lovblad, Karl-Olof; Giannakopoulos, Panteleimon; Van De Ville, Dimitri
2014-05-01
Many diseases are associated with systematic modifications in brain morphometry and function. These alterations may be subtle, in particular at early stages of the disease progress, and thus not evident by visual inspection alone. Group-level statistical comparisons have dominated neuroimaging studies for many years, proving fascinating insight into brain regions involved in various diseases. However, such group-level results do not warrant diagnostic value for individual patients. Recently, pattern recognition approaches have led to a fundamental shift in paradigm, bringing multivariate analysis and predictive results, notably for the early diagnosis of individual patients. We review the state-of-the-art fundamentals of pattern recognition including feature selection, cross-validation and classification techniques, as well as limitations including inter-individual variation in normal brain anatomy and neurocognitive reserve. We conclude with the discussion of future trends including multi-modal pattern recognition, multi-center approaches with data-sharing and cloud-computing.
Predicting major element mineral/melt equilibria - A statistical approach
NASA Technical Reports Server (NTRS)
Hostetler, C. J.; Drake, M. J.
1980-01-01
Empirical equations have been developed for calculating the mole fractions of NaO0.5, MgO, AlO1.5, SiO2, KO0.5, CaO, TiO2, and FeO in a solid phase of initially unknown identity given only the composition of the coexisting silicate melt. The approach involves a linear multivariate regression analysis in which solid composition is expressed as a Taylor series expansion of the liquid compositions. An internally consistent precision of approximately 0.94 is obtained, that is, the nature of the liquidus phase in the input data set can be correctly predicted for approximately 94% of the entries. The composition of the liquidus phase may be calculated to better than 5 mol % absolute. An important feature of this 'generalized solid' model is its reversibility; that is, the dependent and independent variables in the linear multivariate regression may be inverted to permit prediction of the composition of a silicate liquid produced by equilibrium partial melting of a polymineralic source assemblage.
Exploring image data assimilation in the prospect of high-resolution satellite oceanic observations
NASA Astrophysics Data System (ADS)
Durán Moro, Marina; Brankart, Jean-Michel; Brasseur, Pierre; Verron, Jacques
2017-07-01
Satellite sensors increasingly provide high-resolution (HR) observations of the ocean. They supply observations of sea surface height (SSH) and of tracers of the dynamics such as sea surface salinity (SSS) and sea surface temperature (SST). In particular, the Surface Water Ocean Topography (SWOT) mission will provide measurements of the surface ocean topography at very high-resolution (HR) delivering unprecedented information on the meso-scale and submeso-scale dynamics. This study investigates the feasibility to use these measurements to reconstruct meso-scale features simulated by numerical models, in particular on the vertical dimension. A methodology to reconstruct three-dimensional (3D) multivariate meso-scale scenes is developed by using a HR numerical model of the Solomon Sea region. An inverse problem is defined in the framework of a twin experiment where synthetic observations are used. A true state is chosen among the 3D multivariate states which is considered as a reference state. In order to correct a first guess of this true state, a two-step analysis is carried out. A probability distribution of the first guess is defined and updated at each step of the analysis: (i) the first step applies the analysis scheme of a reduced-order Kalman filter to update the first guess probability distribution using SSH observation; (ii) the second step minimizes a cost function using observations of HR image structure and a new probability distribution is estimated. The analysis is extended to the vertical dimension using 3D multivariate empirical orthogonal functions (EOFs) and the probabilistic approach allows the update of the probability distribution through the two-step analysis. Experiments show that the proposed technique succeeds in correcting a multivariate state using meso-scale and submeso-scale information contained in HR SSH and image structure observations. It also demonstrates how the surface information can be used to reconstruct the ocean state below the surface.
An improved method for bivariate meta-analysis when within-study correlations are unknown.
Hong, Chuan; D Riley, Richard; Chen, Yong
2018-03-01
Multivariate meta-analysis, which jointly analyzes multiple and possibly correlated outcomes in a single analysis, is becoming increasingly popular in recent years. An attractive feature of the multivariate meta-analysis is its ability to account for the dependence between multiple estimates from the same study. However, standard inference procedures for multivariate meta-analysis require the knowledge of within-study correlations, which are usually unavailable. This limits standard inference approaches in practice. Riley et al proposed a working model and an overall synthesis correlation parameter to account for the marginal correlation between outcomes, where the only data needed are those required for a separate univariate random-effects meta-analysis. As within-study correlations are not required, the Riley method is applicable to a wide variety of evidence synthesis situations. However, the standard variance estimator of the Riley method is not entirely correct under many important settings. As a consequence, the coverage of a function of pooled estimates may not reach the nominal level even when the number of studies in the multivariate meta-analysis is large. In this paper, we improve the Riley method by proposing a robust variance estimator, which is asymptotically correct even when the model is misspecified (ie, when the likelihood function is incorrect). Simulation studies of a bivariate meta-analysis, in a variety of settings, show a function of pooled estimates has improved performance when using the proposed robust variance estimator. In terms of individual pooled estimates themselves, the standard variance estimator and robust variance estimator give similar results to the original method, with appropriate coverage. The proposed robust variance estimator performs well when the number of studies is relatively large. Therefore, we recommend the use of the robust method for meta-analyses with a relatively large number of studies (eg, m≥50). When the sample size is relatively small, we recommend the use of the robust method under the working independence assumption. We illustrate the proposed method through 2 meta-analyses. Copyright © 2017 John Wiley & Sons, Ltd.
Iorgulescu, E; Voicu, V A; Sârbu, C; Tache, F; Albu, F; Medvedovici, A
2016-08-01
The influence of the experimental variability (instrumental repeatability, instrumental intermediate precision and sample preparation variability) and data pre-processing (normalization, peak alignment, background subtraction) on the discrimination power of multivariate data analysis methods (Principal Component Analysis -PCA- and Cluster Analysis -CA-) as well as a new algorithm based on linear regression was studied. Data used in the study were obtained through positive or negative ion monitoring electrospray mass spectrometry (+/-ESI/MS) and reversed phase liquid chromatography/UV spectrometric detection (RPLC/UV) applied to green tea extracts. Extractions in ethanol and heated water infusion were used as sample preparation procedures. The multivariate methods were directly applied to mass spectra and chromatograms, involving strictly a holistic comparison of shapes, without assignment of any structural identity to compounds. An alternative data interpretation based on linear regression analysis mutually applied to data series is also discussed. Slopes, intercepts and correlation coefficients produced by the linear regression analysis applied on pairs of very large experimental data series successfully retain information resulting from high frequency instrumental acquisition rates, obviously better defining the profiles being compared. Consequently, each type of sample or comparison between samples produces in the Cartesian space an ellipsoidal volume defined by the normal variation intervals of the slope, intercept and correlation coefficient. Distances between volumes graphically illustrates (dis)similarities between compared data. The instrumental intermediate precision had the major effect on the discrimination power of the multivariate data analysis methods. Mass spectra produced through ionization from liquid state in atmospheric pressure conditions of bulk complex mixtures resulting from extracted materials of natural origins provided an excellent data basis for multivariate analysis methods, equivalent to data resulting from chromatographic separations. The alternative evaluation of very large data series based on linear regression analysis produced information equivalent to results obtained through application of PCA an CA. Copyright © 2016 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sreepathi, Sarat; Kumar, Jitendra; Mills, Richard T.
A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like themore » Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.« less
Multivariate meta-analysis: Potential and promise
Jackson, Dan; Riley, Richard; White, Ian R
2011-01-01
The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day ‘Multivariate meta-analysis’ event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd. PMID:21268052
Bludau, Sebastian; Bzdok, Danilo; Gruber, Oliver; Kohn, Nils; Riedl, Valentin; Sorg, Christian; Palomero-Gallagher, Nicola; Müller, Veronika I.; Hoffstaedter, Felix; Amunts, Katrin; Eickhoff, Simon B.
2017-01-01
Objective The heterogeneous human frontal pole has been identified as a node in the dysfunctional network of major depressive disorder. The contribution of the medial (socio-affective) versus lateral (cognitive) frontal pole to major depression pathogenesis is currently unclear. The present study performs morphometric comparison of the microstructurally informed subdivisions of human frontal pole between depressed patients and controls using both uni- and multivariate statistics. Methods Multi-site voxel- and region-based morphometric MRI analysis of 73 depressed patients and 73 matched controls without psychiatric history. Frontal pole volume was first compared between depressed patients and controls by subdivision-wise classical morphometric analysis. In a second approach, frontal pole volume was compared by subdivision-naive multivariate searchlight analysis based on support vector machines. Results Subdivision-wise morphometric analysis found a significantly smaller medial frontal pole in depressed patients with a negative correlation of disease severity and duration. Histologically uninformed multivariate voxel-wise statistics provided converging evidence for structural aberrations specific to the microstructurally defined medial area of the frontal pole in depressed patients. Conclusions Across disparate methods, we demonstrated subregion specificity in the left medial frontal pole volume in depressed patients. Indeed, the frontal pole was shown to structurally and functionally connect to other key regions in major depression pathology like the anterior cingulate cortex and the amygdala via the uncinate fasciculus. Present and previous findings consolidate the left medial portion of the frontal pole as particularly altered in major depression. PMID:26621569
Yan, Yan; Zhang, Qianqian; Feng, Fang
2016-07-01
Sulfur fumigation has recently been used during the postharvest handling of rhubarb to reduce the drying duration and control pests. However, a few reports question the effect of sulfur fumigation on the bioactive components of rhubarb, which is crucial for the quality evaluation of the herbal medicine. The bottleneck limiting the study comes from the complex compounds that exist in herb samples with diverse structural features, wide concentration range and the difficulty to obtain all the reference standards. In this study, an integrated strategy based on the highly effective separation and analysis by liquid chromatography coupled with diode-array detection and time-of-flight/triple-quadruple tandem mass spectrometry combined with multivariate analysis was established. 68 phenolic compounds that exist in nonfumigated and sulfur-fumigated herb samples of rhubarb were tentatively assigned based on their retention behavior, UV spectra, accurate molecular weight, and mass spectral fragments. Qualitative and semiquantitative comparison revealed a serious reduction of the majority of phenolic compounds in sulfur-fumigated rhubarb. Furthermore, multivariate analysis was applied to holistically discriminate nonfumigated from sulfur-fumigated rhubarb and explore the characteristic chemical markers. The established approach was specific and rapid for characterizing and screening sulfur-fumigated rhubarb among commercial samples and could be applied for the quality assessment of other sulfur-fumigated herbs. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Probabilistic flood damage modelling at the meso-scale
NASA Astrophysics Data System (ADS)
Kreibich, Heidi; Botto, Anna; Schröter, Kai; Merz, Bruno
2014-05-01
Decisions on flood risk management and adaptation are usually based on risk analyses. Such analyses are associated with significant uncertainty, even more if changes in risk due to global change are expected. Although uncertainty analysis and probabilistic approaches have received increased attention during the last years, they are still not standard practice for flood risk assessments. Most damage models have in common that complex damaging processes are described by simple, deterministic approaches like stage-damage functions. Novel probabilistic, multi-variate flood damage models have been developed and validated on the micro-scale using a data-mining approach, namely bagging decision trees (Merz et al. 2013). In this presentation we show how the model BT-FLEMO (Bagging decision Tree based Flood Loss Estimation MOdel) can be applied on the meso-scale, namely on the basis of ATKIS land-use units. The model is applied in 19 municipalities which were affected during the 2002 flood by the River Mulde in Saxony, Germany. The application of BT-FLEMO provides a probability distribution of estimated damage to residential buildings per municipality. Validation is undertaken on the one hand via a comparison with eight other damage models including stage-damage functions as well as multi-variate models. On the other hand the results are compared with official damage data provided by the Saxon Relief Bank (SAB). The results show, that uncertainties of damage estimation remain high. Thus, the significant advantage of this probabilistic flood loss estimation model BT-FLEMO is that it inherently provides quantitative information about the uncertainty of the prediction. Reference: Merz, B.; Kreibich, H.; Lall, U. (2013): Multi-variate flood damage assessment: a tree-based data-mining approach. NHESS, 13(1), 53-64.
Elevated body mass index and risk of postoperative CSF leak following transsphenoidal surgery
Dlouhy, Brian J.; Madhavan, Karthik; Clinger, John D.; Reddy, Ambur; Dawson, Jeffrey D.; O’Brien, Erin K.; Chang, Eugene; Graham, Scott M.; Greenlee, Jeremy D. W.
2012-01-01
Object Postoperative CSF leakage can be a serious complication after a transsphenoidal surgical approach. An elevated body mass index (BMI) is a significant risk factor for spontaneous CSF leaks. However, there is no evidence correlating BMI with postoperative CSF leak after transsphenoidal surgery. The authors hypothesized that patients with elevated BMI would have a higher incidence of CSF leakage complications following transsphenoidal surgery. Methods The authors conducted a retrospective review of 121 patients who, between August 2005 and March 2010, underwent endoscopic endonasal transsphenoidal surgeries for resection of primarily sellar masses. Patients requiring extended transsphenoidal approaches were excluded. A multivariate statistical analysis was performed to investigate the association of BMI and other risk factors with postoperative CSF leakage. Results In 92 patients, 96 endonasal endoscopic transsphenoidal surgeries were performed that met inclusion criteria. Thirteen postoperative leaks occurred and required subsequent treatment, including lumbar drainage and/or reoperation. The average BMI of patients with a postoperative CSF leak was significantly greater than that in patients with no postoperative CSF leak (39.2 vs 32.9 kg/m2, p = 0.006). Multivariate analyses indicate that for every 5-kg/m2 increase in BMI, patients undergoing a transsphenoidal approach for a primarily sellar mass have 1.61 times the odds (95% CI 1.10–2.29, p = 0.016, by multivariate logistic regression) of having a postoperative CSF leak. Conclusions Elevated BMI is an independent predictor of postoperative CSF leak after an endonasal endoscopic transsphenoidal approach. The authors recommend that patients with BMI greater than 30 kg/m2 have meticulous sellar reconstruction at surgery and close monitoring postoperatively. PMID:22443502
An EEMD-ICA Approach to Enhancing Artifact Rejection for Noisy Multivariate Neural Data.
Zeng, Ke; Chen, Dan; Ouyang, Gaoxiang; Wang, Lizhe; Liu, Xianzeng; Li, Xiaoli
2016-06-01
As neural data are generally noisy, artifact rejection is crucial for data preprocessing. It has long been a grand research challenge for an approach which is able: 1) to remove the artifacts and 2) to avoid loss or disruption of the structural information at the same time, thus the risk of introducing bias to data interpretation may be minimized. In this study, an approach (namely EEMD-ICA) was proposed to first decompose multivariate neural data that are possibly noisy into intrinsic mode functions (IMFs) using ensemble empirical mode decomposition (EEMD). Independent component analysis (ICA) was then applied to the IMFs to separate the artifactual components. The approach was tested against the classical ICA and the automatic wavelet ICA (AWICA) methods, which were dominant methods for artifact rejection. In order to evaluate the effectiveness of the proposed approach in handling neural data possibly with intensive noises, experiments on artifact removal were performed using semi-simulated data mixed with a variety of noises. Experimental results indicate that the proposed approach continuously outperforms the counterparts in terms of both normalized mean square error (NMSE) and Structure SIMilarity (SSIM). The superiority becomes even greater with the decrease of SNR in all cases, e.g., SSIM of the EEMD-ICA can almost double that of AWICA and triple that of ICA. To further examine the potentials of the approach in sophisticated applications, the approach together with the counterparts were used to preprocess a real-life epileptic EEG with absence seizure. Experiments were carried out with the focus on characterizing the dynamics of the data after artifact rejection, i.e., distinguishing seizure-free, pre-seizure and seizure states. Using multi-scale permutation entropy to extract feature and linear discriminant analysis for classification, the EEMD-ICA performed the best for classifying the states (87.4%, about 4.1% and 8.7% higher than that of AWICA and ICA respectively), which was closest to the results of the manually selected dataset (89.7%).
Nayar, Gautam; Wang, Timothy; Sankey, Eric W; Berry-Candelario, John; Elsamadicy, Aladine A; Back, Adam; Karikari, Isaac; Isaacs, Robert
2018-05-19
Risk factors for surgical revision remain important because of additional readmission, anesthesia, and morbidity for the patient and significant cost for health care systems. Although the rate of reoperation (RRO) is well described for traditional open posterior (OP) approaches, the RRO in minimally invasive lateral (MIL) surgery remains poorly characterized. This study compares the RRO in patients undergoing decompressive lumbar spine surgery via MIL versus OP approaches. Patient demographics and comorbidities were retrospectively collected for 2060 patients undergoing single-stage elective lumbar spinal surgery at multiple institutions. A subset of 1484 patients had long-term data (long-term cohort [LT cohort]). The RRO was compared between approaches through univariate and multivariate analysis. There were 1292 patients (62.7%) who underwent lateral access surgery, whereas 768 patients (37.3%) underwent OP surgery. The MIL cohort was significantly older, had a higher proportion of men, and had more comorbidities than the OP cohort. In the LT cohort, lateral patients were significantly older and had more comorbidities, with a lower body mass index and a lower proportion of men and smokers. Surgical complications between the groups trended to be similar. The MIL cohort had a significantly lower RRO at both 30 days (approximately 57% lower, MIL cohort: 1.01% vs. OP cohort: 2.36%, P = 0.02) and 2 years (approximately 61% lower, MIL cohort: 2.09% vs. OP cohort: 5.37%, P < 0.01) after surgery. On multivariate analysis, surgical approach was the only significant predictor for the RRO at both 30 days (open posterior approach odds ratio [OR], 4.47; 95% confidence interval [CI], 1.33-15.09; P = 0.02) and 2 years (open posterior approach OR, 3.26; 95% CI, 1.26-8.42; P = 0.01). This study shows that MIL surgical approaches, compared with OP approaches, have a significantly lower RRO after lumbar spine surgery. Copyright © 2018 Elsevier Inc. All rights reserved.
Forcino, Frank L; Leighton, Lindsey R; Twerdy, Pamela; Cahill, James F
2015-01-01
Community ecologists commonly perform multivariate techniques (e.g., ordination, cluster analysis) to assess patterns and gradients of taxonomic variation. A critical requirement for a meaningful statistical analysis is accurate information on the taxa found within an ecological sample. However, oversampling (too many individuals counted per sample) also comes at a cost, particularly for ecological systems in which identification and quantification is substantially more resource consuming than the field expedition itself. In such systems, an increasingly larger sample size will eventually result in diminishing returns in improving any pattern or gradient revealed by the data, but will also lead to continually increasing costs. Here, we examine 396 datasets: 44 previously published and 352 created datasets. Using meta-analytic and simulation-based approaches, the research within the present paper seeks (1) to determine minimal sample sizes required to produce robust multivariate statistical results when conducting abundance-based, community ecology research. Furthermore, we seek (2) to determine the dataset parameters (i.e., evenness, number of taxa, number of samples) that require larger sample sizes, regardless of resource availability. We found that in the 44 previously published and the 220 created datasets with randomly chosen abundances, a conservative estimate of a sample size of 58 produced the same multivariate results as all larger sample sizes. However, this minimal number varies as a function of evenness, where increased evenness resulted in increased minimal sample sizes. Sample sizes as small as 58 individuals are sufficient for a broad range of multivariate abundance-based research. In cases when resource availability is the limiting factor for conducting a project (e.g., small university, time to conduct the research project), statistically viable results can still be obtained with less of an investment.
Quantitative image processing in fluid mechanics
NASA Technical Reports Server (NTRS)
Hesselink, Lambertus; Helman, James; Ning, Paul
1992-01-01
The current status of digital image processing in fluid flow research is reviewed. In particular, attention is given to a comprehensive approach to the extraction of quantitative data from multivariate databases and examples of recent developments. The discussion covers numerical simulations and experiments, data processing, generation and dissemination of knowledge, traditional image processing, hybrid processing, fluid flow vector field topology, and isosurface analysis using Marching Cubes.
Absences of Navy Enlisted Personnel: A Search for Gender Differences
1993-03-01
absenteeism behavior has changed over the past decade. Approach Two separate investigations were conducted to (1) perform a comprehensive comparison of women’s... absenteeism : A multivariate analysis with replication. Organizational Behavior and Human Performance , 26, 349-372. Hoiberg, A. (1980). Sex and occupational...Thomas, Marie D. Thomas. Paul Robertson 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Navy Personnel Research and
Multivariate Bioclimatic Ecosystem Change Approaches
2015-02-06
course the sandy soils of the Sandhills will not migrate. This observation suggests that a new nomenclature for ecosystems must be developed if...Coast Sandhills. At that time period, not only will the climate be similar, but the soil character will also be similar. Therefore about the year 2115...Disaggregation of global circulation model outputs decision and policy analysis. Working Paper No. 2. Cali, Colombia : International Centre for Tropical
Mercuri, A; Pagliari, M; Baxevanis, F; Fares, R; Fotaki, N
2017-02-25
In this study the selection of in vivo predictive in vitro dissolution experimental set-ups using a multivariate analysis approach, in line with the Quality by Design (QbD) principles, is explored. The dissolution variables selected using a design of experiments (DoE) were the dissolution apparatus [USP1 apparatus (basket) and USP2 apparatus (paddle)], the rotational speed of the basket/or paddle, the operator conditions (dissolution apparatus brand and operator), the volume, the pH, and the ethanol content of the dissolution medium. The dissolution profiles of two nifedipine capsules (poorly soluble compound), under conditions mimicking the intake of the capsules with i. water, ii. orange juice and iii. an alcoholic drink (orange juice and ethanol) were analysed using multiple linear regression (MLR). Optimised dissolution set-ups, generated based on the mathematical model obtained via MLR, were used to build predicted in vitro-in vivo correlations (IVIVC). IVIVC could be achieved using physiologically relevant in vitro conditions mimicking the intake of the capsules with an alcoholic drink (orange juice and ethanol). The multivariate analysis revealed that the concentration of ethanol used in the in vitro dissolution experiments (47% v/v) can be lowered to less than 20% v/v, reflecting recently found physiological conditions. Copyright © 2016 Elsevier B.V. All rights reserved.
Irano, Natalia; Bignardi, Annaiza Braga; El Faro, Lenira; Santana, Mário Luiz; Cardoso, Vera Lúcia; Albuquerque, Lucia Galvão
2014-03-01
The objective of this study was to estimate genetic parameters for milk yield, stayability, and the occurrence of clinical mastitis in Holstein cows, as well as studying the genetic relationship between them, in order to provide subsidies for the genetic evaluation of these traits. Records from 5,090 Holstein cows with calving varying from 1991 to 2010, were used in the analysis. Two standard multivariate analyses were carried out, one containing the trait of accumulated 305-day milk yields in the first lactation (MY1), stayability (STAY) until the third lactation, and clinical mastitis (CM), as well as the other traits, considering accumulated 305-day milk yields (Y305), STAY, and CM, including the first three lactations as repeated measures for Y305 and CM. The covariance components were obtained by a Bayesian approach. The heritability estimates obtained by multivariate analysis with MY1 were 0.19, 0.28, and 0.13 for MY1, STAY, and CM, respectively, whereas using the multivariate analysis with the Y305, the estimates were 0.19, 0.31, and 0.14, respectively. The genetic correlations between MY1 and STAY, MY1 and CM, and STAY and CM, respectively, were 0.38, 0.12, and -0.49. The genetic correlations between Y305 and STAY, Y305 and CM, and STAY and CM, respectively, were 0.66, -0.25, and -0.52.
Xie, Weixing; Jin, Daxiang; Ma, Hui; Ding, Jinyong; Xu, Jixi; Zhang, Shuncong; Liang, De
2016-05-01
The risk factors for cement leakage were retrospectively reviewed in 192 patients who underwent percutaneous vertebral augmentation (PVA). To discuss the factors related to the cement leakage in PVA procedure for the treatment of osteoporotic vertebral compression fractures. PVA is widely applied for the treatment of osteoporotic vertebral fractures. Cement leakage is a major complication of this procedure. The risk factors for cement leakage were controversial. A retrospective review of 192 patients who underwent PVA was conducted. The following data were recorded: age, sex, bone density, number of fractured vertebrae before surgery, number of treated vertebrae, severity of the treated vertebrae, operative approach, volume of injected bone cement, preoperative vertebral compression ratio, preoperative local kyphosis angle, intraosseous clefts, preoperative vertebral cortical bone defect, and ratio and type of cement leakage. To study the correlation between each factor and cement leakage ratio, bivariate regression analysis was employed to perform univariate analysis, whereas multivariate linear regression analysis was employed to perform multivariate analysis. The study included 192 patients (282 treated vertebrae), and cement leakage occurred in 100 vertebrae (35.46%). The vertebrae with preoperative cortical bone defects generally exhibited higher cement leakage ratio, and the leakage is typically type C. Vertebrae with intact cortical bones before the procedure tend to experience type S leakage. Univariate analysis showed that patient age, bone density, number of fractured vertebrae before surgery, and vertebral cortical bone were associated with cement leakage ratio (P<0.05). Multivariate analysis showed that the main factors influencing bone cement leakage are bone density and vertebral cortical bone defect, with standardized partial regression coefficients of -0.085 and 0.144, respectively. High bone density and vertebral cortical bone defect are independent risk factors associated with bone cement leakage.
Multivariate Longitudinal Analysis with Bivariate Correlation Test
Adjakossa, Eric Houngla; Sadissou, Ibrahim; Hounkonnou, Mahouton Norbert; Nuel, Gregory
2016-01-01
In the context of multivariate multilevel data analysis, this paper focuses on the multivariate linear mixed-effects model, including all the correlations between the random effects when the dimensional residual terms are assumed uncorrelated. Using the EM algorithm, we suggest more general expressions of the model’s parameters estimators. These estimators can be used in the framework of the multivariate longitudinal data analysis as well as in the more general context of the analysis of multivariate multilevel data. By using a likelihood ratio test, we test the significance of the correlations between the random effects of two dependent variables of the model, in order to investigate whether or not it is useful to model these dependent variables jointly. Simulation studies are done to assess both the parameter recovery performance of the EM estimators and the power of the test. Using two empirical data sets which are of longitudinal multivariate type and multivariate multilevel type, respectively, the usefulness of the test is illustrated. PMID:27537692
Multivariate Longitudinal Analysis with Bivariate Correlation Test.
Adjakossa, Eric Houngla; Sadissou, Ibrahim; Hounkonnou, Mahouton Norbert; Nuel, Gregory
2016-01-01
In the context of multivariate multilevel data analysis, this paper focuses on the multivariate linear mixed-effects model, including all the correlations between the random effects when the dimensional residual terms are assumed uncorrelated. Using the EM algorithm, we suggest more general expressions of the model's parameters estimators. These estimators can be used in the framework of the multivariate longitudinal data analysis as well as in the more general context of the analysis of multivariate multilevel data. By using a likelihood ratio test, we test the significance of the correlations between the random effects of two dependent variables of the model, in order to investigate whether or not it is useful to model these dependent variables jointly. Simulation studies are done to assess both the parameter recovery performance of the EM estimators and the power of the test. Using two empirical data sets which are of longitudinal multivariate type and multivariate multilevel type, respectively, the usefulness of the test is illustrated.
Early experiences building a software quality prediction model
NASA Technical Reports Server (NTRS)
Agresti, W. W.; Evanco, W. M.; Smith, M. C.
1990-01-01
Early experiences building a software quality prediction model are discussed. The overall research objective is to establish a capability to project a software system's quality from an analysis of its design. The technical approach is to build multivariate models for estimating reliability and maintainability. Data from 21 Ada subsystems were analyzed to test hypotheses about various design structures leading to failure-prone or unmaintainable systems. Current design variables highlight the interconnectivity and visibility of compilation units. Other model variables provide for the effects of reusability and software changes. Reported results are preliminary because additional project data is being obtained and new hypotheses are being developed and tested. Current multivariate regression models are encouraging, explaining 60 to 80 percent of the variation in error density of the subsystems.
Davatzikos, Christos
2016-10-01
The past 20 years have seen a mushrooming growth of the field of computational neuroanatomy. Much of this work has been enabled by the development and refinement of powerful, high-dimensional image warping methods, which have enabled detailed brain parcellation, voxel-based morphometric analyses, and multivariate pattern analyses using machine learning approaches. The evolution of these 3 types of analyses over the years has overcome many challenges. We present the evolution of our work in these 3 directions, which largely follows the evolution of this field. We discuss the progression from single-atlas, single-registration brain parcellation work to current ensemble-based parcellation; from relatively basic mass-univariate t-tests to optimized regional pattern analyses combining deformations and residuals; and from basic application of support vector machines to generative-discriminative formulations of multivariate pattern analyses, and to methods dealing with heterogeneity of neuroanatomical patterns. We conclude with discussion of some of the future directions and challenges. Copyright © 2016. Published by Elsevier B.V.
Finley, Andrew O.; Banerjee, Sudipto; Cook, Bruce D.; Bradford, John B.
2013-01-01
In this paper we detail a multivariate spatial regression model that couples LiDAR, hyperspectral and forest inventory data to predict forest outcome variables at a high spatial resolution. The proposed model is used to analyze forest inventory data collected on the US Forest Service Penobscot Experimental Forest (PEF), ME, USA. In addition to helping meet the regression model's assumptions, results from the PEF analysis suggest that the addition of multivariate spatial random effects improves model fit and predictive ability, compared with two commonly applied modeling approaches. This improvement results from explicitly modeling the covariation among forest outcome variables and spatial dependence among observations through the random effects. Direct application of such multivariate models to even moderately large datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. We apply a spatial dimension reduction technique to help overcome this computational hurdle without sacrificing richness in modeling.
Davatzikos, Christos
2017-01-01
The past 20 years have seen a mushrooming growth of the field of computational neuroanatomy. Much of this work has been enabled by the development and refinement of powerful, high-dimensional image warping methods, which have enabled detailed brain parcellation, voxel-based morphometric analyses, and multivariate pattern analyses using machine learning approaches. The evolution of these 3 types of analyses over the years has overcome many challenges. We present the evolution of our work in these 3 directions, which largely follows the evolution of this field. We discuss the progression from single-atlas, single-registration brain parcellation work to current ensemble-based parcellation; from relatively basic mass-univariate t-tests to optimized regional pattern analyses combining deformations and residuals; and from basic application of support vector machines to generative-discriminative formulations of multivariate pattern analyses, and to methods dealing with heterogeneity of neuroanatomical patterns. We conclude with discussion of some of the future directions and challenges. PMID:27514582
An effective drift correction for dynamical downscaling of decadal global climate predictions
NASA Astrophysics Data System (ADS)
Paeth, Heiko; Li, Jingmin; Pollinger, Felix; Müller, Wolfgang A.; Pohlmann, Holger; Feldmann, Hendrik; Panitz, Hans-Jürgen
2018-04-01
Initialized decadal climate predictions with coupled climate models are often marked by substantial climate drifts that emanate from a mismatch between the climatology of the coupled model system and the data set used for initialization. While such drifts may be easily removed from the prediction system when analyzing individual variables, a major problem prevails for multivariate issues and, especially, when the output of the global prediction system shall be used for dynamical downscaling. In this study, we present a statistical approach to remove climate drifts in a multivariate context and demonstrate the effect of this drift correction on regional climate model simulations over the Euro-Atlantic sector. The statistical approach is based on an empirical orthogonal function (EOF) analysis adapted to a very large data matrix. The climate drift emerges as a dramatic cooling trend in North Atlantic sea surface temperatures (SSTs) and is captured by the leading EOF of the multivariate output from the global prediction system, accounting for 7.7% of total variability. The SST cooling pattern also imposes drifts in various atmospheric variables and levels. The removal of the first EOF effectuates the drift correction while retaining other components of intra-annual, inter-annual and decadal variability. In the regional climate model, the multivariate drift correction of the input data removes the cooling trends in most western European land regions and systematically reduces the discrepancy between the output of the regional climate model and observational data. In contrast, removing the drift only in the SST field from the global model has hardly any positive effect on the regional climate model.
Venetis, Christos A; Kolibianakis, Efstratios M; Bosdou, Julia K; Lainas, George T; Sfontouris, Ioannis A; Tarlatzis, Basil C; Lainas, Tryfon G
2015-03-01
What is the proper way of assessing the effect of progesterone elevation (PE) on the day of hCG on live birth in women undergoing fresh embryo transfer after in vitro fertilization (IVF) using GnRH analogues and gonadotrophins? This study indicates that a multivariable approach, where the effect of the most important confounders is controlled for, can lead to markedly different results regarding the association between PE on the day of hCG and live birth rates after IVF when compared with the bivariate analysis that has been typically used in the relevant literature up to date. PE on the day of hCG is associated with decreased pregnancy rates in fresh IVF cycles. Evidence for this comes from observational studies that mostly failed to control for potential confounders. This is a retrospective analysis of a cohort of fresh IVF/intracytoplasmic sperm injection cycles (n = 3296) performed in a single IVF centre during the period 2001-2013. Patients in whom ovarian stimulation was performed with gonadotrophins and GnRH analogues. Natural cycles and cycles where stimulation involved the administration of clomiphene were excluded. In order to reflect routine clinical practice, no other exclusion criteria were imposed on this dataset. The primary outcome measure for this study was live birth defined as the delivery of a live infant after 24 weeks of gestation. We compared the association between PE on the day of hCG (defined as P > 1.5 ng/ml) and live birth rates calculated by simple bivariate analyses with that derived from multivariable logistic regression. The multivariable analysis controlled for female age, number of oocytes retrieved, number of embryos transferred, developmental stage of embryos at transfer (cleavage versus blastocyst), whether at least one good-quality embryo was transferred, the woman's body mass index, the total dose of FSH administered during ovarian stimulation and the type of GnRH analogues used (agonists versus antagonists) during ovarian stimulation. In addition, an interaction analysis was performed in order to assess whether the ovarian response (<6, 6-18, >18 oocytes) has a moderating effect on the association of PE on the day of hCG with live birth rates after IVF. Live birth rates were not significantly different between cycles with and those without PE when a bivariate analysis was performed [odds ratio (OR): 0.78, 95% confidence interval (CI): 0.56-1.09]. However, when a multivariable analysis was performed, controlling for the effect of the aforementioned confounders, live birth rates (OR: 0.68, 95% CI: 0.48-0.97) were significantly decreased in the group with PE on the day of hCG. The number of oocytes retrieved was the most potent confounder, causing a 29.4% reduction in the OR for live birth between the two groups compared. Furthermore, a moderating effect of ovarian response on the association between PE and live birth rates was not supported in the present analysis since no interaction was detected between PE and the type of ovarian response (<6, 6-18, >18 oocytes). This is a retrospective analysis of data collected during a 12-year period, and although the effect of the most important confounders was controlled for in the multivariable analysis, the presence of residual bias cannot be excluded. This analysis highlights the need for a multivariable approach when researchers or clinicians aim to evaluate the impact of PE on pregnancy rates in their own clinical setting. Failure to do so might explain why many past studies have failed to identify the detrimental effect of PE in fresh IVF cycles. None. © The Author 2015. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Ghanate, A. D.; Kothiwale, S.; Singh, S. P.; Bertrand, Dominique; Krishna, C. Murali
2011-02-01
Cancer is now recognized as one of the major causes of morbidity and mortality. Histopathological diagnosis, the gold standard, is shown to be subjective, time consuming, prone to interobserver disagreement, and often fails to predict prognosis. Optical spectroscopic methods are being contemplated as adjuncts or alternatives to conventional cancer diagnostics. The most important aspect of these approaches is their objectivity, and multivariate statistical tools play a major role in realizing it. However, rigorous evaluation of the robustness of spectral models is a prerequisite. The utility of Raman spectroscopy in the diagnosis of cancers has been well established. Until now, the specificity and applicability of spectral models have been evaluated for specific cancer types. In this study, we have evaluated the utility of spectroscopic models representing normal and malignant tissues of the breast, cervix, colon, larynx, and oral cavity in a broader perspective, using different multivariate tests. The limit test, which was used in our earlier study, gave high sensitivity but suffered from poor specificity. The performance of other methods such as factorial discriminant analysis and partial least square discriminant analysis are at par with more complex nonlinear methods such as decision trees, but they provide very little information about the classification model. This comparative study thus demonstrates not just the efficacy of Raman spectroscopic models but also the applicability and limitations of different multivariate tools for discrimination under complex conditions such as the multicancer scenario.
A Review of Calibration Transfer Practices and Instrument Differences in Spectroscopy.
Workman, Jerome J
2018-03-01
Calibration transfer for use with spectroscopic instruments, particularly for near-infrared, infrared, and Raman analysis, has been the subject of multiple articles, research papers, book chapters, and technical reviews. There has been a myriad of approaches published and claims made for resolving the problems associated with transferring calibrations; however, the capability of attaining identical results over time from two or more instruments using an identical calibration still eludes technologists. Calibration transfer, in a precise definition, refers to a series of analytical approaches or chemometric techniques used to attempt to apply a single spectral database, and the calibration model developed using that database, for two or more instruments, with statistically retained accuracy and precision. Ideally, one would develop a single calibration for any particular application, and move it indiscriminately across instruments and achieve identical analysis or prediction results. There are many technical aspects involved in such precision calibration transfer, related to the measuring instrument reproducibility and repeatability, the reference chemical values used for the calibration, the multivariate mathematics used for calibration, and sample presentation repeatability and reproducibility. Ideally, a multivariate model developed on a single instrument would provide a statistically identical analysis when used on other instruments following transfer. This paper reviews common calibration transfer techniques, mostly related to instrument differences, and the mathematics of the uncertainty between instruments when making spectroscopic measurements of identical samples. It does not specifically address calibration maintenance or reference laboratory differences.
The impact of multiple endpoint dependency on Q and I(2) in meta-analysis.
Thompson, Christopher Glen; Becker, Betsy Jane
2014-09-01
A common assumption in meta-analysis is that effect sizes are independent. When correlated effect sizes are analyzed using traditional univariate techniques, this assumption is violated. This research assesses the impact of dependence arising from treatment-control studies with multiple endpoints on homogeneity measures Q and I(2) in scenarios using the unbiased standardized-mean-difference effect size. Univariate and multivariate meta-analysis methods are examined. Conditions included different overall outcome effects, study sample sizes, numbers of studies, between-outcomes correlations, dependency structures, and ways of computing the correlation. The univariate approach used typical fixed-effects analyses whereas the multivariate approach used generalized least-squares (GLS) estimates of a fixed-effects model, weighted by the inverse variance-covariance matrix. Increased dependence among effect sizes led to increased Type I error rates from univariate models. When effect sizes were strongly dependent, error rates were drastically higher than nominal levels regardless of study sample size and number of studies. In contrast, using GLS estimation to account for multiple-endpoint dependency maintained error rates within nominal levels. Conversely, mean I(2) values were not greatly affected by increased amounts of dependency. Last, we point out that the between-outcomes correlation should be estimated as a pooled within-groups correlation rather than using a full-sample estimator that does not consider treatment/control group membership. Copyright © 2014 John Wiley & Sons, Ltd.
A Semi-parametric Multivariate Gap-filling Model for Eddy Covariance Latent Heat Flux
NASA Astrophysics Data System (ADS)
Li, M.; Chen, Y.
2010-12-01
Quantitative descriptions of latent heat fluxes are important to study the water and energy exchanges between terrestrial ecosystems and the atmosphere. The eddy covariance approaches have been recognized as the most reliable technique for measuring surface fluxes over time scales ranging from hours to years. However, unfavorable micrometeorological conditions, instrument failures, and applicable measurement limitations may cause inevitable flux gaps in time series data. Development and application of suitable gap-filling techniques are crucial to estimate long term fluxes. In this study, a semi-parametric multivariate gap-filling model was developed to fill latent heat flux gaps for eddy covariance measurements. Our approach combines the advantages of a multivariate statistical analysis (principal component analysis, PCA) and a nonlinear interpolation technique (K-nearest-neighbors, KNN). The PCA method was first used to resolve the multicollinearity relationships among various hydrometeorological factors, such as radiation, soil moisture deficit, LAI, and wind speed. The KNN method was then applied as a nonlinear interpolation tool to estimate the flux gaps as the weighted sum latent heat fluxes with the K-nearest distances in the PCs’ domain. Two years, 2008 and 2009, of eddy covariance and hydrometeorological data from a subtropical mixed evergreen forest (the Lien-Hua-Chih Site) were collected to calibrate and validate the proposed approach with artificial gaps after standard QC/QA procedures. The optimal K values and weighting factors were determined by the maximum likelihood test. The results of gap-filled latent heat fluxes conclude that developed model successful preserving energy balances of daily, monthly, and yearly time scales. Annual amounts of evapotranspiration from this study forest were 747 mm and 708 mm for 2008 and 2009, respectively. Nocturnal evapotranspiration was estimated with filled gaps and results are comparable with other studies. Seasonal and daily variability of latent heat fluxes were also discussed.
Jeon, Jihyoun; Hsu, Li; Gorfine, Malka
2012-07-01
Frailty models are useful for measuring unobserved heterogeneity in risk of failures across clusters, providing cluster-specific risk prediction. In a frailty model, the latent frailties shared by members within a cluster are assumed to act multiplicatively on the hazard function. In order to obtain parameter and frailty variate estimates, we consider the hierarchical likelihood (H-likelihood) approach (Ha, Lee and Song, 2001. Hierarchical-likelihood approach for frailty models. Biometrika 88, 233-243) in which the latent frailties are treated as "parameters" and estimated jointly with other parameters of interest. We find that the H-likelihood estimators perform well when the censoring rate is low, however, they are substantially biased when the censoring rate is moderate to high. In this paper, we propose a simple and easy-to-implement bias correction method for the H-likelihood estimators under a shared frailty model. We also extend the method to a multivariate frailty model, which incorporates complex dependence structure within clusters. We conduct an extensive simulation study and show that the proposed approach performs very well for censoring rates as high as 80%. We also illustrate the method with a breast cancer data set. Since the H-likelihood is the same as the penalized likelihood function, the proposed bias correction method is also applicable to the penalized likelihood estimators.
The chronnectome: time-varying connectivity networks as the next frontier in fMRI data discovery.
Calhoun, Vince D; Miller, Robyn; Pearlson, Godfrey; Adalı, Tulay
2014-10-22
Recent years have witnessed a rapid growth of interest in moving functional magnetic resonance imaging (fMRI) beyond simple scan-length averages and into approaches that capture time-varying properties of connectivity. In this Perspective we use the term "chronnectome" to describe metrics that allow a dynamic view of coupling. In the chronnectome, coupling refers to possibly time-varying levels of correlated or mutually informed activity between brain regions whose spatial properties may also be temporally evolving. We primarily focus on multivariate approaches developed in our group and review a number of approaches with an emphasis on matrix decompositions such as principle component analysis and independent component analysis. We also discuss the potential these approaches offer to improve characterization and understanding of brain function. There are a number of methodological directions that need to be developed further, but chronnectome approaches already show great promise for the study of both the healthy and the diseased brain.
Hybrid Arrays for Chemical Sensing
NASA Astrophysics Data System (ADS)
Kramer, Kirsten E.; Rose-Pehrsson, Susan L.; Johnson, Kevin J.; Minor, Christian P.
In recent years, multisensory approaches to environment monitoring for chemical detection as well as other forms of situational awareness have become increasingly popular. A hybrid sensor is a multimodal system that incorporates several sensing elements and thus produces data that are multivariate in nature and may be significantly increased in complexity compared to data provided by single-sensor systems. Though a hybrid sensor is itself an array, hybrid sensors are often organized into more complex sensing systems through an assortment of network topologies. Part of the reason for the shift to hybrid sensors is due to advancements in sensor technology and computational power available for processing larger amounts of data. There is also ample evidence to support the claim that a multivariate analytical approach is generally superior to univariate measurements because it provides additional redundant and complementary information (Hall, D. L.; Linas, J., Eds., Handbook of Multisensor Data Fusion, CRC, Boca Raton, FL, 2001). However, the benefits of a multisensory approach are not automatically achieved. Interpretation of data from hybrid arrays of sensors requires the analyst to develop an application-specific methodology to optimally fuse the disparate sources of data generated by the hybrid array into useful information characterizing the sample or environment being observed. Consequently, multivariate data analysis techniques such as those employed in the field of chemometrics have become more important in analyzing sensor array data. Depending on the nature of the acquired data, a number of chemometric algorithms may prove useful in the analysis and interpretation of data from hybrid sensor arrays. It is important to note, however, that the challenges posed by the analysis of hybrid sensor array data are not unique to the field of chemical sensing. Applications in electrical and process engineering, remote sensing, medicine, and of course, artificial intelligence and robotics, all share the same essential data fusion challenges. The design of a hybrid sensor array should draw on this extended body of knowledge. In this chapter, various techniques for data preprocessing, feature extraction, feature selection, and modeling of sensor data will be introduced and illustrated with data fusion approaches that have been implemented in applications involving data from hybrid arrays. The example systems discussed in this chapter involve the development of prototype sensor networks for damage control event detection aboard US Navy vessels and the development of analysis algorithms to combine multiple sensing techniques for enhanced remote detection of unexploded ordnance (UXO) in both ground surveys and wide area assessments.
ERIC Educational Resources Information Center
Sun, Anji; Valiga, Michael J.
In this study, the reliability of the American College Testing (ACT) Program's "Survey of Academic Advising" (SAA) was examined using both univariate and multivariate generalizability theory approaches. The primary purpose of the study was to compare the results of three generalizability theory models (a random univariate model, a mixed…
Remote Multivariable Control Design Using a Competition Game
ERIC Educational Resources Information Center
Atanasijevic-Kunc, M.; Logar, V.; Karba, R.; Papic, M.; Kos, A.
2011-01-01
In this paper, some approaches to teaching multivariable control design are discussed, with special attention being devoted to a step-by-step transition to e-learning. The approach put into practice and presented here is developed through design projects, from which one is chosen as a competition game and is realized using the E-CHO system,…
Meta-analysis of gene-level associations for rare variants based on single-variant statistics.
Hu, Yi-Juan; Berndt, Sonja I; Gustafsson, Stefan; Ganna, Andrea; Hirschhorn, Joel; North, Kari E; Ingelsson, Erik; Lin, Dan-Yu
2013-08-08
Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Understanding Information Flow Interaction along Separable Causal Paths in Environmental Signals
NASA Astrophysics Data System (ADS)
Jiang, P.; Kumar, P.
2017-12-01
Multivariate environmental signals reflect the outcome of complex inter-dependencies, such as those in ecohydrologic systems. Transfer entropy and information partitioning approaches have been used to characterize such dependencies. However, these approaches capture net information flow occurring through a multitude of pathways involved in the interaction and as a result mask our ability to discern the causal interaction within an interested subsystem through specific pathways. We build on recent developments of momentary information transfer along causal paths proposed by Runge [2015] to develop a framework for quantifying information decomposition along separable causal paths. Momentary information transfer along causal paths captures the amount of information flow between any two variables lagged at two specific points in time. Our approach expands this concept to characterize the causal interaction in terms of synergistic, unique and redundant information flow through separable causal paths. Multivariate analysis using this novel approach reveals precise understanding of causality and feedback. We illustrate our approach with synthetic and observed time series data. We believe the proposed framework helps better delineate the internal structure of complex systems in geoscience where huge amounts of observational datasets exist, and it will also help the modeling community by providing a new way to look at the complexity of real and modeled systems. Runge, Jakob. "Quantifying information transfer and mediation along causal pathways in complex systems." Physical Review E 92.6 (2015): 062829.
Multivariate Regression Analysis and Slaughter Livestock,
AGRICULTURE, *ECONOMICS), (*MEAT, PRODUCTION), MULTIVARIATE ANALYSIS, REGRESSION ANALYSIS , ANIMALS, WEIGHT, COSTS, PREDICTIONS, STABILITY, MATHEMATICAL MODELS, STORAGE, BEEF, PORK, FOOD, STATISTICAL DATA, ACCURACY
Görgen, Kai; Hebart, Martin N; Allefeld, Carsten; Haynes, John-Dylan
2017-12-27
Standard neuroimaging data analysis based on traditional principles of experimental design, modelling, and statistical inference is increasingly complemented by novel analysis methods, driven e.g. by machine learning methods. While these novel approaches provide new insights into neuroimaging data, they often have unexpected properties, generating a growing literature on possible pitfalls. We propose to meet this challenge by adopting a habit of systematic testing of experimental design, analysis procedures, and statistical inference. Specifically, we suggest to apply the analysis method used for experimental data also to aspects of the experimental design, simulated confounds, simulated null data, and control data. We stress the importance of keeping the analysis method the same in main and test analyses, because only this way possible confounds and unexpected properties can be reliably detected and avoided. We describe and discuss this Same Analysis Approach in detail, and demonstrate it in two worked examples using multivariate decoding. With these examples, we reveal two sources of error: A mismatch between counterbalancing (crossover designs) and cross-validation which leads to systematic below-chance accuracies, and linear decoding of a nonlinear effect, a difference in variance. Copyright © 2017 Elsevier Inc. All rights reserved.
Global spectral graph wavelet signature for surface analysis of carpal bones
NASA Astrophysics Data System (ADS)
Masoumi, Majid; Rezaei, Mahsa; Ben Hamza, A.
2018-02-01
Quantitative shape comparison is a fundamental problem in computer vision, geometry processing and medical imaging. In this paper, we present a spectral graph wavelet approach for shape analysis of carpal bones of the human wrist. We employ spectral graph wavelets to represent the cortical surface of a carpal bone via the spectral geometric analysis of the Laplace-Beltrami operator in the discrete domain. We propose global spectral graph wavelet (GSGW) descriptor that is isometric invariant, efficient to compute, and combines the advantages of both low-pass and band-pass filters. We perform experiments on shapes of the carpal bones of ten women and ten men from a publicly-available database of wrist bones. Using one-way multivariate analysis of variance (MANOVA) and permutation testing, we show through extensive experiments that the proposed GSGW framework gives a much better performance compared to the global point signature embedding approach for comparing shapes of the carpal bones across populations.
Global spectral graph wavelet signature for surface analysis of carpal bones.
Masoumi, Majid; Rezaei, Mahsa; Ben Hamza, A
2018-02-05
Quantitative shape comparison is a fundamental problem in computer vision, geometry processing and medical imaging. In this paper, we present a spectral graph wavelet approach for shape analysis of carpal bones of the human wrist. We employ spectral graph wavelets to represent the cortical surface of a carpal bone via the spectral geometric analysis of the Laplace-Beltrami operator in the discrete domain. We propose global spectral graph wavelet (GSGW) descriptor that is isometric invariant, efficient to compute, and combines the advantages of both low-pass and band-pass filters. We perform experiments on shapes of the carpal bones of ten women and ten men from a publicly-available database of wrist bones. Using one-way multivariate analysis of variance (MANOVA) and permutation testing, we show through extensive experiments that the proposed GSGW framework gives a much better performance compared to the global point signature embedding approach for comparing shapes of the carpal bones across populations.
An issue of literacy on pediatric arterial hypertension
NASA Astrophysics Data System (ADS)
Teodoro, M. Filomena; Romana, Andreia; Simão, Carla
2017-11-01
Arterial hypertension in pediatric age is a public health problem, whose prevalence has increased significantly over time. Pediatric arterial hypertension (PAH) is under-diagnosed in most cases, a highly prevalent disease, appears without notice with multiple consequences on the children's health and future adults. Children caregivers and close family must know the PAH existence, the negative consequences associated with it, the risk factors and, finally, must do prevention. In [12, 13] can be found a statistical data analysis using a simpler questionnaire introduced in [4] under the aim of a preliminary study about PAH caregivers acquaintance. A continuation of such analysis is detailed in [14]. An extension of such questionnaire was built and applied to a distinct population and it was filled online. The statistical approach is partially reproduced in the present work. Some statistical models were estimated using several approaches, namely multivariate analysis (factorial analysis), also adequate methods to analyze the kind of data in study.
Lee, Kyu Ha; Tadesse, Mahlet G; Baccarelli, Andrea A; Schwartz, Joel; Coull, Brent A
2017-03-01
The analysis of multiple outcomes is becoming increasingly common in modern biomedical studies. It is well-known that joint statistical models for multiple outcomes are more flexible and more powerful than fitting a separate model for each outcome; they yield more powerful tests of exposure or treatment effects by taking into account the dependence among outcomes and pooling evidence across outcomes. It is, however, unlikely that all outcomes are related to the same subset of covariates. Therefore, there is interest in identifying exposures or treatments associated with particular outcomes, which we term outcome-specific variable selection. In this work, we propose a variable selection approach for multivariate normal responses that incorporates not only information on the mean model, but also information on the variance-covariance structure of the outcomes. The approach effectively leverages evidence from all correlated outcomes to estimate the effect of a particular covariate on a given outcome. To implement this strategy, we develop a Bayesian method that builds a multivariate prior for the variable selection indicators based on the variance-covariance of the outcomes. We show via simulation that the proposed variable selection strategy can boost power to detect subtle effects without increasing the probability of false discoveries. We apply the approach to the Normative Aging Study (NAS) epigenetic data and identify a subset of five genes in the asthma pathway for which gene-specific DNA methylations are associated with exposures to either black carbon, a marker of traffic pollution, or sulfate, a marker of particles generated by power plants. © 2016, The International Biometric Society.
Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo
2017-03-15
Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
A Look Inside HIV Resistance through Retroviral Protease Interaction Maps
Kontijevskis, Aleksejs; Prusis, Peteris; Petrovska, Ramona; Yahorava, Sviatlana; Mutulis, Felikss; Mutule, Ilze; Komorowski, Jan; Wikberg, Jarl E. S
2007-01-01
Retroviruses affect a large number of species, from fish and birds to mammals and humans, with global socioeconomic negative impacts. Here the authors report and experimentally validate a novel approach for the analysis of the molecular networks that are involved in the recognition of substrates by retroviral proteases. Using multivariate analysis of the sequence-based physiochemical descriptions of 61 retroviral proteases comprising wild-type proteases, natural mutants, and drug-resistant forms of proteases from nine different viral species in relation to their ability to cleave 299 substrates, the authors mapped the physicochemical properties and cross-dependencies of the amino acids of the proteases and their substrates, which revealed a complex molecular interaction network of substrate recognition and cleavage. The approach allowed a detailed analysis of the molecular–chemical mechanisms involved in substrate cleavage by retroviral proteases. PMID:17352531
NASA Astrophysics Data System (ADS)
Hu, Chongqing; Li, Aihua; Zhao, Xingyang
2011-02-01
This paper proposes a multivariate statistical analysis approach to processing the instantaneous engine speed signal for the purpose of locating multiple misfire events in internal combustion engines. The state of each cylinder is described with a characteristic vector extracted from the instantaneous engine speed signal following a three-step procedure. These characteristic vectors are considered as the values of various procedure parameters of an engine cycle. Therefore, determination of occurrence of misfire events and identification of misfiring cylinders can be accomplished by a principal component analysis (PCA) based pattern recognition methodology. The proposed algorithm can be implemented easily in practice because the threshold can be defined adaptively without the information of operating conditions. Besides, the effect of torsional vibration on the engine speed waveform is interpreted as the presence of super powerful cylinder, which is also isolated by the algorithm. The misfiring cylinder and the super powerful cylinder are often adjacent in the firing sequence, thus missing detections and false alarms can be avoided effectively by checking the relationship between the cylinders.
Maier, C; Dickhaus, H
2010-01-01
This study examines the suitability of recurrence plot analysis for the problem of central sleep apnea (CSA) detection and delineation from ECG-derived respiratory (EDR) signals. A parameter describing the average length of vertical line structures in recurrence plots is calculated at a time resolution of 1 s as 'instantaneous trapping time'. Threshold comparison of this parameter is used to detect ongoing CSA. In data from 26 patients (duration 208 h) we assessed sensitivity for detection of CSA and mixed apnea (MSA) events by comparing the results obtained from 8-channel Holter ECGs to the annotations (860 CSA, 480 MSA) of simultaneously registered polysomnograms. Multivariate combination of the EDR from different ECG leads improved the detection accuracy significantly. When all eight leads were considered, an average instantaneous vertical line length above 5 correctly identified 1126 of the 1340 events (sensitivity 84%) with a total number of 1881 positive detections. We conclude that recurrence plot analysis is a promising tool for detection and delineation of CSA epochs from EDR signals with high time resolution. Moreover, the approach is likewise applicable to directly measured respiratory signals.
Multivariate Analysis for Quantification of Plutonium(IV) in Nitric Acid Based on Absorption Spectra
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lines, Amanda M.; Adami, Susan R.; Sinkov, Sergey I.
Development of more effective, reliable, and fast methods for monitoring process streams is a growing opportunity for analytical applications. Many fields can benefit from on-line monitoring, including the nuclear fuel cycle where improved methods for monitoring radioactive materials will facilitate maintenance of proper safeguards and ensure safe and efficient processing of materials. On-line process monitoring with a focus on optical spectroscopy can provide a fast, non-destructive method for monitoring chemical species. However, identification and quantification of species can be hindered by the complexity of the solutions if bands overlap or show condition-dependent spectral features. Plutonium (IV) is one example ofmore » a species which displays significant spectral variation with changing nitric acid concentration. Single variate analysis (i.e. Beer’s Law) is difficult to apply to the quantification of Pu(IV) unless the nitric acid concentration is known and separate calibration curves have been made for all possible acid strengths. Multivariate, or chemometric, analysis is an approach that allows for the accurate quantification of Pu(IV) without a priori knowledge of nitric acid concentration.« less
Tan, Guangguo; Lou, Ziyang; Jing, Jing; Li, Wuhong; Zhu, Zhenyu; Zhao, Liang; Zhang, Guoqing; Chai, Yifeng
2011-12-01
Aconite roots are popularly used in herbal medicines in China. Many cases of accidental and intentional intoxication with this plant have been reported; some of these are fatal because the toxicity of aconitum is very high. It is thus important to detect and identify aconitum alkaloids in biofluids. In this work, an improved method employing LC-TOFMS with multivariate data analysis was developed for screening and analysis of major aconitum alkaloids and their metabolites in rat urine following oral administration of aconite roots extract. Thirty-four signals highlighted by multivariate statistical analyses including 24 parent components and 10 metabolites were screened out and further identified by adjustment of the fragmentor voltage to produce structure-relevant fragment ions. It is helpful for studying aconite roots in toxicology, pharmacology and forensic medicine. This work also confirmed that the metabolomic approach provides effective tools for screening multiple absorbed and metabolic components of Chinese herbal medicines in vivo. Copyright © 2011 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Kanzaki, Yasushi
Many kinds of water products have been offered commercially suggesting some strange efficacy beyond our scientific knowledge even now at which various advanced scientific and technological research have been highly promoted. However, it seems quite obvious that such a strange efficacy must be nonexistent. If such efficacy were really existing, it must be solved by some suitable scientific procedure. In this study, the extraction of paeoniflorin from paeoniae radix was examined by varying the kind of extracting water. Then, the result was analyzed using multivariate analysis where the effect on the extraction was assumed to be ascribed to the ionic species dissolved in each water examined. The dissolved species were analyzed by chemical and instrumental analyses. According to the multivariate analysis, the amount of extracted paeoniflorin (Y) was presented by the following regression equation. The result shows that pH, [Ca2+], and [HCO3 -] were significant parameters and the combination of Ca2+ and HCO3 - affected negatively on the extraction of paeoniflorin.
Y=28.11-0.71 pH-0.0034[Ca2+]-0.93[HCO3 -]
where [Ca2+] is the concentration of calcium ion and [HCO3 -] is that of bicarbonate ion.
Rapid and Simultaneous Prediction of Eight Diesel Quality Parameters through ATR-FTIR Analysis.
Nespeca, Maurilio Gustavo; Hatanaka, Rafael Rodrigues; Flumignan, Danilo Luiz; de Oliveira, José Eduardo
2018-01-01
Quality assessment of diesel fuel is highly necessary for society, but the costs and time spent are very high while using standard methods. Therefore, this study aimed to develop an analytical method capable of simultaneously determining eight diesel quality parameters (density; flash point; total sulfur content; distillation temperatures at 10% (T10), 50% (T50), and 85% (T85) recovery; cetane index; and biodiesel content) through attenuated total reflection Fourier transform infrared (ATR-FTIR) spectroscopy and the multivariate regression method, partial least square (PLS). For this purpose, the quality parameters of 409 samples were determined using standard methods, and their spectra were acquired in ranges of 4000-650 cm -1 . The use of the multivariate filters, generalized least squares weighting (GLSW) and orthogonal signal correction (OSC), was evaluated to improve the signal-to-noise ratio of the models. Likewise, four variable selection approaches were tested: manual exclusion, forward interval PLS (FiPLS), backward interval PLS (BiPLS), and genetic algorithm (GA). The multivariate filters and variables selection algorithms generated more fitted and accurate PLS models. According to the validation, the FTIR/PLS models presented accuracy comparable to the reference methods and, therefore, the proposed method can be applied in the diesel routine monitoring to significantly reduce costs and analysis time.
Rapid and Simultaneous Prediction of Eight Diesel Quality Parameters through ATR-FTIR Analysis
Hatanaka, Rafael Rodrigues; Flumignan, Danilo Luiz; de Oliveira, José Eduardo
2018-01-01
Quality assessment of diesel fuel is highly necessary for society, but the costs and time spent are very high while using standard methods. Therefore, this study aimed to develop an analytical method capable of simultaneously determining eight diesel quality parameters (density; flash point; total sulfur content; distillation temperatures at 10% (T10), 50% (T50), and 85% (T85) recovery; cetane index; and biodiesel content) through attenuated total reflection Fourier transform infrared (ATR-FTIR) spectroscopy and the multivariate regression method, partial least square (PLS). For this purpose, the quality parameters of 409 samples were determined using standard methods, and their spectra were acquired in ranges of 4000–650 cm−1. The use of the multivariate filters, generalized least squares weighting (GLSW) and orthogonal signal correction (OSC), was evaluated to improve the signal-to-noise ratio of the models. Likewise, four variable selection approaches were tested: manual exclusion, forward interval PLS (FiPLS), backward interval PLS (BiPLS), and genetic algorithm (GA). The multivariate filters and variables selection algorithms generated more fitted and accurate PLS models. According to the validation, the FTIR/PLS models presented accuracy comparable to the reference methods and, therefore, the proposed method can be applied in the diesel routine monitoring to significantly reduce costs and analysis time. PMID:29629209
Tang, Yongqiang
2018-04-30
The controlled imputation method refers to a class of pattern mixture models that have been commonly used as sensitivity analyses of longitudinal clinical trials with nonignorable dropout in recent years. These pattern mixture models assume that participants in the experimental arm after dropout have similar response profiles to the control participants or have worse outcomes than otherwise similar participants who remain on the experimental treatment. In spite of its popularity, the controlled imputation has not been formally developed for longitudinal binary and ordinal outcomes partially due to the lack of a natural multivariate distribution for such endpoints. In this paper, we propose 2 approaches for implementing the controlled imputation for binary and ordinal data based respectively on the sequential logistic regression and the multivariate probit model. Efficient Markov chain Monte Carlo algorithms are developed for missing data imputation by using the monotone data augmentation technique for the sequential logistic regression and a parameter-expanded monotone data augmentation scheme for the multivariate probit model. We assess the performance of the proposed procedures by simulation and the analysis of a schizophrenia clinical trial and compare them with the fully conditional specification, last observation carried forward, and baseline observation carried forward imputation methods. Copyright © 2018 John Wiley & Sons, Ltd.
2017-09-01
efficacy of statistical post-processing methods downstream of these dynamical model components with a hierarchical multivariate Bayesian approach to...Bayesian hierarchical modeling, Markov chain Monte Carlo methods , Metropolis algorithm, machine learning, atmospheric prediction 15. NUMBER OF PAGES...scale processes. However, this dissertation explores the efficacy of statistical post-processing methods downstream of these dynamical model components
ERIC Educational Resources Information Center
Magis, David; De Boeck, Paul
2011-01-01
We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is…
A note on a simplified and general approach to simulating from multivariate copula functions
Barry K. Goodwin
2013-01-01
Copulas have become an important analytic tool for characterizing multivariate distributions and dependence. One is often interested in simulating data from copula estimates. The process can be analytically and computationally complex and usually involves steps that are unique to a given parametric copula. We describe an alternative approach that uses âProbability-...
Caeiro, Sandra; Goovaerts, Pierre; Painho, Marco; Costa, M Helena
2003-09-15
The Sado Estuary is a coastal zone located in the south of Portugal where conflicts between conservation and development exist because of its location near industrialized urban zones and its designation as a natural reserve. The aim of this paper is to evaluate a set of multivariate geostatistical approaches to delineate spatially contiguous regions of sediment structure for Sado Estuary. These areas will be the supporting infrastructure of an environmental management system for this estuary. The boundaries of each homogeneous area were derived from three sediment characterization attributes through three different approaches: (1) cluster analysis of dissimilarity matrix function of geographical separation followed by indicator kriging of the cluster data, (2) discriminant analysis of kriged values of the three sediment attributes, and (3) a combination of methods 1 and 2. Final maximum likelihood classification was integrated into a geographical information system. All methods generated fairly spatially contiguous management areas that reproduce well the environment of the estuary. Map comparison techniques based on kappa statistics showed thatthe resultant three maps are similar, supporting the choice of any of the methods as appropriate for management of the Sado Estuary. However, the results of method 1 seem to be in better agreement with estuary behavior, assessment of contamination sources, and previous work conducted at this site.
NASA Astrophysics Data System (ADS)
Yao, Yuchen; Bao, Jie; Skyllas-Kazacos, Maria; Welch, Barry J.; Akhmetov, Sergey
2018-04-01
Individual anode current signals in aluminum reduction cells provide localized cell conditions in the vicinity of each anode, which contain more information than the conventionally measured cell voltage and line current. One common use of this measurement is to identify process faults that can cause significant changes in the anode current signals. While this method is simple and direct, it ignores the interactions between anode currents and other important process variables. This paper presents an approach that applies multivariate statistical analysis techniques to individual anode currents and other process operating data, for the detection and diagnosis of local process abnormalities in aluminum reduction cells. Specifically, since the Hall-Héroult process is time-varying with its process variables dynamically and nonlinearly correlated, dynamic kernel principal component analysis with moving windows is used. The cell is discretized into a number of subsystems, with each subsystem representing one anode and cell conditions in its vicinity. The fault associated with each subsystem is identified based on multivariate statistical control charts. The results show that the proposed approach is able to not only effectively pinpoint the problematic areas in the cell, but also assess the effect of the fault on different parts of the cell.
NASA Astrophysics Data System (ADS)
Åberg Lindell, M.; Andersson, P.; Grape, S.; Håkansson, A.; Thulin, M.
2018-07-01
In addition to verifying operator declared parameters of spent nuclear fuel, the ability to experimentally infer such parameters with a minimum of intrusiveness is of great interest and has been long-sought after in the nuclear safeguards community. It can also be anticipated that such ability would be of interest for quality assurance in e.g. recycling facilities in future Generation IV nuclear fuel cycles. One way to obtain information regarding spent nuclear fuel is to measure various gamma-ray intensities using high-resolution gamma-ray spectroscopy. While intensities from a few isotopes obtained from such measurements have traditionally been used pairwise, the approach in this work is to simultaneously analyze correlations between all available isotopes, using multivariate analysis techniques. Based on this approach, a methodology for inferring burnup, cooling time, and initial fissile content of PWR fuels using passive gamma-ray spectroscopy data has been investigated. PWR nuclear fuels, of UOX and MOX type, and their gamma-ray emissions, were simulated using the Monte Carlo code Serpent. Data comprising relative isotope activities was analyzed with decision trees and support vector machines, for predicting fuel parameters and their associated uncertainties. From this work it may be concluded that up to a cooling time of twenty years, the 95% prediction intervals of burnup, cooling time and initial fissile content could be inferred to within approximately 7 MWd/kgHM, 8 months, and 1.4 percentage points, respectively. An attempt aiming to estimate the plutonium content in spent UOX fuel, using the developed multivariate analysis model, is also presented. The results for Pu mass estimation are promising and call for further studies.
Heunis, Tosca-Marie; Aldrich, Chris; de Vries, Petrus J
2016-08-01
Electroencephalography (EEG) has been used for almost a century to identify seizure-related disorders in humans, typically through expert interpretation of multichannel recordings. Attempts have been made to quantify EEG through frequency analyses and graphic representations. These "traditional" quantitative EEG analysis methods were limited in their ability to analyze complex and multivariate data and have not been generally accepted in clinical settings. There has been growing interest in identification of novel EEG biomarkers to detect early risk of autism spectrum disorder, to identify clinically meaningful subgroups, and to monitor targeted intervention strategies. Most studies to date have, however, used quantitative EEG approaches, and little is known about the emerging multivariate analytical methods or the robustness of candidate biomarkers in the context of the variability of autism spectrum disorder. Here, we present a targeted review of methodological and clinical challenges in the search for novel resting-state EEG biomarkers for autism spectrum disorder. Three primary novel methodologies are discussed: (1) modified multiscale entropy, (2) coherence analysis, and (3) recurrence quantification analysis. Results suggest that these methods may be able to classify resting-state EEG as "autism spectrum disorder" or "typically developing", but many signal processing questions remain unanswered. We suggest that the move to novel EEG analysis methods is akin to the progress in neuroimaging from visual inspection, through region-of-interest analysis, to whole-brain computational analysis. Novel resting-state EEG biomarkers will have to evaluate a range of potential demographic, clinical, and technical confounders including age, gender, intellectual ability, comorbidity, and medication, before these approaches can be translated into the clinical setting. Copyright © 2016 Elsevier Inc. All rights reserved.
Hamchevici, Carmen; Udrea, Ion
2013-11-01
The concept of basin-wide Joint Danube Survey (JDS) was launched by the International Commission for the Protection of the Danube River (ICPDR) as a tool for investigative monitoring under the Water Framework Directive (WFD), with a frequency of 6 years. The first JDS was carried out in 2001 and its success in providing key information for characterisation of the Danube River Basin District as required by WFD lead to the organisation of the second JDS in 2007, which was the world's biggest river research expedition in that year. The present paper presents an approach for improving the survey strategy for the next planned survey JDS3 (2013) by means of several multivariate statistical techniques. In order to design the optimum structure in terms of parameters and sampling sites, principal component analysis (PCA), factor analysis (FA) and cluster analysis were applied on JDS2 data for 13 selected physico-chemical and one biological element measured in 78 sampling sites located on the main course of the Danube. Results from PCA/FA showed that most of the dataset variance (above 75%) was explained by five varifactors loaded with 8 out of 14 variables: physical (transparency and total suspended solids), relevant nutrients (N-nitrates and P-orthophosphates), feedback effects of primary production (pH, alkalinity and dissolved oxygen) and algal biomass. Taking into account the representation of the factor scores given by FA versus sampling sites and the major groups generated by the clustering procedure, the spatial network of the next survey could be carefully tailored, leading to a decreasing of sampling sites by more than 30%. The approach of target oriented sampling strategy based on the selected multivariate statistics can provide a strong reduction in dimensionality of the original data and corresponding costs as well, without any loss of information.
Multivariate Cluster Analysis.
ERIC Educational Resources Information Center
McRae, Douglas J.
Procedures for grouping students into homogeneous subsets have long interested educational researchers. The research reported in this paper is an investigation of a set of objective grouping procedures based on multivariate analysis considerations. Four multivariate functions that might serve as criteria for adequate grouping are given and…
Pedersen, Mangor; Curwood, Evan K; Archer, John S; Abbott, David F; Jackson, Graeme D
2015-11-01
Lennox-Gastaut syndrome, and the similar but less tightly defined Lennox-Gastaut phenotype, describe patients with severe epilepsy, generalized epileptic discharges, and variable intellectual disability. Our previous functional neuroimaging studies suggest that abnormal diffuse association network activity underlies the epileptic discharges of this clinical phenotype. Herein we use a data-driven multivariate approach to determine the spatial changes in local and global networks of patients with severe epilepsy of the Lennox-Gastaut phenotype. We studied 9 adult patients and 14 controls. In 20 min of task-free blood oxygen level-dependent functional magnetic resonance imaging data, two metrics of functional connectivity were studied: Regional homogeneity or local connectivity, a measure of concordance between each voxel to a focal cluster of adjacent voxels; and eigenvector centrality, a global connectivity estimate designed to detect important neural hubs. Multivariate pattern analysis of these data in a machine-learning framework was used to identify spatial features that classified disease subjects. Multivariate pattern analysis was 95.7% accurate in classifying subjects for both local and global connectivity measures (22/23 subjects correctly classified). Maximal discriminating features were the following: increased local connectivity in frontoinsular and intraparietal areas; increased global connectivity in posterior association areas; decreased local connectivity in sensory (visual and auditory) and medial frontal cortices; and decreased global connectivity in the cingulate cortex, striatum, hippocampus, and pons. Using a data-driven analysis method in task-free functional magnetic resonance imaging, we show increased connectivity in critical areas of association cortex and decreased connectivity in primary cortex. This supports previous findings of a critical role for these association cortical regions as a final common pathway in generating the Lennox-Gastaut phenotype. Abnormal function of these areas is likely to be important in explaining the intellectual problems characteristic of this disorder. Wiley Periodicals, Inc. © 2015 International League Against Epilepsy.
Lo, Kenneth
2011-01-01
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components. PMID:22125375
Lo, Kenneth; Gottardo, Raphael
2012-01-01
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.
Evaluation of a Multivariate Syndromic Surveillance System for West Nile Virus.
Faverjon, Céline; Andersson, M Gunnar; Decors, Anouk; Tapprest, Jackie; Tritz, Pierre; Sandoz, Alain; Kutasi, Orsolya; Sala, Carole; Leblond, Agnès
2016-06-01
Various methods are currently used for the early detection of West Nile virus (WNV) but their outputs are not quantitative and/or do not take into account all available information. Our study aimed to test a multivariate syndromic surveillance system to evaluate if the sensitivity and the specificity of detection of WNV could be improved. Weekly time series data on nervous syndromes in horses and mortality in both horses and wild birds were used. Baselines were fitted to the three time series and used to simulate 100 years of surveillance data. WNV outbreaks were simulated and inserted into the baselines based on historical data and expert opinion. Univariate and multivariate syndromic surveillance systems were tested to gauge how well they detected the outbreaks; detection was based on an empirical Bayesian approach. The systems' performances were compared using measures of sensitivity, specificity, and area under receiver operating characteristic curve (AUC). When data sources were considered separately (i.e., univariate systems), the best detection performance was obtained using the data set of nervous symptoms in horses compared to those of bird and horse mortality (AUCs equal to 0.80, 0.75, and 0.50, respectively). A multivariate outbreak detection system that used nervous symptoms in horses and bird mortality generated the best performance (AUC = 0.87). The proposed approach is suitable for performing multivariate syndromic surveillance of WNV outbreaks. This is particularly relevant, given that a multivariate surveillance system performed better than a univariate approach. Such a surveillance system could be especially useful in serving as an alert for the possibility of human viral infections. This approach can be also used for other diseases for which multiple sources of evidence are available.
Factors Influencing Cecal Intubation Time during Retrograde Approach Single-Balloon Enteroscopy
Chen, Peng-Jen; Shih, Yu-Lueng; Huang, Hsin-Hung; Hsieh, Tsai-Yuan
2014-01-01
Background and Aim. The predisposing factors for prolonged cecal intubation time (CIT) during colonoscopy have been well identified. However, the factors influencing CIT during retrograde SBE have not been addressed. The aim of this study was to determine the factors influencing CIT during retrograde SBE. Methods. We investigated patients who underwent retrograde SBE at a medical center from January 2011 to March 2014. The medical charts and SBE reports were reviewed. The patients' characteristics and procedure-associated data were recorded. These data were analyzed with univariate analysis as well as multivariate logistic regression analysis to identify the possible predisposing factors. Results. We enrolled 66 patients into this study. The median CIT was 17.4 minutes. With univariate analysis, there was no statistical difference in age, sex, BMI, or history of abdominal surgery, except for bowel preparation (P = 0.021). Multivariate logistic regression analysis showed that inadequate bowel preparation (odds ratio 30.2, 95% confidence interval 4.63–196.54; P < 0.001) was the independent predisposing factors for prolonged CIT during retrograde SBE. Conclusions. For experienced endoscopist, inadequate bowel preparation was the independent predisposing factor for prolonged CIT during retrograde SBE. PMID:25505904
Çelik, Ecem Evrim; Rubio, Jose Manuel Amigo; Andersen, Mogens L; Gökmen, Vural
2017-12-15
The interactions between free and macromolecule-bound antioxidants were investigated in order to evaluate their combined effects on the antioxidant environment. Dietary fiber (DF), protein and lipid-bound antioxidants, obtained from whole wheat, soybean and olive oil products, respectively and Trolox were used for this purpose. Experimental studies were carried out in autoxidizing liposome medium by monitoring the development of fluorescent products formed by lipid oxidation. Chemometric methods were used both at experimental design and multivariate data analysis stages. Comparison of the simple addition effects of Trolox and bound antioxidants with measured values on lipid oxidation revealed synergetic interactions for DF and refined olive oil-bound antioxidants, and antagonistic interactions for protein and extra virgin olive oil-bound antioxidants with Trolox. A generalized version of logistic function was successfully used for modelling the oxidation curve of liposomes. Principal component analysis revealed two separate phases of liposome autoxidation. Copyright © 2017 Elsevier Ltd. All rights reserved.
Valdés, Arantzazu; Vidal, Lorena; Beltrán, Ana; Canals, Antonio; Garrigós, María Carmen
2015-06-10
A microwave-assisted extraction (MAE) procedure to isolate phenolic compounds from almond skin byproducts was optimized. A three-level, three-factor Box-Behnken design was used to evaluate the effect of almond skin weight, microwave power, and irradiation time on total phenolic content (TPC) and antioxidant activity (DPPH). Almond skin weight was the most important parameter in the studied responses. The best extraction was achieved using 4 g, 60 s, 100 W, and 60 mL of 70% (v/v) ethanol. TPC, antioxidant activity (DPPH, FRAP), and chemical composition (HPLC-DAD-ESI-MS/MS) were determined by using the optimized method from seven different almond cultivars. Successful discrimination was obtained for all cultivars by using multivariate linear discriminant analysis (LDA), suggesting the influence of cultivar type on polyphenol content and antioxidant activity. The results show the potential of almond skin as a natural source of phenolics and the effectiveness of MAE for the reutilization of these byproducts.
Cai, Li-mei; Ma, Jin; Zhou, Yong-zhang; Huang, Lan-chun; Dou, Lei; Zhang, Cheng-bo; Fu, Shan-ming
2008-12-01
One hundred and eighteen surface soil samples were collected from the Dongguan City, and analyzed for concentration of Cu, Zn, Ni, Cr, Pb, Cd, As, Hg, pH and OM. The spatial distribution and sources of soil heavy metals were studied using multivariate geostatistical methods and GIS technique. The results indicated concentrations of Cu, Zn, Ni, Pb, Cd and Hg were beyond the soil background content in Guangdong province, and especially concentrations of Pb, Cd and Hg were greatly beyond the content. The results of factor analysis group Cu, Zn, Ni, Cr and As in Factor 1, Pb and Hg in Factor 2 and Cd in Factor 3. The spatial maps based on geostatistical analysis show definite association of Factor 1 with the soil parent material, Factor 2 was mainly affected by industries. The spatial distribution of Factor 3 was attributed to anthropogenic influence.
NASA Astrophysics Data System (ADS)
Chen, Po-Hsiung; Shimada, Rintaro; Yabumoto, Sohshi; Okajima, Hajime; Ando, Masahiro; Chang, Chiou-Tzu; Lee, Li-Tzu; Wong, Yong-Kie; Chiou, Arthur; Hamaguchi, Hiro-O.
2016-01-01
We have developed an automatic and objective method for detecting human oral squamous cell carcinoma (OSCC) tissues with Raman microspectroscopy. We measure 196 independent Raman spectra from 196 different points of one oral tissue sample and globally analyze these spectra using a Multivariate Curve Resolution (MCR) analysis. Discrimination of OSCC tissues is automatically and objectively made by spectral matching comparison of the MCR decomposed Raman spectra and the standard Raman spectrum of keratin, a well-established molecular marker of OSCC. We use a total of 24 tissue samples, 10 OSCC and 10 normal tissues from the same 10 patients, 3 OSCC and 1 normal tissues from different patients. Following the newly developed protocol presented here, we have been able to detect OSCC tissues with 77 to 92% sensitivity (depending on how to define positivity) and 100% specificity. The present approach lends itself to a reliable clinical diagnosis of OSCC substantiated by the “molecular fingerprint” of keratin.
NASA Astrophysics Data System (ADS)
Katura, Takusige; Yagyu, Akihiko; Obata, Akiko; Yamazaki, Kyoko; Maki, Atsushi; Abe, Masanori; Tanaka, Naoki
2007-07-01
Strong spontaneous fluctuations around 0.1 and 0.3 Hz have been observed in blood-related brain-function measurements such as functional magnetic resonance imaging and optical topography (or functional near-infrared spectroscopy). These fluctuations seem to reflect the interaction between the cerebral circulation system and the systemic circulation system. We took an energetic viewpoint in our analysis of the interrelationships between fluctuations in cerebral blood volume (CBV), mean arterial blood pressure (MAP), heart rate (HR), and respiratory rhythm based on multivariate autoregressive modeling. This approach involves evaluating the contribution of each fluctuation or rhythm to specific ones by performing multivariate spectral analysis. The results we obtained show MAP and HR can account slightly for the fluctuation around 0.1 Hz in CBV, while the fluctuation around 0.3 Hz is derived mainly from the respiratory rhythm. During our presentation, we will report on the effects of posture on the interrelationship between the fluctuations and the respiratory rhythm.
Mallette, Jennifer R.; Casale, John F.; Jordan, James; Morello, David R.; Beyer, Paul M.
2016-01-01
Previously, geo-sourcing to five major coca growing regions within South America was accomplished. However, the expansion of coca cultivation throughout South America made sub-regional origin determinations increasingly difficult. The former methodology was recently enhanced with additional stable isotope analyses (2H and 18O) to fully characterize cocaine due to the varying environmental conditions in which the coca was grown. An improved data analysis method was implemented with the combination of machine learning and multivariate statistical analysis methods to provide further partitioning between growing regions. Here, we show how the combination of trace cocaine alkaloids, stable isotopes, and multivariate statistical analyses can be used to classify illicit cocaine as originating from one of 19 growing regions within South America. The data obtained through this approach can be used to describe current coca cultivation and production trends, highlight trafficking routes, as well as identify new coca growing regions. PMID:27006288
Ritota, Mena; Casciani, Lorena; Valentini, Massimiliano
2013-05-01
Analytical traceability of PGI and PDO foods (Protected Geographical Indication and Protected Denomination Origin respectively) is one of the most challenging tasks of current applied research. Here we proposed a metabolomic approach based on the combination of (1)H high-resolution magic angle spinning-nuclear magnetic resonance (HRMAS-NMR) spectroscopy with multivariate analysis, i.e. PLS-DA, as a reliable tool for the traceability of Italian PGI chicories (Cichorium intybus L.), i.e. Radicchio Rosso di Treviso and Radicchio Variegato di Castelfranco, also known as red and red-spotted, respectively. The metabolic profile was gained by means of HRMAS-NMR, and multivariate data analysis allowed us to build statistical models capable of providing clear discrimination among the two varieties and classification according to the geographical origin. Based on Variable Importance in Projection values, the molecular markers for classifying the different types of red chicories analysed were found accounting for both the cultivar and the place of origin. © 2012 Society of Chemical Industry.
Genome-Wide Association Analysis of Adaptation Using Environmentally Predicted Traits.
van Heerwaarden, Joost; van Zanten, Martijn; Kruijer, Willem
2015-10-01
Current methods for studying the genetic basis of adaptation evaluate genetic associations with ecologically relevant traits or single environmental variables, under the implicit assumption that natural selection imposes correlations between phenotypes, environments and genotypes. In practice, observed trait and environmental data are manifestations of unknown selective forces and are only indirectly associated with adaptive genetic variation. In theory, improved estimation of these forces could enable more powerful detection of loci under selection. Here we present an approach in which we approximate adaptive variation by modeling phenotypes as a function of the environment and using the predicted trait in multivariate and univariate genome-wide association analysis (GWAS). Based on computer simulations and published flowering time data from the model plant Arabidopsis thaliana, we find that environmentally predicted traits lead to higher recovery of functional loci in multivariate GWAS and are more strongly correlated to allele frequencies at adaptive loci than individual environmental variables. Our results provide an example of the use of environmental data to obtain independent and meaningful information on adaptive genetic variation.
Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F
2017-04-01
Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.
Ecological prediction with nonlinear multivariate time-frequency functional data models
Yang, Wen-Hsi; Wikle, Christopher K.; Holan, Scott H.; Wildhaber, Mark L.
2013-01-01
Time-frequency analysis has become a fundamental component of many scientific inquiries. Due to improvements in technology, the amount of high-frequency signals that are collected for ecological and other scientific processes is increasing at a dramatic rate. In order to facilitate the use of these data in ecological prediction, we introduce a class of nonlinear multivariate time-frequency functional models that can identify important features of each signal as well as the interaction of signals corresponding to the response variable of interest. Our methodology is of independent interest and utilizes stochastic search variable selection to improve model selection and performs model averaging to enhance prediction. We illustrate the effectiveness of our approach through simulation and by application to predicting spawning success of shovelnose sturgeon in the Lower Missouri River.
Applying Multivariate Discrete Distributions to Genetically Informative Count Data.
Kirkpatrick, Robert M; Neale, Michael C
2016-03-01
We present a novel method of conducting biometric analysis of twin data when the phenotypes are integer-valued counts, which often show an L-shaped distribution. Monte Carlo simulation is used to compare five likelihood-based approaches to modeling: our multivariate discrete method, when its distributional assumptions are correct, when they are incorrect, and three other methods in common use. With data simulated from a skewed discrete distribution, recovery of twin correlations and proportions of additive genetic and common environment variance was generally poor for the Normal, Lognormal and Ordinal models, but good for the two discrete models. Sex-separate applications to substance-use data from twins in the Minnesota Twin Family Study showed superior performance of two discrete models. The new methods are implemented using R and OpenMx and are freely available.
Bonetti, Jennifer; Quarino, Lawrence
2014-05-01
This study has shown that the combination of simple techniques with the use of multivariate statistics offers the potential for the comparative analysis of soil samples. Five samples were obtained from each of twelve state parks across New Jersey in both the summer and fall seasons. Each sample was examined using particle-size distribution, pH analysis in both water and 1 M CaCl2 , and a loss on ignition technique. Data from each of the techniques were combined, and principal component analysis (PCA) and canonical discriminant analysis (CDA) were used for multivariate data transformation. Samples from different locations could be visually differentiated from one another using these multivariate plots. Hold-one-out cross-validation analysis showed error rates as low as 3.33%. Ten blind study samples were analyzed resulting in no misclassifications using Mahalanobis distance calculations and visual examinations of multivariate plots. Seasonal variation was minimal between corresponding samples, suggesting potential success in forensic applications. © 2014 American Academy of Forensic Sciences.
Bayesian wavelet PCA methodology for turbomachinery damage diagnosis under uncertainty
NASA Astrophysics Data System (ADS)
Xu, Shengli; Jiang, Xiaomo; Huang, Jinzhi; Yang, Shuhua; Wang, Xiaofang
2016-12-01
Centrifugal compressor often suffers various defects such as impeller cracking, resulting in forced outage of the total plant. Damage diagnostics and condition monitoring of such a turbomachinery system has become an increasingly important and powerful tool to prevent potential failure in components and reduce unplanned forced outage and further maintenance costs, while improving reliability, availability and maintainability of a turbomachinery system. This paper presents a probabilistic signal processing methodology for damage diagnostics using multiple time history data collected from different locations of a turbomachine, considering data uncertainty and multivariate correlation. The proposed methodology is based on the integration of three advanced state-of-the-art data mining techniques: discrete wavelet packet transform, Bayesian hypothesis testing, and probabilistic principal component analysis. The multiresolution wavelet analysis approach is employed to decompose a time series signal into different levels of wavelet coefficients. These coefficients represent multiple time-frequency resolutions of a signal. Bayesian hypothesis testing is then applied to each level of wavelet coefficient to remove possible imperfections. The ratio of posterior odds Bayesian approach provides a direct means to assess whether there is imperfection in the decomposed coefficients, thus avoiding over-denoising. Power spectral density estimated by the Welch method is utilized to evaluate the effectiveness of Bayesian wavelet cleansing method. Furthermore, the probabilistic principal component analysis approach is developed to reduce dimensionality of multiple time series and to address multivariate correlation and data uncertainty for damage diagnostics. The proposed methodology and generalized framework is demonstrated with a set of sensor data collected from a real-world centrifugal compressor with impeller cracks, through both time series and contour analyses of vibration signal and principal components.
Gamagami, R; Dickens, E; Gonzalez, A; D'Amico, L; Richardson, C; Rabaza, J; Kolachalam, R
2018-04-26
To compare the perioperative outcomes of initial, consecutive robotic-assisted transabdominal preperitoneal (R-TAPP) inguinal hernia repair (IHR) cases with consecutive open cases completed by the same surgeons. Multicenter, retrospective, comparative study of perioperative results from open and robotic IHR using standard univariate and multivariate regression analyses for propensity score matched (1:1) cohorts. Seven general surgeons at six institutions contributed 602 consecutive open IHR and 652 consecutive R-TAPP IHR cases. Baseline patient characteristics in the unmatched groups were similar with the exception of previous abdominal surgery and all baseline characteristics were comparable in the matched cohorts. In matched analyses, postoperative complications prior to discharge were comparable. However, from post discharge through 30 days, fewer patients experienced complications in the R-TAPP group than in the open group [4.3% vs 7.7% (p = 0.047)]. The R-TAPP group had no reoperations post discharge through 30 days of follow-up compared with five patients (1.1%) in the open group (p = 0.062), respectively. Multivariate logistic regression analysis which demonstrated patient age > 65 years and the open approach were risk factors for complications within 30 days post discharge in the matched group [age > 65 years: odds ratio (OR) = 3.33 (95% CI 1.89, 5.87; p < 0.0001); open approach: OR = 1.89 (95% CI 1.05, 3.38; p = 0.031)]. In this matched analysis, R-TAPP provides similar postoperative complications prior to discharge and a lower rate of postoperative complications through 30 days compared to open repair. R-TAPP is a promising and reproducible approach, and may facilitate adoption of minimally invasive repairs of inguinal hernias.
de Almeida, John R; Carvalho, Felipe; Vaz Guimaraes Filho, Francisco; Kiehl, Tim-Rasmus; Koutourousiou, Maria; Su, Shirley; Vescan, Allan D; Witterick, Ian J; Zadeh, Gelareh; Wang, Eric W; Fernandez-Miranda, Juan C; Gardner, Paul A; Gentili, Fred; Snyderman, Carl H
2015-11-01
We compare the outcomes and postoperative MRI changes of endoscopic endonasal (EEA) and bifrontal craniotomy (BFC) approaches for olfactory groove meningiomas (OGM). All patients who underwent either BFC or EEA for OGM were eligible. Matched pairs were created by matching tumor volumes of an EEA patient with a BFC patient, and matching the timing of the postoperative scans. The tumor dimensions, peritumoral edema, resectability issues, and frontal lobe changes were recorded based on preoperative and postoperative MRI. Postoperative fluid-attenuated inversion recovery (FLAIR) hyperintensity and residual cystic cavity (porencephalic cave) volume were compared using univariable and multivariable analyses. From a total of 70 patients (46 EEA, 24 BFC), 10 matched pairs (20 patients) were created. Three patients (30%) in the EEA group and two (20%) in the BFC had postoperative cerebrospinal fluid leaks (p=0.61). Gross total resections were achieved in seven (70%) of the EEA group and nine (90%) of the BFC group (p=0.26), and one patient from each group developed a recurrence. On postoperative MRI, there was no significant difference in FLAIR signal volumes between EEA and BFC approaches (6.9 versus 13.3 cm(3); p=0.17) or in porencephalic cave volumes (1.7 versus 5.0 cm(3); p=0.11) in univariable analysis. However, in a multivariable analysis, EEA was associated with less postoperative FLAIR change (p=0.02) after adjusting for the volume of preoperative edema. This study provides preliminary evidence that EEA is associated with quantifiable improvements in postoperative frontal lobe imaging. Copyright © 2015 Elsevier Ltd. All rights reserved.
Blended learning in situated contexts: 3-year evaluation of an online peer review project.
Bridges, S; Chang, J W W; Chu, C H; Gardner, K
2014-08-01
Situated and sociocultural perspectives on learning indicate that the design of complex tasks supported by educational technologies holds potential for dental education in moving novices towards closer approximation of the clinical outcomes of their expert mentors. A cross-faculty-, student-centred, web-based project in operative dentistry was established within the Universitas 21 (U21) network of higher education institutions to support university goals for internationalisation in clinical learning by enabling distributed interactions across sites and institutions. This paper aims to present evaluation of one dental faculty's project experience of curriculum redesign for deeper student learning. A mixed-method case study approach was utilised. Three cohorts of second-year students from a 5-year bachelor of dental surgery (BDS) programme were invited to participate in annual surveys and focus group interviews on project completion. Survey data were analysed for differences between years using multivariate logistical regression analysis. Thematic analysis of questionnaire open responses and interview transcripts was conducted. Multivariate logistic regression analysis noted significant differences across items over time indicating learning improvements, attainment of university aims and the positive influence of redesign. Students perceived the enquiry-based project as stimulating and motivating, and building confidence in operative techniques. Institutional goals for greater understanding of others and lifelong learning showed improvement over time. Despite positive scores, students indicated global citizenship and intercultural understanding were conceptually challenging. Establishment of online student learning communities through a blended approach to learning stimulated motivation and intellectual engagement, thereby supporting a situated approach to cognition. Sociocultural perspectives indicate that novice-expert interactions supported student development of professional identities. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Barton, Mitch; Yeatts, Paul E; Henson, Robin K; Martin, Scott B
2016-12-01
There has been a recent call to improve data reporting in kinesiology journals, including the appropriate use of univariate and multivariate analysis techniques. For example, a multivariate analysis of variance (MANOVA) with univariate post hocs and a Bonferroni correction is frequently used to investigate group differences on multiple dependent variables. However, this univariate approach decreases power, increases the risk for Type 1 error, and contradicts the rationale for conducting multivariate tests in the first place. The purpose of this study was to provide a user-friendly primer on conducting descriptive discriminant analysis (DDA), which is a post-hoc strategy to MANOVA that takes into account the complex relationships among multiple dependent variables. A real-world example using the Statistical Package for the Social Sciences syntax and data from 1,095 middle school students on their body composition and body image are provided to explain and interpret the results from DDA. While univariate post hocs increased the risk for Type 1 error to 76%, the DDA identified which dependent variables contributed to group differences and which groups were different from each other. For example, students in the very lean and Healthy Fitness Zone categories for body mass index experienced less pressure to lose weight, more satisfaction with their body, and higher physical self-concept than the Needs Improvement Zone groups. However, perceived pressure to gain weight did not contribute to group differences because it was a suppressor variable. Researchers are encouraged to use DDA when investigating group differences on multiple correlated dependent variables to determine which variables contributed to group differences.
Gutiérrez-Cacciabue, Dolores; Teich, Ingrid; Poma, Hugo Ramiro; Cruz, Mercedes Cecilia; Balzarini, Mónica; Rajal, Verónica Beatriz
2014-01-01
Several recreational surface waters in Salta, Argentina, were selected to assess their quality. Seventy percent of the measurements exceeded at least one of the limits established by international legislation becoming unsuitable for their use. To interpret results of complex data, multivariate techniques were applied. Arenales River, due to the variability observed in the data, was divided in two: upstream and downstream representing low and high pollution sites, respectively; and Cluster Analysis supported that differentiation. Arenales River downstream and Campo Alegre Reservoir were the most different environments and Vaqueros and La Caldera Rivers were the most similar. Canonical Correlation Analysis allowed exploration of correlations between physicochemical and microbiological variables except in both parts of Arenales River, and Principal Component Analysis allowed finding relationships among the 9 measured variables in all aquatic environments. Variable’s loadings showed that Arenales River downstream was impacted by industrial and domestic activities, Arenales River upstream was affected by agricultural activities, Campo Alegre Reservoir was disturbed by anthropogenic and ecological effects, and La Caldera and Vaqueros Rivers were influenced by recreational activities. Discriminant Analysis allowed identification of subgroup of variables responsible for seasonal and spatial variations. Enterococcus, dissolved oxygen, conductivity, E. coli, pH, and fecal coliforms are sufficient to spatially describe the quality of the aquatic environments. Regarding seasonal variations, dissolved oxygen, conductivity, fecal coliforms, and pH can be used to describe water quality during dry season, while dissolved oxygen, conductivity, total coliforms, E. coli, and Enterococcus during wet season. Thus, the use of multivariate techniques allowed optimizing monitoring tasks and minimizing costs involved. PMID:25190636
Quantifying the impact of between-study heterogeneity in multivariate meta-analyses
Jackson, Dan; White, Ian R; Riley, Richard D
2012-01-01
Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950
Michael S. Balshi; A. David McGuire; Paul Duffy; Mike Flannigan; John Walsh; Jerry Melillo
2009-01-01
We developed temporally and spatially explicit relationships between air temperature and fuel moisture codes derived from the Canadian Fire Weather Index System to estimate annual area burned at 2.5o (latitude x longitude) resolution using a Multivariate Adaptive Regression Spline (MARS) approach across Alaska and Canada. Burned area was...
Non-fragile multivariable PID controller design via system augmentation
NASA Astrophysics Data System (ADS)
Liu, Jinrong; Lam, James; Shen, Mouquan; Shu, Zhan
2017-07-01
In this paper, the issue of designing non-fragile H∞ multivariable proportional-integral-derivative (PID) controllers with derivative filters is investigated. In order to obtain the controller gains, the original system is associated with an extended system such that the PID controller design can be formulated as a static output-feedback control problem. By taking the system augmentation approach, the conditions with slack matrices for solving the non-fragile H∞ multivariable PID controller gains are established. Based on the results, linear matrix inequality -based iterative algorithms are provided to compute the controller gains. Simulations are conducted to verify the effectiveness of the proposed approaches.
Rathore, Anurag S; Kumar Singh, Sumit; Pathak, Mili; Read, Erik K; Brorson, Kurt A; Agarabi, Cyrus D; Khan, Mansoor
2015-01-01
Fermentanomics is an emerging field of research and involves understanding the underlying controlled process variables and their effect on process yield and product quality. Although major advancements have occurred in process analytics over the past two decades, accurate real-time measurement of significant quality attributes for a biotech product during production culture is still not feasible. Researchers have used an amalgam of process models and analytical measurements for monitoring and process control during production. This article focuses on using multivariate data analysis as a tool for monitoring the internal bioreactor dynamics, the metabolic state of the cell, and interactions among them during culture. Quality attributes of the monoclonal antibody product that were monitored include glycosylation profile of the final product along with process attributes, such as viable cell density and level of antibody expression. These were related to process variables, raw materials components of the chemically defined hybridoma media, concentration of metabolites formed during the course of the culture, aeration-related parameters, and supplemented raw materials such as glucose, methionine, threonine, tryptophan, and tyrosine. This article demonstrates the utility of multivariate data analysis for correlating the product quality attributes (especially glycosylation) to process variables and raw materials (especially amino acid supplements in cell culture media). The proposed approach can be applied for process optimization to increase product expression, improve consistency of product quality, and target the desired quality attribute profile. © 2015 American Institute of Chemical Engineers.
NASA Astrophysics Data System (ADS)
Braga, Jez Willian Batista; Trevizan, Lilian Cristina; Nunes, Lidiane Cristina; Rufini, Iolanda Aparecida; Santos, Dário, Jr.; Krug, Francisco José
2010-01-01
The application of laser induced breakdown spectrometry (LIBS) aiming the direct analysis of plant materials is a great challenge that still needs efforts for its development and validation. In this way, a series of experimental approaches has been carried out in order to show that LIBS can be used as an alternative method to wet acid digestions based methods for analysis of agricultural and environmental samples. The large amount of information provided by LIBS spectra for these complex samples increases the difficulties for selecting the most appropriated wavelengths for each analyte. Some applications have suggested that improvements in both accuracy and precision can be achieved by the application of multivariate calibration in LIBS data when compared to the univariate regression developed with line emission intensities. In the present work, the performance of univariate and multivariate calibration, based on partial least squares regression (PLSR), was compared for analysis of pellets of plant materials made from an appropriate mixture of cryogenically ground samples with cellulose as the binding agent. The development of a specific PLSR model for each analyte and the selection of spectral regions containing only lines of the analyte of interest were the best conditions for the analysis. In this particular application, these models showed a similar performance, but PLSR seemed to be more robust due to a lower occurrence of outliers in comparison to the univariate method. Data suggests that efforts dealing with sample presentation and fitness of standards for LIBS analysis must be done in order to fulfill the boundary conditions for matrix independent development and validation.
Multivariate Statistical Analysis of MSL APXS Bulk Geochemical Data
NASA Astrophysics Data System (ADS)
Hamilton, V. E.; Edwards, C. S.; Thompson, L. M.; Schmidt, M. E.
2014-12-01
We apply cluster and factor analyses to bulk chemical data of 130 soil and rock samples measured by the Alpha Particle X-ray Spectrometer (APXS) on the Mars Science Laboratory (MSL) rover Curiosity through sol 650. Multivariate approaches such as principal components analysis (PCA), cluster analysis, and factor analysis compliment more traditional approaches (e.g., Harker diagrams), with the advantage of simultaneously examining the relationships between multiple variables for large numbers of samples. Principal components analysis has been applied with success to APXS, Pancam, and Mössbauer data from the Mars Exploration Rovers. Factor analysis and cluster analysis have been applied with success to thermal infrared (TIR) spectral data of Mars. Cluster analyses group the input data by similarity, where there are a number of different methods for defining similarity (hierarchical, density, distribution, etc.). For example, without any assumptions about the chemical contributions of surface dust, preliminary hierarchical and K-means cluster analyses clearly distinguish the physically adjacent rock targets Windjana and Stephen as being distinctly different than lithologies observed prior to Curiosity's arrival at The Kimberley. In addition, they are separated from each other, consistent with chemical trends observed in variation diagrams but without requiring assumptions about chemical relationships. We will discuss the variation in cluster analysis results as a function of clustering method and pre-processing (e.g., log transformation, correction for dust cover) and implications for interpreting chemical data. Factor analysis shares some similarities with PCA, and examines the variability among observed components of a dataset so as to reveal variations attributable to unobserved components. Factor analysis has been used to extract the TIR spectra of components that are typically observed in mixtures and only rarely in isolation; there is the potential for similar results with data from APXS. These techniques offer new ways to understand the chemical relationships between the materials interrogated by Curiosity, and potentially their relation to materials observed by APXS instruments on other landed missions.
Ciampi, Antonio; Dyachenko, Alina; Cole, Martin; McCusker, Jane
2011-12-01
The study of mental disorders in the elderly presents substantial challenges due to population heterogeneity, coexistence of different mental disorders, and diagnostic uncertainty. While reliable tools have been developed to collect relevant data, new approaches to study design and analysis are needed. We focus on a new analytic approach. Our framework is based on latent class analysis and hidden Markov chains. From repeated measurements of a multivariate disease index, we extract the notion of underlying state of a patient at a time point. The course of the disorder is then a sequence of transitions among states. States and transitions are not observable; however, the probability of being in a state at a time point, and the transition probabilities from one state to another over time can be estimated. Data from 444 patients with and without diagnosis of delirium and dementia were available from a previous study. The Delirium Index was measured at diagnosis, and at 2 and 6 months from diagnosis. Four latent classes were identified: fairly healthy, moderately ill, clearly sick, and very sick. Dementia and delirium could not be separated on the basis of these data alone. Indeed, as the probability of delirium increased, so did the probability of decline of mental functions. Eight most probable courses were identified, including good and poor stable courses, and courses exhibiting various patterns of improvement. Latent class analysis and hidden Markov chains offer a promising tool for studying mental disorders in the elderly. Its use may show its full potential as new data become available.
Carbon financial markets: A time-frequency analysis of CO2 prices
NASA Astrophysics Data System (ADS)
Sousa, Rita; Aguiar-Conraria, Luís; Soares, Maria Joana
2014-11-01
We characterize the interrelation of CO2 prices with energy prices (electricity, gas and coal), and with economic activity. Previous studies have relied on time-domain techniques, such as Vector Auto-Regressions. In this study, we use multivariate wavelet analysis, which operates in the time-frequency domain. Wavelet analysis provides convenient tools to distinguish relations at particular frequencies and at particular time horizons. Our empirical approach has the potential to identify relations getting stronger and then disappearing over specific time intervals and frequencies. We are able to examine the coherency of these variables and lead-lag relations at different frequencies for the time periods in focus.
LinkWinds: An Approach to Visual Data Analysis
NASA Technical Reports Server (NTRS)
Jacobson, Allan S.
1992-01-01
The Linked Windows Interactive Data System (LinkWinds) is a prototype visual data exploration and analysis system resulting from a NASA/JPL program of research into graphical methods for rapidly accessing, displaying and analyzing large multivariate multidisciplinary datasets. It is an integrated multi-application execution environment allowing the dynamic interconnection of multiple windows containing visual displays and/or controls through a data-linking paradigm. This paradigm, which results in a system much like a graphical spreadsheet, is not only a powerful method for organizing large amounts of data for analysis, but provides a highly intuitive, easy to learn user interface on top of the traditional graphical user interface.
NASA Astrophysics Data System (ADS)
Sheykhizadeh, Saheleh; Naseri, Abdolhossein
2018-04-01
Variable selection plays a key role in classification and multivariate calibration. Variable selection methods are aimed at choosing a set of variables, from a large pool of available predictors, relevant to the analyte concentrations estimation, or to achieve better classification results. Many variable selection techniques have now been introduced among which, those which are based on the methodologies of swarm intelligence optimization have been more respected during a few last decades since they are mainly inspired by nature. In this work, a simple and new variable selection algorithm is proposed according to the invasive weed optimization (IWO) concept. IWO is considered a bio-inspired metaheuristic mimicking the weeds ecological behavior in colonizing as well as finding an appropriate place for growth and reproduction; it has been shown to be very adaptive and powerful to environmental changes. In this paper, the first application of IWO, as a very simple and powerful method, to variable selection is reported using different experimental datasets including FTIR and NIR data, so as to undertake classification and multivariate calibration tasks. Accordingly, invasive weed optimization - linear discrimination analysis (IWO-LDA) and invasive weed optimization- partial least squares (IWO-PLS) are introduced for multivariate classification and calibration, respectively.
Sheykhizadeh, Saheleh; Naseri, Abdolhossein
2018-04-05
Variable selection plays a key role in classification and multivariate calibration. Variable selection methods are aimed at choosing a set of variables, from a large pool of available predictors, relevant to the analyte concentrations estimation, or to achieve better classification results. Many variable selection techniques have now been introduced among which, those which are based on the methodologies of swarm intelligence optimization have been more respected during a few last decades since they are mainly inspired by nature. In this work, a simple and new variable selection algorithm is proposed according to the invasive weed optimization (IWO) concept. IWO is considered a bio-inspired metaheuristic mimicking the weeds ecological behavior in colonizing as well as finding an appropriate place for growth and reproduction; it has been shown to be very adaptive and powerful to environmental changes. In this paper, the first application of IWO, as a very simple and powerful method, to variable selection is reported using different experimental datasets including FTIR and NIR data, so as to undertake classification and multivariate calibration tasks. Accordingly, invasive weed optimization - linear discrimination analysis (IWO-LDA) and invasive weed optimization- partial least squares (IWO-PLS) are introduced for multivariate classification and calibration, respectively. Copyright © 2018 Elsevier B.V. All rights reserved.
Neelon, Brian; Gelfand, Alan E.; Miranda, Marie Lynn
2013-01-01
Summary Researchers in the health and social sciences often wish to examine joint spatial patterns for two or more related outcomes. Examples include infant birth weight and gestational length, psychosocial and behavioral indices, and educational test scores from different cognitive domains. We propose a multivariate spatial mixture model for the joint analysis of continuous individual-level outcomes that are referenced to areal units. The responses are modeled as a finite mixture of multivariate normals, which accommodates a wide range of marginal response distributions and allows investigators to examine covariate effects within subpopulations of interest. The model has a hierarchical structure built at the individual level (i.e., individuals are nested within areal units), and thus incorporates both individual- and areal-level predictors as well as spatial random effects for each mixture component. Conditional autoregressive (CAR) priors on the random effects provide spatial smoothing and allow the shape of the multivariate distribution to vary flexibly across geographic regions. We adopt a Bayesian modeling approach and develop an efficient Markov chain Monte Carlo model fitting algorithm that relies primarily on closed-form full conditionals. We use the model to explore geographic patterns in end-of-grade math and reading test scores among school-age children in North Carolina. PMID:26401059
Matero, Sanni; van Den Berg, Frans; Poutiainen, Sami; Rantanen, Jukka; Pajander, Jari
2013-05-01
The manufacturing of tablets involves many unit operations that possess multivariate and complex characteristics. The interactions between the material characteristics and process related variation are presently not comprehensively analyzed due to univariate detection methods. As a consequence, current best practice to control a typical process is to not allow process-related factors to vary i.e. lock the production parameters. The problem related to the lack of sufficient process understanding is still there: the variation within process and material properties is an intrinsic feature and cannot be compensated for with constant process parameters. Instead, a more comprehensive approach based on the use of multivariate tools for investigating processes should be applied. In the pharmaceutical field these methods are referred to as Process Analytical Technology (PAT) tools that aim to achieve a thorough understanding and control over the production process. PAT includes the frames for measurement as well as data analyzes and controlling for in-depth understanding, leading to more consistent and safer drug products with less batch rejections. In the optimal situation, by applying these techniques, destructive end-product testing could be avoided. In this paper the most prominent multivariate data analysis measuring tools within tablet manufacturing and basic research on operations are reviewed. Copyright © 2013 Wiley Periodicals, Inc.
Analysis techniques for multivariate root loci. [a tool in linear control systems
NASA Technical Reports Server (NTRS)
Thompson, P. M.; Stein, G.; Laub, A. J.
1980-01-01
Analysis and techniques are developed for the multivariable root locus and the multivariable optimal root locus. The generalized eigenvalue problem is used to compute angles and sensitivities for both types of loci, and an algorithm is presented that determines the asymptotic properties of the optimal root locus.
Methods for presentation and display of multivariate data
NASA Technical Reports Server (NTRS)
Myers, R. H.
1981-01-01
Methods for the presentation and display of multivariate data are discussed with emphasis placed on the multivariate analysis of variance problems and the Hotelling T(2) solution in the two-sample case. The methods utilize the concepts of stepwise discrimination analysis and the computation of partial correlation coefficients.
A Primer on Multivariate Analysis of Variance (MANOVA) for Behavioral Scientists
ERIC Educational Resources Information Center
Warne, Russell T.
2014-01-01
Reviews of statistical procedures (e.g., Bangert & Baumberger, 2005; Kieffer, Reese, & Thompson, 2001; Warne, Lazo, Ramos, & Ritter, 2012) show that one of the most common multivariate statistical methods in psychological research is multivariate analysis of variance (MANOVA). However, MANOVA and its associated procedures are often not…
Identifying patients with cost-related medication non-adherence: a big-data approach.
Zhang, James X; Meltzer, David O
2016-08-01
Millions of Americans encounter access barriers to medication due to cost; however, to date, there is no effective screening tool that identifies patients at risk of cost-related medication non-adherence (CRN). By utilizing a big-data approach to combining the survey data and electronic health records (EHRs), this study aimed to develop a method of identifying patients at risk of CRN. CRN data were collected by surveying patients about CRN behaviors in the past 3 months. By matching the dates of patients' receipt of monthly Social Security (SS) payments and the dates of prescription orders for 559 Medicare beneficiaries who were primary SS claimants at high risk of hospitalization in an urban academic medical center, this study identified patients who ordered their outpatient prescription within 2 days of receipt of monthly SS payments in 2014. The predictive power of this information on CRN was assessed using multivariate logistic regression analysis. Among the 559 Medicare patients at high risk of hospitalization, 137 (25%) reported CRN. Among those with CRN, 96 (70%) had ordered prescriptions on receipt of SS payments one or more times in 2014. The area under the Receiver Operating Curve was 0.70 using the predictive model in multivariate logistic regression analysis. With a new approach to combining the survey data and EHR data, patients' behavior in delaying filling of prescription until funds from SS checks become available can be measured, providing some predictive value for cost-related medication non-adherence. The big-data approach is a valuable tool to identify patients at risk of CRN and can be further expanded to the general population and sub-populations, providing a meaningful risk-stratification for CRN and facilitating physician-patient communication to reduce CRN.
Robustness of Multiple Objective Decision Analysis Preference Functions
2002-06-01
p p′ : The probability of some event. ,i ip q : The probability of event . i Π : An aggregation of proportional data used in calculating a test ...statistical tests of the significance of the term and also is conducted in a multivariate framework rather than the ROSA univariate approach. A...residual error is ˆ−e = y y (45) The coefficient provides a ready indicator of the contribution for the associated variable and statistical tests
Meta-analysis identifies gene-by-environment interactions as demonstrated in a study of 4,965 mice.
Kang, Eun Yong; Han, Buhm; Furlotte, Nicholas; Joo, Jong Wha J; Shih, Diana; Davis, Richard C; Lusis, Aldons J; Eskin, Eleazar
2014-01-01
Identifying environmentally-specific genetic effects is a key challenge in understanding the structure of complex traits. Model organisms play a crucial role in the identification of such gene-by-environment interactions, as a result of the unique ability to observe genetically similar individuals across multiple distinct environments. Many model organism studies examine the same traits but under varying environmental conditions. For example, knock-out or diet-controlled studies are often used to examine cholesterol in mice. These studies, when examined in aggregate, provide an opportunity to identify genomic loci exhibiting environmentally-dependent effects. However, the straightforward application of traditional methodologies to aggregate separate studies suffers from several problems. First, environmental conditions are often variable and do not fit the standard univariate model for interactions. Additionally, applying a multivariate model results in increased degrees of freedom and low statistical power. In this paper, we jointly analyze multiple studies with varying environmental conditions using a meta-analytic approach based on a random effects model to identify loci involved in gene-by-environment interactions. Our approach is motivated by the observation that methods for discovering gene-by-environment interactions are closely related to random effects models for meta-analysis. We show that interactions can be interpreted as heterogeneity and can be detected without utilizing the traditional uni- or multi-variate approaches for discovery of gene-by-environment interactions. We apply our new method to combine 17 mouse studies containing in aggregate 4,965 distinct animals. We identify 26 significant loci involved in High-density lipoprotein (HDL) cholesterol, many of which are consistent with previous findings. Several of these loci show significant evidence of involvement in gene-by-environment interactions. An additional advantage of our meta-analysis approach is that our combined study has significantly higher power and improved resolution compared to any single study thus explaining the large number of loci discovered in the combined study.
Meta-Analysis Identifies Gene-by-Environment Interactions as Demonstrated in a Study of 4,965 Mice
Joo, Jong Wha J.; Shih, Diana; Davis, Richard C.; Lusis, Aldons J.; Eskin, Eleazar
2014-01-01
Identifying environmentally-specific genetic effects is a key challenge in understanding the structure of complex traits. Model organisms play a crucial role in the identification of such gene-by-environment interactions, as a result of the unique ability to observe genetically similar individuals across multiple distinct environments. Many model organism studies examine the same traits but under varying environmental conditions. For example, knock-out or diet-controlled studies are often used to examine cholesterol in mice. These studies, when examined in aggregate, provide an opportunity to identify genomic loci exhibiting environmentally-dependent effects. However, the straightforward application of traditional methodologies to aggregate separate studies suffers from several problems. First, environmental conditions are often variable and do not fit the standard univariate model for interactions. Additionally, applying a multivariate model results in increased degrees of freedom and low statistical power. In this paper, we jointly analyze multiple studies with varying environmental conditions using a meta-analytic approach based on a random effects model to identify loci involved in gene-by-environment interactions. Our approach is motivated by the observation that methods for discovering gene-by-environment interactions are closely related to random effects models for meta-analysis. We show that interactions can be interpreted as heterogeneity and can be detected without utilizing the traditional uni- or multi-variate approaches for discovery of gene-by-environment interactions. We apply our new method to combine 17 mouse studies containing in aggregate 4,965 distinct animals. We identify 26 significant loci involved in High-density lipoprotein (HDL) cholesterol, many of which are consistent with previous findings. Several of these loci show significant evidence of involvement in gene-by-environment interactions. An additional advantage of our meta-analysis approach is that our combined study has significantly higher power and improved resolution compared to any single study thus explaining the large number of loci discovered in the combined study. PMID:24415945
Sasakura, D; Nakayama, K; Sakamoto, T; Chikuma, T
2015-05-01
The use of transmission near infrared spectroscopy (TNIRS) is of particular interest in the pharmaceutical industry. This is because TNIRS does not require sample preparation and can analyze several tens of tablet samples in an hour. It has the capability to measure all relevant information from a tablet, while still on the production line. However, TNIRS has a narrow spectrum range and overtone vibrations often overlap. To perform content uniformity testing in tablets by TNIRS, various properties in the tableting process need to be analyzed by a multivariate prediction model, such as a Partial Least Square Regression modeling. One issue is that typical approaches require several hundred reference samples to act as the basis of the method rather than a strategically designed method. This means that many batches are needed to prepare the reference samples; this requires time and is not cost effective. Our group investigated the concentration dependence of the calibration model with a strategic design. Consequently, we developed a more effective approach to the TNIRS calibration model than the existing methodology.
NASA Astrophysics Data System (ADS)
Metwally, Fadia H.
2008-02-01
The quantitative predictive abilities of the new and simple bivariate spectrophotometric method are compared with the results obtained by the use of multivariate calibration methods [the classical least squares (CLS), principle component regression (PCR) and partial least squares (PLS)], using the information contained in the absorption spectra of the appropriate solutions. Mixtures of the two drugs Nifuroxazide (NIF) and Drotaverine hydrochloride (DRO) were resolved by application of the bivariate method. The different chemometric approaches were applied also with previous optimization of the calibration matrix, as they are useful in simultaneous inclusion of many spectral wavelengths. The results found by application of the bivariate, CLS, PCR and PLS methods for the simultaneous determinations of mixtures of both components containing 2-12 μg ml -1 of NIF and 2-8 μg ml -1 of DRO are reported. Both approaches were satisfactorily applied to the simultaneous determination of NIF and DRO in pure form and in pharmaceutical formulation. The results were in accordance with those given by the EVA Pharma reference spectrophotometric method.
NASA Astrophysics Data System (ADS)
Panagopoulos, George P.
2014-10-01
The multivariate statistical techniques conducted on quarterly water consumption data in Mytilene reveal valuable tools that could help the local authorities in assigning strategies aimed at the sustainable development of urban water resources. The proposed methodology is an innovative approach, applied for the first time in the international literature, to handling urban water consumption data in order to analyze statistically the interrelationships among the determinants of urban water use. Factor analysis of demographic, socio-economic and hydrological variables shows that total water consumption in Mytilene is the combined result of increases in (a) income, (b) population, (c) connections and (d) climate parameters. On the other hand, the per connection water demand is influenced by variations in water prices but with different consequences in each consumption class. Increases in water prices are faced by large consumers; they then reduce their consumption rates and transfer to lower consumption blocks. These shifts are responsible for the increase in the average consumption values in the lower blocks despite the increase in the marginal prices.
The Statistical Consulting Center for Astronomy (SCCA)
NASA Technical Reports Server (NTRS)
Akritas, Michael
2001-01-01
The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to scientists both within and outside of astronomy. The most popular sections are multivariate techniques, image analysis, and time series analysis. Hundreds of copies of the ASURV, SLOPES and CENS-TAU codes developed by SCCA scientists were also downloaded from the StatCodes site. In addition to formal SCCA duties, SCCA scientists continued a variety of related activities in astrostatistics, including refereeing of statistically oriented papers submitted to the Astrophysical Journal, talks in meetings including Feigelson's talk to science journalists entitled "The reemergence of astrostatistics" at the American Association for the Advancement of Science meeting, and published papers of astrostatistical content.
Quality control for quantitative PCR based on amplification compatibility test.
Tichopad, Ales; Bar, Tzachi; Pecen, Ladislav; Kitchen, Robert R; Kubista, Mikael; Pfaffl, Michael W
2010-04-01
Quantitative qPCR is a routinely used method for the accurate quantification of nucleic acids. Yet it may generate erroneous results if the amplification process is obscured by inhibition or generation of aberrant side-products such as primer dimers. Several methods have been established to control for pre-processing performance that rely on the introduction of a co-amplified reference sequence, however there is currently no method to allow for reliable control of the amplification process without directly modifying the sample mix. Herein we present a statistical approach based on multivariate analysis of the amplification response data generated in real-time. The amplification trajectory in its most resolved and dynamic phase is fitted with a suitable model. Two parameters of this model, related to amplification efficiency, are then used for calculation of the Z-score statistics. Each studied sample is compared to a predefined reference set of reactions, typically calibration reactions. A probabilistic decision for each individual Z-score is then used to identify the majority of inhibited reactions in our experiments. We compare this approach to univariate methods using only the sample specific amplification efficiency as reporter of the compatibility. We demonstrate improved identification performance using the multivariate approach compared to the univariate approach. Finally we stress that the performance of the amplification compatibility test as a quality control procedure depends on the quality of the reference set. Copyright 2010 Elsevier Inc. All rights reserved.
Rank estimation and the multivariate analysis of in vivo fast-scan cyclic voltammetric data
Keithley, Richard B.; Carelli, Regina M.; Wightman, R. Mark
2010-01-01
Principal component regression has been used in the past to separate current contributions from different neuromodulators measured with in vivo fast-scan cyclic voltammetry. Traditionally, a percent cumulative variance approach has been used to determine the rank of the training set voltammetric matrix during model development, however this approach suffers from several disadvantages including the use of arbitrary percentages and the requirement of extreme precision of training sets. Here we propose that Malinowski’s F-test, a method based on a statistical analysis of the variance contained within the training set, can be used to improve factor selection for the analysis of in vivo fast-scan cyclic voltammetric data. These two methods of rank estimation were compared at all steps in the calibration protocol including the number of principal components retained, overall noise levels, model validation as determined using a residual analysis procedure, and predicted concentration information. By analyzing 119 training sets from two different laboratories amassed over several years, we were able to gain insight into the heterogeneity of in vivo fast-scan cyclic voltammetric data and study how differences in factor selection propagate throughout the entire principal component regression analysis procedure. Visualizing cyclic voltammetric representations of the data contained in the retained and discarded principal components showed that using Malinowski’s F-test for rank estimation of in vivo training sets allowed for noise to be more accurately removed. Malinowski’s F-test also improved the robustness of our criterion for judging multivariate model validity, even though signal-to-noise ratios of the data varied. In addition, pH change was the majority noise carrier of in vivo training sets while dopamine prediction was more sensitive to noise. PMID:20527815
NASA Astrophysics Data System (ADS)
Ferrera, Elisabetta; Giammanco, Salvatore; Cannata, Andrea; Montalto, Placido
2013-04-01
From November 2009 to April 2011 soil radon activity was continuously monitored using a Barasol® probe located on the upper NE flank of Mt. Etna volcano, close either to the Piano Provenzana fault or to the NE-Rift. Seismic and volcanological data have been analyzed together with radon data. We also analyzed air and soil temperature, barometric pressure, snow and rain fall data. In order to find possible correlations among the above parameters, and hence to reveal possible anomalies in the radon time-series, we used different statistical methods: i) multivariate linear regression; ii) cross-correlation; iii) coherence analysis through wavelet transform. Multivariate regression indicated a modest influence on soil radon from environmental parameters (R2 = 0.31). When using 100-days time windows, the R2 values showed wide variations in time, reaching their maxima (~0.63-0.66) during summer. Cross-correlation analysis over 100-days moving averages showed that, similar to multivariate linear regression analysis, the summer period is characterised by the best correlation between radon data and environmental parameters. Lastly, the wavelet coherence analysis allowed a multi-resolution coherence analysis of the time series acquired. This approach allows to study the relations among different signals either in time or frequency domain. It confirmed the results of the previous methods, but also allowed to recognize correlations between radon and environmental parameters at different observation scales (e.g., radon activity changed during strong precipitations, but also during anomalous variations of soil temperature uncorrelated with seasonal fluctuations). Our work suggests that in order to make an accurate analysis of the relations among distinct signals it is necessary to use different techniques that give complementary analytical information. In particular, the wavelet analysis showed to be very effective in discriminating radon changes due to environmental influences from those correlated with impending seismic or volcanic events.
NASA Astrophysics Data System (ADS)
Harris, C. D.; Profeta, Luisa T. M.; Akpovo, Codjo A.; Johnson, Lewis; Stowe, Ashley C.
2017-05-01
A calibration model was created to illustrate the detection capabilities of laser ablation molecular isotopic spectroscopy (LAMIS) discrimination in isotopic analysis. The sample set contained boric acid pellets that varied in isotopic concentrations of 10B and 11B. Each sample set was interrogated with a Q-switched Nd:YAG ablation laser operating at 532 nm. A minimum of four band heads of the β system B2∑ -> Χ2∑transitions were identified and verified with previous literature on BO molecular emission lines. Isotopic shifts were observed in the spectra for each transition and used as the predictors in the calibration model. The spectra along with their respective 10/11B isotopic ratios were analyzed using Partial Least Squares Regression (PLSR). An IUPAC novel approach for determining a multivariate Limit of Detection (LOD) interval was used to predict the detection of the desired isotopic ratios. The predicted multivariate LOD is dependent on the variation of the instrumental signal and other composites in the calibration model space.
Multivariate pattern dependence
Saxe, Rebecca
2017-01-01
When we perform a cognitive task, multiple brain regions are engaged. Understanding how these regions interact is a fundamental step to uncover the neural bases of behavior. Most research on the interactions between brain regions has focused on the univariate responses in the regions. However, fine grained patterns of response encode important information, as shown by multivariate pattern analysis. In the present article, we introduce and apply multivariate pattern dependence (MVPD): a technique to study the statistical dependence between brain regions in humans in terms of the multivariate relations between their patterns of responses. MVPD characterizes the responses in each brain region as trajectories in region-specific multidimensional spaces, and models the multivariate relationship between these trajectories. We applied MVPD to the posterior superior temporal sulcus (pSTS) and to the fusiform face area (FFA), using a searchlight approach to reveal interactions between these seed regions and the rest of the brain. Across two different experiments, MVPD identified significant statistical dependence not detected by standard functional connectivity. Additionally, MVPD outperformed univariate connectivity in its ability to explain independent variance in the responses of individual voxels. In the end, MVPD uncovered different connectivity profiles associated with different representational subspaces of FFA: the first principal component of FFA shows differential connectivity with occipital and parietal regions implicated in the processing of low-level properties of faces, while the second and third components show differential connectivity with anterior temporal regions implicated in the processing of invariant representations of face identity. PMID:29155809
Chieng, Norman; Trnka, Hjalte; Boetker, Johan; Pikal, Michael; Rantanen, Jukka; Grohganz, Holger
2013-09-15
The purpose of this study is to investigate the use of multivariate data analysis for powder X-ray diffraction-pair-wise distribution function (PXRD-PDF) data to detect phase separation in freeze-dried binary amorphous systems. Polymer-polymer and polymer-sugar binary systems at various ratios were freeze-dried. All samples were analyzed by PXRD, transformed to PDF and analyzed by principal component analysis (PCA). These results were validated by differential scanning calorimetry (DSC) through characterization of glass transition of the maximally freeze-concentrate solute (Tg'). Analysis of PXRD-PDF data using PCA provides a more clear 'miscible' or 'phase separated' interpretation through the distribution pattern of samples on a score plot presentation compared to residual plot method. In a phase separated system, samples were found to be evenly distributed around the theoretical PDF profile. For systems that were miscible, a clear deviation of samples away from the theoretical PDF profile was observed. Moreover, PCA analysis allows simultaneous analysis of replicate samples. Comparatively, the phase behavior analysis from PXRD-PDF-PCA method was in agreement with the DSC results. Overall, the combined PXRD-PDF-PCA approach improves the clarity of the PXRD-PDF results and can be used as an alternative explorative data analytical tool in detecting phase separation in freeze-dried binary amorphous systems. Copyright © 2013 Elsevier B.V. All rights reserved.
Evaluation of natural mandibular shape asymmetry: an approach by using elliptical Fourier analysis.
Niño-Sandoval, Tania C; Morantes Ariza, Carlos F; Infante-Contreras, Clementina; Vasconcelos, Belmiro Ce
2018-04-05
The purpose of this study was to demonstrate that asymmetry is a natural occurring phenomenon in the mandibular shape by using elliptical Fourier analysis. 164 digital orthopantomographs from Colombian patients of both sexes aged 18 to 25 years were collected. Curves from left and right hemimandible were digitized. An elliptical Fourier analysis was performed with 20 harmonics. In the general sexual dimorphism a principal component analysis (PCA) and a hotelling T 2 from the multivariate warp space were employed. Exploratory analysis of general asymmetry and sexual dimorphism by side was made with a Procrustes Fit. A non-parametric multivariate analysis of variance (MANOVA) was applied to assess differentiation of skeletal classes of each hemimandible, and a Procrustes analysis of variance (ANOVA) was applied to search any relation between skeletal class and side in both sexes. Significant values were found in general asymmetry, general sexual dimorphism, in dimorphism by side (p < 0.0001), asymmetry by sex, and differences between Class I, II, and III (p < 0.005). However, a relation of skeletal classes and side was not found. The mandibular asymmetry by shape is present in all patients and should not be articulated exclusively to pathological processes, therefore, along with sexual dimorphism and differences between skeletal classes must be taken into account for improving mandibular prediction systems.
Fernández de la Ossa, Ma Ángeles; Ortega-Ojeda, Fernando; García-Ruiz, Carmen
2014-11-01
This work reports an investigation for the analysis of different paper samples using CE with laser-induced detection. Papers from four different manufactures (white-copy paper) and four different paper sources (white and recycled-copy papers, adhesive yellow paper notes and restaurant serviettes) were pulverized by scratching with a surgical scalpel prior to their derivatization with a fluorescent labeling agent, 8-aminopyrene-1,3,6-trisulfonic acid. Methodological conditions were evaluated, specifically the derivatization conditions with the aim to achieve the best S/N signals and the separation conditions in order to obtain optimum values of sensitivity and reproducibility. The best conditions, in terms of fastest, and easiest sample preparation procedure, minimal sample consumption, as well as the use of the simplest and fastest CE-procedure for obtaining the best analytical parameters, were applied to the analysis of the different paper samples. The registered electropherograms were pretreated (normalized and aligned) and subjected to multivariate analysis (principal component analysis). A successful discrimination among paper samples without entanglements was achieved. To the best of our knowledge, this work presents the first approach to achieve a successful differentiation among visually similar white-copy paper samples produced by different manufactures and paper from different paper sources through their direct analysis by CE-LIF and subsequent comparative study of the complete cellulose electropherogram by chemometric tools. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Estimating an Effect Size in One-Way Multivariate Analysis of Variance (MANOVA)
ERIC Educational Resources Information Center
Steyn, H. S., Jr.; Ellis, S. M.
2009-01-01
When two or more univariate population means are compared, the proportion of variation in the dependent variable accounted for by population group membership is eta-squared. This effect size can be generalized by using multivariate measures of association, based on the multivariate analysis of variance (MANOVA) statistics, to establish whether…
Dangers in Using Analysis of Covariance Procedures.
ERIC Educational Resources Information Center
Campbell, Kathleen T.
Problems associated with the use of analysis of covariance (ANCOVA) as a statistical control technique are explained. Three problems relate to the use of "OVA" methods (analysis of variance, analysis of covariance, multivariate analysis of variance, and multivariate analysis of covariance) in general. These are: (1) the wasting of information when…
Clustering analysis for muon tomography data elaboration in the Muon Portal project
NASA Astrophysics Data System (ADS)
Bandieramonte, M.; Antonuccio-Delogu, V.; Becciani, U.; Costa, A.; La Rocca, P.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, F.; Riggi, S.; Sciacca, E.; Vitello, F.
2015-05-01
Clustering analysis is one of multivariate data analysis techniques which allows to gather statistical data units into groups, in order to minimize the logical distance within each group and to maximize the one between different groups. In these proceedings, the authors present a novel approach to the muontomography data analysis based on clustering algorithms. As a case study we present the Muon Portal project that aims to build and operate a dedicated particle detector for the inspection of harbor containers to hinder the smuggling of nuclear materials. Clustering techniques, working directly on scattering points, help to detect the presence of suspicious items inside the container, acting, as it will be shown, as a filter for a preliminary analysis of the data.
Natural selection. VII. History and interpretation of kin selection theory.
Frank, S A
2013-06-01
Kin selection theory is a kind of causal analysis. The initial form of kin selection ascribed cause to costs, benefits and genetic relatedness. The theory then slowly developed a deeper and more sophisticated approach to partitioning the causes of social evolution. Controversy followed because causal analysis inevitably attracts opposing views. It is always possible to separate total effects into different component causes. Alternative causal schemes emphasize different aspects of a problem, reflecting the distinct goals, interests and biases of different perspectives. For example, group selection is a particular causal scheme with certain advantages and significant limitations. Ultimately, to use kin selection theory to analyse natural patterns and to understand the history of debates over different approaches, one must follow the underlying history of causal analysis. This article describes the history of kin selection theory, with emphasis on how the causal perspective improved through the study of key patterns of natural history, such as dispersal and sex ratio, and through a unified approach to demographic and social processes. Independent historical developments in the multivariate analysis of quantitative traits merged with the causal analysis of social evolution by kin selection. © 2013 The Author. Journal of Evolutionary Biology © 2013 European Society For Evolutionary Biology.
Grosse Frie, Kirstin; Janssen, Christian
2009-01-01
Based on the theoretical and empirical approach of Pierre Bourdieu, a multivariate non-linear method is introduced as an alternative way to analyse the complex relationships between social determinants and health. The analysis is based on face-to-face interviews with 695 randomly selected respondents aged 30 to 59. Variables regarding socio-economic status, life circumstances, lifestyles, health-related behaviour and health were chosen for the analysis. In order to determine whether the respondents can be differentiated and described based on these variables, a non-linear canonical correlation analysis (OVERALS) was performed. The results can be described on three dimensions; Eigenvalues add up to the fit of 1.444, which can be interpreted as approximately 50 % of explained variance. The three-dimensional space illustrates correspondences between variables and provides a framework for interpretation based on latent dimensions, which can be described by age, education, income and gender. Using non-linear canonical correlation analysis, health characteristics can be analysed in conjunction with socio-economic conditions and lifestyles. Based on Bourdieus theoretical approach, the complex correlations between these variables can be more substantially interpreted and presented.
Risk factors for parastomal hernia in Japanese patients with permanent colostomy.
Funahashi, Kimihiko; Suzuki, Takayuki; Nagashima, Yasuo; Matsuda, Satoshi; Koike, Junichi; Shiokawa, Hiroyuki; Ushigome, Mitsunori; Arai, Kenichiro; Kaneko, Tomoaki; Kurihara, Akiharu; Kaneko, Hironori
2014-08-01
Although the definitive risk factors for parastomal hernia development remain unclear, potential contributing factors have been reported from Western countries. The aim of this study was to identify the risk factors for parastomal hernia in Japanese patients with permanent colostomies. All patients who received abdominoperineal resection or total pelvic exenteration at our institution between December 2004 and December 2011 were reviewed. Patient-related, operation-related and postoperative variables were evaluated, in both univariate and multivariate analyses, to identify the risk factors for parastomal hernia formation. Of the 80 patients who underwent colostomy, 22 (27.5 %) developed a parastomal hernia during a median follow-up period of 953 days (range 15-2792 days). Hernia development was significantly associated with increasing patient age and body mass index, a laparoscopic surgical approach and the transperitoneal route of colostomy formation. In the multivariate analysis, the body mass index (p = 0.022), the laparoscopic approach (p = 0.043) and transperitoneal stoma creation (p = 0.021) retained statistical significance. Our findings in Japanese ostomates match those from Western countries: a higher body mass index, the use of a laparoscopic approach and a transperitoneal colostomy are significant independent risk factors for parastomal hernia formation. The precise role of the stoma creation route remains unclear.
A hybrid PCA-CART-MARS-based prognostic approach of the remaining useful life for aircraft engines.
Sánchez Lasheras, Fernando; García Nieto, Paulino José; de Cos Juez, Francisco Javier; Mayo Bayón, Ricardo; González Suárez, Victor Manuel
2015-03-23
Prognostics is an engineering discipline that predicts the future health of a system. In this research work, a data-driven approach for prognostics is proposed. Indeed, the present paper describes a data-driven hybrid model for the successful prediction of the remaining useful life of aircraft engines. The approach combines the multivariate adaptive regression splines (MARS) technique with the principal component analysis (PCA), dendrograms and classification and regression trees (CARTs). Elements extracted from sensor signals are used to train this hybrid model, representing different levels of health for aircraft engines. In this way, this hybrid algorithm is used to predict the trends of these elements. Based on this fitting, one can determine the future health state of a system and estimate its remaining useful life (RUL) with accuracy. To evaluate the proposed approach, a test was carried out using aircraft engine signals collected from physical sensors (temperature, pressure, speed, fuel flow, etc.). Simulation results show that the PCA-CART-MARS-based approach can forecast faults long before they occur and can predict the RUL. The proposed hybrid model presents as its main advantage the fact that it does not require information about the previous operation states of the input variables of the engine. The performance of this model was compared with those obtained by other benchmark models (multivariate linear regression and artificial neural networks) also applied in recent years for the modeling of remaining useful life. Therefore, the PCA-CART-MARS-based approach is very promising in the field of prognostics of the RUL for aircraft engines.
A Hybrid PCA-CART-MARS-Based Prognostic Approach of the Remaining Useful Life for Aircraft Engines
Lasheras, Fernando Sánchez; Nieto, Paulino José García; de Cos Juez, Francisco Javier; Bayón, Ricardo Mayo; Suárez, Victor Manuel González
2015-01-01
Prognostics is an engineering discipline that predicts the future health of a system. In this research work, a data-driven approach for prognostics is proposed. Indeed, the present paper describes a data-driven hybrid model for the successful prediction of the remaining useful life of aircraft engines. The approach combines the multivariate adaptive regression splines (MARS) technique with the principal component analysis (PCA), dendrograms and classification and regression trees (CARTs). Elements extracted from sensor signals are used to train this hybrid model, representing different levels of health for aircraft engines. In this way, this hybrid algorithm is used to predict the trends of these elements. Based on this fitting, one can determine the future health state of a system and estimate its remaining useful life (RUL) with accuracy. To evaluate the proposed approach, a test was carried out using aircraft engine signals collected from physical sensors (temperature, pressure, speed, fuel flow, etc.). Simulation results show that the PCA-CART-MARS-based approach can forecast faults long before they occur and can predict the RUL. The proposed hybrid model presents as its main advantage the fact that it does not require information about the previous operation states of the input variables of the engine. The performance of this model was compared with those obtained by other benchmark models (multivariate linear regression and artificial neural networks) also applied in recent years for the modeling of remaining useful life. Therefore, the PCA-CART-MARS-based approach is very promising in the field of prognostics of the RUL for aircraft engines. PMID:25806876
Science Learning Outcomes in Alignment with Learning Environment Preferences
NASA Astrophysics Data System (ADS)
Chang, Chun-Yen; Hsiao, Chien-Hua; Chang, Yueh-Hsia
2011-04-01
This study investigated students' learning environment preferences and compared the relative effectiveness of instructional approaches on students' learning outcomes in achievement and attitude among 10th grade earth science classes in Taiwan. Data collection instruments include the Earth Science Classroom Learning Environment Inventory and Earth Science Learning Outcomes Inventory. The results showed that most students preferred learning in a classroom environment where student-centered and teacher-centered instructional approaches coexisted over a teacher-centered learning environment. A multivariate analysis of covariance also revealed that the STBIM students' cognitive achievement and attitude toward earth science were enhanced when the learning environment was congruent with their learning environment preference.
Kohara, Norihito; Kaneko, Masayuki; Narukawa, Mamoru
2018-01-01
The concept of the risk-based approach has been introduced as an effort to secure the quality of clinical trials. In the risk-based approach, identification and evaluation of risk in advance are considered important. For recently completed clinical trials, we investigated the relationship between study characteristics and protocol deviations leading to the exclusion of subjects from Per Protocol Set (PPS) efficacy analysis. New drugs approved in Japan in the fiscal year 2014-2015 were targeted in the research. The reasons for excluding subjects from the PPS efficacy analysis were described in 102 trials out of 492 in the summary of new drug application documents, which was publicly disclosed after the drug's regulatory approval. The author extracted these reasons along with the numbers of the cases and the study characteristics of each clinical trial. Then, the direct comparison, univariate regression analysis, and multivariate regression analysis was carried out based on the exclusion rate. The study characteristics for which exclusion of subjects from the PPS efficacy analysis were frequently observed was multiregional clinical trials in study region; inhalant and external use in administration route; Anti-infective for systemic use; Respiratory system, Dermatologicals, and Nervous system in therapeutic drug under the Anatomical Therapeutic Chemical Classification. In the multivariate regression analysis, the clinical trial variables of inhalant, Respiratory system, or Dermatologicals were selected as study characteristics leading to a higher exclusion rate. The characteristics of the clinical trial that is likely to cause protocol deviations that will affect efficacy analysis were suggested. These studies should be considered for specific attention and priority observation in the trial protocol or its monitoring plan and execution, such as a clear description of inclusion/exclusion criteria in the protocol, development of training materials to site staff, and/or trial subjects as specific risk-alleviating measures.
LSST Astroinformatics And Astrostatistics: Data-oriented Astronomical Research
NASA Astrophysics Data System (ADS)
Borne, Kirk D.; Stassun, K.; Brunner, R. J.; Djorgovski, S. G.; Graham, M.; Hakkila, J.; Mahabal, A.; Paegert, M.; Pesenson, M.; Ptak, A.; Scargle, J.; Informatics, LSST; Statistics Team
2011-01-01
The LSST Informatics and Statistics Science Collaboration (ISSC) focuses on research and scientific discovery challenges posed by the very large and complex data collection that LSST will generate. Application areas include astroinformatics, machine learning, data mining, astrostatistics, visualization, scientific data semantics, time series analysis, and advanced signal processing. Research problems to be addressed with these methodologies include transient event characterization and classification, rare class discovery, correlation mining, outlier/anomaly/surprise detection, improved estimators (e.g., for photometric redshift or early onset supernova classification), exploration of highly dimensional (multivariate) data catalogs, and more. We present sample science results from these data-oriented approaches to large-data astronomical research. We present results from LSST ISSC team members, including the EB (Eclipsing Binary) Factory, the environmental variations in the fundamental plane of elliptical galaxies, and outlier detection in multivariate catalogs.
The role of middle-class status in payday loan borrowing: a multivariate approach.
Lim, Younghee; Bickham, Trey; Broussard, Julia; Dinecola, Cassie M; Gregory, Alethia; Weber, Brittany E
2014-10-01
Payday loans refer to small-dollar, high-interest, short-term loans usually extended to lower-income consumers. Despite much research to the contrary, the payday loan industry asserts that it primarily serves middle-class Americans. This article discusses the authors' investigation of the industry's claim, by analyzing data from a U.S. bankruptcy court serving a Southern district. Results of the multivariate binary logistic regression analysis showed that, controlling for various sociodemographic and economic variables, two middle-class indicators--home-ownership and annual income at or greater than the median income--are associated with a decreased likelihood of using payday loans. The article concludes with a discussion of the implications of the results for social work practice and advocacy in regard to financial capability, particularly asset development, income maintenance, and payday loan regulation.
A conditional Granger causality model approach for group analysis in functional MRI
Zhou, Zhenyu; Wang, Xunheng; Klahr, Nelson J.; Liu, Wei; Arias, Diana; Liu, Hongzhi; von Deneen, Karen M.; Wen, Ying; Lu, Zuhong; Xu, Dongrong; Liu, Yijun
2011-01-01
Granger causality model (GCM) derived from multivariate vector autoregressive models of data has been employed for identifying effective connectivity in the human brain with functional MR imaging (fMRI) and to reveal complex temporal and spatial dynamics underlying a variety of cognitive processes. In the most recent fMRI effective connectivity measures, pairwise GCM has commonly been applied based on single voxel values or average values from special brain areas at the group level. Although a few novel conditional GCM methods have been proposed to quantify the connections between brain areas, our study is the first to propose a viable standardized approach for group analysis of an fMRI data with GCM. To compare the effectiveness of our approach with traditional pairwise GCM models, we applied a well-established conditional GCM to pre-selected time series of brain regions resulting from general linear model (GLM) and group spatial kernel independent component analysis (ICA) of an fMRI dataset in the temporal domain. Datasets consisting of one task-related and one resting-state fMRI were used to investigate connections among brain areas with the conditional GCM method. With the GLM detected brain activation regions in the emotion related cortex during the block design paradigm, the conditional GCM method was proposed to study the causality of the habituation between the left amygdala and pregenual cingulate cortex during emotion processing. For the resting-state dataset, it is possible to calculate not only the effective connectivity between networks but also the heterogeneity within a single network. Our results have further shown a particular interacting pattern of default mode network (DMN) that can be characterized as both afferent and efferent influences on the medial prefrontal cortex (mPFC) and posterior cingulate cortex (PCC). These results suggest that the conditional GCM approach based on a linear multivariate vector autoregressive (MVAR) model can achieve greater accuracy in detecting network connectivity than the widely used pairwise GCM, and this group analysis methodology can be quite useful to extend the information obtainable in fMRI. PMID:21232892
Cell nuclei and cytoplasm joint segmentation using the sliding band filter.
Quelhas, Pedro; Marcuzzo, Monica; Mendonça, Ana Maria; Campilho, Aurélio
2010-08-01
Microscopy cell image analysis is a fundamental tool for biological research. In particular, multivariate fluorescence microscopy is used to observe different aspects of cells in cultures. It is still common practice to perform analysis tasks by visual inspection of individual cells which is time consuming, exhausting and prone to induce subjective bias. This makes automatic cell image analysis essential for large scale, objective studies of cell cultures. Traditionally the task of automatic cell analysis is approached through the use of image segmentation methods for extraction of cells' locations and shapes. Image segmentation, although fundamental, is neither an easy task in computer vision nor is it robust to image quality changes. This makes image segmentation for cell detection semi-automated requiring frequent tuning of parameters. We introduce a new approach for cell detection and shape estimation in multivariate images based on the sliding band filter (SBF). This filter's design makes it adequate to detect overall convex shapes and as such it performs well for cell detection. Furthermore, the parameters involved are intuitive as they are directly related to the expected cell size. Using the SBF filter we detect cells' nucleus and cytoplasm location and shapes. Based on the assumption that each cell has the same approximate shape center in both nuclei and cytoplasm fluorescence channels, we guide cytoplasm shape estimation by the nuclear detections improving performance and reducing errors. Then we validate cell detection by gathering evidence from nuclei and cytoplasm channels. Additionally, we include overlap correction and shape regularization steps which further improve the estimated cell shapes. The approach is evaluated using two datasets with different types of data: a 20 images benchmark set of simulated cell culture images, containing 1000 simulated cells; a 16 images Drosophila melanogaster Kc167 dataset containing 1255 cells, stained for DNA and actin. Both image datasets present a difficult problem due to the high variability of cell shapes and frequent cluster overlap between cells. On the Drosophila dataset our approach achieved a precision/recall of 95%/69% and 82%/90% for nuclei and cytoplasm detection respectively and an overall accuracy of 76%.
A time domain frequency-selective multivariate Granger causality approach.
Leistritz, Lutz; Witte, Herbert
2016-08-01
The investigation of effective connectivity is one of the major topics in computational neuroscience to understand the interaction between spatially distributed neuronal units of the brain. Thus, a wide variety of methods has been developed during the last decades to investigate functional and effective connectivity in multivariate systems. Their spectrum ranges from model-based to model-free approaches with a clear separation into time and frequency range methods. We present in this simulation study a novel time domain approach based on Granger's principle of predictability, which allows frequency-selective considerations of directed interactions. It is based on a comparison of prediction errors of multivariate autoregressive models fitted to systematically modified time series. These modifications are based on signal decompositions, which enable a targeted cancellation of specific signal components with specific spectral properties. Depending on the embedded signal decomposition method, a frequency-selective or data-driven signal-adaptive Granger Causality Index may be derived.
Shen, Yanna; Cooper, Gregory F
2012-09-01
This paper investigates Bayesian modeling of known and unknown causes of events in the context of disease-outbreak detection. We introduce a multivariate Bayesian approach that models multiple evidential features of every person in the population. This approach models and detects (1) known diseases (e.g., influenza and anthrax) by using informative prior probabilities and (2) unknown diseases (e.g., a new, highly contagious respiratory virus that has never been seen before) by using relatively non-informative prior probabilities. We report the results of simulation experiments which support that this modeling method can improve the detection of new disease outbreaks in a population. A contribution of this paper is that it introduces a multivariate Bayesian approach for jointly modeling both known and unknown causes of events. Such modeling has general applicability in domains where the space of known causes is incomplete. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Chemometrics-assisted chromatographic fingerprinting: An illicit methamphetamine case study.
Shekari, Nafiseh; Vosough, Maryam; Tabar Heidar, Kourosh
2017-03-01
The volatile chemical constituents in complex mixtures can be analyzed using gas chromatography with mass spectrometry. This analysis allows the tentative identification of diverse impurities of an illicit methamphetamine sample. The acquired two-dimensional data of liquid-liquid extraction was resolved by multivariate curve resolution alternating curve resolution to elucidate the embedded peaks effectively. This is the first report on the application of a curve resolution approach for chromatogram fingerprinting to identify particularly the embedded impurities of a drug of abuse. Indeed, the strong and broad peak of methamphetamine makes identifying the underlying peaks problematic and even impossible. Mathematical separation instead of conventional chromatographic approaches was performed in a way that trace components embedded in methamphetamine peak were successfully resolved. Comprehensive analysis of the chromatogram, using multivariate curve resolution, resulted in elution profiles and mass spectra for each pure compound. Impurities such as benzaldehyde, benzyl alcohol, benzene, propenyl methyl ketone, benzyl methyl ketone, amphetamine, N-benzyl-2-methylaziridine, phenethylamine, N,N,α-trimethylamine, phenethylamine, N,α,α-trimethylmethamphetamine, N-acetylmethamphetamine, N-formylmethamphetamine, and other chemicals were identified. A route-specific impurity, N-benzyl-2-methylaziridine, indicating a synthesis route based on ephedrine/pseudoephedrine was identified. Moreover, this is the first report on the detection of impurities such as phenethylamine, N,α,α-trimethylamine (a structurally related impurity), and clonitazene (as an adulterant) in an illicit methamphetamine sample. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
3D tooth microwear texture analysis in fishes as a test of dietary hypotheses of durophagy
NASA Astrophysics Data System (ADS)
Purnell, Mark A.; Darras, Laurent P. G.
2016-03-01
An understanding of how extinct animals functioned underpins our understanding of past evolutionary events, including adaptive radiations, and the role of functional innovation and adaptation as drivers of both micro- and macroevolution. Yet analysis of function in extinct animals is fraught with difficulty. Hypotheses that interpret molariform teeth in fishes as evidence of durophagous (shell-crushing) diets provide a good example of the particular problems inherent in the methods of functional morphology. This is because the assumed close coupling of form and function upon which the approach is based is weakened by, among other things, behavioural flexibility and the absence of a clear one to one relationship between structures and functions. Here we show that ISO 25178-2 standard parameters for surface texture, derived from analysis of worn surfaces of molariform teeth of fishes, vary significantly between species that differ in the amount of hard-shelled prey they consume. Two populations of the Sheepshead Seabream (Archosargus probatocephalus) were studied. This fish is not a dietary specialist, and one of the populations is known to consume more vegetation and less hard-shelled prey than the other; this is reflected in significant differences in their microwear textures. The Archosargus populations differ significantly in their microwear from the specialist shell-crusher Anarhichas lupus (the Atlantic Wolffish). Multivariate analysis of these three groups of fishes lends further support to the relationship between diet and tooth microwear, and provides robust validation of the approach. Application of the multivariate models derived from microwear texture in Archosargus and Anarhichas to a third fish species—the cichlid Astatoreochromis alluaudi—successfully separates wild caught fish that ate hard-shelled prey from lab-raised fish that did not. This cross-taxon validation demonstrates that quantitative analysis of tooth microwear texture can differentiate between fishes with different diets even when they range widely in size, habitat, and in the structure of their trophic apparatus. The approach thus has great potential as an additional tool for dietary analysis in extant fishes, and for testing dietary hypotheses in ancient and extinct species.
Chen, Gang; Adleman, Nancy E; Saad, Ziad S; Leibenluft, Ellen; Cox, Robert W
2014-10-01
All neuroimaging packages can handle group analysis with t-tests or general linear modeling (GLM). However, they are quite hamstrung when there are multiple within-subject factors or when quantitative covariates are involved in the presence of a within-subject factor. In addition, sphericity is typically assumed for the variance-covariance structure when there are more than two levels in a within-subject factor. To overcome such limitations in the traditional AN(C)OVA and GLM, we adopt a multivariate modeling (MVM) approach to analyzing neuroimaging data at the group level with the following advantages: a) there is no limit on the number of factors as long as sample sizes are deemed appropriate; b) quantitative covariates can be analyzed together with within-subject factors; c) when a within-subject factor is involved, three testing methodologies are provided: traditional univariate testing (UVT) with sphericity assumption (UVT-UC) and with correction when the assumption is violated (UVT-SC), and within-subject multivariate testing (MVT-WS); d) to correct for sphericity violation at the voxel level, we propose a hybrid testing (HT) approach that achieves equal or higher power via combining traditional sphericity correction methods (Greenhouse-Geisser and Huynh-Feldt) with MVT-WS. To validate the MVM methodology, we performed simulations to assess the controllability for false positives and power achievement. A real FMRI dataset was analyzed to demonstrate the capability of the MVM approach. The methodology has been implemented into an open source program 3dMVM in AFNI, and all the statistical tests can be performed through symbolic coding with variable names instead of the tedious process of dummy coding. Our data indicates that the severity of sphericity violation varies substantially across brain regions. The differences among various modeling methodologies were addressed through direct comparisons between the MVM approach and some of the GLM implementations in the field, and the following two issues were raised: a) the improper formulation of test statistics in some univariate GLM implementations when a within-subject factor is involved in a data structure with two or more factors, and b) the unjustified presumption of uniform sphericity violation and the practice of estimating the variance-covariance structure through pooling across brain regions. Published by Elsevier Inc.
Ng, Andrea K.; Dabaja, Bouthaina S.; Milgrom, Sarah A.; Gunther, Jillian R.; Fuller, C. David; Smith, Grace L.; Abou Yehia, Zeinab; Qiao, Wei; Wogan, Christine F.; Akhtari, Mani; Mawlawi, Osama; Medeiros, L. Jeffrey; Chuang, Hubert H.; Martin-Doyle, William; Armand, Philippe; LaCasce, Ann S.; Oki, Yasuhiro; Fanale, Michelle; Westin, Jason; Neelapu, Sattva; Nastoupil, Loretta
2018-01-01
Dose-adjusted rituximab plus etoposide, prednisone, vincristine, cyclophosphamide, and doxorubicin (DA-R-EPOCH) has produced good outcomes in primary mediastinal B-cell lymphoma (PMBCL), but predictors of resistance to this treatment are unclear. We investigated whether [18F]fluorodeoxyglucose positron emission tomography–computed tomography (PET-CT) findings could identify patients with PMBCL who would not respond completely to DA-R-EPOCH. We performed a retrospective analysis of 65 patients with newly diagnosed stage I to IV PMBCL treated at 2 tertiary cancer centers who had PET-CT scans available before and after frontline therapy with DA-R-EPOCH. Pretreatment variables assessed included metabolic tumor volume (MTV) and total lesion glycolysis (TLG). Optimal cutoff points for progression-free survival (PFS) were determined by a machine learning approach. Univariate and multivariable models were constructed to assess associations between radiographic variables and PFS. At a median follow-up of 36.6 months (95% confidence interval, 28.1-45.1), 2-year PFS and overall survival rates for the 65 patients were 81.4% and 98.4%, respectively. Machine learning–derived thresholds for baseline MTV and TLG were associated with inferior PFS (elevated MTV: hazard ratio [HR], 11.5; P = .019; elevated TLG: HR, 8.99; P = .005); other pretreatment clinical factors, including International Prognostic Index and bulky (>10 cm) disease, were not. On multivariable analysis, only TLG retained statistical significance (P = .049). Univariate analysis of posttreatment variables revealed that residual CT tumor volume, maximum standardized uptake value, and Deauville score were associated with PFS; a Deauville score of 5 remained significant on multivariable analysis (P = .006). A model combining baseline TLG and end-of-therapy Deauville score identified patients at increased risk of progression. PMID:29895624
Sato, Masashi; Yamashita, Okito; Sato, Masa-Aki; Miyawaki, Yoichi
2018-01-01
To understand information representation in human brain activity, it is important to investigate its fine spatial patterns at high temporal resolution. One possible approach is to use source estimation of magnetoencephalography (MEG) signals. Previous studies have mainly quantified accuracy of this technique according to positional deviations and dispersion of estimated sources, but it remains unclear how accurately MEG source estimation restores information content represented by spatial patterns of brain activity. In this study, using simulated MEG signals representing artificial experimental conditions, we performed MEG source estimation and multivariate pattern analysis to examine whether MEG source estimation can restore information content represented by patterns of cortical current in source brain areas. Classification analysis revealed that the corresponding artificial experimental conditions were predicted accurately from patterns of cortical current estimated in the source brain areas. However, accurate predictions were also possible from brain areas whose original sources were not defined. Searchlight decoding further revealed that this unexpected prediction was possible across wide brain areas beyond the original source locations, indicating that information contained in the original sources can spread through MEG source estimation. This phenomenon of "information spreading" may easily lead to false-positive interpretations when MEG source estimation and classification analysis are combined to identify brain areas that represent target information. Real MEG data analyses also showed that presented stimuli were able to be predicted in the higher visual cortex at the same latency as in the primary visual cortex, also suggesting that information spreading took place. These results indicate that careful inspection is necessary to avoid false-positive interpretations when MEG source estimation and multivariate pattern analysis are combined.
Soliman, Essam S; Moawed, Sherif A; Hassan, Rania A
2017-08-01
Birds litter contains unutilized nitrogen in the form of uric acid that is converted into ammonia; a fact that does not only affect poultry performance but also has a negative effect on people's health around the farm and contributes in the environmental degradation. The influence of microclimatic ammonia emissions on Ross and Hubbard broilers reared in different housing systems at two consecutive seasons (fall and winter) was evaluated using a discriminant function analysis to differentiate between Ross and Hubbard breeds. A total number of 400 air samples were collected and analyzed for ammonia levels during the experimental period. Data were analyzed using univariate and multivariate statistical methods. Ammonia levels were significantly higher (p< 0.01) in the Ross compared to the Hubbard breed farm, although no significant differences (p>0.05) were found between the two farms in body weight, body weight gain, feed intake, feed conversion ratio, and performance index (PI) of broilers. Body weight; weight gain and PI had increased values (p< 0.01) during fall compared to winter irrespective of broiler breed. Ammonia emissions were positively (although weekly) correlated with the ambient relative humidity (r=0.383; p< 0.01), but not with the ambient temperature (r=-0.045; p>0.05). Test of significance of discriminant function analysis did not show a classification based on the studied traits suggesting that they cannot been used as predictor variables. The percentage of correct classification was 52% and it was improved after deletion of highly correlated traits to 57%. The study revealed that broiler's growth was negatively affected by increased microclimatic ammonia concentrations and recommended the analysis of broilers' growth performance parameters data using multivariate discriminant function analysis.
Soliman, Essam S.; Moawed, Sherif A.; Hassan, Rania A.
2017-01-01
Background and Aim: Birds litter contains unutilized nitrogen in the form of uric acid that is converted into ammonia; a fact that does not only affect poultry performance but also has a negative effect on people’s health around the farm and contributes in the environmental degradation. The influence of microclimatic ammonia emissions on Ross and Hubbard broilers reared in different housing systems at two consecutive seasons (fall and winter) was evaluated using a discriminant function analysis to differentiate between Ross and Hubbard breeds. Materials and Methods: A total number of 400 air samples were collected and analyzed for ammonia levels during the experimental period. Data were analyzed using univariate and multivariate statistical methods. Results: Ammonia levels were significantly higher (p< 0.01) in the Ross compared to the Hubbard breed farm, although no significant differences (p>0.05) were found between the two farms in body weight, body weight gain, feed intake, feed conversion ratio, and performance index (PI) of broilers. Body weight; weight gain and PI had increased values (p< 0.01) during fall compared to winter irrespective of broiler breed. Ammonia emissions were positively (although weekly) correlated with the ambient relative humidity (r=0.383; p< 0.01), but not with the ambient temperature (r=−0.045; p>0.05). Test of significance of discriminant function analysis did not show a classification based on the studied traits suggesting that they cannot been used as predictor variables. The percentage of correct classification was 52% and it was improved after deletion of highly correlated traits to 57%. Conclusion: The study revealed that broiler’s growth was negatively affected by increased microclimatic ammonia concentrations and recommended the analysis of broilers’ growth performance parameters data using multivariate discriminant function analysis. PMID:28919677
Pinnix, Chelsea C; Ng, Andrea K; Dabaja, Bouthaina S; Milgrom, Sarah A; Gunther, Jillian R; Fuller, C David; Smith, Grace L; Abou Yehia, Zeinab; Qiao, Wei; Wogan, Christine F; Akhtari, Mani; Mawlawi, Osama; Medeiros, L Jeffrey; Chuang, Hubert H; Martin-Doyle, William; Armand, Philippe; LaCasce, Ann S; Oki, Yasuhiro; Fanale, Michelle; Westin, Jason; Neelapu, Sattva; Nastoupil, Loretta
2018-06-12
Dose-adjusted rituximab plus etoposide, prednisone, vincristine, cyclophosphamide, and doxorubicin (DA-R-EPOCH) has produced good outcomes in primary mediastinal B-cell lymphoma (PMBCL), but predictors of resistance to this treatment are unclear. We investigated whether [ 18 F]fluorodeoxyglucose positron emission tomography-computed tomography (PET-CT) findings could identify patients with PMBCL who would not respond completely to DA-R-EPOCH. We performed a retrospective analysis of 65 patients with newly diagnosed stage I to IV PMBCL treated at 2 tertiary cancer centers who had PET-CT scans available before and after frontline therapy with DA-R-EPOCH. Pretreatment variables assessed included metabolic tumor volume (MTV) and total lesion glycolysis (TLG). Optimal cutoff points for progression-free survival (PFS) were determined by a machine learning approach. Univariate and multivariable models were constructed to assess associations between radiographic variables and PFS. At a median follow-up of 36.6 months (95% confidence interval, 28.1-45.1), 2-year PFS and overall survival rates for the 65 patients were 81.4% and 98.4%, respectively. Machine learning-derived thresholds for baseline MTV and TLG were associated with inferior PFS (elevated MTV: hazard ratio [HR], 11.5; P = .019; elevated TLG: HR, 8.99; P = .005); other pretreatment clinical factors, including International Prognostic Index and bulky (>10 cm) disease, were not. On multivariable analysis, only TLG retained statistical significance ( P = .049). Univariate analysis of posttreatment variables revealed that residual CT tumor volume, maximum standardized uptake value, and Deauville score were associated with PFS; a Deauville score of 5 remained significant on multivariable analysis ( P = .006). A model combining baseline TLG and end-of-therapy Deauville score identified patients at increased risk of progression. © 2018 by The American Society of Hematology.
Sato, Masashi; Yamashita, Okito; Sato, Masa-aki
2018-01-01
To understand information representation in human brain activity, it is important to investigate its fine spatial patterns at high temporal resolution. One possible approach is to use source estimation of magnetoencephalography (MEG) signals. Previous studies have mainly quantified accuracy of this technique according to positional deviations and dispersion of estimated sources, but it remains unclear how accurately MEG source estimation restores information content represented by spatial patterns of brain activity. In this study, using simulated MEG signals representing artificial experimental conditions, we performed MEG source estimation and multivariate pattern analysis to examine whether MEG source estimation can restore information content represented by patterns of cortical current in source brain areas. Classification analysis revealed that the corresponding artificial experimental conditions were predicted accurately from patterns of cortical current estimated in the source brain areas. However, accurate predictions were also possible from brain areas whose original sources were not defined. Searchlight decoding further revealed that this unexpected prediction was possible across wide brain areas beyond the original source locations, indicating that information contained in the original sources can spread through MEG source estimation. This phenomenon of “information spreading” may easily lead to false-positive interpretations when MEG source estimation and classification analysis are combined to identify brain areas that represent target information. Real MEG data analyses also showed that presented stimuli were able to be predicted in the higher visual cortex at the same latency as in the primary visual cortex, also suggesting that information spreading took place. These results indicate that careful inspection is necessary to avoid false-positive interpretations when MEG source estimation and multivariate pattern analysis are combined. PMID:29912968
Feng, Xiao-Liang; He, Yun-biao; Liang, Yi-Zeng; Wang, Yu-Lin; Huang, Lan-Fang; Xie, Jian-Wei
2013-01-01
Gas chromatography-mass spectrometry and multivariate curve resolution were applied to the differential analysis of the volatile components in Agrimonia eupatoria specimens from different plant parts. After extracted with water distillation method, the volatile components in Agrimonia eupatoria from leaves and roots were detected by GC-MS. Then the qualitative and quantitative analysis of the volatile components in the main root of Agrimonia eupatoria was completed with the help of subwindow factor analysis resolving two-dimensional original data into mass spectra and chromatograms. 68 of 87 separated constituents in the total ion chromatogram of the volatile components were identified and quantified, accounting for about 87.03% of the total content. Then, the common peaks in leaf were extracted with orthogonal projection resolution method. Among the components determined, there were 52 components coexisting in the studied samples although the relative content of each component showed difference to some extent. The results showed a fair consistency in their GC-MS fingerprint. It was the first time to apply orthogonal projection method to compare different plant parts of Agrimonia eupatoria, and it reduced the burden of qualitative analysis as well as the subjectivity. The obtained results proved the combined approach powerful for the analysis of complex Agrimonia eupatoria samples. The developed method can be used to further study and quality control of Agrimonia eupatoria. PMID:24286016
Feng, Xiao-Liang; He, Yun-Biao; Liang, Yi-Zeng; Wang, Yu-Lin; Huang, Lan-Fang; Xie, Jian-Wei
2013-01-01
Gas chromatography-mass spectrometry and multivariate curve resolution were applied to the differential analysis of the volatile components in Agrimonia eupatoria specimens from different plant parts. After extracted with water distillation method, the volatile components in Agrimonia eupatoria from leaves and roots were detected by GC-MS. Then the qualitative and quantitative analysis of the volatile components in the main root of Agrimonia eupatoria was completed with the help of subwindow factor analysis resolving two-dimensional original data into mass spectra and chromatograms. 68 of 87 separated constituents in the total ion chromatogram of the volatile components were identified and quantified, accounting for about 87.03% of the total content. Then, the common peaks in leaf were extracted with orthogonal projection resolution method. Among the components determined, there were 52 components coexisting in the studied samples although the relative content of each component showed difference to some extent. The results showed a fair consistency in their GC-MS fingerprint. It was the first time to apply orthogonal projection method to compare different plant parts of Agrimonia eupatoria, and it reduced the burden of qualitative analysis as well as the subjectivity. The obtained results proved the combined approach powerful for the analysis of complex Agrimonia eupatoria samples. The developed method can be used to further study and quality control of Agrimonia eupatoria.
Marino, S R; Lin, S; Maiers, M; Haagenson, M; Spellman, S; Klein, J P; Binkowski, T A; Lee, S J; van Besien, K
2012-02-01
The identification of important amino acid substitutions associated with low survival in hematopoietic cell transplantation (HCT) is hampered by the large number of observed substitutions compared with the small number of patients available for analysis. Random forest analysis is designed to address these limitations. We studied 2107 HCT recipients with good or intermediate risk hematological malignancies to identify HLA class I amino acid substitutions associated with reduced survival at day 100 post transplant. Random forest analysis and traditional univariate and multivariate analyses were used. Random forest analysis identified amino acid substitutions in 33 positions that were associated with reduced 100 day survival, including HLA-A 9, 43, 62, 63, 76, 77, 95, 97, 114, 116, 152, 156, 166 and 167; HLA-B 97, 109, 116 and 156; and HLA-C 6, 9, 11, 14, 21, 66, 77, 80, 95, 97, 99, 116, 156, 163 and 173. In all 13 had been previously reported by other investigators using classical biostatistical approaches. Using the same data set, traditional multivariate logistic regression identified only five amino acid substitutions associated with lower day 100 survival. Random forest analysis is a novel statistical methodology for analysis of HLA mismatching and outcome studies, capable of identifying important amino acid substitutions missed by other methods.
Lê Cao, Kim-Anh; Boitard, Simon; Besse, Philippe
2011-06-22
Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits. A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework. sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.
Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo
2018-01-01
This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555
Can texture analysis of tooth microwear detect within guild niche partitioning in extinct species?
NASA Astrophysics Data System (ADS)
Purnell, Mark; Nedza, Christopher; Rychlik, Leszek
2017-04-01
Recent work shows that tooth microwear analysis can be applied further back in time and deeper into the phylogenetic history of vertebrate clades than previously thought (e.g. niche partitioning in early Jurassic insectivorous mammals; Gill et al., 2014, Nature). Furthermore, quantitative approaches to analysis based on parameterization of surface roughness are increasing the robustness and repeatability of this widely used dietary proxy. Discriminating between taxa within dietary guilds has the potential to significantly increase our ability to determine resource use and partitioning in fossil vertebrates, but how sensitive is the technique? To address this question we analysed tooth microwear texture in sympatric populations of shrew species (Neomys fodiens, Neomys anomalus, Sorex araneus, Sorex minutus) from BiaŁ owieza Forest, Poland. These populations are known to exhibit varying degrees of niche partitioning (Churchfield & Rychlik, 2006, J. Zool.) with greatest overlap between the Neomys species. Sorex araneus also exhibits some niche overlap with N. anomalus, while S. minutus is the most specialised. Multivariate analysis based only on tooth microwear textures recovers the same pattern of niche partitioning. Our results also suggest that tooth textures track seasonal differences in diet. Projecting data from fossils into the multivariate dietary space defined using microwear from extant taxa demonstrates that the technique is capable of subtle dietary discrimination in extinct insectivores.
Ristivojević, Petar; Trifković, Jelena; Vovk, Irena; Milojković-Opsenica, Dušanka
2017-01-01
Considering the introduction of phytochemical fingerprint analysis, as a method of screening the complex natural products for the presence of most bioactive compounds, use of chemometric classification methods, application of powerful scanning and image capturing and processing devices and algorithms, advancement in development of novel stationary phases as well as various separation modalities, high-performance thin-layer chromatography (HPTLC) fingerprinting is becoming attractive and fruitful field of separation science. Multivariate image analysis is crucial in the light of proper data acquisition. In a current study, different image processing procedures were studied and compared in detail on the example of HPTLC chromatograms of plant resins. In that sense, obtained variables such as gray intensities of pixels along the solvent front, peak area and mean values of peak were used as input data and compared to obtained best classification models. Important steps in image analysis, baseline removal, denoising, target peak alignment and normalization were pointed out. Numerical data set based on mean value of selected bands and intensities of pixels along the solvent front proved to be the most convenient for planar-chromatographic profiling, although required at least the basic knowledge on image processing methodology, and could be proposed for further investigation in HPLTC fingerprinting. Copyright © 2016 Elsevier B.V. All rights reserved.
Methods for spectral image analysis by exploiting spatial simplicity
Keenan, Michael R.
2010-05-25
Several full-spectrum imaging techniques have been introduced in recent years that promise to provide rapid and comprehensive chemical characterization of complex samples. One of the remaining obstacles to adopting these techniques for routine use is the difficulty of reducing the vast quantities of raw spectral data to meaningful chemical information. Multivariate factor analysis techniques, such as Principal Component Analysis and Alternating Least Squares-based Multivariate Curve Resolution, have proven effective for extracting the essential chemical information from high dimensional spectral image data sets into a limited number of components that describe the spectral characteristics and spatial distributions of the chemical species comprising the sample. There are many cases, however, in which those constraints are not effective and where alternative approaches may provide new analytical insights. For many cases of practical importance, imaged samples are "simple" in the sense that they consist of relatively discrete chemical phases. That is, at any given location, only one or a few of the chemical species comprising the entire sample have non-zero concentrations. The methods of spectral image analysis of the present invention exploit this simplicity in the spatial domain to make the resulting factor models more realistic. Therefore, more physically accurate and interpretable spectral and abundance components can be extracted from spectral images that have spatially simple structure.
Methods for spectral image analysis by exploiting spatial simplicity
Keenan, Michael R.
2010-11-23
Several full-spectrum imaging techniques have been introduced in recent years that promise to provide rapid and comprehensive chemical characterization of complex samples. One of the remaining obstacles to adopting these techniques for routine use is the difficulty of reducing the vast quantities of raw spectral data to meaningful chemical information. Multivariate factor analysis techniques, such as Principal Component Analysis and Alternating Least Squares-based Multivariate Curve Resolution, have proven effective for extracting the essential chemical information from high dimensional spectral image data sets into a limited number of components that describe the spectral characteristics and spatial distributions of the chemical species comprising the sample. There are many cases, however, in which those constraints are not effective and where alternative approaches may provide new analytical insights. For many cases of practical importance, imaged samples are "simple" in the sense that they consist of relatively discrete chemical phases. That is, at any given location, only one or a few of the chemical species comprising the entire sample have non-zero concentrations. The methods of spectral image analysis of the present invention exploit this simplicity in the spatial domain to make the resulting factor models more realistic. Therefore, more physically accurate and interpretable spectral and abundance components can be extracted from spectral images that have spatially simple structure.
Chandrasekaran, A; Ravisankar, R; Harikrishnan, N; Satapathy, K K; Prasad, M V R; Kanagasabapathy, K V
2015-02-25
Anthropogenic activities increase the accumulation of heavy metals in the soil environment. Soil pollution significantly reduces environmental quality and affects the human health. In the present study soil samples were collected at different locations of Yelagiri Hills, Tamilnadu, India for heavy metal analysis. The samples were analyzed for twelve selected heavy metals (Mg, Al, K, Ca, Ti, Fe, V, Cr, Mn, Co, Ni and Zn) using energy dispersive X-ray fluorescence (EDXRF) spectroscopy. Heavy metals concentration in soil were investigated using enrichment factor (EF), geo-accumulation index (Igeo), contamination factor (CF) and pollution load index (PLI) to determine metal accumulation, distribution and its pollution status. Heavy metal toxicity risk was assessed using soil quality guidelines (SQGs) given by target and intervention values of Dutch soil standards. The concentration of Ni, Co, Zn, Cr, Mn, Fe, Ti, K, Al, Mg were mainly controlled by natural sources. Multivariate statistical methods such as correlation matrix, principal component analysis and cluster analysis were applied for the identification of heavy metal sources (anthropogenic/natural origin). Geo-statistical methods such as kirging identified hot spots of metal contamination in road areas influenced mainly by presence of natural rocks. Copyright © 2014 Elsevier B.V. All rights reserved.
Extracting galactic structure parameters from multivariated density estimation
NASA Technical Reports Server (NTRS)
Chen, B.; Creze, M.; Robin, A.; Bienayme, O.
1992-01-01
Multivariate statistical analysis, including includes cluster analysis (unsupervised classification), discriminant analysis (supervised classification) and principle component analysis (dimensionlity reduction method), and nonparameter density estimation have been successfully used to search for meaningful associations in the 5-dimensional space of observables between observed points and the sets of simulated points generated from a synthetic approach of galaxy modelling. These methodologies can be applied as the new tools to obtain information about hidden structure otherwise unrecognizable, and place important constraints on the space distribution of various stellar populations in the Milky Way. In this paper, we concentrate on illustrating how to use nonparameter density estimation to substitute for the true densities in both of the simulating sample and real sample in the five-dimensional space. In order to fit model predicted densities to reality, we derive a set of equations which include n lines (where n is the total number of observed points) and m (where m: the numbers of predefined groups) unknown parameters. A least-square estimation will allow us to determine the density law of different groups and components in the Galaxy. The output from our software, which can be used in many research fields, will also give out the systematic error between the model and the observation by a Bayes rule.
Interactive and coordinated visualization approaches for biological data analysis.
Cruz, António; Arrais, Joel P; Machado, Penousal
2018-03-26
The field of computational biology has become largely dependent on data visualization tools to analyze the increasing quantities of data gathered through the use of new and growing technologies. Aside from the volume, which often results in large amounts of noise and complex relationships with no clear structure, the visualization of biological data sets is hindered by their heterogeneity, as data are obtained from different sources and contain a wide variety of attributes, including spatial and temporal information. This requires visualization approaches that are able to not only represent various data structures simultaneously but also provide exploratory methods that allow the identification of meaningful relationships that would not be perceptible through data analysis algorithms alone. In this article, we present a survey of visualization approaches applied to the analysis of biological data. We focus on graph-based visualizations and tools that use coordinated multiple views to represent high-dimensional multivariate data, in particular time series gene expression, protein-protein interaction networks and biological pathways. We then discuss how these methods can be used to help solve the current challenges surrounding the visualization of complex biological data sets.
Using Interactive Graphics to Teach Multivariate Data Analysis to Psychology Students
ERIC Educational Resources Information Center
Valero-Mora, Pedro M.; Ledesma, Ruben D.
2011-01-01
This paper discusses the use of interactive graphics to teach multivariate data analysis to Psychology students. Three techniques are explored through separate activities: parallel coordinates/boxplots; principal components/exploratory factor analysis; and cluster analysis. With interactive graphics, students may perform important parts of the…
Liebenberg, Leandi; L'Abbé, Ericka N; Stull, Kyra E
2015-12-01
The cranium is widely recognized as the most important skeletal element to use when evaluating population differences and estimating ancestry. However, the cranium is not always intact or available for analysis, which emphasizes the need for postcranial alternatives. The purpose of this study was to quantify postcraniometric differences among South Africans that can be used to estimate ancestry. Thirty-nine standard measurements from 11 postcranial bones were collected from 360 modern black, white and coloured South Africans; the sex and ancestry distribution were equal. Group differences were explored with analysis of variance (ANOVA) and Tukey's honestly significant difference (HSD) test. Linear and flexible discriminant analysis (LDA and FDA, respectively) were conducted with bone models as well as numerous multivariate subsets to identify the model and method that yielded the highest correct classifications. Leave-one-out (LDA) and k-fold (k=10; FDA) cross-validation with equal priors were used for all models. ANOVA and Tukey's HSD results reveal statistically significant differences between at least two of the three groups for the majority of the variables, with varying degrees of group overlap. Bone models, which consisted of all measurements per bone, resulted in low accuracies that ranged from 46% to 63% (LDA) and 41% to 66% (FDA). In contrast, the multivariate subsets, which consisted of different variable combinations from all elements, achieved accuracies as high as 85% (LDA) and 87% (FDA). Thus, when using a multivariate approach, the postcranial skeleton can distinguish among three modern South African groups with high accuracy. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Chabukdhara, Mayuri; Gupta, Sanjay Kumar; Kotecha, Yatharth; Nema, Arvind K
2017-07-01
This study aimed to assess the quality of groundwater and potential health risk due to ingestion of heavy metals in the peri-urban and urban-industrial clusters of Ghaziabad district, Uttar Pradesh, India. Furthermore, the study aimed to evaluate heavy metals sources and their pollution level using multivariate analysis and fuzzy comprehensive assessment (FCA), respectively. Multivariate analysis using principle component analysis (PCA) showed mixed origin for Pb, Cd, Zn, Fe, and Ni, natural source for Cu and Mn and anthropogenic source for Cr. Among all the metals, Pb, Cd, Fe and Ni were above the safe limits of Bureau of Indian Standards (BIS) and World Health Organization (WHO) except Ni. Health risk in terms of hazard quotient (HQ) showed that the HQ values for children were higher than the safe level (HQ = 1) for Pb (2.4) and Cd (2.1) in pre-monsoon while in post-monsoon the value exceeded only for Pb (HQ = 1.23). The health risks of heavy metals for the adults were well within safe limits. The finding of this study indicates potential health risks to the children due to chronic exposure to contaminated groundwater in the region. Based on FCA, groundwater pollution could be categorized as quite high in the peri-urban region, and absolutely high in the urban region of Ghaziabad district. This study showed that different approaches are required for the integrated assessment of the groundwater pollution, and provides a scientific basis for the strategic future planning and comprehensive management. Copyright © 2017 Elsevier Ltd. All rights reserved.
Patterns and Sequences: Interactive Exploration of Clickstreams to Understand Common Visitor Paths.
Liu, Zhicheng; Wang, Yang; Dontcheva, Mira; Hoffman, Matthew; Walker, Seth; Wilson, Alan
2017-01-01
Modern web clickstream data consists of long, high-dimensional sequences of multivariate events, making it difficult to analyze. Following the overarching principle that the visual interface should provide information about the dataset at multiple levels of granularity and allow users to easily navigate across these levels, we identify four levels of granularity in clickstream analysis: patterns, segments, sequences and events. We present an analytic pipeline consisting of three stages: pattern mining, pattern pruning and coordinated exploration between patterns and sequences. Based on this approach, we discuss properties of maximal sequential patterns, propose methods to reduce the number of patterns and describe design considerations for visualizing the extracted sequential patterns and the corresponding raw sequences. We demonstrate the viability of our approach through an analysis scenario and discuss the strengths and limitations of the methods based on user feedback.
A power analysis for multivariate tests of temporal trend in species composition.
Irvine, Kathryn M; Dinger, Eric C; Sarr, Daniel
2011-10-01
Long-term monitoring programs emphasize power analysis as a tool to determine the sampling effort necessary to effectively document ecologically significant changes in ecosystems. Programs that monitor entire multispecies assemblages require a method for determining the power of multivariate statistical models to detect trend. We provide a method to simulate presence-absence species assemblage data that are consistent with increasing or decreasing directional change in species composition within multiple sites. This step is the foundation for using Monte Carlo methods to approximate the power of any multivariate method for detecting temporal trends. We focus on comparing the power of the Mantel test, permutational multivariate analysis of variance, and constrained analysis of principal coordinates. We find that the power of the various methods we investigate is sensitive to the number of species in the community, univariate species patterns, and the number of sites sampled over time. For increasing directional change scenarios, constrained analysis of principal coordinates was as or more powerful than permutational multivariate analysis of variance, the Mantel test was the least powerful. However, in our investigation of decreasing directional change, the Mantel test was typically as or more powerful than the other models.
Mueller, Daniela; Ferrão, Marco Flôres; Marder, Luciano; da Costa, Adilson Ben; de Cássia de Souza Schneider, Rosana
2013-01-01
The main objective of this study was to use infrared spectroscopy to identify vegetable oils used as raw material for biodiesel production and apply multivariate analysis to the data. Six different vegetable oil sources—canola, cotton, corn, palm, sunflower and soybeans—were used to produce biodiesel batches. The spectra were acquired by Fourier transform infrared spectroscopy using a universal attenuated total reflectance sensor (FTIR-UATR). For the multivariate analysis principal component analysis (PCA), hierarchical cluster analysis (HCA), interval principal component analysis (iPCA) and soft independent modeling of class analogy (SIMCA) were used. The results indicate that is possible to develop a methodology to identify vegetable oils used as raw material in the production of biodiesel by FTIR-UATR applying multivariate analysis. It was also observed that the iPCA found the best spectral range for separation of biodiesel batches using FTIR-UATR data, and with this result, the SIMCA method classified 100% of the soybean biodiesel samples. PMID:23539030
Liu, Dungang; Liu, Regina; Xie, Minge
2014-01-01
Meta-analysis has been widely used to synthesize evidence from multiple studies for common hypotheses or parameters of interest. However, it has not yet been fully developed for incorporating heterogeneous studies, which arise often in applications due to different study designs, populations or outcomes. For heterogeneous studies, the parameter of interest may not be estimable for certain studies, and in such a case, these studies are typically excluded from conventional meta-analysis. The exclusion of part of the studies can lead to a non-negligible loss of information. This paper introduces a metaanalysis for heterogeneous studies by combining the confidence density functions derived from the summary statistics of individual studies, hence referred to as the CD approach. It includes all the studies in the analysis and makes use of all information, direct as well as indirect. Under a general likelihood inference framework, this new approach is shown to have several desirable properties, including: i) it is asymptotically as efficient as the maximum likelihood approach using individual participant data (IPD) from all studies; ii) unlike the IPD analysis, it suffices to use summary statistics to carry out the CD approach. Individual-level data are not required; and iii) it is robust against misspecification of the working covariance structure of the parameter estimates. Besides its own theoretical significance, the last property also substantially broadens the applicability of the CD approach. All the properties of the CD approach are further confirmed by data simulated from a randomized clinical trials setting as well as by real data on aircraft landing performance. Overall, one obtains an unifying approach for combining summary statistics, subsuming many of the existing meta-analysis methods as special cases. PMID:26190875
Li, Haocheng; Zhang, Yukun; Carroll, Raymond J; Keadle, Sarah Kozey; Sampson, Joshua N; Matthews, Charles E
2017-11-10
A mixed effect model is proposed to jointly analyze multivariate longitudinal data with continuous, proportion, count, and binary responses. The association of the variables is modeled through the correlation of random effects. We use a quasi-likelihood type approximation for nonlinear variables and transform the proposed model into a multivariate linear mixed model framework for estimation and inference. Via an extension to the EM approach, an efficient algorithm is developed to fit the model. The method is applied to physical activity data, which uses a wearable accelerometer device to measure daily movement and energy expenditure information. Our approach is also evaluated by a simulation study. Copyright © 2017 John Wiley & Sons, Ltd.
Two-sample tests and one-way MANOVA for multivariate biomarker data with nondetects.
Thulin, M
2016-09-10
Testing whether the mean vector of a multivariate set of biomarkers differs between several populations is an increasingly common problem in medical research. Biomarker data is often left censored because some measurements fall below the laboratory's detection limit. We investigate how such censoring affects multivariate two-sample and one-way multivariate analysis of variance tests. Type I error rates, power and robustness to increasing censoring are studied, under both normality and non-normality. Parametric tests are found to perform better than non-parametric alternatives, indicating that the current recommendations for analysis of censored multivariate data may have to be revised. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Yang, Haiqing; Wu, Di; He, Yong
2007-11-01
Near-infrared spectroscopy (NIRS) with the characteristics of high speed, non-destructiveness, high precision and reliable detection data, etc. is a pollution-free, rapid, quantitative and qualitative analysis method. A new approach for variety discrimination of brown sugars using short-wave NIR spectroscopy (800-1050nm) was developed in this work. The relationship between the absorbance spectra and brown sugar varieties was established. The spectral data were compressed by the principal component analysis (PCA). The resulting features can be visualized in principal component (PC) space, which can lead to discovery of structures correlative with the different class of spectral samples. It appears to provide a reasonable variety clustering of brown sugars. The 2-D PCs plot obtained using the first two PCs can be used for the pattern recognition. Least-squares support vector machines (LS-SVM) was applied to solve the multivariate calibration problems in a relatively fast way. The work has shown that short-wave NIR spectroscopy technique is available for the brand identification of brown sugar, and LS-SVM has the better identification ability than PLS when the calibration set is small.
[Referral to internal medicine for alcoholism: influence on follow-up care].
Avila, P; Marcos, M; Avila, J J; Laso, F J
2008-11-01
The problem of high rates of patient drop-out in alcohol treatment programs is frequently reported in the literature. Our aim was to investigate if internal medicine referral could improve abstinence and retention rates in a cohort of alcoholic patients. A retrospective observational study was conducted comparing 200 alcoholic patients attending a psychiatric unit (group 1) with 100 patients attending both this unit and an internal medicine unit (group 2). We collected sociodemographic and clinical variables and analysed differences regarding abstinence and retention rates by means of univariate and multivariate analysis. At 3 and 12 months follow-up, group 2 patients had higher retention and abstinence rates than group 1 patients. Multivariate analysis including potential confounding variables showed that independent predictors of one-year retention were internal medicine referral and being married. Independent predictors of one-year abstinence were being married, age > 44 years and receipt of drug treatment. The higher retention rate found among patients referred to Internal Medicine specialists, a result that has not been previously reported to the best of our knowledge, emphasizes the importance of a multidisciplinary team approach in the treatment of alcoholism.
Rudi, Knut; Zimonja, Monika; Kvenshagen, Bente; Rugtveit, Jarle; Midtvedt, Tore; Eggesbø, Merete
2007-01-01
We present a novel approach for comparing 16S rRNA gene clone libraries that is independent of both DNA sequence alignment and definition of bacterial phylogroups. These steps are the major bottlenecks in current microbial comparative analyses. We used direct comparisons of taxon density distributions in an absolute evolutionary coordinate space. The coordinate space was generated by using alignment-independent bilinear multivariate modeling. Statistical analyses for clone library comparisons were based on multivariate analysis of variance, partial least-squares regression, and permutations. Clone libraries from both adult and infant gastrointestinal tract microbial communities were used as biological models. We reanalyzed a library consisting of 11,831 clones covering complete colons from three healthy adults in addition to a smaller 390-clone library from infant feces. We show that it is possible to extract detailed information about microbial community structures using our alignment-independent method. Our density distribution analysis is also very efficient with respect to computer operation time, meeting the future requirements of large-scale screenings to understand the diversity and dynamics of microbial communities. PMID:17337554
Genome-Wide Association Analysis of Adaptation Using Environmentally Predicted Traits
van Zanten, Martijn
2015-01-01
Current methods for studying the genetic basis of adaptation evaluate genetic associations with ecologically relevant traits or single environmental variables, under the implicit assumption that natural selection imposes correlations between phenotypes, environments and genotypes. In practice, observed trait and environmental data are manifestations of unknown selective forces and are only indirectly associated with adaptive genetic variation. In theory, improved estimation of these forces could enable more powerful detection of loci under selection. Here we present an approach in which we approximate adaptive variation by modeling phenotypes as a function of the environment and using the predicted trait in multivariate and univariate genome-wide association analysis (GWAS). Based on computer simulations and published flowering time data from the model plant Arabidopsis thaliana, we find that environmentally predicted traits lead to higher recovery of functional loci in multivariate GWAS and are more strongly correlated to allele frequencies at adaptive loci than individual environmental variables. Our results provide an example of the use of environmental data to obtain independent and meaningful information on adaptive genetic variation. PMID:26496492
Multivariate modelling of endophenotypes associated with the metabolic syndrome in Chinese twins.
Pang, Z; Zhang, D; Li, S; Duan, H; Hjelmborg, J; Kruse, T A; Kyvik, K O; Christensen, K; Tan, Q
2010-12-01
The common genetic and environmental effects on endophenotypes related to the metabolic syndrome have been investigated using bivariate and multivariate twin models. This paper extends the pairwise analysis approach by introducing independent and common pathway models to Chinese twin data. The aim was to explore the common genetic architecture in the development of these phenotypes in the Chinese population. Three multivariate models including the full saturated Cholesky decomposition model, the common factor independent pathway model and the common factor common pathway model were fitted to 695 pairs of Chinese twins representing six phenotypes including BMI, total cholesterol, total triacylglycerol, fasting glucose, HDL and LDL. Performances of the nested models were compared with that of the full Cholesky model. Cross-phenotype correlation coefficients gave clear indication of common genetic or environmental backgrounds in the phenotypes. Decomposition of phenotypic correlation by the Cholesky model revealed that the observed phenotypic correlation among lipid phenotypes had genetic and unique environmental backgrounds. Both pathway models suggest a common genetic architecture for lipid phenotypes, which is distinct from that of the non-lipid phenotypes. The declining performance with model restriction indicates biological heterogeneity in development among some of these phenotypes. Our multivariate analyses revealed common genetic and environmental backgrounds for the studied lipid phenotypes in Chinese twins. Model performance showed that physiologically distinct endophenotypes may follow different genetic regulations.
Garcia Vicente, A M; Soriano Castrejón, A; Amo-Salas, M; Lopez Fidalgo, J F; Muñoz Sanchez, M M; Alvarez Cabellos, R; Espinosa Aunion, R; Muñoz Madero, V
2016-01-01
To explore the relationship between basal (18)F-FDG uptake in breast tumors and survival in patients with breast cancer (BC) using a molecular phenotype approach. This prospective and multicentre study included 193 women diagnosed with BC. All patients underwent an (18)F-FDG PET/CT prior to treatment. Maximum standardized uptake value (SUVmax) in tumor (T), lymph nodes (N), and the N/T index was obtained in all the cases. Metabolic stage was established. As regards biological prognostic parameters, tumors were classified into molecular sub-types and risk categories. Overall survival (OS) and disease free survival (DFS) were obtained. An analysis was performed on the relationship between semi-quantitative metabolic parameters with molecular phenotypes and risk categories. The effect of molecular sub-type and risk categories in prognosis was analyzed using Kaplan-Meier and univariate and multivariate tests. Statistical differences were found in both SUVT and SUVN, according to the molecular sub-types and risk classifications, with higher semi-quantitative values in more biologically aggressive tumors. No statistical differences were observed with respect to the N/T index. Kaplan-Meier analysis revealed that risk categories were significantly related to DFS and OS. In the multivariate analysis, metabolic stage and risk phenotype showed a significant association with DFS. High-risk phenotype category showed a worst prognosis with respect to the other categories with higher SUVmax in primary tumor and lymph nodes. Copyright © 2015 Elsevier España, S.L.U. and SEMNIM. All rights reserved.
Multivariate Autoregressive Modeling and Granger Causality Analysis of Multiple Spike Trains
Krumin, Michael; Shoham, Shy
2010-01-01
Recent years have seen the emergence of microelectrode arrays and optical methods allowing simultaneous recording of spiking activity from populations of neurons in various parts of the nervous system. The analysis of multiple neural spike train data could benefit significantly from existing methods for multivariate time-series analysis which have proven to be very powerful in the modeling and analysis of continuous neural signals like EEG signals. However, those methods have not generally been well adapted to point processes. Here, we use our recent results on correlation distortions in multivariate Linear-Nonlinear-Poisson spiking neuron models to derive generalized Yule-Walker-type equations for fitting ‘‘hidden” Multivariate Autoregressive models. We use this new framework to perform Granger causality analysis in order to extract the directed information flow pattern in networks of simulated spiking neurons. We discuss the relative merits and limitations of the new method. PMID:20454705
Up-scaling of multi-variable flood loss models from objects to land use units at the meso-scale
NASA Astrophysics Data System (ADS)
Kreibich, Heidi; Schröter, Kai; Merz, Bruno
2016-05-01
Flood risk management increasingly relies on risk analyses, including loss modelling. Most of the flood loss models usually applied in standard practice have in common that complex damaging processes are described by simple approaches like stage-damage functions. Novel multi-variable models significantly improve loss estimation on the micro-scale and may also be advantageous for large-scale applications. However, more input parameters also reveal additional uncertainty, even more in upscaling procedures for meso-scale applications, where the parameters need to be estimated on a regional area-wide basis. To gain more knowledge about challenges associated with the up-scaling of multi-variable flood loss models the following approach is applied: Single- and multi-variable micro-scale flood loss models are up-scaled and applied on the meso-scale, namely on basis of ATKIS land-use units. Application and validation is undertaken in 19 municipalities, which were affected during the 2002 flood by the River Mulde in Saxony, Germany by comparison to official loss data provided by the Saxon Relief Bank (SAB).In the meso-scale case study based model validation, most multi-variable models show smaller errors than the uni-variable stage-damage functions. The results show the suitability of the up-scaling approach, and, in accordance with micro-scale validation studies, that multi-variable models are an improvement in flood loss modelling also on the meso-scale. However, uncertainties remain high, stressing the importance of uncertainty quantification. Thus, the development of probabilistic loss models, like BT-FLEMO used in this study, which inherently provide uncertainty information are the way forward.
Approximate Uncertainty Modeling in Risk Analysis with Vine Copulas
Bedford, Tim; Daneshkhah, Alireza
2015-01-01
Many applications of risk analysis require us to jointly model multiple uncertain quantities. Bayesian networks and copulas are two common approaches to modeling joint uncertainties with probability distributions. This article focuses on new methodologies for copulas by developing work of Cooke, Bedford, Kurowica, and others on vines as a way of constructing higher dimensional distributions that do not suffer from some of the restrictions of alternatives such as the multivariate Gaussian copula. The article provides a fundamental approximation result, demonstrating that we can approximate any density as closely as we like using vines. It further operationalizes this result by showing how minimum information copulas can be used to provide parametric classes of copulas that have such good levels of approximation. We extend previous approaches using vines by considering nonconstant conditional dependencies, which are particularly relevant in financial risk modeling. We discuss how such models may be quantified, in terms of expert judgment or by fitting data, and illustrate the approach by modeling two financial data sets. PMID:26332240
NASA Astrophysics Data System (ADS)
Schölzel, C.; Friederichs, P.
2008-10-01
Probability distributions of multivariate random variables are generally more complex compared to their univariate counterparts which is due to a possible nonlinear dependence between the random variables. One approach to this problem is the use of copulas, which have become popular over recent years, especially in fields like econometrics, finance, risk management, or insurance. Since this newly emerging field includes various practices, a controversial discussion, and vast field of literature, it is difficult to get an overview. The aim of this paper is therefore to provide an brief overview of copulas for application in meteorology and climate research. We examine the advantages and disadvantages compared to alternative approaches like e.g. mixture models, summarize the current problem of goodness-of-fit (GOF) tests for copulas, and discuss the connection with multivariate extremes. An application to station data shows the simplicity and the capabilities as well as the limitations of this approach. Observations of daily precipitation and temperature are fitted to a bivariate model and demonstrate, that copulas are valuable complement to the commonly used methods.
Lin, Deborah S; Greenwood, Paul F; George, Suman; Somerfield, Paul J; Tibbett, Mark
2011-08-01
Soil organic matter (SOM) is known to increase with time as landscapes recover after a major disturbance; however, little is known about the evolution of the chemistry of SOM in reconstructed ecosystems. In this study, we assessed the development of SOM chemistry in a chronosequence (space for time substitution) of restored Jarrah forest sites in Western Australia. Replicated samples were taken at the surface of the mineral soil as well as deeper in the profile at sites of 1, 3, 6, 9, 12, and 17 years of age. A molecular approach was developed to distinguish and quantify numerous individual compounds in SOM. This used accelerated solvent extraction in conjunction with gas chromatography mass spectrometry. A novel multivariate statistical approach was used to assess changes in accelerated solvent extraction (ASE)-gas chromatography-mass spectrometry (GCMS) spectra. This enabled us to track SOM developmental trajectories with restoration time. Results showed total carbon concentrations approached that of native forests soils by 17 years of restoration. Using the relate protocol in PRIMER, we demonstrated an overall linear relationship with site age at both depths, indicating that changes in SOM chemistry were occurring. The surface soils were seen to approach native molecular compositions while the deeper soil retained a more stable chemical signature, suggesting litter from the developing diverse plant community has altered SOM near the surface. Our new approach for assessing SOM development, combining ASE-GCMS with illuminating multivariate statistical analysis, holds great promise to more fully develop ASE for the characterisation of SOM.
The Multidisciplinary Swallowing Team Approach Decreases Pneumonia Onset in Acute Stroke Patients.
Aoki, Shiro; Hosomi, Naohisa; Hirayama, Junko; Nakamori, Masahiro; Yoshikawa, Mineka; Nezu, Tomohisa; Kubo, Satoshi; Nagano, Yuka; Nagao, Akiko; Yamane, Naoya; Nishikawa, Yuichi; Takamoto, Megumi; Ueno, Hiroki; Ochi, Kazuhide; Maruyama, Hirofumi; Yamamoto, Hiromi; Matsumoto, Masayasu
2016-01-01
Dysphagia occurs in acute stroke patients at high rates, and many of them develop aspiration pneumonia. Team approaches with the cooperation of various professionals have the power to improve the quality of medical care, utilizing the specialized knowledge and skills of each professional. In our hospital, a multidisciplinary participatory swallowing team was organized. The aim of this study was to clarify the influence of a team approach on dysphagia by comparing the rates of pneumonia in acute stroke patients prior to and post team organization. All consecutive acute stroke patients who were admitted to our hospital between April 2009 and March 2014 were registered. We analyzed the difference in the rate of pneumonia onset between the periods before team organization (prior period) and after team organization (post period). Univariate and multivariate analyses were performed using a Cox proportional hazards model to determine the predictors of pneumonia. We recruited 132 acute stroke patients from the prior period and 173 patients from the post period. Pneumonia onset was less frequent in the post period compared with the prior period (6.9% vs. 15.9%, respectively; p = 0.01). Based on a multivariate analysis using a Cox proportional hazards model, it was determined that a swallowing team approach was related to pneumonia onset independent from the National Institutes of Health Stroke Scale score on admission (adjusted hazard ratio 0.41, 95% confidence interval 0.19-0.84, p = 0.02). The multidisciplinary participatory swallowing team effectively decreased the pneumonia onset in acute stroke patients.