multivariate normal distribution: Topics by Science.gov

Sample records for multivariate normal distribution

An Alternative Method for Computing Mean and Covariance Matrix of Some Multivariate Distributions

ERIC Educational Resources Information Center

Radhakrishnan, R.; Choudhury, Askar

2009-01-01

Computing the mean and covariance matrix of some multivariate distributions, in particular, multivariate normal distribution and Wishart distribution are considered in this article. It involves a matrix transformation of the normal random vector into a random vector whose components are independent normal random variables, and then integrating…
Comparison of Multidimensional Item Response Models: Multivariate Normal Ability Distributions versus Multivariate Polytomous Ability Distributions. Research Report. ETS RR-08-45

ERIC Educational Resources Information Center

Haberman, Shelby J.; von Davier, Matthias; Lee, Yi-Hsuan

2008-01-01

Multidimensional item response models can be based on multivariate normal ability distributions or on multivariate polytomous ability distributions. For the case of simple structure in which each item corresponds to a unique dimension of the ability vector, some applications of the two-parameter logistic model to empirical data are employed to…
Multivariate stochastic simulation with subjective multivariate normal distributions

Treesearch

P. J. Ince; J. Buongiorno

1991-01-01

In many applications of Monte Carlo simulation in forestry or forest products, it may be known that some variables are correlated. However, for simplicity, in most simulations it has been assumed that random variables are independently distributed. This report describes an alternative Monte Carlo simulation technique for subjectively assesed multivariate normal...
Multivariate Models for Normal and Binary Responses in Intervention Studies

ERIC Educational Resources Information Center

Pituch, Keenan A.; Whittaker, Tiffany A.; Chang, Wanchen

2016-01-01

Use of multivariate analysis (e.g., multivariate analysis of variance) is common when normally distributed outcomes are collected in intervention research. However, when mixed responses--a set of normal and binary outcomes--are collected, standard multivariate analyses are no longer suitable. While mixed responses are often obtained in…
Statistical analysis of multivariate atmospheric variables. [cloud cover

NASA Technical Reports Server (NTRS)

Tubbs, J. D.

1979-01-01

Topics covered include: (1) estimation in discrete multivariate distributions; (2) a procedure to predict cloud cover frequencies in the bivariate case; (3) a program to compute conditional bivariate normal parameters; (4) the transformation of nonnormal multivariate to near-normal; (5) test of fit for the extreme value distribution based upon the generalized minimum chi-square; (6) test of fit for continuous distributions based upon the generalized minimum chi-square; (7) effect of correlated observations on confidence sets based upon chi-square statistics; and (8) generation of random variates from specified distributions.
A Robust Bayesian Approach for Structural Equation Models with Missing Data

ERIC Educational Resources Information Center

Lee, Sik-Yum; Xia, Ye-Mao

2008-01-01

In this paper, normal/independent distributions, including but not limited to the multivariate t distribution, the multivariate contaminated distribution, and the multivariate slash distribution, are used to develop a robust Bayesian approach for analyzing structural equation models with complete or missing data. In the context of a nonlinear…
On measures of association among genetic variables

PubMed Central

Gianola, Daniel; Manfredi, Eduardo; Simianer, Henner

2012-01-01

Summary Systems involving many variables are important in population and quantitative genetics, for example, in multi-trait prediction of breeding values and in exploration of multi-locus associations. We studied departures of the joint distribution of sets of genetic variables from independence. New measures of association based on notions of statistical distance between distributions are presented. These are more general than correlations, which are pairwise measures, and lack a clear interpretation beyond the bivariate normal distribution. Our measures are based on logarithmic (Kullback-Leibler) and on relative ‘distances’ between distributions. Indexes of association are developed and illustrated for quantitative genetics settings in which the joint distribution of the variables is either multivariate normal or multivariate-t, and we show how the indexes can be used to study linkage disequilibrium in a two-locus system with multiple alleles and present applications to systems of correlated beta distributions. Two multivariate beta and multivariate beta-binomial processes are examined, and new distributions are introduced: the GMS-Sarmanov multivariate beta and its beta-binomial counterpart. PMID:22742500
Asymptotic Distribution of the Likelihood Ratio Test Statistic for Sphericity of Complex Multivariate Normal Distribution.

DTIC Science & Technology

1981-08-01

RATIO TEST STATISTIC FOR SPHERICITY OF COMPLEX MULTIVARIATE NORMAL DISTRIBUTION* C. Fang P. R. Krishnaiah B. N. Nagarsenker** August 1981 Technical...and their applications in time sEries, the reader is referred to Krishnaiah (1976). Motivated by the applications in the area of inference on multiple...for practical purposes. Here, we note that Krishnaiah , Lee and Chang (1976) approxi- mated the null distribution of certain power of the likeli
Vector wind and vector wind shear models 0 to 27 km altitude for Cape Kennedy, Florida, and Vandenberg AFB, California

NASA Technical Reports Server (NTRS)

Smith, O. E.

1976-01-01

The techniques are presented to derive several statistical wind models. The techniques are from the properties of the multivariate normal probability function. Assuming that the winds can be considered as bivariate normally distributed, then (1) the wind components and conditional wind components are univariate normally distributed, (2) the wind speed is Rayleigh distributed, (3) the conditional distribution of wind speed given a wind direction is Rayleigh distributed, and (4) the frequency of wind direction can be derived. All of these distributions are derived from the 5-sample parameter of wind for the bivariate normal distribution. By further assuming that the winds at two altitudes are quadravariate normally distributed, then the vector wind shear is bivariate normally distributed and the modulus of the vector wind shear is Rayleigh distributed. The conditional probability of wind component shears given a wind component is normally distributed. Examples of these and other properties of the multivariate normal probability distribution function as applied to Cape Kennedy, Florida, and Vandenberg AFB, California, wind data samples are given. A technique to develop a synthetic vector wind profile model of interest to aerospace vehicle applications is presented.
Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution

PubMed Central

Lo, Kenneth

2011-01-01

Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components. PMID:22125375
Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution.

PubMed

Lo, Kenneth; Gottardo, Raphael

2012-01-01

Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.
Exact Interval Estimation, Power Calculation, and Sample Size Determination in Normal Correlation Analysis

ERIC Educational Resources Information Center

Shieh, Gwowen

2006-01-01

This paper considers the problem of analysis of correlation coefficients from a multivariate normal population. A unified theorem is derived for the regression model with normally distributed explanatory variables and the general results are employed to provide useful expressions for the distributions of simple, multiple, and partial-multiple…
Approximating Multivariate Normal Orthant Probabilities. ONR Technical Report. [Biometric Lab Report No. 90-1.

ERIC Educational Resources Information Center

Gibbons, Robert D.; And Others

The probability integral of the multivariate normal distribution (ND) has received considerable attention since W. F. Sheppard's (1900) and K. Pearson's (1901) seminal work on the bivariate ND. This paper evaluates the formula that represents the "n x n" correlation matrix of the "chi(sub i)" and the standardized multivariate…
A new multivariate zero-adjusted Poisson model with applications to biomedicine.

PubMed

Liu, Yin; Tian, Guo-Liang; Tang, Man-Lai; Yuen, Kam Chuen

2018-05-25

Recently, although advances were made on modeling multivariate count data, existing models really has several limitations: (i) The multivariate Poisson log-normal model (Aitchison and Ho, ) cannot be used to fit multivariate count data with excess zero-vectors; (ii) The multivariate zero-inflated Poisson (ZIP) distribution (Li et al., 1999) cannot be used to model zero-truncated/deflated count data and it is difficult to apply to high-dimensional cases; (iii) The Type I multivariate zero-adjusted Poisson (ZAP) distribution (Tian et al., 2017) could only model multivariate count data with a special correlation structure for random components that are all positive or negative. In this paper, we first introduce a new multivariate ZAP distribution, based on a multivariate Poisson distribution, which allows the correlations between components with a more flexible dependency structure, that is some of the correlation coefficients could be positive while others could be negative. We then develop its important distributional properties, and provide efficient statistical inference methods for multivariate ZAP model with or without covariates. Two real data examples in biomedicine are used to illustrate the proposed methods. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Standard Error of Linear Observed-Score Equating for the NEAT Design with Nonnormally Distributed Data

ERIC Educational Resources Information Center

Zu, Jiyun; Yuan, Ke-Hai

2012-01-01

In the nonequivalent groups with anchor test (NEAT) design, the standard error of linear observed-score equating is commonly estimated by an estimator derived assuming multivariate normality. However, real data are seldom normally distributed, causing this normal estimator to be inconsistent. A general estimator, which does not rely on the…
Bayesian inference on risk differences: an application to multivariate meta-analysis of adverse events in clinical trials.

PubMed

Chen, Yong; Luo, Sheng; Chu, Haitao; Wei, Peng

2013-05-01

Multivariate meta-analysis is useful in combining evidence from independent studies which involve several comparisons among groups based on a single outcome. For binary outcomes, the commonly used statistical models for multivariate meta-analysis are multivariate generalized linear mixed effects models which assume risks, after some transformation, follow a multivariate normal distribution with possible correlations. In this article, we consider an alternative model for multivariate meta-analysis where the risks are modeled by the multivariate beta distribution proposed by Sarmanov (1966). This model have several attractive features compared to the conventional multivariate generalized linear mixed effects models, including simplicity of likelihood function, no need to specify a link function, and has a closed-form expression of distribution functions for study-specific risk differences. We investigate the finite sample performance of this model by simulation studies and illustrate its use with an application to multivariate meta-analysis of adverse events of tricyclic antidepressants treatment in clinical trials.
Empirical performance of the multivariate normal universal portfolio

NASA Astrophysics Data System (ADS)

Tan, Choon Peng; Pang, Sook Theng

2013-09-01

Universal portfolios generated by the multivariate normal distribution are studied with emphasis on the case where variables are dependent, namely, the covariance matrix is not diagonal. The moving-order multivariate normal universal portfolio requires very long implementation time and large computer memory in its implementation. With the objective of reducing memory and implementation time, the finite-order universal portfolio is introduced. Some stock-price data sets are selected from the local stock exchange and the finite-order universal portfolio is run on the data sets, for small finite order. Empirically, it is shown that the portfolio can outperform the moving-order Dirichlet universal portfolio of Cover and Ordentlich[2] for certain parameters in the selected data sets.
Multivariate meta-analysis: a robust approach based on the theory of U-statistic.

PubMed

Ma, Yan; Mazumdar, Madhu

2011-10-30

Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.
Deterministic annealing for density estimation by multivariate normal mixtures

NASA Astrophysics Data System (ADS)

Kloppenburg, Martin; Tavan, Paul

1997-03-01

An approach to maximum-likelihood density estimation by mixtures of multivariate normal distributions for large high-dimensional data sets is presented. Conventionally that problem is tackled by notoriously unstable expectation-maximization (EM) algorithms. We remove these instabilities by the introduction of soft constraints, enabling deterministic annealing. Our developments are motivated by the proof that algorithmically stable fuzzy clustering methods that are derived from statistical physics analogs are special cases of EM procedures.
Generating Multivariate Ordinal Data via Entropy Principles.

PubMed

Lee, Yen; Kaplan, David

2018-03-01

When conducting robustness research where the focus of attention is on the impact of non-normality, the marginal skewness and kurtosis are often used to set the degree of non-normality. Monte Carlo methods are commonly applied to conduct this type of research by simulating data from distributions with skewness and kurtosis constrained to pre-specified values. Although several procedures have been proposed to simulate data from distributions with these constraints, no corresponding procedures have been applied for discrete distributions. In this paper, we present two procedures based on the principles of maximum entropy and minimum cross-entropy to estimate the multivariate observed ordinal distributions with constraints on skewness and kurtosis. For these procedures, the correlation matrix of the observed variables is not specified but depends on the relationships between the latent response variables. With the estimated distributions, researchers can study robustness not only focusing on the levels of non-normality but also on the variations in the distribution shapes. A simulation study demonstrates that these procedures yield excellent agreement between specified parameters and those of estimated distributions. A robustness study concerning the effect of distribution shape in the context of confirmatory factor analysis shows that shape can affect the robust [Formula: see text] and robust fit indices, especially when the sample size is small, the data are severely non-normal, and the fitted model is complex.

Parameter estimation of multivariate multiple regression model using bayesian with non-informative Jeffreys’ prior distribution

NASA Astrophysics Data System (ADS)

Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.

2018-05-01

Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.
Bayesian inference for multivariate meta-analysis Box-Cox transformation models for individual patient data with applications to evaluation of cholesterol lowering drugs

PubMed Central

Kim, Sungduk; Chen, Ming-Hui; Ibrahim, Joseph G.; Shah, Arvind K.; Lin, Jianxin

2013-01-01

In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data (IPD) in meta-analysis. Our modeling formulation uses a multivariate normal response meta-analysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the Deviance Information Criterion (DIC) is used to select the best transformation model. Since the model is quite complex, a novel Monte Carlo Markov chain (MCMC) sampling scheme is developed to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol lowering drugs where the goal is to jointly model the three dimensional response consisting of Low Density Lipoprotein Cholesterol (LDL-C), High Density Lipoprotein Cholesterol (HDL-C), and Triglycerides (TG) (LDL-C, HDL-C, TG). Since the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately: however, a multivariate approach would be more appropriate since these variables are correlated with each other. A detailed analysis of these data is carried out using the proposed methodology. PMID:23580436
Bayesian inference for multivariate meta-analysis Box-Cox transformation models for individual patient data with applications to evaluation of cholesterol-lowering drugs.

PubMed

Kim, Sungduk; Chen, Ming-Hui; Ibrahim, Joseph G; Shah, Arvind K; Lin, Jianxin

2013-10-15

In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data in meta-analysis. Our modeling formulation uses a multivariate normal response meta-analysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the deviance information criterion is used to select the best transformation model. Because the model is quite complex, we develop a novel Monte Carlo Markov chain sampling scheme to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol-lowering drugs where the goal is to jointly model the three-dimensional response consisting of low density lipoprotein cholesterol (LDL-C), high density lipoprotein cholesterol (HDL-C), and triglycerides (TG) (LDL-C, HDL-C, TG). Because the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately; however, a multivariate approach would be more appropriate because these variables are correlated with each other. We carry out a detailed analysis of these data by using the proposed methodology. Copyright © 2013 John Wiley & Sons, Ltd.
Atrial Electrogram Fractionation Distribution before and after Pulmonary Vein Isolation in Human Persistent Atrial Fibrillation-A Retrospective Multivariate Statistical Analysis.

PubMed

Almeida, Tiago P; Chu, Gavin S; Li, Xin; Dastagir, Nawshin; Tuan, Jiun H; Stafford, Peter J; Schlindwein, Fernando S; Ng, G André

2017-01-01

Purpose: Complex fractionated atrial electrograms (CFAE)-guided ablation after pulmonary vein isolation (PVI) has been used for persistent atrial fibrillation (persAF) therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model. Methods: 207 pairs of atrial electrograms (AEGs) were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA) and linear discriminant analysis (LDA) have been used to characterize the atrial regions and AEGs. Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P < 0.0001). Four types of LA regions were identified, based on the AEGs characteristics: (i) fractionated before PVI that remained fractionated after PVI (31% of the collected points); (ii) fractionated that converted to normal (39%); (iii) normal prior to PVI that became fractionated (9%) and; (iv) normal that remained normal (21%). Individually, the attributes failed to distinguish these LA regions, but multivariate statistical models were effective in their discrimination ( P < 0.0001). Conclusion: Our results have unveiled that there are LA regions resistant to PVI, while others are affected by it. Although, traditional methods were unable to identify these different regions, the proposed multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information.
Shape model of the maxillary dental arch using Fourier descriptors with an application in the rehabilitation for edentulous patient.

PubMed

Rijal, Omar M; Abdullah, Norli A; Isa, Zakiah M; Noor, Norliza M; Tawfiq, Omar F

2013-01-01

The knowledge of teeth positions on the maxillary arch is useful in the rehabilitation of the edentulous patient. A combination of angular (θ), and linear (l) variables representing position of four teeth were initially proposed as the shape descriptor of the maxillary dental arch. Three categories of shape were established, each having a multivariate normal distribution. It may be argued that 4 selected teeth on the standardized digital images of the dental casts could be considered as insufficient with respect to representing shape. However, increasing the number of points would create problems with dimensions and proof of existence of the multivariate normal distribution is extremely difficult. This study investigates the ability of Fourier descriptors (FD) using all maxillary teeth to find alternative shape models. Eight FD terms were sufficient to represent 21 points on the arch. Using these 8 FD terms as an alternative shape descriptor, three categories of shape were verified, each category having the complex normal distribution.
Accumulation risk assessment for the flooding hazard

NASA Astrophysics Data System (ADS)

Roth, Giorgio; Ghizzoni, Tatiana; Rudari, Roberto

2010-05-01

One of the main consequences of the demographic and economic development and of markets and trades globalization is represented by risks cumulus. In most cases, the cumulus of risks intuitively arises from the geographic concentration of a number of vulnerable elements in a single place. For natural events, risks cumulus can be associated, in addition to intensity, also to event's extension. In this case, the magnitude can be such that large areas, that may include many regions or even large portions of different countries, are stroked by single, catastrophic, events. Among natural risks, the impact of the flooding hazard cannot be understated. To cope with, a variety of mitigation actions can be put in place: from the improvement of monitoring and alert systems to the development of hydraulic structures, throughout land use restrictions, civil protection, financial and insurance plans. All of those viable options present social and economic impacts, either positive or negative, whose proper estimate should rely on the assumption of appropriate - present and future - flood risk scenarios. It is therefore necessary to identify proper statistical methodologies, able to describe the multivariate aspects of the involved physical processes and their spatial dependence. In hydrology and meteorology, but also in finance and insurance practice, it has early been recognized that classical statistical theory distributions (e.g., the normal and gamma families) are of restricted use for modeling multivariate spatial data. Recent research efforts have been therefore directed towards developing statistical models capable of describing the forms of asymmetry manifest in data sets. This, in particular, for the quite frequent case of phenomena whose empirical outcome behaves in a non-normal fashion, but still maintains some broad similarity with the multivariate normal distribution. Fruitful approaches were recognized in the use of flexible models, which include the normal distribution as a special or limiting case (e.g., the skew-normal or skew-t distributions). The present contribution constitutes an attempt to provide a better estimation of the joint probability distribution able to describe flood events in a multi-site multi-basin fashion. This goal will be pursued through the multivariate skew-t distribution, which allows to analytically define the joint probability distribution. Performances of the skew-t distribution will be discussed with reference to the Tanaro River in Northwestern Italy. To enhance the characteristics of the correlation structure, both nested and non-nested gauging stations will be selected, with significantly different contributing areas.
Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation.

PubMed

Cain, Meghan K; Zhang, Zhiyong; Yuan, Ke-Hai

2017-10-01

Nonnormality of univariate data has been extensively examined previously (Blanca et al., Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 9(2), 78-84, 2013; Miceeri, Psychological Bulletin, 105(1), 156, 1989). However, less is known of the potential nonnormality of multivariate data although multivariate analysis is commonly used in psychological and educational research. Using univariate and multivariate skewness and kurtosis as measures of nonnormality, this study examined 1,567 univariate distriubtions and 254 multivariate distributions collected from authors of articles published in Psychological Science and the American Education Research Journal. We found that 74 % of univariate distributions and 68 % multivariate distributions deviated from normal distributions. In a simulation study using typical values of skewness and kurtosis that we collected, we found that the resulting type I error rates were 17 % in a t-test and 30 % in a factor analysis under some conditions. Hence, we argue that it is time to routinely report skewness and kurtosis along with other summary statistics such as means and variances. To facilitate future report of skewness and kurtosis, we provide a tutorial on how to compute univariate and multivariate skewness and kurtosis by SAS, SPSS, R and a newly developed Web application.
Predicting the required number of training samples. [for remotely sensed image data based on covariance matrix estimate quality criterion of normal distribution

NASA Technical Reports Server (NTRS)

Kalayeh, H. M.; Landgrebe, D. A.

1983-01-01

A criterion which measures the quality of the estimate of the covariance matrix of a multivariate normal distribution is developed. Based on this criterion, the necessary number of training samples is predicted. Experimental results which are used as a guide for determining the number of training samples are included. Previously announced in STAR as N82-28109
DENBRAN: A basic program for a significance test for multivariate normality of clusters from branching patterns in dendrograms

NASA Astrophysics Data System (ADS)

Sneath, P. H. A.

A BASIC program is presented for significance tests to determine whether a dendrogram is derived from clustering of points that belong to a single multivariate normal distribution. The significance tests are based on statistics of the Kolmogorov—Smirnov type, obtained by comparing the observed cumulative graph of branch levels with a graph for the hypothesis of multivariate normality. The program also permits testing whether the dendrogram could be from a cluster of lower dimensionality due to character correlations. The program makes provision for three similarity coefficients, (1) Euclidean distances, (2) squared Euclidean distances, and (3) Simple Matching Coefficients, and for five cluster methods (1) WPGMA, (2) UPGMA, (3) Single Linkage (or Minimum Spanning Trees), (4) Complete Linkage, and (5) Ward's Increase in Sums of Squares. The program is entitled DENBRAN.
Measuring Treasury Bond Portfolio Risk and Portfolio Optimization with a Non-Gaussian Multivariate Model

NASA Astrophysics Data System (ADS)

Dong, Yijun

The research about measuring the risk of a bond portfolio and the portfolio optimization was relatively rare previously, because the risk factors of bond portfolios are not very volatile. However, this condition has changed recently. The 2008 financial crisis brought high volatility to the risk factors and the related bond securities, even if the highly rated U.S. treasury bonds. Moreover, the risk factors of bond portfolios show properties of fat-tailness and asymmetry like risk factors of equity portfolios. Therefore, we need to use advanced techniques to measure and manage risk of bond portfolios. In our paper, we first apply autoregressive moving average generalized autoregressive conditional heteroscedasticity (ARMA-GARCH) model with multivariate normal tempered stable (MNTS) distribution innovations to predict risk factors of U.S. treasury bonds and statistically demonstrate that MNTS distribution has the ability to capture the properties of risk factors based on the goodness-of-fit tests. Then based on empirical evidence, we find that the VaR and AVaR estimated by assuming normal tempered stable distribution are more realistic and reliable than those estimated by assuming normal distribution, especially for the financial crisis period. Finally, we use the mean-risk portfolio optimization to minimize portfolios' potential risks. The empirical study indicates that the optimized bond portfolios have better risk-adjusted performances than the benchmark portfolios for some periods. Moreover, the optimized bond portfolios obtained by assuming normal tempered stable distribution have improved performances in comparison to the optimized bond portfolios obtained by assuming normal distribution.
Distribution of the Determinant of the Sample Correlation Matrix: Monte Carlo Type One Error Rates.

ERIC Educational Resources Information Center

Reddon, John R.; And Others

1985-01-01

Computer sampling from a multivariate normal spherical population was used to evaluate the type one error rates for a test of sphericity based on the distribution of the determinant of the sample correlation matrix. (Author/LMO)
Multivariate Generalizations of Student's t-Distribution. ONR Technical Report. [Biometric Lab Report No. 90-3.

ERIC Educational Resources Information Center

Gibbons, Robert D.; And Others

In the process of developing a conditionally-dependent item response theory (IRT) model, the problem arose of modeling an underlying multivariate normal (MVN) response process with general correlation among the items. Without the assumption of conditional independence, for which the underlying MVN cdf takes on comparatively simple forms and can be…
Simulating Univariate and Multivariate Burr Type IIII and Type XII Distributions through the Method of L-Moments

ERIC Educational Resources Information Center

Pant, Mohan Dev

2011-01-01

The Burr families (Type III and Type XII) of distributions are traditionally used in the context of statistical modeling and for simulating non-normal distributions with moment-based parameters (e.g., Skew and Kurtosis). In educational and psychological studies, the Burr families of distributions can be used to simulate extremely asymmetrical and…
Simulation techniques for estimating error in the classification of normal patterns

NASA Technical Reports Server (NTRS)

Whitsitt, S. J.; Landgrebe, D. A.

1974-01-01

Methods of efficiently generating and classifying samples with specified multivariate normal distributions were discussed. Conservative confidence tables for sample sizes are given for selective sampling. Simulation results are compared with classified training data. Techniques for comparing error and separability measure for two normal patterns are investigated and used to display the relationship between the error and the Chernoff bound.
Circularly-symmetric complex normal ratio distribution for scalar transmissibility functions. Part I: Fundamentals

NASA Astrophysics Data System (ADS)

Yan, Wang-Ji; Ren, Wei-Xin

2016-12-01

Recent advances in signal processing and structural dynamics have spurred the adoption of transmissibility functions in academia and industry alike. Due to the inherent randomness of measurement and variability of environmental conditions, uncertainty impacts its applications. This study is focused on statistical inference for raw scalar transmissibility functions modeled as complex ratio random variables. The goal is achieved through companion papers. This paper (Part I) is dedicated to dealing with a formal mathematical proof. New theorems on multivariate circularly-symmetric complex normal ratio distribution are proved on the basis of principle of probabilistic transformation of continuous random vectors. The closed-form distributional formulas for multivariate ratios of correlated circularly-symmetric complex normal random variables are analytically derived. Afterwards, several properties are deduced as corollaries and lemmas to the new theorems. Monte Carlo simulation (MCS) is utilized to verify the accuracy of some representative cases. This work lays the mathematical groundwork to find probabilistic models for raw scalar transmissibility functions, which are to be expounded in detail in Part II of this study.
Estimation of value at risk in currency exchange rate portfolio using asymmetric GJR-GARCH Copula

NASA Astrophysics Data System (ADS)

Nurrahmat, Mohamad Husein; Noviyanti, Lienda; Bachrudin, Achmad

2017-03-01

In this study, we discuss the problem in measuring the risk in a portfolio based on value at risk (VaR) using asymmetric GJR-GARCH Copula. The approach based on the consideration that the assumption of normality over time for the return can not be fulfilled, and there is non-linear correlation for dependent model structure among the variables that lead to the estimated VaR be inaccurate. Moreover, the leverage effect also causes the asymmetric effect of dynamic variance and shows the weakness of the GARCH models due to its symmetrical effect on conditional variance. Asymmetric GJR-GARCH models are used to filter the margins while the Copulas are used to link them together into a multivariate distribution. Then, we use copulas to construct flexible multivariate distributions with different marginal and dependence structure, which is led to portfolio joint distribution does not depend on the assumptions of normality and linear correlation. VaR obtained by the analysis with confidence level 95% is 0.005586. This VaR derived from the best Copula model, t-student Copula with marginal distribution of t distribution.
Analysis of vector wind change with respect to time for Cape Kennedy, Florida: Wind aloft profile change vs. time, phase 1

NASA Technical Reports Server (NTRS)

Adelfang, S. I.

1977-01-01

Wind vector change with respect to time at Cape Kennedy, Florida, is examined according to the theory of multivariate normality. The joint distribution of the four variables represented by the components of the wind vector at an initial time and after a specified elapsed time is hypothesized to be quadravariate normal; the fourteen statistics of this distribution, calculated from fifteen years of twice daily Rawinsonde data are presented by monthly reference periods for each month from 0 to 27 km. The hypotheses that the wind component changes with respect to time is univariate normal, the joint distribution of wind component changes is bivariate normal, and the modulus of vector wind change is Rayleigh, has been tested by comparison with observed distributions. Statistics of the conditional bivariate normal distributions of vector wind at a future time given the vector wind at an initial time are derived. Wind changes over time periods from one to five hours, calculated from Jimsphere data, are presented.
The choice of prior distribution for a covariance matrix in multivariate meta-analysis: a simulation study.

PubMed

Hurtado Rúa, Sandra M; Mazumdar, Madhu; Strawderman, Robert L

2015-12-30

Bayesian meta-analysis is an increasingly important component of clinical research, with multivariate meta-analysis a promising tool for studies with multiple endpoints. Model assumptions, including the choice of priors, are crucial aspects of multivariate Bayesian meta-analysis (MBMA) models. In a given model, two different prior distributions can lead to different inferences about a particular parameter. A simulation study was performed in which the impact of families of prior distributions for the covariance matrix of a multivariate normal random effects MBMA model was analyzed. Inferences about effect sizes were not particularly sensitive to prior choice, but the related covariance estimates were. A few families of prior distributions with small relative biases, tight mean squared errors, and close to nominal coverage for the effect size estimates were identified. Our results demonstrate the need for sensitivity analysis and suggest some guidelines for choosing prior distributions in this class of problems. The MBMA models proposed here are illustrated in a small meta-analysis example from the periodontal field and a medium meta-analysis from the study of stroke. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Multivariate normality

NASA Technical Reports Server (NTRS)

Crutcher, H. L.; Falls, L. W.

1976-01-01

Sets of experimentally determined or routinely observed data provide information about the past, present and, hopefully, future sets of similarly produced data. An infinite set of statistical models exists which may be used to describe the data sets. The normal distribution is one model. If it serves at all, it serves well. If a data set, or a transformation of the set, representative of a larger population can be described by the normal distribution, then valid statistical inferences can be drawn. There are several tests which may be applied to a data set to determine whether the univariate normal model adequately describes the set. The chi-square test based on Pearson's work in the late nineteenth and early twentieth centuries is often used. Like all tests, it has some weaknesses which are discussed in elementary texts. Extension of the chi-square test to the multivariate normal model is provided. Tables and graphs permit easier application of the test in the higher dimensions. Several examples, using recorded data, illustrate the procedures. Tests of maximum absolute differences, mean sum of squares of residuals, runs and changes of sign are included in these tests. Dimensions one through five with selected sample sizes 11 to 101 are used to illustrate the statistical tests developed.
Noncentral Chi-Square versus Normal Distributions in Describing the Likelihood Ratio Statistic: The Univariate Case and Its Multivariate Implication

ERIC Educational Resources Information Center

Yuan, Ke-Hai

2008-01-01

In the literature of mean and covariance structure analysis, noncentral chi-square distribution is commonly used to describe the behavior of the likelihood ratio (LR) statistic under alternative hypothesis. Due to the inaccessibility of the rather technical literature for the distribution of the LR statistic, it is widely believed that the…

Simultaneous calibration of ensemble river flow predictions over an entire range of lead times

NASA Astrophysics Data System (ADS)

Hemri, S.; Fundel, F.; Zappa, M.

2013-10-01

Probabilistic estimates of future water levels and river discharge are usually simulated with hydrologic models using ensemble weather forecasts as main inputs. As hydrologic models are imperfect and the meteorological ensembles tend to be biased and underdispersed, the ensemble forecasts for river runoff typically are biased and underdispersed, too. Thus, in order to achieve both reliable and sharp predictions statistical postprocessing is required. In this work Bayesian model averaging (BMA) is applied to statistically postprocess ensemble runoff raw forecasts for a catchment in Switzerland, at lead times ranging from 1 to 240 h. The raw forecasts have been obtained using deterministic and ensemble forcing meteorological models with different forecast lead time ranges. First, BMA is applied based on mixtures of univariate normal distributions, subject to the assumption of independence between distinct lead times. Then, the independence assumption is relaxed in order to estimate multivariate runoff forecasts over the entire range of lead times simultaneously, based on a BMA version that uses multivariate normal distributions. Since river runoff is a highly skewed variable, Box-Cox transformations are applied in order to achieve approximate normality. Both univariate and multivariate BMA approaches are able to generate well calibrated probabilistic forecasts that are considerably sharper than climatological forecasts. Additionally, multivariate BMA provides a promising approach for incorporating temporal dependencies into the postprocessed forecasts. Its major advantage against univariate BMA is an increase in reliability when the forecast system is changing due to model availability.
Approximations to the distribution of a test statistic in covariance structure analysis: A comprehensive study.

PubMed

Wu, Hao

2018-05-01

In structural equation modelling (SEM), a robust adjustment to the test statistic or to its reference distribution is needed when its null distribution deviates from a χ 2 distribution, which usually arises when data do not follow a multivariate normal distribution. Unfortunately, existing studies on this issue typically focus on only a few methods and neglect the majority of alternative methods in statistics. Existing simulation studies typically consider only non-normal distributions of data that either satisfy asymptotic robustness or lead to an asymptotic scaled χ 2 distribution. In this work we conduct a comprehensive study that involves both typical methods in SEM and less well-known methods from the statistics literature. We also propose the use of several novel non-normal data distributions that are qualitatively different from the non-normal distributions widely used in existing studies. We found that several under-studied methods give the best performance under specific conditions, but the Satorra-Bentler method remains the most viable method for most situations. © 2017 The British Psychological Society.
Technology-enhanced Interactive Teaching of Marginal, Joint and Conditional Probabilities: The Special Case of Bivariate Normal Distribution

PubMed Central

Dinov, Ivo D.; Kamino, Scott; Bhakhrani, Bilal; Christou, Nicolas

2014-01-01

Summary Data analysis requires subtle probability reasoning to answer questions like What is the chance of event A occurring, given that event B was observed? This generic question arises in discussions of many intriguing scientific questions such as What is the probability that an adolescent weighs between 120 and 140 pounds given that they are of average height? and What is the probability of (monetary) inflation exceeding 4% and housing price index below 110? To address such problems, learning some applied, theoretical or cross-disciplinary probability concepts is necessary. Teaching such courses can be improved by utilizing modern information technology resources. Students’ understanding of multivariate distributions, conditional probabilities, correlation and causation can be significantly strengthened by employing interactive web-based science educational resources. Independent of the type of a probability course (e.g. majors, minors or service probability course, rigorous measure-theoretic, applied or statistics course) student motivation, learning experiences and knowledge retention may be enhanced by blending modern technological tools within the classical conceptual pedagogical models. We have designed, implemented and disseminated a portable open-source web-application for teaching multivariate distributions, marginal, joint and conditional probabilities using the special case of bivariate Normal distribution. A real adolescent height and weight dataset is used to demonstrate the classroom utilization of the new web-application to address problems of parameter estimation, univariate and multivariate inference. PMID:25419016
Technology-enhanced Interactive Teaching of Marginal, Joint and Conditional Probabilities: The Special Case of Bivariate Normal Distribution.

PubMed

Dinov, Ivo D; Kamino, Scott; Bhakhrani, Bilal; Christou, Nicolas

2013-01-01

Data analysis requires subtle probability reasoning to answer questions like What is the chance of event A occurring, given that event B was observed? This generic question arises in discussions of many intriguing scientific questions such as What is the probability that an adolescent weighs between 120 and 140 pounds given that they are of average height? and What is the probability of (monetary) inflation exceeding 4% and housing price index below 110? To address such problems, learning some applied, theoretical or cross-disciplinary probability concepts is necessary. Teaching such courses can be improved by utilizing modern information technology resources. Students' understanding of multivariate distributions, conditional probabilities, correlation and causation can be significantly strengthened by employing interactive web-based science educational resources. Independent of the type of a probability course (e.g. majors, minors or service probability course, rigorous measure-theoretic, applied or statistics course) student motivation, learning experiences and knowledge retention may be enhanced by blending modern technological tools within the classical conceptual pedagogical models. We have designed, implemented and disseminated a portable open-source web-application for teaching multivariate distributions, marginal, joint and conditional probabilities using the special case of bivariate Normal distribution. A real adolescent height and weight dataset is used to demonstrate the classroom utilization of the new web-application to address problems of parameter estimation, univariate and multivariate inference.
Multivariate Bayesian analysis of Gaussian, right censored Gaussian, ordered categorical and binary traits using Gibbs sampling

PubMed Central

Korsgaard, Inge Riis; Lund, Mogens Sandø; Sorensen, Daniel; Gianola, Daniel; Madsen, Per; Jensen, Just

2003-01-01

A fully Bayesian analysis using Gibbs sampling and data augmentation in a multivariate model of Gaussian, right censored, and grouped Gaussian traits is described. The grouped Gaussian traits are either ordered categorical traits (with more than two categories) or binary traits, where the grouping is determined via thresholds on the underlying Gaussian scale, the liability scale. Allowances are made for unequal models, unknown covariance matrices and missing data. Having outlined the theory, strategies for implementation are reviewed. These include joint sampling of location parameters; efficient sampling from the fully conditional posterior distribution of augmented data, a multivariate truncated normal distribution; and sampling from the conditional inverse Wishart distribution, the fully conditional posterior distribution of the residual covariance matrix. Finally, a simulated dataset was analysed to illustrate the methodology. This paper concentrates on a model where residuals associated with liabilities of the binary traits are assumed to be independent. A Bayesian analysis using Gibbs sampling is outlined for the model where this assumption is relaxed. PMID:12633531
Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials

PubMed Central

Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo

2018-01-01

This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555
Problems with Multivariate Normality: Can the Multivariate Bootstrap Help?

ERIC Educational Resources Information Center

Thompson, Bruce

Multivariate normality is required for some statistical tests. This paper explores the implications of violating the assumption of multivariate normality and illustrates a graphical procedure for evaluating multivariate normality. The logic for using the multivariate bootstrap is presented. The multivariate bootstrap can be used when distribution…
Use of collateral information to improve LANDSAT classification accuracies

NASA Technical Reports Server (NTRS)

Strahler, A. H. (Principal Investigator)

1981-01-01

Methods to improve LANDSAT classification accuracies were investigated including: (1) the use of prior probabilities in maximum likelihood classification as a methodology to integrate discrete collateral data with continuously measured image density variables; (2) the use of the logit classifier as an alternative to multivariate normal classification that permits mixing both continuous and categorical variables in a single model and fits empirical distributions of observations more closely than the multivariate normal density function; and (3) the use of collateral data in a geographic information system as exercised to model a desired output information layer as a function of input layers of raster format collateral and image data base layers.
The use of copulas to practical estimation of multivariate stochastic differential equation mixed effects models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rupšys, P.

A system of stochastic differential equations (SDE) with mixed-effects parameters and multivariate normal copula density function were used to develop tree height model for Scots pine trees in Lithuania. A two-step maximum likelihood parameter estimation method is used and computational guidelines are given. After fitting the conditional probability density functions to outside bark diameter at breast height, and total tree height, a bivariate normal copula distribution model was constructed. Predictions from the mixed-effects parameters SDE tree height model calculated during this research were compared to the regression tree height equations. The results are implemented in the symbolic computational language MAPLE.
Concurrent generation of multivariate mixed data with variables of dissimilar types.

PubMed

Amatya, Anup; Demirtas, Hakan

2016-01-01

Data sets originating from wide range of research studies are composed of multiple variables that are correlated and of dissimilar types, primarily of count, binary/ordinal and continuous attributes. The present paper builds on the previous works on multivariate data generation and develops a framework for generating multivariate mixed data with a pre-specified correlation matrix. The generated data consist of components that are marginally count, binary, ordinal and continuous, where the count and continuous variables follow the generalized Poisson and normal distributions, respectively. The use of the generalized Poisson distribution provides a flexible mechanism which allows under- and over-dispersed count variables generally encountered in practice. A step-by-step algorithm is provided and its performance is evaluated using simulated and real-data scenarios.
Applying Multivariate Discrete Distributions to Genetically Informative Count Data.

PubMed

Kirkpatrick, Robert M; Neale, Michael C

2016-03-01

We present a novel method of conducting biometric analysis of twin data when the phenotypes are integer-valued counts, which often show an L-shaped distribution. Monte Carlo simulation is used to compare five likelihood-based approaches to modeling: our multivariate discrete method, when its distributional assumptions are correct, when they are incorrect, and three other methods in common use. With data simulated from a skewed discrete distribution, recovery of twin correlations and proportions of additive genetic and common environment variance was generally poor for the Normal, Lognormal and Ordinal models, but good for the two discrete models. Sex-separate applications to substance-use data from twins in the Minnesota Twin Family Study showed superior performance of two discrete models. The new methods are implemented using R and OpenMx and are freely available.
A Note on Asymptotic Joint Distribution of the Eigenvalues of a Noncentral Multivariate F Matrix.

DTIC Science & Technology

1984-11-01

Krishnaiah (1982). Now, let us consider the samples drawn from the k multivariate normal popuiejons. Let (Xlt....Xpt) denote the mean vector of the t...to maltivariate problems. Sankh-ya, 4, 381-39(s. (71 KRISHNAIAH , P. R. (1982). Selection of variables in discrimlnant analysis. In Handbook of...Statistics, Volume 2 (P. R. Krishnaiah , editor), 805-820. North-Holland Publishing Company. 6. Unclassifie INSTRUCTIONS REPORT DOCUMENTATION PAGE
Application of a multivariate normal distribution methodology to the dissociation of doubly ionized molecules: The DMDS (CH3 -SS-CH3 ) case.

PubMed

Varas, Lautaro R; Pontes, F C; Santos, A C F; Coutinho, L H; de Souza, G G B

2015-09-15

The ion-ion-coincidence mass spectroscopy technique brings useful information about the fragmentation dynamics of doubly and multiply charged ionic species. We advocate the use of a matrix-parameter methodology in order to represent and interpret the entire ion-ion spectra associated with the ionic dissociation of doubly charged molecules. This method makes it possible, among other things, to infer fragmentation processes and to extract information about overlapped ion-ion coincidences. This important piece of information is difficult to obtain from other previously described methodologies. A Wiley-McLaren time-of-flight mass spectrometer was used to discriminate the positively charged fragment ions resulting from the sample ionization by a pulsed 800 eV electron beam. We exemplify the application of this methodology by analyzing the fragmentation and ionic dissociation of the dimethyl disulfide (DMDS) molecule as induced by fast electrons. The doubly charged dissociation was analyzed using the Multivariate Normal Distribution. The ion-ion spectrum of the DMDS molecule was obtained at an incident electron energy of 800 eV and was matrix represented using the Multivariate Distribution theory. The proposed methodology allows us to distinguish information among [CH n SH n ] + /[CH 3 ] + (n = 1-3) fragment ions in the ion-ion coincidence spectra using ion-ion coincidence data. Using the momenta balance methodology for the inferred parameters, a secondary decay mechanism is proposed for the [CHS] + ion formation. As an additional check on the methodology, previously published data on the SiF 4 molecule was re-analyzed with the present methodology and the results were shown to be statistically equivalent. The use of a Multivariate Normal Distribution allows for the representation of the whole ion-ion mass spectrum of doubly or multiply ionized molecules as a combination of parameters and the extraction of information among overlapped data. We have successfully applied this methodology to the analysis of the fragmentation of the DMDS molecule. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Surrogacy assessment using principal stratification when surrogate and outcome measures are multivariate normal.

PubMed

Conlon, Anna S C; Taylor, Jeremy M G; Elliott, Michael R

2014-04-01

In clinical trials, a surrogate outcome variable (S) can be measured before the outcome of interest (T) and may provide early information regarding the treatment (Z) effect on T. Using the principal surrogacy framework introduced by Frangakis and Rubin (2002. Principal stratification in causal inference. Biometrics 58, 21-29), we consider an approach that has a causal interpretation and develop a Bayesian estimation strategy for surrogate validation when the joint distribution of potential surrogate and outcome measures is multivariate normal. From the joint conditional distribution of the potential outcomes of T, given the potential outcomes of S, we propose surrogacy validation measures from this model. As the model is not fully identifiable from the data, we propose some reasonable prior distributions and assumptions that can be placed on weakly identified parameters to aid in estimation. We explore the relationship between our surrogacy measures and the surrogacy measures proposed by Prentice (1989. Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in Medicine 8, 431-440). The method is applied to data from a macular degeneration study and an ovarian cancer study.
Surrogacy assessment using principal stratification when surrogate and outcome measures are multivariate normal

PubMed Central

Conlon, Anna S. C.; Taylor, Jeremy M. G.; Elliott, Michael R.

2014-01-01

In clinical trials, a surrogate outcome variable (S) can be measured before the outcome of interest (T) and may provide early information regarding the treatment (Z) effect on T. Using the principal surrogacy framework introduced by Frangakis and Rubin (2002. Principal stratification in causal inference. Biometrics 58, 21–29), we consider an approach that has a causal interpretation and develop a Bayesian estimation strategy for surrogate validation when the joint distribution of potential surrogate and outcome measures is multivariate normal. From the joint conditional distribution of the potential outcomes of T, given the potential outcomes of S, we propose surrogacy validation measures from this model. As the model is not fully identifiable from the data, we propose some reasonable prior distributions and assumptions that can be placed on weakly identified parameters to aid in estimation. We explore the relationship between our surrogacy measures and the surrogacy measures proposed by Prentice (1989. Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in Medicine 8, 431–440). The method is applied to data from a macular degeneration study and an ovarian cancer study. PMID:24285772
Drought forecasting in Luanhe River basin involving climatic indices

NASA Astrophysics Data System (ADS)

Ren, Weinan; Wang, Yixuan; Li, Jianzhu; Feng, Ping; Smith, Ronald J.

2017-11-01

Drought is regarded as one of the most severe natural disasters globally. This is especially the case in Tianjin City, Northern China, where drought can affect economic development and people's livelihoods. Drought forecasting, the basis of drought management, is an important mitigation strategy. In this paper, we evolve a probabilistic forecasting model, which forecasts transition probabilities from a current Standardized Precipitation Index (SPI) value to a future SPI class, based on conditional distribution of multivariate normal distribution to involve two large-scale climatic indices at the same time, and apply the forecasting model to 26 rain gauges in the Luanhe River basin in North China. The establishment of the model and the derivation of the SPI are based on the hypothesis of aggregated monthly precipitation that is normally distributed. Pearson correlation and Shapiro-Wilk normality tests are used to select appropriate SPI time scale and large-scale climatic indices. Findings indicated that longer-term aggregated monthly precipitation, in general, was more likely to be considered normally distributed and forecasting models should be applied to each gauge, respectively, rather than to the whole basin. Taking Liying Gauge as an example, we illustrate the impact of the SPI time scale and lead time on transition probabilities. Then, the controlled climatic indices of every gauge are selected by Pearson correlation test and the multivariate normality of SPI, corresponding climatic indices for current month and SPI 1, 2, and 3 months later are demonstrated using Shapiro-Wilk normality test. Subsequently, we illustrate the impact of large-scale oceanic-atmospheric circulation patterns on transition probabilities. Finally, we use a score method to evaluate and compare the performance of the three forecasting models and compare them with two traditional models which forecast transition probabilities from a current to a future SPI class. The results show that the three proposed models outperform the two traditional models and involving large-scale climatic indices can improve the forecasting accuracy.
Estimation and model selection of semiparametric multivariate survival functions under general censorship.

PubMed

Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang

2010-07-01

We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root- n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided.
Estimation and model selection of semiparametric multivariate survival functions under general censorship

PubMed Central

Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang

2013-01-01

We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root-n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided. PMID:24790286
Positive phase space distributions and uncertainty relations

NASA Technical Reports Server (NTRS)

Kruger, Jan

1993-01-01

In contrast to a widespread belief, Wigner's theorem allows the construction of true joint probabilities in phase space for distributions describing the object system as well as for distributions depending on the measurement apparatus. The fundamental role of Heisenberg's uncertainty relations in Schroedinger form (including correlations) is pointed out for these two possible interpretations of joint probability distributions. Hence, in order that a multivariate normal probability distribution in phase space may correspond to a Wigner distribution of a pure or a mixed state, it is necessary and sufficient that Heisenberg's uncertainty relation in Schroedinger form should be satisfied.
Photon event distribution sampling: an image formation technique for scanning microscopes that permits tracking of sub-diffraction particles with high spatial and temporal resolutions.

PubMed

Larkin, J D; Publicover, N G; Sutko, J L

2011-01-01

In photon event distribution sampling, an image formation technique for scanning microscopes, the maximum likelihood position of origin of each detected photon is acquired as a data set rather than binning photons in pixels. Subsequently, an intensity-related probability density function describing the uncertainty associated with the photon position measurement is applied to each position and individual photon intensity distributions are summed to form an image. Compared to pixel-based images, photon event distribution sampling images exhibit increased signal-to-noise and comparable spatial resolution. Photon event distribution sampling is superior to pixel-based image formation in recognizing the presence of structured (non-random) photon distributions at low photon counts and permits use of non-raster scanning patterns. A photon event distribution sampling based method for localizing single particles derived from a multi-variate normal distribution is more precise than statistical (Gaussian) fitting to pixel-based images. Using the multi-variate normal distribution method, non-raster scanning and a typical confocal microscope, localizations with 8 nm precision were achieved at 10 ms sampling rates with acquisition of ~200 photons per frame. Single nanometre precision was obtained with a greater number of photons per frame. In summary, photon event distribution sampling provides an efficient way to form images when low numbers of photons are involved and permits particle tracking with confocal point-scanning microscopes with nanometre precision deep within specimens. © 2010 The Authors Journal of Microscopy © 2010 The Royal Microscopical Society.

Multidimensional stochastic approximation using locally contractive functions

NASA Technical Reports Server (NTRS)

Lawton, W. M.

1975-01-01

A Robbins-Monro type multidimensional stochastic approximation algorithm which converges in mean square and with probability one to the fixed point of a locally contractive regression function is developed. The algorithm is applied to obtain maximum likelihood estimates of the parameters for a mixture of multivariate normal distributions.
Functional Relationships and Regression Analysis.

ERIC Educational Resources Information Center

Preece, Peter F. W.

1978-01-01

Using a degenerate multivariate normal model for the distribution of organismic variables, the form of least-squares regression analysis required to estimate a linear functional relationship between variables is derived. It is suggested that the two conventional regression lines may be considered to describe functional, not merely statistical,…
Estimating the Classification Efficiency of a Test Battery.

ERIC Educational Resources Information Center

De Corte, Wilfried

2000-01-01

Shows how a theorem proven by H. Brogden (1951, 1959) can be used to estimate the allocation average (a predictor based classification of a test battery) assuming that the predictor intercorrelations and validities are known and that the predictor variables have a joint multivariate normal distribution. (SLD)
Influence assessment in censored mixed-effects models using the multivariate Student’s-t distribution

PubMed Central

Matos, Larissa A.; Bandyopadhyay, Dipankar; Castro, Luis M.; Lachos, Victor H.

2015-01-01

In biomedical studies on HIV RNA dynamics, viral loads generate repeated measures that are often subjected to upper and lower detection limits, and hence these responses are either left- or right-censored. Linear and non-linear mixed-effects censored (LMEC/NLMEC) models are routinely used to analyse these longitudinal data, with normality assumptions for the random effects and residual errors. However, the derived inference may not be robust when these underlying normality assumptions are questionable, especially the presence of outliers and thick-tails. Motivated by this, Matos et al. (2013b) recently proposed an exact EM-type algorithm for LMEC/NLMEC models using a multivariate Student’s-t distribution, with closed-form expressions at the E-step. In this paper, we develop influence diagnostics for LMEC/NLMEC models using the multivariate Student’s-t density, based on the conditional expectation of the complete data log-likelihood. This partially eliminates the complexity associated with the approach of Cook (1977, 1986) for censored mixed-effects models. The new methodology is illustrated via an application to a longitudinal HIV dataset. In addition, a simulation study explores the accuracy of the proposed measures in detecting possible influential observations for heavy-tailed censored data under different perturbation and censoring schemes. PMID:26190871
Multivariate probability distribution for sewer system vulnerability assessment under data-limited conditions.

PubMed

Del Giudice, G; Padulano, R; Siciliano, D

2016-01-01

The lack of geometrical and hydraulic information about sewer networks often excludes the adoption of in-deep modeling tools to obtain prioritization strategies for funds management. The present paper describes a novel statistical procedure for defining the prioritization scheme for preventive maintenance strategies based on a small sample of failure data collected by the Sewer Office of the Municipality of Naples (IT). Novelty issues involve, among others, considering sewer parameters as continuous statistical variables and accounting for their interdependences. After a statistical analysis of maintenance interventions, the most important available factors affecting the process are selected and their mutual correlations identified. Then, after a Box-Cox transformation of the original variables, a methodology is provided for the evaluation of a vulnerability map of the sewer network by adopting a joint multivariate normal distribution with different parameter sets. The goodness-of-fit is eventually tested for each distribution by means of a multivariate plotting position. The developed methodology is expected to assist municipal engineers in identifying critical sewers, prioritizing sewer inspections in order to fulfill rehabilitation requirements.
Fitting and Testing Conditional Multinormal Partial Credit Models

ERIC Educational Resources Information Center

Hessen, David J.

2012-01-01

A multinormal partial credit model for factor analysis of polytomously scored items with ordered response categories is derived using an extension of the Dutch Identity (Holland in "Psychometrika" 55:5-18, 1990). In the model, latent variables are assumed to have a multivariate normal distribution conditional on unweighted sums of item…
A flexible model for multivariate interval-censored survival times with complex correlation structure.

PubMed

Falcaro, Milena; Pickles, Andrew

2007-02-10

We focus on the analysis of multivariate survival times with highly structured interdependency and subject to interval censoring. Such data are common in developmental genetics and genetic epidemiology. We propose a flexible mixed probit model that deals naturally with complex but uninformative censoring. The recorded ages of onset are treated as possibly censored ordinal outcomes with the interval censoring mechanism seen as arising from a coarsened measurement of a continuous variable observed as falling between subject-specific thresholds. This bypasses the requirement for the failure times to be observed as falling into non-overlapping intervals. The assumption of a normal age-of-onset distribution of the standard probit model is relaxed by embedding within it a multivariate Box-Cox transformation whose parameters are jointly estimated with the other parameters of the model. Complex decompositions of the underlying multivariate normal covariance matrix of the transformed ages of onset become possible. The new methodology is here applied to a multivariate study of the ages of first use of tobacco and first consumption of alcohol without parental permission in twins. The proposed model allows estimation of the genetic and environmental effects that are shared by both of these risk behaviours as well as those that are specific. 2006 John Wiley & Sons, Ltd.
Gaussian Mixture Models of Between-Source Variation for Likelihood Ratio Computation from Multivariate Data

PubMed Central

Franco-Pedroso, Javier; Ramos, Daniel; Gonzalez-Rodriguez, Joaquin

2016-01-01

In forensic science, trace evidence found at a crime scene and on suspect has to be evaluated from the measurements performed on them, usually in the form of multivariate data (for example, several chemical compound or physical characteristics). In order to assess the strength of that evidence, the likelihood ratio framework is being increasingly adopted. Several methods have been derived in order to obtain likelihood ratios directly from univariate or multivariate data by modelling both the variation appearing between observations (or features) coming from the same source (within-source variation) and that appearing between observations coming from different sources (between-source variation). In the widely used multivariate kernel likelihood-ratio, the within-source distribution is assumed to be normally distributed and constant among different sources and the between-source variation is modelled through a kernel density function (KDF). In order to better fit the observed distribution of the between-source variation, this paper presents a different approach in which a Gaussian mixture model (GMM) is used instead of a KDF. As it will be shown, this approach provides better-calibrated likelihood ratios as measured by the log-likelihood ratio cost (Cllr) in experiments performed on freely available forensic datasets involving different trace evidences: inks, glass fragments and car paints. PMID:26901680
Universal portfolios generated by weakly stationary processes

NASA Astrophysics Data System (ADS)

Tan, Choon Peng; Pang, Sook Theng

2014-12-01

Recently, a universal portfolio generated by a set of independent Brownian motions where a finite number of past stock prices are weighted by the moments of the multivariate normal distribution is introduced and studied. The multivariate normal moments as polynomials in time consequently lead to a constant rebalanced portfolio depending on the drift coefficients of the Brownian motions. For a weakly stationary process, a different type of universal portfolio is proposed where the weights on the stock prices depend only on the time differences of the stock prices. An empirical study is conducted on the returns achieved by the universal portfolios generated by the Ornstein-Uhlenbeck process on selected stock-price data sets. Promising results are demonstrated for increasing the wealth of the investor by using the weakly-stationary-process-generated universal portfolios.
Hierarchical Multinomial Processing Tree Models: A Latent-Trait Approach

ERIC Educational Resources Information Center

Klauer, Karl Christoph

2010-01-01

Multinomial processing tree models are widely used in many areas of psychology. A hierarchical extension of the model class is proposed, using a multivariate normal distribution of person-level parameters with the mean and covariance matrix to be estimated from the data. The hierarchical model allows one to take variability between persons into…
Robust LOD scores for variance component-based linkage analysis.

PubMed

Blangero, J; Williams, J T; Almasy, L

2000-01-01

The variance component method is now widely used for linkage analysis of quantitative traits. Although this approach offers many advantages, the importance of the underlying assumption of multivariate normality of the trait distribution within pedigrees has not been studied extensively. Simulation studies have shown that traits with leptokurtic distributions yield linkage test statistics that exhibit excessive Type I error when analyzed naively. We derive analytical formulae relating the deviation from the expected asymptotic distribution of the lod score to the kurtosis and total heritability of the quantitative trait. A simple correction constant yields a robust lod score for any deviation from normality and for any pedigree structure, and effectively eliminates the problem of inflated Type I error due to misspecification of the underlying probability model in variance component-based linkage analysis.
Analysis of vector wind change with respect to time for Cape Kennedy, Florida

NASA Technical Reports Server (NTRS)

Adelfang, S. I.

1978-01-01

Multivariate analysis was used to determine the joint distribution of the four variables represented by the components of the wind vector at an initial time and after a specified elapsed time is hypothesized to be quadravariate normal; the fourteen statistics of this distribution, calculated from 15 years of twice-daily rawinsonde data are presented by monthly reference periods for each month from 0 to 27 km. The hypotheses that the wind component changes with respect to time is univariate normal, that the joint distribution of wind component change with respect to time is univariate normal, that the joint distribution of wind component changes is bivariate normal, and that the modulus of vector wind change is Rayleigh are tested by comparison with observed distributions. Statistics of the conditional bivariate normal distributions of vector wind at a future time given the vector wind at an initial time are derived. Wind changes over time periods from 1 to 5 hours, calculated from Jimsphere data, are presented. Extension of the theoretical prediction (based on rawinsonde data) of wind component change standard deviation to time periods of 1 to 5 hours falls (with a few exceptions) within the 95 percentile confidence band of the population estimate obtained from the Jimsphere sample data. The joint distributions of wind change components, conditional wind components, and 1 km vector wind shear change components are illustrated by probability ellipses at the 95 percentile level.
Characterizations of linear sufficient statistics

NASA Technical Reports Server (NTRS)

Peters, B. C., Jr.; Reoner, R.; Decell, H. P., Jr.

1977-01-01

A surjective bounded linear operator T from a Banach space X to a Banach space Y must be a sufficient statistic for a dominated family of probability measures defined on the Borel sets of X. These results were applied, so that they characterize linear sufficient statistics for families of the exponential type, including as special cases the Wishart and multivariate normal distributions. The latter result was used to establish precisely which procedures for sampling from a normal population had the property that the sample mean was a sufficient statistic.
The Effect of the Multivariate Box-Cox Transformation on the Power of MANOVA.

ERIC Educational Resources Information Center

Kirisci, Levent; Hsu, Tse-Chi

Most of the multivariate statistical techniques rely on the assumption of multivariate normality. The effects of non-normality on multivariate tests are assumed to be negligible when variance-covariance matrices and sample sizes are equal. Therefore, in practice, investigators do not usually attempt to remove non-normality. In this simulation…
Two-part models with stochastic processes for modelling longitudinal semicontinuous data: Computationally efficient inference and modelling the overall marginal mean.

PubMed

Yiu, Sean; Tom, Brian Dm

2017-01-01

Several researchers have described two-part models with patient-specific stochastic processes for analysing longitudinal semicontinuous data. In theory, such models can offer greater flexibility than the standard two-part model with patient-specific random effects. However, in practice, the high dimensional integrations involved in the marginal likelihood (i.e. integrated over the stochastic processes) significantly complicates model fitting. Thus, non-standard computationally intensive procedures based on simulating the marginal likelihood have so far only been proposed. In this paper, we describe an efficient method of implementation by demonstrating how the high dimensional integrations involved in the marginal likelihood can be computed efficiently. Specifically, by using a property of the multivariate normal distribution and the standard marginal cumulative distribution function identity, we transform the marginal likelihood so that the high dimensional integrations are contained in the cumulative distribution function of a multivariate normal distribution, which can then be efficiently evaluated. Hence, maximum likelihood estimation can be used to obtain parameter estimates and asymptotic standard errors (from the observed information matrix) of model parameters. We describe our proposed efficient implementation procedure for the standard two-part model parameterisation and when it is of interest to directly model the overall marginal mean. The methodology is applied on a psoriatic arthritis data set concerning functional disability.
A Bayesian joint probability modeling approach for seasonal forecasting of streamflows at multiple sites

NASA Astrophysics Data System (ADS)

Wang, Q. J.; Robertson, D. E.; Chiew, F. H. S.

2009-05-01

Seasonal forecasting of streamflows can be highly valuable for water resources management. In this paper, a Bayesian joint probability (BJP) modeling approach for seasonal forecasting of streamflows at multiple sites is presented. A Box-Cox transformed multivariate normal distribution is proposed to model the joint distribution of future streamflows and their predictors such as antecedent streamflows and El Niño-Southern Oscillation indices and other climate indicators. Bayesian inference of model parameters and uncertainties is implemented using Markov chain Monte Carlo sampling, leading to joint probabilistic forecasts of streamflows at multiple sites. The model provides a parametric structure for quantifying relationships between variables, including intersite correlations. The Box-Cox transformed multivariate normal distribution has considerable flexibility for modeling a wide range of predictors and predictands. The Bayesian inference formulated allows the use of data that contain nonconcurrent and missing records. The model flexibility and data-handling ability means that the BJP modeling approach is potentially of wide practical application. The paper also presents a number of statistical measures and graphical methods for verification of probabilistic forecasts of continuous variables. Results for streamflows at three river gauges in the Murrumbidgee River catchment in southeast Australia show that the BJP modeling approach has good forecast quality and that the fitted model is consistent with observed data.
Modeling absolute differences in life expectancy with a censored skew-normal regression approach

PubMed Central

Clough-Gorr, Kerri; Zwahlen, Marcel

2015-01-01

Parameter estimates from commonly used multivariable parametric survival regression models do not directly quantify differences in years of life expectancy. Gaussian linear regression models give results in terms of absolute mean differences, but are not appropriate in modeling life expectancy, because in many situations time to death has a negative skewed distribution. A regression approach using a skew-normal distribution would be an alternative to parametric survival models in the modeling of life expectancy, because parameter estimates can be interpreted in terms of survival time differences while allowing for skewness of the distribution. In this paper we show how to use the skew-normal regression so that censored and left-truncated observations are accounted for. With this we model differences in life expectancy using data from the Swiss National Cohort Study and from official life expectancy estimates and compare the results with those derived from commonly used survival regression models. We conclude that a censored skew-normal survival regression approach for left-truncated observations can be used to model differences in life expectancy across covariates of interest. PMID:26339544
Community health assessment using self-organizing maps and geographic information systems

PubMed Central

Basara, Heather G; Yuan, May

2008-01-01

Background From a public health perspective, a healthier community environment correlates with fewer occurrences of chronic or infectious diseases. Our premise is that community health is a non-linear function of environmental and socioeconomic effects that are not normally distributed among communities. The objective was to integrate multivariate data sets representing social, economic, and physical environmental factors to evaluate the hypothesis that communities with similar environmental characteristics exhibit similar distributions of disease. Results The SOM algorithm used the intrinsic distributions of 92 environmental variables to classify 511 communities into five clusters. SOM determined clusters were reprojected to geographic space and compared with the distributions of several health outcomes. ANOVA results indicated that the variability between community clusters was significant with respect to the spatial distribution of disease occurrence. Conclusion Our study demonstrated a positive relationship between environmental conditions and health outcomes in communities using the SOM-GIS method to overcome data and methodological challenges traditionally encountered in public health research. Results demonstrated that community health can be classified using environmental variables and that the SOM-GIS method may be applied to multivariate environmental health studies. PMID:19116020
Esophageal wall dose-surface maps do not improve the predictive performance of a multivariable NTCP model for acute esophageal toxicity in advanced stage NSCLC patients treated with intensity-modulated (chemo-)radiotherapy.

PubMed

Dankers, Frank; Wijsman, Robin; Troost, Esther G C; Monshouwer, René; Bussink, Johan; Hoffmann, Aswin L

2017-05-07

In our previous work, a multivariable normal-tissue complication probability (NTCP) model for acute esophageal toxicity (AET) Grade ⩾2 after highly conformal (chemo-)radiotherapy for non-small cell lung cancer (NSCLC) was developed using multivariable logistic regression analysis incorporating clinical parameters and mean esophageal dose (MED). Since the esophagus is a tubular organ, spatial information of the esophageal wall dose distribution may be important in predicting AET. We investigated whether the incorporation of esophageal wall dose-surface data with spatial information improves the predictive power of our established NTCP model. For 149 NSCLC patients treated with highly conformal radiation therapy esophageal wall dose-surface histograms (DSHs) and polar dose-surface maps (DSMs) were generated. DSMs were used to generate new DSHs and dose-length-histograms that incorporate spatial information of the dose-surface distribution. From these histograms dose parameters were derived and univariate logistic regression analysis showed that they correlated significantly with AET. Following our previous work, new multivariable NTCP models were developed using the most significant dose histogram parameters based on univariate analysis (19 in total). However, the 19 new models incorporating esophageal wall dose-surface data with spatial information did not show improved predictive performance (area under the curve, AUC range 0.79-0.84) over the established multivariable NTCP model based on conventional dose-volume data (AUC = 0.84). For prediction of AET, based on the proposed multivariable statistical approach, spatial information of the esophageal wall dose distribution is of no added value and it is sufficient to only consider MED as a predictive dosimetric parameter.
Esophageal wall dose-surface maps do not improve the predictive performance of a multivariable NTCP model for acute esophageal toxicity in advanced stage NSCLC patients treated with intensity-modulated (chemo-)radiotherapy

NASA Astrophysics Data System (ADS)

Dankers, Frank; Wijsman, Robin; Troost, Esther G. C.; Monshouwer, René; Bussink, Johan; Hoffmann, Aswin L.

2017-05-01

In our previous work, a multivariable normal-tissue complication probability (NTCP) model for acute esophageal toxicity (AET) Grade ⩾2 after highly conformal (chemo-)radiotherapy for non-small cell lung cancer (NSCLC) was developed using multivariable logistic regression analysis incorporating clinical parameters and mean esophageal dose (MED). Since the esophagus is a tubular organ, spatial information of the esophageal wall dose distribution may be important in predicting AET. We investigated whether the incorporation of esophageal wall dose-surface data with spatial information improves the predictive power of our established NTCP model. For 149 NSCLC patients treated with highly conformal radiation therapy esophageal wall dose-surface histograms (DSHs) and polar dose-surface maps (DSMs) were generated. DSMs were used to generate new DSHs and dose-length-histograms that incorporate spatial information of the dose-surface distribution. From these histograms dose parameters were derived and univariate logistic regression analysis showed that they correlated significantly with AET. Following our previous work, new multivariable NTCP models were developed using the most significant dose histogram parameters based on univariate analysis (19 in total). However, the 19 new models incorporating esophageal wall dose-surface data with spatial information did not show improved predictive performance (area under the curve, AUC range 0.79-0.84) over the established multivariable NTCP model based on conventional dose-volume data (AUC = 0.84). For prediction of AET, based on the proposed multivariable statistical approach, spatial information of the esophageal wall dose distribution is of no added value and it is sufficient to only consider MED as a predictive dosimetric parameter.

MCMC Sampling for a Multilevel Model with Nonindependent Residuals within and between Cluster Units

ERIC Educational Resources Information Center

Browne, William; Goldstein, Harvey

2010-01-01

In this article, we discuss the effect of removing the independence assumptions between the residuals in two-level random effect models. We first consider removing the independence between the Level 2 residuals and instead assume that the vector of all residuals at the cluster level follows a general multivariate normal distribution. We…
Relative Performance of Rescaling and Resampling Approaches to Model Chi Square and Parameter Standard Error Estimation in Structural Equation Modeling.

ERIC Educational Resources Information Center

Nevitt, Johnathan; Hancock, Gregory R.

Though common structural equation modeling (SEM) methods are predicated upon the assumption of multivariate normality, applied researchers often find themselves with data clearly violating this assumption and without sufficient sample size to use distribution-free estimation methods. Fortunately, promising alternatives are being integrated into…
Performance of Modified Test Statistics in Covariance and Correlation Structure Analysis under Conditions of Multivariate Nonnormality.

ERIC Educational Resources Information Center

Fouladi, Rachel T.

2000-01-01

Provides an overview of standard and modified normal theory and asymptotically distribution-free covariance and correlation structure analysis techniques and details Monte Carlo simulation results on Type I and Type II error control. Demonstrates through the simulation that robustness and nonrobustness of structure analysis techniques vary as a…
Sample Size Calculation for Estimating or Testing a Nonzero Squared Multiple Correlation Coefficient

ERIC Educational Resources Information Center

Krishnamoorthy, K.; Xia, Yanping

2008-01-01

The problems of hypothesis testing and interval estimation of the squared multiple correlation coefficient of a multivariate normal distribution are considered. It is shown that available one-sided tests are uniformly most powerful, and the one-sided confidence intervals are uniformly most accurate. An exact method of calculating sample size to…
Multivariate methods to visualise colour-space and colour discrimination data.

PubMed

Hastings, Gareth D; Rubin, Alan

2015-01-01

Despite most modern colour spaces treating colour as three-dimensional (3-D), colour data is usually not visualised in 3-D (and two-dimensional (2-D) projection-plane segments and multiple 2-D perspective views are used instead). The objectives of this article are firstly, to introduce a truly 3-D percept of colour space using stereo-pairs, secondly to view colour discrimination data using that platform, and thirdly to apply formal statistics and multivariate methods to analyse the data in 3-D. This is the first demonstration of the software that generated stereo-pairs of RGB colour space, as well as of a new computerised procedure that investigated colour discrimination by measuring colour just noticeable differences (JND). An initial pilot study and thorough investigation of instrument repeatability were performed. Thereafter, to demonstrate the capabilities of the software, five colour-normal and one colour-deficient subject were examined using the JND procedure and multivariate methods of data analysis. Scatter plots of responses were meaningfully examined in 3-D and were useful in evaluating multivariate normality as well as identifying outliers. The extent and direction of the difference between each JND response and the stimulus colour point was calculated and appreciated in 3-D. Ellipsoidal surfaces of constant probability density (distribution ellipsoids) were fitted to response data; the volumes of these ellipsoids appeared useful in differentiating the colour-deficient subject from the colour-normals. Hypothesis tests of variances and covariances showed many statistically significant differences between the results of the colour-deficient subject and those of the colour-normals, while far fewer differences were found when comparing within colour-normals. The 3-D visualisation of colour data using stereo-pairs, as well as the statistics and multivariate methods of analysis employed, were found to be unique and useful tools in the representation and study of colour. Many additional studies using these methods along with the JND and other procedures have been identified and will be reported in future publications. © 2014 The Authors Ophthalmic & Physiological Optics © 2014 The College of Optometrists.
Multivariate non-normally distributed random variables in climate research - introduction to the copula approach

NASA Astrophysics Data System (ADS)

Schölzel, C.; Friederichs, P.

2008-10-01

Probability distributions of multivariate random variables are generally more complex compared to their univariate counterparts which is due to a possible nonlinear dependence between the random variables. One approach to this problem is the use of copulas, which have become popular over recent years, especially in fields like econometrics, finance, risk management, or insurance. Since this newly emerging field includes various practices, a controversial discussion, and vast field of literature, it is difficult to get an overview. The aim of this paper is therefore to provide an brief overview of copulas for application in meteorology and climate research. We examine the advantages and disadvantages compared to alternative approaches like e.g. mixture models, summarize the current problem of goodness-of-fit (GOF) tests for copulas, and discuss the connection with multivariate extremes. An application to station data shows the simplicity and the capabilities as well as the limitations of this approach. Observations of daily precipitation and temperature are fitted to a bivariate model and demonstrate, that copulas are valuable complement to the commonly used methods.
Distribution pattern of urine albumin creatinine ratio and the prevalence of high-normal levels in untreated asymptomatic non-diabetic hypertensive patients.

PubMed

Ohmaru, Natsuki; Nakatsu, Takaaki; Izumi, Reishi; Mashima, Keiichi; Toki, Misako; Kobayashi, Asako; Ogawa, Hiroko; Hirohata, Satoshi; Ikeda, Satoru; Kusachi, Shozo

2011-01-01

Even high-normal albuminuria is reportedly associated with cardiovascular events. We determined the urine albumin creatinine ratio (UACR) in spot urine samples and analyzed the UACR distribution and the prevalence of high-normal levels. The UACR was determined using immunoturbidimetry in 332 untreated asymptomatic non-diabetic Japanese patients with hypertension and in 69 control subjects. The microalbuminuria and macroalbuminuria levels were defined as a UCAR ≥30 and <300 µg/mg·creatinine and a UCAR ≥300 µg/mg·creatinine, respectively. The distribution patterns showed a highly skewed distribution for the lower levels, and a common logarithmic transformation produced a close fit to a Gaussian distribution with median, 25th and 75th percentile values of 22.6, 13.5 and 48.2 µg/mg·creatinine, respectively. When a high-normal UACR was set at >20 to <30 µg/mg·creatinine, 19.9% (66/332) of the hypertensive patients exhibited a high-normal UACR. Microalbuminuria and macroalbuminuria were observed in 36.1% (120/336) and 2.1% (7/332) of the patients, respectively. UACR was significantly correlated with the systolic and diastolic blood pressures and the pulse pressure. A stepwise multivariate analysis revealed that these pressures as well as age were independent factors that increased UACR. The UACR distribution exhibited a highly skewed pattern, with approximately 60% of untreated, non-diabetic hypertensive patients exhibiting a high-normal or larger UACR. Both hypertension and age are independent risk factors that increase the UACR. The present study indicated that a considerable percentage of patients require anti-hypertensive drugs with antiproteinuric effects at the start of treatment.
A multivariate spatial mixture model for areal data: examining regional differences in standardized test scores

PubMed Central

Neelon, Brian; Gelfand, Alan E.; Miranda, Marie Lynn

2013-01-01

Summary Researchers in the health and social sciences often wish to examine joint spatial patterns for two or more related outcomes. Examples include infant birth weight and gestational length, psychosocial and behavioral indices, and educational test scores from different cognitive domains. We propose a multivariate spatial mixture model for the joint analysis of continuous individual-level outcomes that are referenced to areal units. The responses are modeled as a finite mixture of multivariate normals, which accommodates a wide range of marginal response distributions and allows investigators to examine covariate effects within subpopulations of interest. The model has a hierarchical structure built at the individual level (i.e., individuals are nested within areal units), and thus incorporates both individual- and areal-level predictors as well as spatial random effects for each mixture component. Conditional autoregressive (CAR) priors on the random effects provide spatial smoothing and allow the shape of the multivariate distribution to vary flexibly across geographic regions. We adopt a Bayesian modeling approach and develop an efficient Markov chain Monte Carlo model fitting algorithm that relies primarily on closed-form full conditionals. We use the model to explore geographic patterns in end-of-grade math and reading test scores among school-age children in North Carolina. PMID:26401059
Evaluating online data of water quality changes in a pilot drinking water distribution system with multivariate data exploration methods.

PubMed

Mustonen, Satu M; Tissari, Soile; Huikko, Laura; Kolehmainen, Mikko; Lehtola, Markku J; Hirvonen, Arja

2008-05-01

The distribution of drinking water generates soft deposits and biofilms in the pipelines of distribution systems. Disturbances in water distribution can detach these deposits and biofilms and thus deteriorate the water quality. We studied the effects of simulated pressure shocks on the water quality with online analysers. The study was conducted with copper and composite plastic pipelines in a pilot distribution system. The online data gathered during the study was evaluated with Self-Organising Map (SOM) and Sammon's mapping, which are useful methods in exploring large amounts of multivariate data. The objective was to test the usefulness of these methods in pinpointing the abnormal water quality changes in the online data. The pressure shocks increased temporarily the number of particles, turbidity and electrical conductivity. SOM and Sammon's mapping were able to separate these situations from the normal data and thus make those visible. Therefore these methods make it possible to detect abrupt changes in water quality and thus to react rapidly to any disturbances in the system. These methods are useful in developing alert systems and predictive applications connected to online monitoring.
Exact and Approximate Statistical Inference for Nonlinear Regression and the Estimating Equation Approach.

PubMed

Demidenko, Eugene

2017-09-01

The exact density distribution of the nonlinear least squares estimator in the one-parameter regression model is derived in closed form and expressed through the cumulative distribution function of the standard normal variable. Several proposals to generalize this result are discussed. The exact density is extended to the estimating equation (EE) approach and the nonlinear regression with an arbitrary number of linear parameters and one intrinsically nonlinear parameter. For a very special nonlinear regression model, the derived density coincides with the distribution of the ratio of two normally distributed random variables previously obtained by Fieller (1932), unlike other approximations previously suggested by other authors. Approximations to the density of the EE estimators are discussed in the multivariate case. Numerical complications associated with the nonlinear least squares are illustrated, such as nonexistence and/or multiple solutions, as major factors contributing to poor density approximation. The nonlinear Markov-Gauss theorem is formulated based on the near exact EE density approximation.
The effect of signal variability on the histograms of anthropomorphic channel outputs: factors resulting in non-normally distributed data

NASA Astrophysics Data System (ADS)

Elshahaby, Fatma E. A.; Ghaly, Michael; Jha, Abhinav K.; Frey, Eric C.

2015-03-01

Model Observers are widely used in medical imaging for the optimization and evaluation of instrumentation, acquisition parameters and image reconstruction and processing methods. The channelized Hotelling observer (CHO) is a commonly used model observer in nuclear medicine and has seen increasing use in other modalities. An anthropmorphic CHO consists of a set of channels that model some aspects of the human visual system and the Hotelling Observer, which is the optimal linear discriminant. The optimality of the CHO is based on the assumption that the channel outputs for data with and without the signal present have a multivariate normal distribution with equal class covariance matrices. The channel outputs result from the dot product of channel templates with input images and are thus the sum of a large number of random variables. The central limit theorem is thus often used to justify the assumption that the channel outputs are normally distributed. In this work, we aim to examine this assumption for realistically simulated nuclear medicine images when various types of signal variability are present.
A method of using cluster analysis to study statistical dependence in multivariate data

NASA Technical Reports Server (NTRS)

Borucki, W. J.; Card, D. H.; Lyle, G. C.

1975-01-01

A technique is presented that uses both cluster analysis and a Monte Carlo significance test of clusters to discover associations between variables in multidimensional data. The method is applied to an example of a noisy function in three-dimensional space, to a sample from a mixture of three bivariate normal distributions, and to the well-known Fisher's Iris data.
Multivariate frequency domain analysis of protein dynamics

NASA Astrophysics Data System (ADS)

Matsunaga, Yasuhiro; Fuchigami, Sotaro; Kidera, Akinori

2009-03-01

Multivariate frequency domain analysis (MFDA) is proposed to characterize collective vibrational dynamics of protein obtained by a molecular dynamics (MD) simulation. MFDA performs principal component analysis (PCA) for a bandpass filtered multivariate time series using the multitaper method of spectral estimation. By applying MFDA to MD trajectories of bovine pancreatic trypsin inhibitor, we determined the collective vibrational modes in the frequency domain, which were identified by their vibrational frequencies and eigenvectors. At near zero temperature, the vibrational modes determined by MFDA agreed well with those calculated by normal mode analysis. At 300 K, the vibrational modes exhibited characteristic features that were considerably different from the principal modes of the static distribution given by the standard PCA. The influences of aqueous environments were discussed based on two different sets of vibrational modes, one derived from a MD simulation in water and the other from a simulation in vacuum. Using the varimax rotation, an algorithm of the multivariate statistical analysis, the representative orthogonal set of eigenmodes was determined at each vibrational frequency.
Using cystoscopy to segment bladder tumors with a multivariate approach in different color spaces.

PubMed

Freitas, Nuno R; Vieira, Pedro M; Lima, Estevao; Lima, Carlos S

2017-07-01

Nowadays the diagnosis of bladder lesions relies upon cystoscopy examination and depends on the interpreter's experience. State of the art of bladder tumor identification are based on 3D reconstruction, using CT images (Virtual Cystoscopy) or images where the structures are exalted with the use of pigmentation, but none uses white light cystoscopy images. An initial attempt to automatically identify tumoral tissue was already developed by the authors and this paper will develop this idea. Traditional cystoscopy images processing has a huge potential to improve early tumor detection and allows a more effective treatment. In this paper is described a multivariate approach to do segmentation of bladder cystoscopy images, that will be used to automatically detect and improve physician diagnose. Each region can be assumed as a normal distribution with specific parameters, leading to the assumption that the distribution of intensities is a Gaussian Mixture Model (GMM). Region of high grade and low grade tumors, usually appears with higher intensity than normal regions. This paper proposes a Maximum a Posteriori (MAP) approach based on pixel intensities read simultaneously in different color channels from RGB, HSV and CIELab color spaces. The Expectation-Maximization (EM) algorithm is used to estimate the best multivariate GMM parameters. Experimental results show that the proposed method does bladder tumor segmentation into two classes in a more efficient way in RGB even in cases where the tumor shape is not well defined. Results also show that the elimination of component L from CIELab color space does not allow definition of the tumor shape.
Technical Reports Prepared Under Contract N00014-76-C-0475.

DTIC Science & Technology

1987-05-29

264 Approximations to Densities in Geometric H. Solomon 10/27/78 Probability M.A. Stephens 3. Technical Relort No. Title Author Date 265 Sequential ...Certain Multivariate S. Iyengar 8/12/82 Normal Probabilities 323 EDF Statistics for Testing for the Gamma M.A. Stephens 8/13/82 Distribution with...20-85 Nets 360 Random Sequential Coding By Hamming Distance Yoshiaki Itoh 07-11-85 Herbert Solomon 361 Transforming Censored Samples And Testing Fit
Univariate and multivariate methods for chemical mapping of cervical cancer cells

NASA Astrophysics Data System (ADS)

Duraipandian, Shiyamala; Zheng, Wei; Huang, Zhiwei

2012-01-01

Visualization of cells and subcellular organelles are currently carried out using available microscopy methods such as cryoelectron microscopy, and fluorescence microscopy. These methods require external labeling using fluorescent dyes and extensive sample preparations to access the subcellular structures. However, Raman micro-spectroscopy provides a non-invasive, label-free method for imaging the cells with chemical specificity at sub-micrometer spatial resolutions. The scope of this paper is to image the biochemical/molecular distributions in cells associated with cancerous changes. Raman map data sets were acquired from the human cervical carcinoma cell lines (HeLa) after fixation under 785 nm excitation wavelength. The individual spectrum was recorded by raster-scanning the laser beam over the sample with 1μm step size and 10s exposure time. Images revealing nucleic acids, lipids and proteins (phenylalanine, amide I) were reconstructed using univariate methods. In near future, the small pixel to pixel variations will also be imaged using different multivariate methods (PCA, clustering (HCA, K-means, FCM)) to determine the main cellular constitutions. The hyper-spectral image of cell was reconstructed utilizing the spectral contrast at different pixels of the cell (due to the variation in the biochemical distribution) without using fluorescent dyes. Normal cervical squamous cells will also be imaged in order to differentiate normal and cancer cells of cervix using the biochemical changes in different grades of cancer. Based on the information obtained from the pseudo-color maps, constructed from the hyper-spectral cubes, the primary cellular constituents of normal and cervical cancer cells were identified.
Prediction of Malaysian monthly GDP

NASA Astrophysics Data System (ADS)

Hin, Pooi Ah; Ching, Soo Huei; Yeing, Pan Wei

2015-12-01

The paper attempts to use a method based on multivariate power-normal distribution to predict the Malaysian Gross Domestic Product next month. Letting r(t) be the vector consisting of the month-t values on m selected macroeconomic variables, and GDP, we model the month-(t+1) GDP to be dependent on the present and l-1 past values r(t), r(t-1),…,r(t-l+1) via a conditional distribution which is derived from a [(m+1)l+1]-dimensional power-normal distribution. The 100(α/2)% and 100(1-α/2)% points of the conditional distribution may be used to form an out-of sample prediction interval. This interval together with the mean of the conditional distribution may be used to predict the month-(t+1) GDP. The mean absolute percentage error (MAPE), estimated coverage probability and average length of the prediction interval are used as the criterions for selecting the suitable lag value l-1 and the subset from a pool of 17 macroeconomic variables. It is found that the relatively better models would be those of which 2 ≤ l ≤ 3, and involving one or two of the macroeconomic variables given by Market Indicative Yield, Oil Prices, Exchange Rate and Import Trade.
A comparison of portfolio selection models via application on ISE 100 index data

NASA Astrophysics Data System (ADS)

Altun, Emrah; Tatlidil, Hüseyin

2013-10-01

Markowitz Model, a classical approach to portfolio optimization problem, relies on two important assumptions: the expected return is multivariate normally distributed and the investor is risk averter. But this model has not been extensively used in finance. Empirical results show that it is very hard to solve large scale portfolio optimization problems with Mean-Variance (M-V)model. Alternative model, Mean Absolute Deviation (MAD) model which is proposed by Konno and Yamazaki [7] has been used to remove most of difficulties of Markowitz Mean-Variance model. MAD model don't need to assume that the probability of the rates of return is normally distributed and based on Linear Programming. Another alternative portfolio model is Mean-Lower Semi Absolute Deviation (M-LSAD), which is proposed by Speranza [3]. We will compare these models to determine which model gives more appropriate solution to investors.
Some Recent Developments on Complex Multivariate Distributions

ERIC Educational Resources Information Center

Krishnaiah, P. R.

1976-01-01

In this paper, the author gives a review of the literature on complex multivariate distributions. Some new results on these distributions are also given. Finally, the author discusses the applications of the complex multivariate distributions in the area of the inference on multiple time series. (Author)
Modelling physiological deterioration in post-operative patient vital-sign data.

PubMed

Pimentel, Marco A F; Clifton, David A; Clifton, Lei; Watkinson, Peter J; Tarassenko, Lionel

2013-08-01

Patients who undergo upper-gastrointestinal surgery have a high incidence of post-operative complications, often requiring admission to the intensive care unit several days after surgery. A dataset comprising observational vital-sign data from 171 post-operative patients taking part in a two-phase clinical trial at the Oxford Cancer Centre, was used to explore the trajectory of patients' vital-sign changes during their stay in the post-operative ward using both univariate and multivariate analyses. A model of normality based vital-sign data from patients who had a "normal" recovery was constructed using a kernel density estimate, and tested with "abnormal" data from patients who deteriorated sufficiently to be re-admitted to the intensive care unit. The vital-sign distributions from "normal" patients were found to vary over time from admission to the post-operative ward to their discharge home, but no significant changes in their distributions were observed from halfway through their stay on the ward to the time of discharge. The model of normality identified patient deterioration when tested with unseen "abnormal" data, suggesting that such techniques may be used to provide early warning of adverse physiological events.

An Application of Discriminant Analysis to the Selection of Software Cost Estimating Models.

DTIC Science & Technology

1984-09-01

the PRICE S Users Manual (29:111-25) was used with a slight modification. Based on the experience and advice of Captain Joe Dean, Electronic System...this study, and EXP is the expansion factor listed in the PRICE S User’s Manual . Another important factor needing explanation is development cost...coefficients and a unique constant. According to the SPSS manual (26:445) "Under the assumption of a multivariate normal distribution, the
Smoothing of the bivariate LOD score for non-normal quantitative traits.

PubMed

Buil, Alfonso; Dyer, Thomas D; Almasy, Laura; Blangero, John

2005-12-30

Variance component analysis provides an efficient method for performing linkage analysis for quantitative traits. However, type I error of variance components-based likelihood ratio testing may be affected when phenotypic data are non-normally distributed (especially with high values of kurtosis). This results in inflated LOD scores when the normality assumption does not hold. Even though different solutions have been proposed to deal with this problem with univariate phenotypes, little work has been done in the multivariate case. We present an empirical approach to adjust the inflated LOD scores obtained from a bivariate phenotype that violates the assumption of normality. Using the Collaborative Study on the Genetics of Alcoholism data available for the Genetic Analysis Workshop 14, we show how bivariate linkage analysis with leptokurtotic traits gives an inflated type I error. We perform a novel correction that achieves acceptable levels of type I error.
Kullback-Leibler information function and the sequential selection of experiments to discriminate among several linear models

NASA Technical Reports Server (NTRS)

Sidik, S. M.

1972-01-01

The error variance of the process prior multivariate normal distributions of the parameters of the models are assumed to be specified, prior probabilities of the models being correct. A rule for termination of sampling is proposed. Upon termination, the model with the largest posterior probability is chosen as correct. If sampling is not terminated, posterior probabilities of the models and posterior distributions of the parameters are computed. An experiment was chosen to maximize the expected Kullback-Leibler information function. Monte Carlo simulation experiments were performed to investigate large and small sample behavior of the sequential adaptive procedure.
A Review of Multivariate Distributions for Count Data Derived from the Poisson Distribution.

PubMed

Inouye, David; Yang, Eunho; Allen, Genevera; Ravikumar, Pradeep

2017-01-01

The Poisson distribution has been widely studied and used for modeling univariate count-valued data. Multivariate generalizations of the Poisson distribution that permit dependencies, however, have been far less popular. Yet, real-world high-dimensional count-valued data found in word counts, genomics, and crime statistics, for example, exhibit rich dependencies, and motivate the need for multivariate distributions that can appropriately model this data. We review multivariate distributions derived from the univariate Poisson, categorizing these models into three main classes: 1) where the marginal distributions are Poisson, 2) where the joint distribution is a mixture of independent multivariate Poisson distributions, and 3) where the node-conditional distributions are derived from the Poisson. We discuss the development of multiple instances of these classes and compare the models in terms of interpretability and theory. Then, we empirically compare multiple models from each class on three real-world datasets that have varying data characteristics from different domains, namely traffic accident data, biological next generation sequencing data, and text data. These empirical experiments develop intuition about the comparative advantages and disadvantages of each class of multivariate distribution that was derived from the Poisson. Finally, we suggest new research directions as explored in the subsequent discussion section.
An analysis of fracture trace patterns in areas of flat-lying sedimentary rocks for the detection of buried geologic structure. [Kansas and Texas

NASA Technical Reports Server (NTRS)

Podwysocki, M. H.

1974-01-01

Two study areas in a cratonic platform underlain by flat-lying sedimentary rocks were analyzed to determine if a quantitative relationship exists between fracture trace patterns and their frequency distributions and subsurface structural closures which might contain petroleum. Fracture trace lengths and frequency (number of fracture traces per unit area) were analyzed by trend surface analysis and length frequency distributions also were compared to a standard Gaussian distribution. Composite rose diagrams of fracture traces were analyzed using a multivariate analysis method which grouped or clustered the rose diagrams and their respective areas on the basis of the behavior of the rays of the rose diagram. Analysis indicates that the lengths of fracture traces are log-normally distributed according to the mapping technique used. Fracture trace frequency appeared higher on the flanks of active structures and lower around passive reef structures. Fracture trace log-mean lengths were shorter over several types of structures, perhaps due to increased fracturing and subsequent erosion. Analysis of rose diagrams using a multivariate technique indicated lithology as the primary control for the lower grouping levels. Groupings at higher levels indicated that areas overlying active structures may be isolated from their neighbors by this technique while passive structures showed no differences which could be isolated.
Multiple imputation for handling missing outcome data when estimating the relative risk.

PubMed

Sullivan, Thomas R; Lee, Katherine J; Ryan, Philip; Salter, Amy B

2017-09-06

Multiple imputation is a popular approach to handling missing data in medical research, yet little is known about its applicability for estimating the relative risk. Standard methods for imputing incomplete binary outcomes involve logistic regression or an assumption of multivariate normality, whereas relative risks are typically estimated using log binomial models. It is unclear whether misspecification of the imputation model in this setting could lead to biased parameter estimates. Using simulated data, we evaluated the performance of multiple imputation for handling missing data prior to estimating adjusted relative risks from a correctly specified multivariable log binomial model. We considered an arbitrary pattern of missing data in both outcome and exposure variables, with missing data induced under missing at random mechanisms. Focusing on standard model-based methods of multiple imputation, missing data were imputed using multivariate normal imputation or fully conditional specification with a logistic imputation model for the outcome. Multivariate normal imputation performed poorly in the simulation study, consistently producing estimates of the relative risk that were biased towards the null. Despite outperforming multivariate normal imputation, fully conditional specification also produced somewhat biased estimates, with greater bias observed for higher outcome prevalences and larger relative risks. Deleting imputed outcomes from analysis datasets did not improve the performance of fully conditional specification. Both multivariate normal imputation and fully conditional specification produced biased estimates of the relative risk, presumably since both use a misspecified imputation model. Based on simulation results, we recommend researchers use fully conditional specification rather than multivariate normal imputation and retain imputed outcomes in the analysis when estimating relative risks. However fully conditional specification is not without its shortcomings, and so further research is needed to identify optimal approaches for relative risk estimation within the multiple imputation framework.
Buried landmine detection using multivariate normal clustering

NASA Astrophysics Data System (ADS)

Duston, Brian M.

2001-10-01

A Bayesian classification algorithm is presented for discriminating buried land mines from buried and surface clutter in Ground Penetrating Radar (GPR) signals. This algorithm is based on multivariate normal (MVN) clustering, where feature vectors are used to identify populations (clusters) of mines and clutter objects. The features are extracted from two-dimensional images created from ground penetrating radar scans. MVN clustering is used to determine the number of clusters in the data and to create probability density models for target and clutter populations, producing the MVN clustering classifier (MVNCC). The Bayesian Information Criteria (BIC) is used to evaluate each model to determine the number of clusters in the data. An extension of the MVNCC allows the model to adapt to local clutter distributions by treating each of the MVN cluster components as a Poisson process and adaptively estimating the intensity parameters. The algorithm is developed using data collected by the Mine Hunter/Killer Close-In Detector (MH/K CID) at prepared mine lanes. The Mine Hunter/Killer is a prototype mine detecting and neutralizing vehicle developed for the U.S. Army to clear roads of anti-tank mines.
Prediction of mortality rates using a model with stochastic parameters

NASA Astrophysics Data System (ADS)

Tan, Chon Sern; Pooi, Ah Hin

2016-10-01

Prediction of future mortality rates is crucial to insurance companies because they face longevity risks while providing retirement benefits to a population whose life expectancy is increasing. In the past literature, a time series model based on multivariate power-normal distribution has been applied on mortality data from the United States for the years 1933 till 2000 to forecast the future mortality rates for the years 2001 till 2010. In this paper, a more dynamic approach based on the multivariate time series will be proposed where the model uses stochastic parameters that vary with time. The resulting prediction intervals obtained using the model with stochastic parameters perform better because apart from having good ability in covering the observed future mortality rates, they also tend to have distinctly shorter interval lengths.
A Review of Multivariate Distributions for Count Data Derived from the Poisson Distribution

PubMed Central

Inouye, David; Yang, Eunho; Allen, Genevera; Ravikumar, Pradeep

2017-01-01

The Poisson distribution has been widely studied and used for modeling univariate count-valued data. Multivariate generalizations of the Poisson distribution that permit dependencies, however, have been far less popular. Yet, real-world high-dimensional count-valued data found in word counts, genomics, and crime statistics, for example, exhibit rich dependencies, and motivate the need for multivariate distributions that can appropriately model this data. We review multivariate distributions derived from the univariate Poisson, categorizing these models into three main classes: 1) where the marginal distributions are Poisson, 2) where the joint distribution is a mixture of independent multivariate Poisson distributions, and 3) where the node-conditional distributions are derived from the Poisson. We discuss the development of multiple instances of these classes and compare the models in terms of interpretability and theory. Then, we empirically compare multiple models from each class on three real-world datasets that have varying data characteristics from different domains, namely traffic accident data, biological next generation sequencing data, and text data. These empirical experiments develop intuition about the comparative advantages and disadvantages of each class of multivariate distribution that was derived from the Poisson. Finally, we suggest new research directions as explored in the subsequent discussion section. PMID:28983398
Some Modified Integrated Squared Error Procedures for Multivariate Normal Data.

DTIC Science & Technology

1982-06-01

p-dimensional Gaussian. There are a number of measures of qualitative robustness but the most important is the influence function . Most of the other...measures are derived from the influence function . The influence function is simply proportional to the score function (Huber, 1981, p. 45 ). The... influence function at the p-variate Gaussian distribution Np (UV) is as -1P IC(x; ,N) = IE&) ;-") sD=XV = (I+c) (p+2)(x-p) exp(- ! (x-p) TV-.1-)) (3.6
Taking the Missing Propensity Into Account When Estimating Competence Scores

PubMed Central

Pohl, Steffi; Carstensen, Claus H.

2014-01-01

When competence tests are administered, subjects frequently omit items. These missing responses pose a threat to correctly estimating the proficiency level. Newer model-based approaches aim to take nonignorable missing data processes into account by incorporating a latent missing propensity into the measurement model. Two assumptions are typically made when using these models: (1) The missing propensity is unidimensional and (2) the missing propensity and the ability are bivariate normally distributed. These assumptions may, however, be violated in real data sets and could, thus, pose a threat to the validity of this approach. The present study focuses on modeling competencies in various domains, using data from a school sample (N = 15,396) and an adult sample (N = 7,256) from the National Educational Panel Study. Our interest was to investigate whether violations of unidimensionality and the normal distribution assumption severely affect the performance of the model-based approach in terms of differences in ability estimates. We propose a model with a competence dimension, a unidimensional missing propensity and a distributional assumption more flexible than a multivariate normal. Using this model for ability estimation results in different ability estimates compared with a model ignoring missing responses. Implications for ability estimation in large-scale assessments are discussed. PMID:29795844
Two-sample tests and one-way MANOVA for multivariate biomarker data with nondetects.

PubMed

Thulin, M

2016-09-10

Testing whether the mean vector of a multivariate set of biomarkers differs between several populations is an increasingly common problem in medical research. Biomarker data is often left censored because some measurements fall below the laboratory's detection limit. We investigate how such censoring affects multivariate two-sample and one-way multivariate analysis of variance tests. Type I error rates, power and robustness to increasing censoring are studied, under both normality and non-normality. Parametric tests are found to perform better than non-parametric alternatives, indicating that the current recommendations for analysis of censored multivariate data may have to be revised. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Risk of false decision on conformity of a multicomponent material when test results of the components' content are correlated.

PubMed

Kuselman, Ilya; Pennecchi, Francesca R; da Silva, Ricardo J N B; Hibbert, D Brynn

2017-11-01

The probability of a false decision on conformity of a multicomponent material due to measurement uncertainty is discussed when test results are correlated. Specification limits of the components' content of such a material generate a multivariate specification interval/domain. When true values of components' content and corresponding test results are modelled by multivariate distributions (e.g. by multivariate normal distributions), a total global risk of a false decision on the material conformity can be evaluated based on calculation of integrals of their joint probability density function. No transformation of the raw data is required for that. A total specific risk can be evaluated as the joint posterior cumulative function of true values of a specific batch or lot lying outside the multivariate specification domain, when the vector of test results, obtained for the lot, is inside this domain. It was shown, using a case study of four components under control in a drug, that the correlation influence on the risk value is not easily predictable. To assess this influence, the evaluated total risk values were compared with those calculated for independent test results and also with those assuming much stronger correlation than that observed. While the observed statistically significant correlation did not lead to a visible difference in the total risk values in comparison to the independent test results, the stronger correlation among the variables caused either the total risk decreasing or its increasing, depending on the actual values of the test results. Copyright © 2017 Elsevier B.V. All rights reserved.
Multivariable normal tissue complication probability model-based treatment plan optimization for grade 2-4 dysphagia and tube feeding dependence in head and neck radiotherapy.

PubMed

Kierkels, Roel G J; Wopken, Kim; Visser, Ruurd; Korevaar, Erik W; van der Schaaf, Arjen; Bijl, Hendrik P; Langendijk, Johannes A

2016-12-01

Radiotherapy of the head and neck is challenged by the relatively large number of organs-at-risk close to the tumor. Biologically-oriented objective functions (OF) could optimally distribute the dose among the organs-at-risk. We aimed to explore OFs based on multivariable normal tissue complication probability (NTCP) models for grade 2-4 dysphagia (DYS) and tube feeding dependence (TFD). One hundred head and neck cancer patients were studied. Additional to the clinical plan, two more plans (an OF DYS and OF TFD -plan) were optimized per patient. The NTCP models included up to four dose-volume parameters and other non-dosimetric factors. A fully automatic plan optimization framework was used to optimize the OF NTCP -based plans. All OF NTCP -based plans were reviewed and classified as clinically acceptable. On average, the Δdose and ΔNTCP were small comparing the OF DYS -plan, OF TFD -plan, and clinical plan. For 5% of patients NTCP TFD reduced >5% using OF TFD -based planning compared to the OF DYS -plans. Plan optimization using NTCP DYS - and NTCP TFD -based objective functions resulted in clinically acceptable plans. For patients with considerable risk factors of TFD, the OF TFD steered the optimizer to dose distributions which directly led to slightly lower predicted NTCP TFD values as compared to the other studied plans. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Analysis of human tissues by total reflection X-ray fluorescence. Application of chemometrics for diagnostic cancer recognition

NASA Astrophysics Data System (ADS)

Benninghoff, L.; von Czarnowski, D.; Denkhaus, E.; Lemke, K.

1997-07-01

For the determination of trace element distributions of more than 20 elements in malignant and normal tissues of the human colon, tissue samples (approx. 400 mg wet weight) were digested with 3 ml of nitric acid (sub-boiled quality) by use of an autoclave system. The accuracy of measurements has been investigated by using certified materials. The analytical results were evaluated by using a spreadsheet program to give an overview of the element distribution in cancerous samples and in normal colon tissues. A further application, cluster analysis of the analytical results, was introduced to demonstrate the possibility of classification for cancer diagnosis. To confirm the results of cluster analysis, multivariate three-way principal component analysis was performed. Additionally, microtome frozen sections (10 μm) were prepared from the same tissue samples to compare the analytical results, i.e. the mass fractions of elements, according to the preparation method and to exclude systematic errors depending on the inhomogeneity of the tissues.
Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design

PubMed Central

Lu, Tsui-Shan; Longnecker, Matthew P.; Zhou, Haibo

2016-01-01

Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data and the general ODS design for a continuous response. While substantial work has been done for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome dependent sampling (Multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the Multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the Multivariate-ODS or the estimator from a simple random sample with the same sample size. The Multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of PCB exposure to hearing loss in children born to the Collaborative Perinatal Study. PMID:27966260
Peripapillary retinal nerve fiber layer thickness in a population of 6-year-old children: findings by optical coherence tomography.

PubMed

Huynh, Son C; Wang, Xiu Ying; Rochtchina, Elena; Mitchell, Paul

2006-09-01

To study the distribution of retinal nerve fiber layer (RNFL) thickness by ocular and demographic variables in a population-based study of young children. Population-based cross-sectional study. One thousand seven hundred sixty-five of 2238 (78.9%) eligible 6-year-old children participated in the Sydney Childhood Eye Study between 2003 and 2004. Mean age was 6.7 years (50.9% boys). Detailed examination included cycloplegic autorefraction and measurement of axial length. Retinal nerve fiber layer scans using an optical coherence tomographer were performed with a circular scan pattern of 3.4-mm diameter. Multivariate analyses were performed to examine the distribution of RNFL parameters with gender, ethnicity, axial length, and refraction. Peripapillary RNFL thickness and RNFL(estimated integral) (RNFL(EI)), which measures the total cross-sectional area of ganglion cell axons converging onto the optic nerve head. Peripapillary RNFL thickness and RNFL(EI) were normally distributed. The mean+/-standard deviation RNFL average thickness was 103.7+/-11.4 microm and RNFL(EI) was 1.05+/-0.12 mm2. Retinal nerve fiber layer thickness was least for the temporal quadrant (75.7+/-14.7 microm), followed by the nasal (81.7+/-19.6 microm), inferior (127.8+/-20.5 microm), and superior (129.5+/-20.6 microm) quadrants. Multivariate adjusted RNFL average thickness was marginally greater in boys than in girls (104.7 microm vs. 103.2 microm; P = 0.007) and in East Asian than in white children (107.7 microm vs. 102.7 microm; P<0.0001). The RNFL was thinner with greater axial length (P(trend)<0.0001) and less positive spherical equivalent refractions (P(trend) = 0.004). Retinal nerve fiber layer average thickness and RNFL(EI) followed a normal distribution. Retinal nerve fiber layer thickness varied marginally with gender, but differences were more marked between white and East Asian children. Retinal nerve fiber layer thinning was associated with increasing axial length and less positive refractions.
Using monolingual neuropsychological test norms with bilingual Hispanic americans: application of an individual comparison standard.

PubMed

Gasquoine, Philip Gerard; Gonzalez, Cassandra Dayanira

2012-05-01

Conventional neuropsychological norms developed for monolinguals likely overestimate normal performance in bilinguals on language but not visual-perceptual format tests. This was studied by comparing neuropsychological false-positive rates using the 50th percentile of conventional norms and individual comparison standards (Picture Vocabulary or Matrix Reasoning scores) as estimates of preexisting neuropsychological skill level against the number expected from the normal distribution for a consecutive sample of 56 neurologically intact, bilingual, Hispanic Americans. Participants were tested in separate sessions in Spanish and English in the counterbalanced order on La Bateria Neuropsicologica and the original English language tests on which this battery was based. For language format measures, repeated-measures multivariate analysis of variance showed that individual estimates of preexisting skill level in English generated the mean number of false positives most approximate to that expected from the normal distribution, whereas the 50th percentile of conventional English language norms did the same for visual-perceptual format measures. When using conventional Spanish or English monolingual norms for language format neuropsychological measures with bilingual Hispanic Americans, individual estimates of preexisting skill level are recommended over the 50th percentile.
A constrained multinomial Probit route choice model in the metro network: Formulation, estimation and application

PubMed Central

Zhang, Yongsheng; Wei, Heng; Zheng, Kangning

2017-01-01

Considering that metro network expansion brings us with more alternative routes, it is attractive to integrate the impacts of routes set and the interdependency among alternative routes on route choice probability into route choice modeling. Therefore, the formulation, estimation and application of a constrained multinomial probit (CMNP) route choice model in the metro network are carried out in this paper. The utility function is formulated as three components: the compensatory component is a function of influencing factors; the non-compensatory component measures the impacts of routes set on utility; following a multivariate normal distribution, the covariance of error component is structured into three parts, representing the correlation among routes, the transfer variance of route, and the unobserved variance respectively. Considering multidimensional integrals of the multivariate normal probability density function, the CMNP model is rewritten as Hierarchical Bayes formula and M-H sampling algorithm based Monte Carlo Markov Chain approach is constructed to estimate all parameters. Based on Guangzhou Metro data, reliable estimation results are gained. Furthermore, the proposed CMNP model also shows a good forecasting performance for the route choice probabilities calculation and a good application performance for transfer flow volume prediction. PMID:28591188
Normalization methods in time series of platelet function assays

PubMed Central

Van Poucke, Sven; Zhang, Zhongheng; Roest, Mark; Vukicevic, Milan; Beran, Maud; Lauwereins, Bart; Zheng, Ming-Hua; Henskens, Yvonne; Lancé, Marcus; Marcus, Abraham

2016-01-01

Abstract Platelet function can be quantitatively assessed by specific assays such as light-transmission aggregometry, multiple-electrode aggregometry measuring the response to adenosine diphosphate (ADP), arachidonic acid, collagen, and thrombin-receptor activating peptide and viscoelastic tests such as rotational thromboelastometry (ROTEM). The task of extracting meaningful statistical and clinical information from high-dimensional data spaces in temporal multivariate clinical data represented in multivariate time series is complex. Building insightful visualizations for multivariate time series demands adequate usage of normalization techniques. In this article, various methods for data normalization (z-transformation, range transformation, proportion transformation, and interquartile range) are presented and visualized discussing the most suited approach for platelet function data series. Normalization was calculated per assay (test) for all time points and per time point for all tests. Interquartile range, range transformation, and z-transformation demonstrated the correlation as calculated by the Spearman correlation test, when normalized per assay (test) for all time points. When normalizing per time point for all tests, no correlation could be abstracted from the charts as was the case when using all data as 1 dataset for normalization. PMID:27428217

Probability distributions for multimeric systems.

PubMed

Albert, Jaroslav; Rooman, Marianne

2016-01-01

We propose a fast and accurate method of obtaining the equilibrium mono-modal joint probability distributions for multimeric systems. The method necessitates only two assumptions: the copy number of all species of molecule may be treated as continuous; and, the probability density functions (pdf) are well-approximated by multivariate skew normal distributions (MSND). Starting from the master equation, we convert the problem into a set of equations for the statistical moments which are then expressed in terms of the parameters intrinsic to the MSND. Using an optimization package on Mathematica, we minimize a Euclidian distance function comprising of a sum of the squared difference between the left and the right hand sides of these equations. Comparison of results obtained via our method with those rendered by the Gillespie algorithm demonstrates our method to be highly accurate as well as efficient.
Regional magnetic resonance imaging measures for multivariate analysis in Alzheimer's disease and mild cognitive impairment.

PubMed

Westman, Eric; Aguilar, Carlos; Muehlboeck, J-Sebastian; Simmons, Andrew

2013-01-01

Automated structural magnetic resonance imaging (MRI) processing pipelines are gaining popularity for Alzheimer's disease (AD) research. They generate regional volumes, cortical thickness measures and other measures, which can be used as input for multivariate analysis. It is not clear which combination of measures and normalization approach are most useful for AD classification and to predict mild cognitive impairment (MCI) conversion. The current study includes MRI scans from 699 subjects [AD, MCI and controls (CTL)] from the Alzheimer's disease Neuroimaging Initiative (ADNI). The Freesurfer pipeline was used to generate regional volume, cortical thickness, gray matter volume, surface area, mean curvature, gaussian curvature, folding index and curvature index measures. 259 variables were used for orthogonal partial least square to latent structures (OPLS) multivariate analysis. Normalisation approaches were explored and the optimal combination of measures determined. Results indicate that cortical thickness measures should not be normalized, while volumes should probably be normalized by intracranial volume (ICV). Combining regional cortical thickness measures (not normalized) with cortical and subcortical volumes (normalized with ICV) using OPLS gave a prediction accuracy of 91.5 % when distinguishing AD versus CTL. This model prospectively predicted future decline from MCI to AD with 75.9 % of converters correctly classified. Normalization strategy did not have a significant effect on the accuracies of multivariate models containing multiple MRI measures for this large dataset. The appropriate choice of input for multivariate analysis in AD and MCI is of great importance. The results support the use of un-normalised cortical thickness measures and volumes normalised by ICV.
NONPARAMETRIC MANOVA APPROACHES FOR NON-NORMAL MULTIVARIATE OUTCOMES WITH MISSING VALUES

PubMed Central

He, Fanyin; Mazumdar, Sati; Tang, Gong; Bhatia, Triptish; Anderson, Stewart J.; Dew, Mary Amanda; Krafty, Robert; Nimgaonkar, Vishwajit; Deshpande, Smita; Hall, Martica; Reynolds, Charles F.

2017-01-01

Between-group comparisons often entail many correlated response variables. The multivariate linear model, with its assumption of multivariate normality, is the accepted standard tool for these tests. When this assumption is violated, the nonparametric multivariate Kruskal-Wallis (MKW) test is frequently used. However, this test requires complete cases with no missing values in response variables. Deletion of cases with missing values likely leads to inefficient statistical inference. Here we extend the MKW test to retain information from partially-observed cases. Results of simulated studies and analysis of real data show that the proposed method provides adequate coverage and superior power to complete-case analyses. PMID:29416225
Remote sensing of earth terrain

NASA Technical Reports Server (NTRS)

Kong, J. A.

1988-01-01

Two monographs and 85 journal and conference papers on remote sensing of earth terrain have been published, sponsored by NASA Contract NAG5-270. A multivariate K-distribution is proposed to model the statistics of fully polarimetric data from earth terrain with polarizations HH, HV, VH, and VV. In this approach, correlated polarizations of radar signals, as characterized by a covariance matrix, are treated as the sum of N n-dimensional random vectors; N obeys the negative binomial distribution with a parameter alpha and mean bar N. Subsequently, and n-dimensional K-distribution, with either zero or non-zero mean, is developed in the limit of infinite bar N or illuminated area. The probability density function (PDF) of the K-distributed vector normalized by its Euclidean norm is independent of the parameter alpha and is the same as that derived from a zero-mean Gaussian-distributed random vector. The above model is well supported by experimental data provided by MIT Lincoln Laboratory and the Jet Propulsion Laboratory in the form of polarimetric measurements.
Control-group feature normalization for multivariate pattern analysis of structural MRI data using the support vector machine.

PubMed

Linn, Kristin A; Gaonkar, Bilwaj; Satterthwaite, Theodore D; Doshi, Jimit; Davatzikos, Christos; Shinohara, Russell T

2016-05-15

Normalization of feature vector values is a common practice in machine learning. Generally, each feature value is standardized to the unit hypercube or by normalizing to zero mean and unit variance. Classification decisions based on support vector machines (SVMs) or by other methods are sensitive to the specific normalization used on the features. In the context of multivariate pattern analysis using neuroimaging data, standardization effectively up- and down-weights features based on their individual variability. Since the standard approach uses the entire data set to guide the normalization, it utilizes the total variability of these features. This total variation is inevitably dependent on the amount of marginal separation between groups. Thus, such a normalization may attenuate the separability of the data in high dimensional space. In this work we propose an alternate approach that uses an estimate of the control-group standard deviation to normalize features before training. We study our proposed approach in the context of group classification using structural MRI data. We show that control-based normalization leads to better reproducibility of estimated multivariate disease patterns and improves the classifier performance in many cases. Copyright © 2016 Elsevier Inc. All rights reserved.
Multivariate normal maximum likelihood with both ordinal and continuous variables, and data missing at random.

PubMed

Pritikin, Joshua N; Brick, Timothy R; Neale, Michael C

2018-04-01

A novel method for the maximum likelihood estimation of structural equation models (SEM) with both ordinal and continuous indicators is introduced using a flexible multivariate probit model for the ordinal indicators. A full information approach ensures unbiased estimates for data missing at random. Exceeding the capability of prior methods, up to 13 ordinal variables can be included before integration time increases beyond 1 s per row. The method relies on the axiom of conditional probability to split apart the distribution of continuous and ordinal variables. Due to the symmetry of the axiom, two similar methods are available. A simulation study provides evidence that the two similar approaches offer equal accuracy. A further simulation is used to develop a heuristic to automatically select the most computationally efficient approach. Joint ordinal continuous SEM is implemented in OpenMx, free and open-source software.
The classification of secondary colorectal liver cancer in human biopsy samples using angular dispersive x-ray diffraction and multivariate analysis

NASA Astrophysics Data System (ADS)

Theodorakou, Chrysoula; Farquharson, Michael J.

2009-08-01

The motivation behind this study is to assess whether angular dispersive x-ray diffraction (ADXRD) data, processed using multivariate analysis techniques, can be used for classifying secondary colorectal liver cancer tissue and normal surrounding liver tissue in human liver biopsy samples. The ADXRD profiles from a total of 60 samples of normal liver tissue and colorectal liver metastases were measured using a synchrotron radiation source. The data were analysed for 56 samples using nonlinear peak-fitting software. Four peaks were fitted to all of the ADXRD profiles, and the amplitude, area, amplitude and area ratios for three of the four peaks were calculated and used for the statistical and multivariate analysis. The statistical analysis showed that there are significant differences between all the peak-fitting parameters and ratios between the normal and the diseased tissue groups. The technique of soft independent modelling of class analogy (SIMCA) was used to classify normal liver tissue and colorectal liver metastases resulting in 67% of the normal tissue samples and 60% of the secondary colorectal liver tissue samples being classified correctly. This study has shown that the ADXRD data of normal and secondary colorectal liver cancer are statistically different and x-ray diffraction data analysed using multivariate analysis have the potential to be used as a method of tissue classification.
Joint pattern of seasonal hydrological droughts and floods alternation in China's Huai River Basin using the multivariate L-moments

NASA Astrophysics Data System (ADS)

Wu, ShaoFei; Zhang, Xiang; She, DunXian

2017-06-01

Under the current condition of climate change, droughts and floods occur more frequently, and events in which flooding occurs after a prolonged drought or a drought occurs after an extreme flood may have a more severe impact on natural systems and human lives. This challenges the traditional approach wherein droughts and floods are considered separately, which may largely underestimate the risk of the disasters. In our study, the sudden alternation of droughts and flood events (ADFEs) between adjacent seasons is studied using the multivariate L-moments theory and the bivariate copula functions in the Huai River Basin (HRB) of China with monthly streamflow data at 32 hydrological stations from 1956 to 2012. The dry and wet conditions are characterized by the standardized streamflow index (SSI) at a 3-month time scale. The results show that: (1) The summer streamflow makes the largest contribution to the annual streamflow, followed by the autumn streamflow and spring streamflow. (2) The entire study area can be divided into five homogeneous sub-regions using the multivariate regional homogeneity test. The generalized logistic distribution (GLO) and log-normal distribution (LN3) are acceptable to be the optimal marginal distributions under most conditions, and the Frank copula is more appropriate for spring-summer and summer-autumn SSI series. Continuous flood events dominate at most sites both in spring-summer and summer-autumn (with an average frequency of 13.78% and 17.06%, respectively), while continuous drought events come second (with an average frequency of 11.27% and 13.79%, respectively). Moreover, seasonal ADFEs most probably occurred near the mainstream of HRB, and drought and flood events are more likely to occur in summer-autumn than in spring-summer.
φq-field theory for portfolio optimization: “fat tails” and nonlinear correlations

NASA Astrophysics Data System (ADS)

Sornette, D.; Simonetti, P.; Andersen, J. V.

2000-08-01

Physics and finance are both fundamentally based on the theory of random walks (and their generalizations to higher dimensions) and on the collective behavior of large numbers of correlated variables. The archetype examplifying this situation in finance is the portfolio optimization problem in which one desires to diversify on a set of possibly dependent assets to optimize the return and minimize the risks. The standard mean-variance solution introduced by Markovitz and its subsequent developments is basically a mean-field Gaussian solution. It has severe limitations for practical applications due to the strongly non-Gaussian structure of distributions and the nonlinear dependence between assets. Here, we present in details a general analytical characterization of the distribution of returns for a portfolio constituted of assets whose returns are described by an arbitrary joint multivariate distribution. In this goal, we introduce a non-linear transformation that maps the returns onto Gaussian variables whose covariance matrix provides a new measure of dependence between the non-normal returns, generalizing the covariance matrix into a nonlinear covariance matrix. This nonlinear covariance matrix is chiseled to the specific fat tail structure of the underlying marginal distributions, thus ensuring stability and good conditioning. The portfolio distribution is then obtained as the solution of a mapping to a so-called φq field theory in particle physics, of which we offer an extensive treatment using Feynman diagrammatic techniques and large deviation theory, that we illustrate in details for multivariate Weibull distributions. The interaction (non-mean field) structure in this field theory is a direct consequence of the non-Gaussian nature of the distribution of asset price returns. We find that minimizing the portfolio variance (i.e. the relatively “small” risks) may often increase the large risks, as measured by higher normalized cumulants. Extensive empirical tests are presented on the foreign exchange market that validate satisfactorily the theory. For “fat tail” distributions, we show that an adequate prediction of the risks of a portfolio relies much more on the correct description of the tail structure rather than on their correlations. For the case of asymmetric return distributions, our theory allows us to generalize the return-risk efficient frontier concept to incorporate the dimensions of large risks embedded in the tail of the asset distributions. We demonstrate that it is often possible to increase the portfolio return while decreasing the large risks as quantified by the fourth and higher-order cumulants. Exact theoretical formulas are validated by empirical tests.
Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design.

PubMed

Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo

2017-03-15

Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
A comparison of likelihood ratio tests and Rao's score test for three separable covariance matrix structures.

PubMed

Filipiak, Katarzyna; Klein, Daniel; Roy, Anuradha

2017-01-01

The problem of testing the separability of a covariance matrix against an unstructured variance-covariance matrix is studied in the context of multivariate repeated measures data using Rao's score test (RST). The RST statistic is developed with the first component of the separable structure as a first-order autoregressive (AR(1)) correlation matrix or an unstructured (UN) covariance matrix under the assumption of multivariate normality. It is shown that the distribution of the RST statistic under the null hypothesis of any separability does not depend on the true values of the mean or the unstructured components of the separable structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data, where the likelihood ratio test (LRT) cannot be used, and it outperforms the standard LRT in a number of contexts. Monte Carlo simulations are then used to study the comparative behavior of the null distribution of the RST statistic, as well as that of the LRT statistic, in terms of sample size considerations, and for the estimation of the empirical percentiles. Our findings are compared with existing results where the first component of the separable structure is a compound symmetry (CS) correlation matrix. It is also shown by simulations that the empirical null distribution of the RST statistic converges faster than the empirical null distribution of the LRT statistic to the limiting χ 2 distribution. The tests are implemented on a real dataset from medical studies. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Modeling stochastic frontier based on vine copulas

NASA Astrophysics Data System (ADS)

Constantino, Michel; Candido, Osvaldo; Tabak, Benjamin M.; da Costa, Reginaldo Brito

2017-11-01

This article models a production function and analyzes the technical efficiency of listed companies in the United States, Germany and England between 2005 and 2012 based on the vine copula approach. Traditional estimates of the stochastic frontier assume that data is multivariate normally distributed and there is no source of asymmetry. The proposed method based on vine copulas allow us to explore different types of asymmetry and multivariate distribution. Using data on product, capital and labor, we measure the relative efficiency of the vine production function and estimate the coefficient used in the stochastic frontier literature for comparison purposes. This production vine copula predicts the value added by firms with given capital and labor in a probabilistic way. It thereby stands in sharp contrast to the production function, where the output of firms is completely deterministic. The results show that, on average, S&P500 companies are more efficient than companies listed in England and Germany, which presented similar average efficiency coefficients. For comparative purposes, the traditional stochastic frontier was estimated and the results showed discrepancies between the coefficients obtained by the application of the two methods, traditional and frontier-vine, opening new paths of non-linear research.
Usual Dietary Intakes: SAS Macros for Fitting Multivariate Measurement Error Models & Estimating Multivariate Usual Intake Distributions

Cancer.gov

The following SAS macros can be used to create a multivariate usual intake distribution for multiple dietary components that are consumed nearly every day or episodically. A SAS macro for performing balanced repeated replication (BRR) variance estimation is also included.
Using empirical Bayes predictors from generalized linear mixed models to test and visualize associations among longitudinal outcomes.

PubMed

Mikulich-Gilbertson, Susan K; Wagner, Brandie D; Grunwald, Gary K; Riggs, Paula D; Zerbe, Gary O

2018-01-01

Medical research is often designed to investigate changes in a collection of response variables that are measured repeatedly on the same subjects. The multivariate generalized linear mixed model (MGLMM) can be used to evaluate random coefficient associations (e.g. simple correlations, partial regression coefficients) among outcomes that may be non-normal and differently distributed by specifying a multivariate normal distribution for their random effects and then evaluating the latent relationship between them. Empirical Bayes predictors are readily available for each subject from any mixed model and are observable and hence, plotable. Here, we evaluate whether second-stage association analyses of empirical Bayes predictors from a MGLMM, provide a good approximation and visual representation of these latent association analyses using medical examples and simulations. Additionally, we compare these results with association analyses of empirical Bayes predictors generated from separate mixed models for each outcome, a procedure that could circumvent computational problems that arise when the dimension of the joint covariance matrix of random effects is large and prohibits estimation of latent associations. As has been shown in other analytic contexts, the p-values for all second-stage coefficients that were determined by naively assuming normality of empirical Bayes predictors provide a good approximation to p-values determined via permutation analysis. Analyzing outcomes that are interrelated with separate models in the first stage and then associating the resulting empirical Bayes predictors in a second stage results in different mean and covariance parameter estimates from the maximum likelihood estimates generated by a MGLMM. The potential for erroneous inference from using results from these separate models increases as the magnitude of the association among the outcomes increases. Thus if computable, scatterplots of the conditionally independent empirical Bayes predictors from a MGLMM are always preferable to scatterplots of empirical Bayes predictors generated by separate models, unless the true association between outcomes is zero.
On Some Multiple Decision Problems

DTIC Science & Technology

1976-08-01

parameter space. Some recent results in the area of subset selection formulation are Gnanadesikan and Gupta [28], Gupta and Studden [43], Gupta and...York, pp. 363-376. [27) Gnanadesikan , M. (1966). Some Selection and Ranking Procedures for Multivariate Normal Populations. Ph.D. Thesis. Dept. of...Statist., Purdue Univ., West Lafayette, Indiana 47907. [28) Gnanadesikan , M. and Gupta, S. S. (1970). Selection procedures for multivariate normal
Impact of distributions on the archetypes and prototypes in heterogeneous nanoparticle ensembles.

PubMed

Fernandez, Michael; Wilson, Hugh F; Barnard, Amanda S

2017-01-05

The magnitude and complexity of the structural and functional data available on nanomaterials requires data analytics, statistical analysis and information technology to drive discovery. We demonstrate that multivariate statistical analysis can recognise the sets of truly significant nanostructures and their most relevant properties in heterogeneous ensembles with different probability distributions. The prototypical and archetypal nanostructures of five virtual ensembles of Si quantum dots (SiQDs) with Boltzmann, frequency, normal, Poisson and random distributions are identified using clustering and archetypal analysis, where we find that their diversity is defined by size and shape, regardless of the type of distribution. At the complex hull of the SiQD ensembles, simple configuration archetypes can efficiently describe a large number of SiQDs, whereas more complex shapes are needed to represent the average ordering of the ensembles. This approach provides a route towards the characterisation of computationally intractable virtual nanomaterial spaces, which can convert big data into smart data, and significantly reduce the workload to simulate experimentally relevant virtual samples.
Modified Distribution-Free Goodness-of-Fit Test Statistic.

PubMed

Chun, So Yeon; Browne, Michael W; Shapiro, Alexander

2018-03-01

Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.
Semi-nonparametric VaR forecasts for hedge funds during the recent crisis

NASA Astrophysics Data System (ADS)

Del Brio, Esther B.; Mora-Valencia, Andrés; Perote, Javier

2014-05-01

The need to provide accurate value-at-risk (VaR) forecasting measures has triggered an important literature in econophysics. Although these accurate VaR models and methodologies are particularly demanded for hedge fund managers, there exist few articles specifically devoted to implement new techniques in hedge fund returns VaR forecasting. This article advances in these issues by comparing the performance of risk measures based on parametric distributions (the normal, Student’s t and skewed-t), semi-nonparametric (SNP) methodologies based on Gram-Charlier (GC) series and the extreme value theory (EVT) approach. Our results show that normal-, Student’s t- and Skewed t- based methodologies fail to forecast hedge fund VaR, whilst SNP and EVT approaches accurately success on it. We extend these results to the multivariate framework by providing an explicit formula for the GC copula and its density that encompasses the Gaussian copula and accounts for non-linear dependences. We show that the VaR obtained by the meta GC accurately captures portfolio risk and outperforms regulatory VaR estimates obtained through the meta Gaussian and Student’s t distributions.
Spatial and spectral interpolation of ground-motion intensity measure observations

USGS Publications Warehouse

Worden, Charles; Thompson, Eric M.; Baker, Jack W.; Bradley, Brendon A.; Luco, Nicolas; Wilson, David

2018-01-01

Following a significant earthquake, ground‐motion observations are available for a limited set of locations and intensity measures (IMs). Typically, however, it is desirable to know the ground motions for additional IMs and at locations where observations are unavailable. Various interpolation methods are available, but because IMs or their logarithms are normally distributed, spatially correlated, and correlated with each other at a given location, it is possible to apply the conditional multivariate normal (MVN) distribution to the problem of estimating unobserved IMs. In this article, we review the MVN and its application to general estimation problems, and then apply the MVN to the specific problem of ground‐motion IM interpolation. In particular, we present (1) a formulation of the MVN for the simultaneous interpolation of IMs across space and IM type (most commonly, spectral response at different oscillator periods) and (2) the inclusion of uncertain observation data in the MVN formulation. These techniques, in combination with modern empirical ground‐motion models and correlation functions, provide a flexible framework for estimating a variety of IMs at arbitrary locations.
On the Theory of Multivariate Elliptically Contoured Distributions and Their Applications.

DTIC Science & Technology

1982-05-01

elliptically contoured distributions has been studied by several authors: Schoenberg (1938), Kelker (1970), Devlin, Gnanadesikan and Keltenring (1976...theory of ellip- tically contoured distributions, J. Multivariate Analysis, 11, 368-385. Devlin, S. J., Gnanadesikan , R., and Kettenring, J. R. (1976

Study on discrimination of oral cancer from normal using blood plasma based on fluorescence steady and excited state at excitation wavelength 280 nm

NASA Astrophysics Data System (ADS)

Rekha, Pachaiappan; Aruna, Prakasa Rao; Ganesan, Singaravelu

2016-03-01

Many research works based on fluorescence spectroscopy have proven its potential in the diagnosis of various diseases using the spectral signatures of the native key fluorophores such as tryptophan, tyrosine, collagen, NADH, FAD and porphyrin. These fluorophores distribution, concentration and their conformation may be changed depending upon the pathological and metabolic conditions of cells and tissues. In this study, we have made an attempt to characterize the blood plasma of normal subject and oral cancer patients by native fluorescence spectroscopy at 280 nm excitation. Further, the fluorescence data were analyzed by employing the multivariate statistical method - linear discriminant analyses (LDA) using leaves one out cross validation method. The results illustrate the potential of fluorescence spectroscopy technique in the diagnosis of oral cancer using blood plasma.
Understanding characteristics in multivariate traffic flow time series from complex network structure

NASA Astrophysics Data System (ADS)

Yan, Ying; Zhang, Shen; Tang, Jinjun; Wang, Xiaofei

2017-07-01

Discovering dynamic characteristics in traffic flow is the significant step to design effective traffic managing and controlling strategy for relieving traffic congestion in urban cities. A new method based on complex network theory is proposed to study multivariate traffic flow time series. The data were collected from loop detectors on freeway during a year. In order to construct complex network from original traffic flow, a weighted Froenius norm is adopt to estimate similarity between multivariate time series, and Principal Component Analysis is implemented to determine the weights. We discuss how to select optimal critical threshold for networks at different hour in term of cumulative probability distribution of degree. Furthermore, two statistical properties of networks: normalized network structure entropy and cumulative probability of degree, are utilized to explore hourly variation in traffic flow. The results demonstrate these two statistical quantities express similar pattern to traffic flow parameters with morning and evening peak hours. Accordingly, we detect three traffic states: trough, peak and transitional hours, according to the correlation between two aforementioned properties. The classifying results of states can actually represent hourly fluctuation in traffic flow by analyzing annual average hourly values of traffic volume, occupancy and speed in corresponding hours.
Interpreting support vector machine models for multivariate group wise analysis in neuroimaging

PubMed Central

Gaonkar, Bilwaj; Shinohara, Russell T; Davatzikos, Christos

2015-01-01

Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier’s decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification. PMID:26210913
Bayesian soft X-ray tomography using non-stationary Gaussian Processes

NASA Astrophysics Data System (ADS)

Li, Dong; Svensson, J.; Thomsen, H.; Medina, F.; Werner, A.; Wolf, R.

2013-08-01

In this study, a Bayesian based non-stationary Gaussian Process (GP) method for the inference of soft X-ray emissivity distribution along with its associated uncertainties has been developed. For the investigation of equilibrium condition and fast magnetohydrodynamic behaviors in nuclear fusion plasmas, it is of importance to infer, especially in the plasma center, spatially resolved soft X-ray profiles from a limited number of noisy line integral measurements. For this ill-posed inversion problem, Bayesian probability theory can provide a posterior probability distribution over all possible solutions under given model assumptions. Specifically, the use of a non-stationary GP to model the emission allows the model to adapt to the varying length scales of the underlying diffusion process. In contrast to other conventional methods, the prior regularization is realized in a probability form which enhances the capability of uncertainty analysis, in consequence, scientists who concern the reliability of their results will benefit from it. Under the assumption of normally distributed noise, the posterior distribution evaluated at a discrete number of points becomes a multivariate normal distribution whose mean and covariance are analytically available, making inversions and calculation of uncertainty fast. Additionally, the hyper-parameters embedded in the model assumption can be optimized through a Bayesian Occam's Razor formalism and thereby automatically adjust the model complexity. This method is shown to produce convincing reconstructions and good agreements with independently calculated results from the Maximum Entropy and Equilibrium-Based Iterative Tomography Algorithm methods.
Bayesian soft X-ray tomography using non-stationary Gaussian Processes.

PubMed

Li, Dong; Svensson, J; Thomsen, H; Medina, F; Werner, A; Wolf, R

2013-08-01

In this study, a Bayesian based non-stationary Gaussian Process (GP) method for the inference of soft X-ray emissivity distribution along with its associated uncertainties has been developed. For the investigation of equilibrium condition and fast magnetohydrodynamic behaviors in nuclear fusion plasmas, it is of importance to infer, especially in the plasma center, spatially resolved soft X-ray profiles from a limited number of noisy line integral measurements. For this ill-posed inversion problem, Bayesian probability theory can provide a posterior probability distribution over all possible solutions under given model assumptions. Specifically, the use of a non-stationary GP to model the emission allows the model to adapt to the varying length scales of the underlying diffusion process. In contrast to other conventional methods, the prior regularization is realized in a probability form which enhances the capability of uncertainty analysis, in consequence, scientists who concern the reliability of their results will benefit from it. Under the assumption of normally distributed noise, the posterior distribution evaluated at a discrete number of points becomes a multivariate normal distribution whose mean and covariance are analytically available, making inversions and calculation of uncertainty fast. Additionally, the hyper-parameters embedded in the model assumption can be optimized through a Bayesian Occam's Razor formalism and thereby automatically adjust the model complexity. This method is shown to produce convincing reconstructions and good agreements with independently calculated results from the Maximum Entropy and Equilibrium-Based Iterative Tomography Algorithm methods.
Risk of portfolio with simulated returns based on copula model

NASA Astrophysics Data System (ADS)

Razak, Ruzanna Ab; Ismail, Noriszura

2015-02-01

The commonly used tool for measuring risk of a portfolio with equally weighted stocks is variance-covariance method. Under extreme circumstances, this method leads to significant underestimation of actual risk due to its multivariate normality assumption of the joint distribution of stocks. The purpose of this research is to compare the actual risk of portfolio with the simulated risk of portfolio in which the joint distribution of two return series is predetermined. The data used is daily stock prices from the ASEAN market for the period January 2000 to December 2012. The copula approach is applied to capture the time varying dependence among the return series. The results shows that the chosen copula families are not suitable to present the dependence structures of each bivariate returns. Exception for the Philippines-Thailand pair where by t copula distribution appears to be the appropriate choice to depict its dependence. Assuming that the t copula distribution is the joint distribution of each paired series, simulated returns is generated and value-at-risk (VaR) is then applied to evaluate the risk of each portfolio consisting of two simulated return series. The VaR estimates was found to be symmetrical due to the simulation of returns via elliptical copula-GARCH approach. By comparison, it is found that the actual risks are underestimated for all pairs of portfolios except for Philippines-Thailand. This study was able to show that disregard of the non-normal dependence structure of two series will result underestimation of actual risk of the portfolio.
Comparative Robustness of Recent Methods for Analyzing Multivariate Repeated Measures Designs

ERIC Educational Resources Information Center

Seco, Guillermo Vallejo; Gras, Jaime Arnau; Garcia, Manuel Ato

2007-01-01

This study evaluated the robustness of two recent methods for analyzing multivariate repeated measures when the assumptions of covariance homogeneity and multivariate normality are violated. Specifically, the authors' work compares the performance of the modified Brown-Forsythe (MBF) procedure and the mixed-model procedure adjusted by the…
Using Copula Distributions to Support More Accurate Imaging-Based Diagnostic Classifiers for Neuropsychiatric Disorders

PubMed Central

Bansal, Ravi; Hao, Xuejun; Liu, Jun; Peterson, Bradley S.

2014-01-01

Many investigators have tried to apply machine learning techniques to magnetic resonance images (MRIs) of the brain in order to diagnose neuropsychiatric disorders. Usually the number of brain imaging measures (such as measures of cortical thickness and measures of local surface morphology) derived from the MRIs (i.e., their dimensionality) has been large (e.g. >10) relative to the number of participants who provide the MRI data (<100). Sparse data in a high dimensional space increases the variability of the classification rules that machine learning algorithms generate, thereby limiting the validity, reproducibility, and generalizability of those classifiers. The accuracy and stability of the classifiers can improve significantly if the multivariate distributions of the imaging measures can be estimated accurately. To accurately estimate the multivariate distributions using sparse data, we propose to estimate first the univariate distributions of imaging data and then combine them using a Copula to generate more accurate estimates of their multivariate distributions. We then sample the estimated Copula distributions to generate dense sets of imaging measures and use those measures to train classifiers. We hypothesize that the dense sets of brain imaging measures will generate classifiers that are stable to variations in brain imaging measures, thereby improving the reproducibility, validity, and generalizability of diagnostic classification algorithms in imaging datasets from clinical populations. In our experiments, we used both computer-generated and real-world brain imaging datasets to assess the accuracy of multivariate Copula distributions in estimating the corresponding multivariate distributions of real-world imaging data. Our experiments showed that diagnostic classifiers generated using imaging measures sampled from the Copula were significantly more accurate and more reproducible than were the classifiers generated using either the real-world imaging measures or their multivariate Gaussian distributions. Thus, our findings demonstrate that estimated multivariate Copula distributions can generate dense sets of brain imaging measures that can in turn be used to train classifiers, and those classifiers are significantly more accurate and more reproducible than are those generated using real-world imaging measures alone. PMID:25093634
Distribution and determinants of QRS rotation of black and white persons in the general population.

PubMed

Prineas, Ronald J; Zhang, Zhu-Ming; Stevens, Cladd E; Soliman, Elsayed Z

The prevalence and determinants of QRS transition zones are not well established. We examined the distributions of Normal, clockwise (CW) and counterclockwise (CCW)) QRS transition zones and their relations to disease, body size and demographics in 4624 black and white men and women free of cardiovascular disease and major ECG abnormalities enrolled in the NHANES-III survey. CW transition zones were least observed (6.2%) and CCW were most prevalent (60.1%) with Normal in an intermediate position (33.7%). In multivariable logistic regression analysis, the adjusted, significant predictors for CCW compared to Normal were a greater proportion of blacks and women, fewer thin people (BMI<20, thin), a greater ratio of chest depth to chest width, and an LVMass index <80g. By contrast, CW persons were older, had larger QRS/T angles, smaller ratio of chest depth to chest width, had a greater proportion of subjects with low voltage QRS, more pulmonary disease, a greater proportion with high heart rates, shorter QRS duration and were more obese (BMI≥30). Normal rather than being the most prevalent transition zone was intermediate in frequency between the most frequently encountered CCW and the least frequently encountered transition zone CW. Differences in the predictors of CW and CCW exist. This requires further investigation to examine how far these differences explain the differences in the published prognostic differences between CW and CCW. Copyright © 2017 Elsevier Inc. All rights reserved.
Enhancing Multimedia Imbalanced Concept Detection Using VIMP in Random Forests.

PubMed

Sadiq, Saad; Yan, Yilin; Shyu, Mei-Ling; Chen, Shu-Ching; Ishwaran, Hemant

2016-07-01

Recent developments in social media and cloud storage lead to an exponential growth in the amount of multimedia data, which increases the complexity of managing, storing, indexing, and retrieving information from such big data. Many current content-based concept detection approaches lag from successfully bridging the semantic gap. To solve this problem, a multi-stage random forest framework is proposed to generate predictor variables based on multivariate regressions using variable importance (VIMP). By fine tuning the forests and significantly reducing the predictor variables, the concept detection scores are evaluated when the concept of interest is rare and imbalanced, i.e., having little collaboration with other high level concepts. Using classical multivariate statistics, estimating the value of one coordinate using other coordinates standardizes the covariates and it depends upon the variance of the correlations instead of the mean. Thus, conditional dependence on the data being normally distributed is eliminated. Experimental results demonstrate that the proposed framework outperforms those approaches in the comparison in terms of the Mean Average Precision (MAP) values.
SPICE: exploration and analysis of post-cytometric complex multivariate datasets.

PubMed

Roederer, Mario; Nozzi, Joshua L; Nason, Martha C

2011-02-01

Polychromatic flow cytometry results in complex, multivariate datasets. To date, tools for the aggregate analysis of these datasets across multiple specimens grouped by different categorical variables, such as demographic information, have not been optimized. Often, the exploration of such datasets is accomplished by visualization of patterns with pie charts or bar charts, without easy access to statistical comparisons of measurements that comprise multiple components. Here we report on algorithms and a graphical interface we developed for these purposes. In particular, we discuss thresholding necessary for accurate representation of data in pie charts, the implications for display and comparison of normalized versus unnormalized data, and the effects of averaging when samples with significant background noise are present. Finally, we define a statistic for the nonparametric comparison of complex distributions to test for difference between groups of samples based on multi-component measurements. While originally developed to support the analysis of T cell functional profiles, these techniques are amenable to a broad range of datatypes. Published 2011 Wiley-Liss, Inc.
Multivariate Normal Tissue Complication Probability Modeling of Heart Valve Dysfunction in Hodgkin Lymphoma Survivors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cella, Laura, E-mail: laura.cella@cnr.it; Department of Advanced Biomedical Sciences, Federico II University School of Medicine, Naples; Liuzzi, Raffaele

Purpose: To establish a multivariate normal tissue complication probability (NTCP) model for radiation-induced asymptomatic heart valvular defects (RVD). Methods and Materials: Fifty-six patients treated with sequential chemoradiation therapy for Hodgkin lymphoma (HL) were retrospectively reviewed for RVD events. Clinical information along with whole heart, cardiac chambers, and lung dose distribution parameters was collected, and the correlations to RVD were analyzed by means of Spearman's rank correlation coefficient (Rs). For the selection of the model order and parameters for NTCP modeling, a multivariate logistic regression method using resampling techniques (bootstrapping) was applied. Model performance was evaluated using the area under themore » receiver operating characteristic curve (AUC). Results: When we analyzed the whole heart, a 3-variable NTCP model including the maximum dose, whole heart volume, and lung volume was shown to be the optimal predictive model for RVD (Rs = 0.573, P<.001, AUC = 0.83). When we analyzed the cardiac chambers individually, for the left atrium and for the left ventricle, an NTCP model based on 3 variables including the percentage volume exceeding 30 Gy (V30), cardiac chamber volume, and lung volume was selected as the most predictive model (Rs = 0.539, P<.001, AUC = 0.83; and Rs = 0.557, P<.001, AUC = 0.82, respectively). The NTCP values increase as heart maximum dose or cardiac chambers V30 increase. They also increase with larger volumes of the heart or cardiac chambers and decrease when lung volume is larger. Conclusions: We propose logistic NTCP models for RVD considering not only heart irradiation dose but also the combined effects of lung and heart volumes. Our study establishes the statistical evidence of the indirect effect of lung size on radio-induced heart toxicity.« less
Evaluation of the environmental contamination at an abandoned mining site using multivariate statistical techniques--the Rodalquilar (Southern Spain) mining district.

PubMed

Bagur, M G; Morales, S; López-Chicano, M

2009-11-15

Unsupervised and supervised pattern recognition techniques such as hierarchical cluster analysis, principal component analysis, factor analysis and linear discriminant analysis have been applied to water samples recollected in Rodalquilar mining district (Southern Spain) in order to identify different sources of environmental pollution caused by the abandoned mining industry. The effect of the mining activity on waters was monitored determining the concentration of eleven elements (Mn, Ba, Co, Cu, Zn, As, Cd, Sb, Hg, Au and Pb) by inductively coupled plasma mass spectrometry (ICP-MS). The Box-Cox transformation has been used to transform the data set in normal form in order to minimize the non-normal distribution of the geochemical data. The environmental impact is affected mainly by the mining activity developed in the zone, the acid drainage and finally by the chemical treatment used for the benefit of gold.
Diffuse reflection from a stochastically bounded, semi-infinite medium

NASA Technical Reports Server (NTRS)

Lumme, K.; Peltoniemi, J. I.; Irvine, W. M.

1990-01-01

In order to determine the diffuse reflection from a medium bounded by a rough surface, the problem of radiative transfer in a boundary layer characterized by a statistical distribution of heights is considered. For the case that the surface is defined by a multivariate normal probability density, the propagation probability for rays traversing the boundary layer is derived and, from that probability, a corresponding radiative transfer equation. A solution of the Eddington (two stream) type is found explicitly, and examples are given. The results should be applicable to reflection from the regoliths of solar system bodies, as well as from a rough ocean surface.
Coping with matrix effects in headspace solid phase microextraction gas chromatography using multivariate calibration strategies.

PubMed

Ferreira, Vicente; Herrero, Paula; Zapata, Julián; Escudero, Ana

2015-08-14

SPME is extremely sensitive to experimental parameters affecting liquid-gas and gas-solid distribution coefficients. Our aims were to measure the weights of these factors and to design a multivariate strategy based on the addition of a pool of internal standards, to minimize matrix effects. Synthetic but real-like wines containing selected analytes and variable amounts of ethanol, non-volatile constituents and major volatile compounds were prepared following a factorial design. The ANOVA study revealed that even using a strong matrix dilution, matrix effects are important and additive with non-significant interaction effects and that it is the presence of major volatile constituents the most dominant factor. A single internal standard provided a robust calibration for 15 out of 47 analytes. Then, two different multivariate calibration strategies based on Partial Least Square Regression were run in order to build calibration functions based on 13 different internal standards able to cope with matrix effects. The first one is based in the calculation of Multivariate Internal Standards (MIS), linear combinations of the normalized signals of the 13 internal standards, which provide the expected area of a given unit of analyte present in each sample. The second strategy is a direct calibration relating concentration to the 13 relative areas measured in each sample for each analyte. Overall, 47 different compounds can be reliably quantified in a single fully automated method with overall uncertainties better than 15%. Copyright © 2015 Elsevier B.V. All rights reserved.
Generating Virtual Patients by Multivariate and Discrete Re-Sampling Techniques.

PubMed

Teutonico, D; Musuamba, F; Maas, H J; Facius, A; Yang, S; Danhof, M; Della Pasqua, O

2015-10-01

Clinical Trial Simulations (CTS) are a valuable tool for decision-making during drug development. However, to obtain realistic simulation scenarios, the patients included in the CTS must be representative of the target population. This is particularly important when covariate effects exist that may affect the outcome of a trial. The objective of our investigation was to evaluate and compare CTS results using re-sampling from a population pool and multivariate distributions to simulate patient covariates. COPD was selected as paradigm disease for the purposes of our analysis, FEV1 was used as response measure and the effects of a hypothetical intervention were evaluated in different populations in order to assess the predictive performance of the two methods. Our results show that the multivariate distribution method produces realistic covariate correlations, comparable to the real population. Moreover, it allows simulation of patient characteristics beyond the limits of inclusion and exclusion criteria in historical protocols. Both methods, discrete resampling and multivariate distribution generate realistic pools of virtual patients. However the use of a multivariate distribution enable more flexible simulation scenarios since it is not necessarily bound to the existing covariate combinations in the available clinical data sets.
Bayesian Nonparametric Ordination for the Analysis of Microbial Communities.

PubMed

Ren, Boyu; Bacallado, Sergio; Favaro, Stefano; Holmes, Susan; Trippa, Lorenzo

2017-01-01

Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.
Use of multivariate measures of disability in health surveys.

PubMed Central

Charlton, J R; Patrick, D L; Peach, H

1983-01-01

It has been claimed that the aggregation of information from several areas of life into a small set of global measures has certain advantages for describing disability. Global measures of disability were constructed from a modified version of an existing health survey instrument and the sickness impact profile (SIP) and their properties were tested. The disability items grouped satisfactorily into five global measures (physical, psychosocial, eating, communication, and work). All disability measures (global and original category scores) were poor predictors of service use by individuals but were related as expected to age and number of medical conditions. The global measures generally had lower standard errors and better repeatability. All scores exhibit J-shaped distributions for cross sectional data but the change in global measures over time was consistent with the normal distribution. Preferably, both global and category measures should be used for comparing changes over time between groups of individuals. PMID:6655420
A framework for multivariate data-based at-site flood frequency analysis: Essentiality of the conjugal application of parametric and nonparametric approaches

NASA Astrophysics Data System (ADS)

Vittal, H.; Singh, Jitendra; Kumar, Pankaj; Karmakar, Subhankar

2015-06-01

In watershed management, flood frequency analysis (FFA) is performed to quantify the risk of flooding at different spatial locations and also to provide guidelines for determining the design periods of flood control structures. The traditional FFA was extensively performed by considering univariate scenario for both at-site and regional estimation of return periods. However, due to inherent mutual dependence of the flood variables or characteristics [i.e., peak flow (P), flood volume (V) and flood duration (D), which are random in nature], analysis has been further extended to multivariate scenario, with some restrictive assumptions. To overcome the assumption of same family of marginal density function for all flood variables, the concept of copula has been introduced. Although, the advancement from univariate to multivariate analyses drew formidable attention to the FFA research community, the basic limitation was that the analyses were performed with the implementation of only parametric family of distributions. The aim of the current study is to emphasize the importance of nonparametric approaches in the field of multivariate FFA; however, the nonparametric distribution may not always be a good-fit and capable of replacing well-implemented multivariate parametric and multivariate copula-based applications. Nevertheless, the potential of obtaining best-fit using nonparametric distributions might be improved because such distributions reproduce the sample's characteristics, resulting in more accurate estimations of the multivariate return period. Hence, the current study shows the importance of conjugating multivariate nonparametric approach with multivariate parametric and copula-based approaches, thereby results in a comprehensive framework for complete at-site FFA. Although the proposed framework is designed for at-site FFA, this approach can also be applied to regional FFA because regional estimations ideally include at-site estimations. The framework is based on the following steps: (i) comprehensive trend analysis to assess nonstationarity in the observed data; (ii) selection of the best-fit univariate marginal distribution with a comprehensive set of parametric and nonparametric distributions for the flood variables; (iii) multivariate frequency analyses with parametric, copula-based and nonparametric approaches; and (iv) estimation of joint and various conditional return periods. The proposed framework for frequency analysis is demonstrated using 110 years of observed data from Allegheny River at Salamanca, New York, USA. The results show that for both univariate and multivariate cases, the nonparametric Gaussian kernel provides the best estimate. Further, we perform FFA for twenty major rivers over continental USA, which shows for seven rivers, all the flood variables followed nonparametric Gaussian kernel; whereas for other rivers, parametric distributions provide the best-fit either for one or two flood variables. Thus the summary of results shows that the nonparametric method cannot substitute the parametric and copula-based approaches, but should be considered during any at-site FFA to provide the broadest choices for best estimation of the flood return periods.
Multivariate η-μ fading distribution with arbitrary correlation model

NASA Astrophysics Data System (ADS)

Ghareeb, Ibrahim; Atiani, Amani

2018-03-01

An extensive analysis for the multivariate ? distribution with arbitrary correlation is presented, where novel analytical expressions for the multivariate probability density function, cumulative distribution function and moment generating function (MGF) of arbitrarily correlated and not necessarily identically distributed ? power random variables are derived. Also, this paper provides exact-form expression for the MGF of the instantaneous signal-to-noise ratio at the combiner output in a diversity reception system with maximal-ratio combining and post-detection equal-gain combining operating in slow frequency nonselective arbitrarily correlated not necessarily identically distributed ?-fading channels. The average bit error probability of differentially detected quadrature phase shift keying signals with post-detection diversity reception system over arbitrarily correlated and not necessarily identical fading parameters ?-fading channels is determined by using the MGF-based approach. The effect of fading correlation between diversity branches, fading severity parameters and diversity level is studied.

Head and facial anthropometry of mixed-race US Army male soldiers for military design and sizing: a pilot study.

PubMed

Yokota, Miyo

2005-05-01

In the United States, the biologically admixed population is increasing. Such demographic changes may affect the distribution of anthropometric characteristics, which are incorporated into the design of equipment and clothing for the US Army and other large organizations. The purpose of this study was to examine multivariate craniofacial anthropometric distributions between biologically admixed male populations and single racial groups of Black and White males. Multivariate statistical results suggested that nose breadth and lip length were different between Blacks and Whites. Such differences may be considered for adjustments to respirators and chemical-biological protective masks. However, based on this pilot study, multivariate anthropometric distributions of admixed individuals were within the distributions of single racial groups. Based on the sample reported, sizing and designing for the admixed groups are not necessary if anthropometric distributions of single racial groups comprising admixed groups are known.
Use of the Wii Gaming System for Balance Rehabilitation: Establishing Parameters for Healthy Individuals.

PubMed

Burns, Melissa K; Andeway, Kathleen; Eppenstein, Paula; Ruroede, Kathleen

2014-06-01

This study was designed to establish balance parameters for the Nintendo(®) (Redmond, WA) "Wii Fit™" Balance Board system with three common games, in a sample of healthy adults, and to evaluate the balance measurement reproducibility with separation by age. This was a prospective, multivariate analysis of variance, cohort study design. Seventy-five participants who satisfied all inclusion criteria and completed an informed consent were enrolled. Participants were grouped into age ranges: 21-35 years (n=24), 36-50 years (n=24), and 51-65 years (n=27). Each participant completed the following games three consecutive times, in a randomized order, during one session: "Balance Bubble" (BB) for distance and duration, "Tight Rope" (TR) for distance and duration, and "Center of Balance" (COB) on the left and right sides. COB distributed weight was fairly symmetrical across all subjects and trials; therefore, no influence was assumed on or interaction with other "Wii Fit" measurements. Homogeneity of variance statistics indicated the assumption of distribution normality of the dependent variables (rates) were tenable. The multivariate analysis of variance included dependent variables BB and TR rates (distance divided by duration to complete) with age group and trials as the independent variables. The BB rate was statistically significant (F=4.725, P<0.005), but not the TR rate. The youngest group's BB rate was significantly larger than those of the other two groups. "Wii Fit" can discriminate among age groups across trials. The results show promise as a viable tool to measure balance and distance across time (speed) and center of balance distribution.
Informative markers identification and multivariate analysis of selected DxP for the purpose of QTL mapping

NASA Astrophysics Data System (ADS)

Hazirah S., Z.; Maizura, I.; Rajinder, S.; Mohd Isa Z., A.; Ismanizan, I.

2014-09-01

A study was carried out to generate a linkage map of oil palm dura x pisifera (DXP) population. A subset of sample from a DXP mapping family was screened using 325 SSR primers, of which 221 were informative. To date, 150 SSRs have been genotyped across the entire DxP population via capillary sequencer, where 73 SSRs had 1:1 segregation ratio, 64 had 1:1:1:1, 3 had 3:1 and ten had 1:2:1 segregation ratios. Kolmogorov-Smirnov tests by SPSS revealed that most of the bunch quality components had normal distribution which fulfilled one of the pre-requisites to carry out phenotype-genotype correlation association.
Infilling and quality checking of discharge, precipitation and temperature data using a copula based approach

NASA Astrophysics Data System (ADS)

Anwar, Faizan; Bárdossy, András; Seidel, Jochen

2017-04-01

Estimating missing values in a time series of a hydrological variable is an everyday task for a hydrologist. Existing methods such as inverse distance weighting, multivariate regression, and kriging, though simple to apply, provide no indication of the quality of the estimated value and depend mainly on the values of neighboring stations at a given step in the time series. Copulas have the advantage of representing the pure dependence structure between two or more variables (given the relationship between them is monotonic). They rid us of questions such as transforming the data before use or calculating functions that model the relationship between the considered variables. A copula-based approach is suggested to infill discharge, precipitation, and temperature data. As a first step the normal copula is used, subsequently, the necessity to use non-normal / non-symmetrical dependence is investigated. Discharge and temperature are treated as regular continuous variables and can be used without processing for infilling and quality checking. Due to the mixed distribution of precipitation values, it has to be treated differently. This is done by assigning a discrete probability to the zeros and treating the rest as a continuous distribution. Building on the work of others, along with infilling, the normal copula is also utilized to identify values in a time series that might be erroneous. This is done by treating the available value as missing, infilling it using the normal copula and checking if it lies within a confidence band (5 to 95% in our case) of the obtained conditional distribution. Hydrological data from two catchments Upper Neckar River (Germany) and Santa River (Peru) are used to demonstrate the application for datasets with different data quality. The Python code used here is also made available on GitHub. The required input is the time series of a given variable at different stations.
Comparative multivariate analyses of transient otoacoustic emissions and distorsion products in normal and impaired hearing.

PubMed

Stamate, Mirela Cristina; Todor, Nicolae; Cosgarea, Marcel

2015-01-01

The clinical utility of otoacoustic emissions as a noninvasive objective test of cochlear function has been long studied. Both transient otoacoustic emissions and distorsion products can be used to identify hearing loss, but to what extent they can be used as predictors for hearing loss is still debated. Most studies agree that multivariate analyses have better test performances than univariate analyses. The aim of the study was to determine transient otoacoustic emissions and distorsion products performance in identifying normal and impaired hearing loss, using the pure tone audiogram as a gold standard procedure and different multivariate statistical approaches. The study included 105 adult subjects with normal hearing and hearing loss who underwent the same test battery: pure-tone audiometry, tympanometry, otoacoustic emission tests. We chose to use the logistic regression as a multivariate statistical technique. Three logistic regression models were developed to characterize the relations between different risk factors (age, sex, tinnitus, demographic features, cochlear status defined by otoacoustic emissions) and hearing status defined by pure-tone audiometry. The multivariate analyses allow the calculation of the logistic score, which is a combination of the inputs, weighted by coefficients, calculated within the analyses. The accuracy of each model was assessed using receiver operating characteristics curve analysis. We used the logistic score to generate receivers operating curves and to estimate the areas under the curves in order to compare different multivariate analyses. We compared the performance of each otoacoustic emission (transient, distorsion product) using three different multivariate analyses for each ear, when multi-frequency gold standards were used. We demonstrated that all multivariate analyses provided high values of the area under the curve proving the performance of the otoacoustic emissions. Each otoacoustic emission test presented high values of area under the curve, suggesting that implementing a multivariate approach to evaluate the performances of each otoacoustic emission test would serve to increase the accuracy in identifying the normal and impaired ears. We encountered the highest area under the curve value for the combined multivariate analysis suggesting that both otoacoustic emission tests should be used in assessing hearing status. Our multivariate analyses revealed that age is a constant predictor factor of the auditory status for both ears, but the presence of tinnitus was the most important predictor for the hearing level, only for the left ear. Age presented similar coefficients, but tinnitus coefficients, by their high value, produced the highest variations of the logistic scores, only for the left ear group, thus increasing the risk of hearing loss. We did not find gender differences between ears for any otoacoustic emission tests, but studies still debate this question as the results are contradictory. Neither gender, nor environment origin had any predictive value for the hearing status, according to the results of our study. Like any other audiological test, using otoacoustic emissions to identify hearing loss is not without error. Even when applying multivariate analysis, perfect test performance is never achieved. Although most studies demonstrated the benefit of using the multivariate analysis, it has not been incorporated into clinical decisions maybe because of the idiosyncratic nature of multivariate solutions or because of the lack of the validation studies.
Comparative multivariate analyses of transient otoacoustic emissions and distorsion products in normal and impaired hearing

PubMed Central

STAMATE, MIRELA CRISTINA; TODOR, NICOLAE; COSGAREA, MARCEL

2015-01-01

Background and aim The clinical utility of otoacoustic emissions as a noninvasive objective test of cochlear function has been long studied. Both transient otoacoustic emissions and distorsion products can be used to identify hearing loss, but to what extent they can be used as predictors for hearing loss is still debated. Most studies agree that multivariate analyses have better test performances than univariate analyses. The aim of the study was to determine transient otoacoustic emissions and distorsion products performance in identifying normal and impaired hearing loss, using the pure tone audiogram as a gold standard procedure and different multivariate statistical approaches. Methods The study included 105 adult subjects with normal hearing and hearing loss who underwent the same test battery: pure-tone audiometry, tympanometry, otoacoustic emission tests. We chose to use the logistic regression as a multivariate statistical technique. Three logistic regression models were developed to characterize the relations between different risk factors (age, sex, tinnitus, demographic features, cochlear status defined by otoacoustic emissions) and hearing status defined by pure-tone audiometry. The multivariate analyses allow the calculation of the logistic score, which is a combination of the inputs, weighted by coefficients, calculated within the analyses. The accuracy of each model was assessed using receiver operating characteristics curve analysis. We used the logistic score to generate receivers operating curves and to estimate the areas under the curves in order to compare different multivariate analyses. Results We compared the performance of each otoacoustic emission (transient, distorsion product) using three different multivariate analyses for each ear, when multi-frequency gold standards were used. We demonstrated that all multivariate analyses provided high values of the area under the curve proving the performance of the otoacoustic emissions. Each otoacoustic emission test presented high values of area under the curve, suggesting that implementing a multivariate approach to evaluate the performances of each otoacoustic emission test would serve to increase the accuracy in identifying the normal and impaired ears. We encountered the highest area under the curve value for the combined multivariate analysis suggesting that both otoacoustic emission tests should be used in assessing hearing status. Our multivariate analyses revealed that age is a constant predictor factor of the auditory status for both ears, but the presence of tinnitus was the most important predictor for the hearing level, only for the left ear. Age presented similar coefficients, but tinnitus coefficients, by their high value, produced the highest variations of the logistic scores, only for the left ear group, thus increasing the risk of hearing loss. We did not find gender differences between ears for any otoacoustic emission tests, but studies still debate this question as the results are contradictory. Neither gender, nor environment origin had any predictive value for the hearing status, according to the results of our study. Conclusion Like any other audiological test, using otoacoustic emissions to identify hearing loss is not without error. Even when applying multivariate analysis, perfect test performance is never achieved. Although most studies demonstrated the benefit of using the multivariate analysis, it has not been incorporated into clinical decisions maybe because of the idiosyncratic nature of multivariate solutions or because of the lack of the validation studies. PMID:26733749
Species distribution modelling for plant communities: Stacked single species or multivariate modelling approaches?

Treesearch

Emilie B. Henderson; Janet L. Ohmann; Matthew J. Gregory; Heather M. Roberts; Harold S.J. Zald

2014-01-01

Landscape management and conservation planning require maps of vegetation composition and structure over large regions. Species distribution models (SDMs) are often used for individual species, but projects mapping multiple species are rarer. We compare maps of plant community composition assembled by stacking results from many SDMs with multivariate maps constructed...
Distributions of Characteristic Roots in Multivariate Analysis

DTIC Science & Technology

1976-07-01

stiidied by various authors, have been briefly discussed. Such distributional ies of four test criteria and a few less important ones which are...functions h. -nots have further been discussed in view of the power comparisons made in co. ion wich tests of three multivariate hypotheses. In addition...one- sample case has also been considered in terms of distributional aspects of the ch. roots and criteria for tests of two hypotheses on the
Simulating Multivariate Nonnormal Data Using an Iterative Algorithm

ERIC Educational Resources Information Center

Ruscio, John; Kaczetow, Walter

2008-01-01

Simulating multivariate nonnormal data with specified correlation matrices is difficult. One especially popular method is Vale and Maurelli's (1983) extension of Fleishman's (1978) polynomial transformation technique to multivariate applications. This requires the specification of distributional moments and the calculation of an intermediate…
Characteristics of Mild Cognitive Impairment Using the Thai Version of the Consortium to Establish a Registry for Alzheimer's Disease Tests: A Multivariate and Machine Learning Study.

PubMed

Tunvirachaisakul, Chavit; Supasitthumrong, Thitiporn; Tangwongchai, Sookjareon; Hemrunroj, Solaphat; Chuchuen, Phenphichcha; Tawankanjanachot, Itthipol; Likitchareon, Yuthachai; Phanthumchinda, Kamman; Sriswasdi, Sira; Maes, Michael

2018-04-04

The Consortium to Establish a Registry for Alzheimer's Disease (CERAD) developed a neuropsychological battery (CERAD-NP) to screen patients with Alzheimer's dementia. Mild cognitive impairment (MCI) has received attention as a pre-dementia stage. To delineate the CERAD-NP features of MCI and their clinical utility to externally validate MCI diagnosis. The study included 60 patients with MCI, diagnosed using the Clinical Dementia Rating, and 63 normal controls. Data were analysed employing receiver operating characteristic analysis, Linear Support Vector Machine, Random Forest, Adaptive Boosting, Neural Network models, and t-distributed stochastic neighbour embedding (t-SNE). MCI patients were best discriminated from normal controls using a combination of Wordlist Recall, Wordlist Memory, and Verbal Fluency Test. Machine learning showed that the CERAD features learned from MCI patients and controls were not strongly predictive of the diagnosis (maximal cross-validation 77.2%), whilst t-SNE showed that there is a considerable overlap between MCI and controls. The most important features of the CERAD-NP differentiating MCI from normal controls indicate impairments in episodic and semantic memory and recall. While these features significantly discriminate MCI patients from normal controls, the tests are not predictive of MCI. © 2018 S. Karger AG, Basel.
Hot spots of multivariate extreme anomalies in Earth observations

NASA Astrophysics Data System (ADS)

Flach, M.; Sippel, S.; Bodesheim, P.; Brenning, A.; Denzler, J.; Gans, F.; Guanche, Y.; Reichstein, M.; Rodner, E.; Mahecha, M. D.

2016-12-01

Anomalies in Earth observations might indicate data quality issues, extremes or the change of underlying processes within a highly multivariate system. Thus, considering the multivariate constellation of variables for extreme detection yields crucial additional information over conventional univariate approaches. We highlight areas in which multivariate extreme anomalies are more likely to occur, i.e. hot spots of extremes in global atmospheric Earth observations that impact the Biosphere. In addition, we present the year of the most unusual multivariate extreme between 2001 and 2013 and show that these coincide with well known high impact extremes. Technically speaking, we account for multivariate extremes by using three sophisticated algorithms adapted from computer science applications. Namely an ensemble of the k-nearest neighbours mean distance, a kernel density estimation and an approach based on recurrences is used. However, the impact of atmosphere extremes on the Biosphere might largely depend on what is considered to be normal, i.e. the shape of the mean seasonal cycle and its inter-annual variability. We identify regions with similar mean seasonality by means of dimensionality reduction in order to estimate in each region both the `normal' variance and robust thresholds for detecting the extremes. In addition, we account for challenges like heteroscedasticity in Northern latitudes. Apart from hot spot areas, those anomalies in the atmosphere time series are of particular interest, which can only be detected by a multivariate approach but not by a simple univariate approach. Such an anomalous constellation of atmosphere variables is of interest if it impacts the Biosphere. The multivariate constellation of such an anomalous part of a time series is shown in one case study indicating that multivariate anomaly detection can provide novel insights into Earth observations.
A system to build distributed multivariate models and manage disparate data sharing policies: implementation in the scalable national network for effectiveness research.

PubMed

Meeker, Daniella; Jiang, Xiaoqian; Matheny, Michael E; Farcas, Claudiu; D'Arcy, Michel; Pearlman, Laura; Nookala, Lavanya; Day, Michele E; Kim, Katherine K; Kim, Hyeoneui; Boxwala, Aziz; El-Kareh, Robert; Kuo, Grace M; Resnic, Frederic S; Kesselman, Carl; Ohno-Machado, Lucila

2015-11-01

Centralized and federated models for sharing data in research networks currently exist. To build multivariate data analysis for centralized networks, transfer of patient-level data to a central computation resource is necessary. The authors implemented distributed multivariate models for federated networks in which patient-level data is kept at each site and data exchange policies are managed in a study-centric manner. The objective was to implement infrastructure that supports the functionality of some existing research networks (e.g., cohort discovery, workflow management, and estimation of multivariate analytic models on centralized data) while adding additional important new features, such as algorithms for distributed iterative multivariate models, a graphical interface for multivariate model specification, synchronous and asynchronous response to network queries, investigator-initiated studies, and study-based control of staff, protocols, and data sharing policies. Based on the requirements gathered from statisticians, administrators, and investigators from multiple institutions, the authors developed infrastructure and tools to support multisite comparative effectiveness studies using web services for multivariate statistical estimation in the SCANNER federated network. The authors implemented massively parallel (map-reduce) computation methods and a new policy management system to enable each study initiated by network participants to define the ways in which data may be processed, managed, queried, and shared. The authors illustrated the use of these systems among institutions with highly different policies and operating under different state laws. Federated research networks need not limit distributed query functionality to count queries, cohort discovery, or independently estimated analytic models. Multivariate analyses can be efficiently and securely conducted without patient-level data transport, allowing institutions with strict local data storage requirements to participate in sophisticated analyses based on federated research networks. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Back to Normal! Gaussianizing posterior distributions for cosmological probes

NASA Astrophysics Data System (ADS)

Schuhmann, Robert L.; Joachimi, Benjamin; Peiris, Hiranya V.

2014-05-01

We present a method to map multivariate non-Gaussian posterior probability densities into Gaussian ones via nonlinear Box-Cox transformations, and generalizations thereof. This is analogous to the search for normal parameters in the CMB, but can in principle be applied to any probability density that is continuous and unimodal. The search for the optimally Gaussianizing transformation amongst the Box-Cox family is performed via a maximum likelihood formalism. We can judge the quality of the found transformation a posteriori: qualitatively via statistical tests of Gaussianity, and more illustratively by how well it reproduces the credible regions. The method permits an analytical reconstruction of the posterior from a sample, e.g. a Markov chain, and simplifies the subsequent joint analysis with other experiments. Furthermore, it permits the characterization of a non-Gaussian posterior in a compact and efficient way. The expression for the non-Gaussian posterior can be employed to find analytic formulae for the Bayesian evidence, and consequently be used for model comparison.
Analytical Fingerprint of Wolframite Ore Concentrates.

PubMed

Gäbler, Hans-Eike; Schink, Wilhelm; Goldmann, Simon; Bahr, Andreas; Gawronski, Timo

2017-07-01

Ongoing violent conflicts in Central Africa are fueled by illegal mining and trading of tantalum, tin, and tungsten ores. The credibility of document-based traceability systems can be improved by an analytical fingerprint applied as an independent method to confirm or doubt the documented origin of ore minerals. Wolframite (Fe,Mn)WO 4 is the most important ore mineral for tungsten and is subject to artisanal mining in Central Africa. Element concentrations of wolframite grains analyzed by laser ablation-inductively coupled plasma-mass spectrometry are used to establish the analytical fingerprint. The data from ore concentrate samples are multivariate, not normal or log-normal distributed. The samples cannot be regarded as representative aliquots of a population. Based on the Kolmogorov-Smirnov distance, a measure of similarity between a sample in question and reference samples from a database is determined. A decision criterion is deduced to recognize samples which do not originate from the declared mine site. © 2017 American Academy of Forensic Sciences.
Categorical speech processing in Broca's area: an fMRI study using multivariate pattern-based analysis.

PubMed

Lee, Yune-Sang; Turkeltaub, Peter; Granger, Richard; Raizada, Rajeev D S

2012-03-14

Although much effort has been directed toward understanding the neural basis of speech processing, the neural processes involved in the categorical perception of speech have been relatively less studied, and many questions remain open. In this functional magnetic resonance imaging (fMRI) study, we probed the cortical regions mediating categorical speech perception using an advanced brain-mapping technique, whole-brain multivariate pattern-based analysis (MVPA). Normal healthy human subjects (native English speakers) were scanned while they listened to 10 consonant-vowel syllables along the /ba/-/da/ continuum. Outside of the scanner, individuals' own category boundaries were measured to divide the fMRI data into /ba/ and /da/ conditions per subject. The whole-brain MVPA revealed that Broca's area and the left pre-supplementary motor area evoked distinct neural activity patterns between the two perceptual categories (/ba/ vs /da/). Broca's area was also found when the same analysis was applied to another dataset (Raizada and Poldrack, 2007), which previously yielded the supramarginal gyrus using a univariate adaptation-fMRI paradigm. The consistent MVPA findings from two independent datasets strongly indicate that Broca's area participates in categorical speech perception, with a possible role of translating speech signals into articulatory codes. The difference in results between univariate and multivariate pattern-based analyses of the same data suggest that processes in different cortical areas along the dorsal speech perception stream are distributed on different spatial scales.
Generalized t-statistic for two-group classification.

PubMed

Komori, Osamu; Eguchi, Shinto; Copas, John B

2015-06-01

In the classic discriminant model of two multivariate normal distributions with equal variance matrices, the linear discriminant function is optimal both in terms of the log likelihood ratio and in terms of maximizing the standardized difference (the t-statistic) between the means of the two distributions. In a typical case-control study, normality may be sensible for the control sample but heterogeneity and uncertainty in diagnosis may suggest that a more flexible model is needed for the cases. We generalize the t-statistic approach by finding the linear function which maximizes a standardized difference but with data from one of the groups (the cases) filtered by a possibly nonlinear function U. We study conditions for consistency of the method and find the function U which is optimal in the sense of asymptotic efficiency. Optimality may also extend to other measures of discriminatory efficiency such as the area under the receiver operating characteristic curve. The optimal function U depends on a scalar probability density function which can be estimated non-parametrically using a standard numerical algorithm. A lasso-like version for variable selection is implemented by adding L1-regularization to the generalized t-statistic. Two microarray data sets in the study of asthma and various cancers are used as motivating examples. © 2014, The International Biometric Society.
Novel method for hit-position reconstruction using voltage signals in plastic scintillators and its application to Positron Emission Tomography

NASA Astrophysics Data System (ADS)

Raczyński, L.; Moskal, P.; Kowalski, P.; Wiślicki, W.; Bednarski, T.; Białas, P.; Czerwiński, E.; Kapłon, Ł.; Kochanowski, A.; Korcyl, G.; Kowal, J.; Kozik, T.; Krzemień, W.; Kubicz, E.; Molenda, M.; Moskal, I.; Niedźwiecki, Sz.; Pałka, M.; Pawlik-Niedźwiecka, M.; Rudy, Z.; Salabura, P.; Sharma, N. G.; Silarski, M.; Słomski, A.; Smyrski, J.; Strzelecki, A.; Wieczorek, A.; Zieliński, M.; Zoń, N.

2014-11-01

Currently inorganic scintillator detectors are used in all commercial Time of Flight Positron Emission Tomograph (TOF-PET) devices. The J-PET collaboration investigates a possibility of construction of a PET scanner from plastic scintillators which would allow for single bed imaging of the whole human body. This paper describes a novel method of hit-position reconstruction based on sampled signals and an example of an application of the method for a single module with a 30 cm long plastic strip, read out on both ends by Hamamatsu R4998 photomultipliers. The sampling scheme to generate a vector with samples of a PET event waveform with respect to four user-defined amplitudes is introduced. The experimental setup provides irradiation of a chosen position in the plastic scintillator strip with an annihilation gamma quanta of energy 511 keV. The statistical test for a multivariate normal (MVN) distribution of measured vectors at a given position is developed, and it is shown that signals sampled at four thresholds in a voltage domain are approximately normally distributed variables. With the presented method of a vector analysis made out of waveform samples acquired with four thresholds, we obtain a spatial resolution of about 1 cm and a timing resolution of about 80 ps (σ).
Fat distribution in children and adolescents with myelomeningocele.

PubMed

Mueske, Nicole M; Ryan, Deirdre D; Van Speybroeck, Alexander L; Chan, Linda S; Wren, Tishya A L

2015-03-01

To evaluate fat distribution in children and adolescents with myelomeningocele using dual-energy X-ray absorptiometry (DXA). Cross-sectional DXA measurements of the percentage of fat in the trunk, arms, legs, and whole body were compared between 82 children with myelomeningocele (45 males, 37 females; mean age 9y 8mo, SD 2y 7mo; 22 sacral, 13 low lumbar, 47 mid lumbar and above) and 119 comparison children (65 males, 54 females; mean age 10y 4mo, SD 2y 4mo). Differences in fat distribution between groups were evaluated using univariate and multivariate analyses. Children with myelomeningocele had higher total body fat (34% vs 31%, p=0.02) and leg fat (42% vs 35%, p<0.001) than comparison children, but no differences in trunk or arm fat after adjustment for anthropometric measures. Children with myelomeningocele have higher than normal total body and leg fat, but only children with higher level lesions have increased trunk fat, which may be caused by greater obesity in this group. Quantifying segmental fat distribution may aid in better assessment of excess weight and, potentially, the associated health risks. © 2014 Mac Keith Press.
Fat distribution in children and adolescents with myelomeningocele

PubMed Central

Mueske, Nicole M; Ryan, Deirdre D; Van Speybroeck, Alexander L; Chan, Linda S; Al Wren, Tishya

2014-01-01

AIM To evaluate quantitatively fat distribution in children and adolescents with myelomeningocele using dual-energy X-ray absorptiometry (DXA). METHOD Cross-sectional DXA measurements of the percentage of fat in the trunk, arms, legs, and whole body were compared between 82 children with myelomeningocele (45 males, 37 females; mean age 9y 8mo, SD 2y 7mo; 22 sacral, 13 low lumbar, 47 mid lumbar and above) and 119 comparison children (65 males, 54 females; mean age 10y 4mo, SD 2y 4mo). Differences in fat distribution between groups were evaluated using univariate and multivariate analyses. RESULTS Children with myelomeningocele had higher total body fat (34% vs 31%, p=0.02) and leg fat (42% vs 35%, p<0.001than comparison children, but no differences in trunk or arm fat after adjustment for anthropometric measures. INTERPRETATION Children with myelomeningocele have higher than normal total body and leg fat, but only children with higher level lesions have increased trunk fat, which may be caused by greater obesity in this group. Quantifying segmental fat distribution may aid in better assessment of excess weight and, potentially, the associated health risks. PMID:25251828
Posterior propriety for hierarchical models with log-likelihoods that have norm bounds

DOE PAGES

Michalak, Sarah E.; Morris, Carl N.

2015-07-17

Statisticians often use improper priors to express ignorance or to provide good frequency properties, requiring that posterior propriety be verified. Our paper addresses generalized linear mixed models, GLMMs, when Level I parameters have Normal distributions, with many commonly-used hyperpriors. It provides easy-to-verify sufficient posterior propriety conditions based on dimensions, matrix ranks, and exponentiated norm bounds, ENBs, for the Level I likelihood. Since many familiar likelihoods have ENBs, which is often verifiable via log-concavity and MLE finiteness, our novel use of ENBs permits unification of posterior propriety results and posterior MGF/moment results for many useful Level I distributions, including those commonlymore » used with multilevel generalized linear models, e.g., GLMMs and hierarchical generalized linear models, HGLMs. Furthermore, those who need to verify existence of posterior distributions or of posterior MGFs/moments for a multilevel generalized linear model given a proper or improper multivariate F prior as in Section 1 should find the required results in Sections 1 and 2 and Theorem 3 (GLMMs), Theorem 4 (HGLMs), or Theorem 5 (posterior MGFs/moments).« less

A preliminary analysis of quantifying computer security vulnerability data in "the wild"

NASA Astrophysics Data System (ADS)

Farris, Katheryn A.; McNamara, Sean R.; Goldstein, Adam; Cybenko, George

2016-05-01

A system of computers, networks and software has some level of vulnerability exposure that puts it at risk to criminal hackers. Presently, most vulnerability research uses data from software vendors, and the National Vulnerability Database (NVD). We propose an alternative path forward through grounding our analysis in data from the operational information security community, i.e. vulnerability data from "the wild". In this paper, we propose a vulnerability data parsing algorithm and an in-depth univariate and multivariate analysis of the vulnerability arrival and deletion process (also referred to as the vulnerability birth-death process). We find that vulnerability arrivals are best characterized by the log-normal distribution and vulnerability deletions are best characterized by the exponential distribution. These distributions can serve as prior probabilities for future Bayesian analysis. We also find that over 22% of the deleted vulnerability data have a rate of zero, and that the arrival vulnerability data is always greater than zero. Finally, we quantify and visualize the dependencies between vulnerability arrivals and deletions through a bivariate scatterplot and statistical observations.
Multivariate classification of infrared spectra of cell and tissue samples

DOEpatents

Haaland, David M.; Jones, Howland D. T.; Thomas, Edward V.

1997-01-01

Multivariate classification techniques are applied to spectra from cell and tissue samples irradiated with infrared radiation to determine if the samples are normal or abnormal (cancerous). Mid and near infrared radiation can be used for in vivo and in vitro classifications using at least different wavelengths.
Local polynomial estimation of heteroscedasticity in a multivariate linear regression model and its applications in economics.

PubMed

Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan

2012-01-01

Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
A new subgrid-scale representation of hydrometeor fields using a multivariate PDF

DOE PAGES

Griffin, Brian M.; Larson, Vincent E.

2016-06-03

The subgrid-scale representation of hydrometeor fields is important for calculating microphysical process rates. In order to represent subgrid-scale variability, the Cloud Layers Unified By Binormals (CLUBB) parameterization uses a multivariate probability density function (PDF). In addition to vertical velocity, temperature, and moisture fields, the PDF includes hydrometeor fields. Previously, hydrometeor fields were assumed to follow a multivariate single lognormal distribution. Now, in order to better represent the distribution of hydrometeors, two new multivariate PDFs are formulated and introduced.The new PDFs represent hydrometeors using either a delta-lognormal or a delta-double-lognormal shape. The two new PDF distributions, plus the previous single lognormalmore » shape, are compared to histograms of data taken from large-eddy simulations (LESs) of a precipitating cumulus case, a drizzling stratocumulus case, and a deep convective case. In conclusion, the warm microphysical process rates produced by the different hydrometeor PDFs are compared to the same process rates produced by the LES.« less
Esophageal cancer detection based on tissue surface-enhanced Raman spectroscopy and multivariate analysis

NASA Astrophysics Data System (ADS)

Feng, Shangyuan; Lin, Juqiang; Huang, Zufang; Chen, Guannan; Chen, Weisheng; Wang, Yue; Chen, Rong; Zeng, Haishan

2013-01-01

The capability of using silver nanoparticle based near-infrared surface enhanced Raman scattering (SERS) spectroscopy combined with principal component analysis (PCA) and linear discriminate analysis (LDA) to differentiate esophageal cancer tissue from normal tissue was presented. Significant differences in Raman intensities of prominent SERS bands were observed between normal and cancer tissues. PCA-LDA multivariate analysis of the measured tissue SERS spectra achieved diagnostic sensitivity of 90.9% and specificity of 97.8%. This exploratory study demonstrated great potential for developing label-free tissue SERS analysis into a clinical tool for esophageal cancer detection.
Analyzing Multivariate Repeated Measures Designs: A Comparison of Two Approximate Degrees of Freedom Procedures

ERIC Educational Resources Information Center

Lix, Lisa M.; Algina, James; Keselman, H. J.

2003-01-01

The approximate degrees of freedom Welch-James (WJ) and Brown-Forsythe (BF) procedures for testing within-subjects effects in multivariate groups by trials repeated measures designs were investigated under departures from covariance homogeneity and normality. Empirical Type I error and power rates were obtained for least-squares estimators and…
The prognostic impact of germline 46/1 haplotype of Janus kinase 2 in cytogenetically normal acute myeloid leukemia

PubMed Central

Nahajevszky, Sarolta; Andrikovics, Hajnalka; Batai, Arpad; Adam, Emma; Bors, Andras; Csomor, Judit; Gopcsa, Laszlo; Koszarska, Magdalena; Kozma, Andras; Lovas, Nora; Lueff, Sandor; Matrai, Zoltan; Meggyesi, Nora; Sinko, Janos; Sipos, Andrea; Varkonyi, Andrea; Fekete, Sandor; Tordai, Attila; Masszi, Tamas

2011-01-01

Background Prognostic risk stratification according to acquired or inherited genetic alterations has received increasing attention in acute myeloid leukemia in recent years. A germline Janus kinase 2 haplotype designated as the 46/1 haplotype has been reported to be associated with an inherited predisposition to myeloproliferative neoplasms, and also to acute myeloid leukemia with normal karyotype. The aim of this study was to assess the prognostic impact of the 46/1 haplotype on disease characteristics and treatment outcome in acute myeloid leukemia. Design and Methods Janus kinase 2 rs12343867 single nucleotide polymorphism tagging the 46/1 haplotype was genotyped by LightCycler technology applying melting curve analysis with the hybridization probe detection format in 176 patients with acute myeloid leukemia under 60 years diagnosed consecutively and treated with curative intent. Results The morphological subtype of acute myeloid leukemia with maturation was less frequent among 46/1 carriers than among non-carriers (5.6% versus 17.2%, P=0.018, cytogenetically normal subgroup: 4.3% versus 20.6%, P=0.031), while the morphological distribution shifted towards the myelomonocytoid form in 46/1 haplotype carriers (28.1% versus 14.9%, P=0.044, cytogenetically normal subgroup: 34.0% versus 11.8%, P=0.035). In cytogenetically normal cases of acute myeloid leukemia, the 46/1 carriers had a considerably lower remission rate (78.7% versus 94.1%, P=0.064) and more deaths in remission or in aplasia caused by infections (46.8% versus 23.5%, P=0.038), resulting in the 46/1 carriers having shorter disease-free survival and overall survival compared to the 46/1 non-carriers. In multivariate analysis, the 46/1 haplotype was an independent adverse prognostic factor for disease-free survival (P=0.024) and overall survival (P=0.024) in patients with a normal karyotype. Janus kinase 2 46/1 haplotype had no impact on prognosis in the subgroup with abnormal karyotype. Conclusions Janus kinase 2 46/1 haplotype influences morphological distribution, increasing the predisposition towards an acute myelomonocytoid form. It may be a novel, independent unfavorable risk factor in acute myeloid leukemia with a normal karyotype. PMID:21791467
Departure from Normality in Multivariate Normative Comparison: The Cramer Alternative for Hotelling's "T[squared]"

ERIC Educational Resources Information Center

Grasman, Raoul P. P. P.; Huizenga, Hilde M.; Geurts, Hilde M.

2010-01-01

Crawford and Howell (1998) have pointed out that the common practice of z-score inference on cognitive disability is inappropriate if a patient's performance on a task is compared with relatively few typical control individuals. Appropriate univariate and multivariate statistical tests have been proposed for these studies, but these are only valid…
About normal distribution on SO(3) group in texture analysis

NASA Astrophysics Data System (ADS)

Savyolova, T. I.; Filatov, S. V.

2017-12-01

This article studies and compares different normal distributions (NDs) on SO(3) group, which are used in texture analysis. Those NDs are: Fisher normal distribution (FND), Bunge normal distribution (BND), central normal distribution (CND) and wrapped normal distribution (WND). All of the previously mentioned NDs are central functions on SO(3) group. CND is a subcase for normal CLT-motivated distributions on SO(3) (CLT here is Parthasarathy’s central limit theorem). WND is motivated by CLT in R 3 and mapped to SO(3) group. A Monte Carlo method for modeling normally distributed values was studied for both CND and WND. All of the NDs mentioned above are used for modeling different components of crystallites orientation distribution function in texture analysis.
Implementation of the Iterative Proportion Fitting Algorithm for Geostatistical Facies Modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li Yupeng, E-mail: yupeng@ualberta.ca; Deutsch, Clayton V.

2012-06-15

In geostatistics, most stochastic algorithm for simulation of categorical variables such as facies or rock types require a conditional probability distribution. The multivariate probability distribution of all the grouped locations including the unsampled location permits calculation of the conditional probability directly based on its definition. In this article, the iterative proportion fitting (IPF) algorithm is implemented to infer this multivariate probability. Using the IPF algorithm, the multivariate probability is obtained by iterative modification to an initial estimated multivariate probability using lower order bivariate probabilities as constraints. The imposed bivariate marginal probabilities are inferred from profiles along drill holes or wells.more » In the IPF process, a sparse matrix is used to calculate the marginal probabilities from the multivariate probability, which makes the iterative fitting more tractable and practical. This algorithm can be extended to higher order marginal probability constraints as used in multiple point statistics. The theoretical framework is developed and illustrated with estimation and simulation example.« less
Influence of Time-Series Normalization, Number of Nodes, Connectivity and Graph Measure Selection on Seizure-Onset Zone Localization from Intracranial EEG.

PubMed

van Mierlo, Pieter; Lie, Octavian; Staljanssens, Willeke; Coito, Ana; Vulliémoz, Serge

2018-04-26

We investigated the influence of processing steps in the estimation of multivariate directed functional connectivity during seizures recorded with intracranial EEG (iEEG) on seizure-onset zone (SOZ) localization. We studied the effect of (i) the number of nodes, (ii) time-series normalization, (iii) the choice of multivariate time-varying connectivity measure: Adaptive Directed Transfer Function (ADTF) or Adaptive Partial Directed Coherence (APDC) and (iv) graph theory measure: outdegree or shortest path length. First, simulations were performed to quantify the influence of the various processing steps on the accuracy to localize the SOZ. Afterwards, the SOZ was estimated from a 113-electrodes iEEG seizure recording and compared with the resection that rendered the patient seizure-free. The simulations revealed that ADTF is preferred over APDC to localize the SOZ from ictal iEEG recordings. Normalizing the time series before analysis resulted in an increase of 25-35% of correctly localized SOZ, while adding more nodes to the connectivity analysis led to a moderate decrease of 10%, when comparing 128 with 32 input nodes. The real-seizure connectivity estimates localized the SOZ inside the resection area using the ADTF coupled to outdegree or shortest path length. Our study showed that normalizing the time-series is an important pre-processing step, while adding nodes to the analysis did only marginally affect the SOZ localization. The study shows that directed multivariate Granger-based connectivity analysis is feasible with many input nodes (> 100) and that normalization of the time-series before connectivity analysis is preferred.
Measuring firm size distribution with semi-nonparametric densities

NASA Astrophysics Data System (ADS)

Cortés, Lina M.; Mora-Valencia, Andrés; Perote, Javier

2017-11-01

In this article, we propose a new methodology based on a (log) semi-nonparametric (log-SNP) distribution that nests the lognormal and enables better fits in the upper tail of the distribution through the introduction of new parameters. We test the performance of the lognormal and log-SNP distributions capturing firm size, measured through a sample of US firms in 2004-2015. Taking different levels of aggregation by type of economic activity, our study shows that the log-SNP provides a better fit of the firm size distribution. We also formally introduce the multivariate log-SNP distribution, which encompasses the multivariate lognormal, to analyze the estimation of the joint distribution of the value of the firm's assets and sales. The results suggest that sales are a better firm size measure, as indicated by other studies in the literature.
The Relationship between Visual Impairment and Health-Related Quality of Life in Korean Adults: The Korea National Health and Nutrition Examination Survey (2008–2012)

PubMed Central

Park, Yuli; Shin, Jeong Ah; Yang, Suk Woo; Yim, Hyeon Woo; Kim, Hyun Seung; Park, Young-Hoon

2015-01-01

Introduction To evaluate health-related quality of life (HRQoL) in Korean adults with visual impairment(VI) using various measures based on a nationally distributed sample. Methods Using the Korea National Health and Nutrition Examination Survey (KNHANES, 2008–2012) data, we compared EuroQol five-dimensional questionnaire (EQ-5D) and EQ-visual analogue scale (VAS) scores after adjusting for socio-demographic and psychosocial factors as well as for comorbidities with VI. Logistic regressions were used to elucidate determinants for the lowest quintile HRQoL scales according to VI severity. Uncorrected visual acuity (VA) which implies vision of ordinary life was measured using an international standard vision chart based on Snellen scale. Results 28,825 participants (sum of weights; 37,562,376) were included in the analysis. The mean EQ-5D and EQ-VAS scores were significantly lower in the VI groups than in the normal vision (defined as VA 20/20-20/25) group based on the better or worse seeing eye (P<.0001 and P<.0001, respectively). Participants with moderate (VA 20/80-20/160) and severe VI (VA ≤20/200) had higher scores of multivariate-adjusted odd ratios (aORs) for the lowest quintile than did the normal vision group which was particularly evident in the results from EQ-5D, whereas the results of the mild VI (VA 20/32-20/63) group did not identify significant differences from the normal vision group independent of classification according to the better or the worse seeing eye. Conversely, EQ-VAS revealed significantly higher score of multivariate-aORs for the lowest quintile in participants with mild VI either for the better or worse seeing eye. Conclusions The severity of VI was definitely associated with impaired HRQoL compared with the normal vision population. The analyses presented here elicited even mild VI could potentially deteriorate the health-related quality of life (or subjective perception of health quality) and therefore, therapeutic approaches should also focus on the subjective perception and better management of health condition. PMID:26192763
Comparison of Two Stochastic Daily Rainfall Models and their Ability to Preserve Multi-year Rainfall Variability

NASA Astrophysics Data System (ADS)

Kamal Chowdhury, AFM; Lockart, Natalie; Willgoose, Garry; Kuczera, George; Kiem, Anthony; Parana Manage, Nadeeka

2016-04-01

Stochastic simulation of rainfall is often required in the simulation of streamflow and reservoir levels for water security assessment. As reservoir water levels generally vary on monthly to multi-year timescales, it is important that these rainfall series accurately simulate the multi-year variability. However, the underestimation of multi-year variability is a well-known issue in daily rainfall simulation. Focusing on this issue, we developed a hierarchical Markov Chain (MC) model in a traditional two-part MC-Gamma Distribution modelling structure, but with a new parameterization technique. We used two parameters of first-order MC process (transition probabilities of wet-to-wet and dry-to-dry days) to simulate the wet and dry days, and two parameters of Gamma distribution (mean and standard deviation of wet day rainfall) to simulate wet day rainfall depths. We found that use of deterministic Gamma parameter values results in underestimation of multi-year variability of rainfall depths. Therefore, we calculated the Gamma parameters for each month of each year from the observed data. Then, for each month, we fitted a multi-variate normal distribution to the calculated Gamma parameter values. In the model, we stochastically sampled these two Gamma parameters from the multi-variate normal distribution for each month of each year and used them to generate rainfall depth in wet days using the Gamma distribution. In another study, Mehrotra and Sharma (2007) proposed a semi-parametric Markov model. They also used a first-order MC process for rainfall occurrence simulation. But, the MC parameters were modified by using an additional factor to incorporate the multi-year variability. Generally, the additional factor is analytically derived from the rainfall over a pre-specified past periods (e.g. last 30, 180, or 360 days). They used a non-parametric kernel density process to simulate the wet day rainfall depths. In this study, we have compared the performance of our hierarchical MC model with the semi-parametric model in preserving rainfall variability in daily, monthly, and multi-year scales. To calibrate the parameters of both models and assess their ability to preserve observed statistics, we have used ground based data from 15 raingauge stations around Australia, which consist a wide range of climate zones including coastal, monsoonal, and arid climate characteristics. In preliminary results, both models show comparative performances in preserving the multi-year variability of rainfall depth and occurrence. However, the semi-parametric model shows a tendency of overestimating the mean rainfall depth, while our model shows a tendency of overestimating the number of wet days. We will discuss further the relative merits of the both models for hydrology simulation in the presentation.
The Myth of Optimality in Clinical Neuroscience.

PubMed

Holmes, Avram J; Patrick, Lauren M

2018-03-01

Clear evidence supports a dimensional view of psychiatric illness. Within this framework the expression of disorder-relevant phenotypes is often interpreted as a breakdown or departure from normal brain function. Conversely, health is reified, conceptualized as possessing a single ideal state. We challenge this concept here, arguing that there is no universally optimal profile of brain functioning. The evolutionary forces that shape our species select for a staggering diversity of human behaviors. To support our position we highlight pervasive population-level variability within large-scale functional networks and discrete circuits. We propose that, instead of examining behaviors in isolation, psychiatric illnesses can be best understood through the study of domains of functioning and associated multivariate patterns of variation across distributed brain systems. Copyright © 2018 Elsevier Ltd. All rights reserved.
A Comparison of the Bootstrap-F, Improved General Approximation, and Brown-Forsythe Multivariate Approaches in a Mixed Repeated Measures Design

ERIC Educational Resources Information Center

Seco, Guillermo Vallejo; Izquierdo, Marcelino Cuesta; Garcia, M. Paula Fernandez; Diez, F. Javier Herrero

2006-01-01

The authors compare the operating characteristics of the bootstrap-F approach, a direct extension of the work of Berkovits, Hancock, and Nevitt, with Huynh's improved general approximation (IGA) and the Brown-Forsythe (BF) multivariate approach in a mixed repeated measures design when normality and multisample sphericity assumptions do not hold.…
Measures of dependence for multivariate Lévy distributions

NASA Astrophysics Data System (ADS)

Boland, J.; Hurd, T. R.; Pivato, M.; Seco, L.

2001-02-01

Recent statistical analysis of a number of financial databases is summarized. Increasing agreement is found that logarithmic equity returns show a certain type of asymptotic behavior of the largest events, namely that the probability density functions have power law tails with an exponent α≈3.0. This behavior does not vary much over different stock exchanges or over time, despite large variations in trading environments. The present paper proposes a class of multivariate distributions which generalizes the observed qualities of univariate time series. A new consequence of the proposed class is the "spectral measure" which completely characterizes the multivariate dependences of the extreme tails of the distribution. This measure on the unit sphere in M-dimensions, in principle completely general, can be determined empirically by looking at extreme events. If it can be observed and determined, it will prove to be of importance for scenario generation in portfolio risk management.
The Structure of the Young Star Cluster NGC 6231. II. Structure, Formation, and Fate

NASA Astrophysics Data System (ADS)

Kuhn, Michael A.; Getman, Konstantin V.; Feigelson, Eric D.; Sills, Alison; Gromadzki, Mariusz; Medina, Nicolás; Borissova, Jordanka; Kurtev, Radostin

2017-12-01

The young cluster NGC 6231 (stellar ages ˜2-7 Myr) is observed shortly after star formation activity has ceased. Using the catalog of 2148 probable cluster members obtained from Chandra, VVV, and optical surveys (Paper I), we examine the cluster’s spatial structure and dynamical state. The spatial distribution of stars is remarkably well fit by an isothermal sphere with moderate elongation, while other commonly used models like Plummer spheres, multivariate normal distributions, or power-law models are poor fits. The cluster has a core radius of 1.2 ± 0.1 pc and a central density of ˜200 stars pc-3. The distribution of stars is mildly mass segregated. However, there is no radial stratification of the stars by age. Although most of the stars belong to a single cluster, a small subcluster of stars is found superimposed on the main cluster, and there are clumpy non-isotropic distributions of stars outside ˜4 core radii. When the size, mass, and age of NGC 6231 are compared to other young star clusters and subclusters in nearby active star-forming regions, it lies at the high-mass end of the distribution but along the same trend line. This could result from similar formation processes, possibly hierarchical cluster assembly. We argue that NGC 6231 has expanded from its initial size but that it remains gravitationally bound.
Post-processing of multi-model ensemble river discharge forecasts using censored EMOS

NASA Astrophysics Data System (ADS)

Hemri, Stephan; Lisniak, Dmytro; Klein, Bastian

2014-05-01

When forecasting water levels and river discharge, ensemble weather forecasts are used as meteorological input to hydrologic process models. As hydrologic models are imperfect and the input ensembles tend to be biased and underdispersed, the output ensemble forecasts for river runoff typically are biased and underdispersed, too. Thus, statistical post-processing is required in order to achieve calibrated and sharp predictions. Standard post-processing methods such as Ensemble Model Output Statistics (EMOS) that have their origins in meteorological forecasting are now increasingly being used in hydrologic applications. Here we consider two sub-catchments of River Rhine, for which the forecasting system of the Federal Institute of Hydrology (BfG) uses runoff data that are censored below predefined thresholds. To address this methodological challenge, we develop a censored EMOS method that is tailored to such data. The censored EMOS forecast distribution can be understood as a mixture of a point mass at the censoring threshold and a continuous part based on a truncated normal distribution. Parameter estimates of the censored EMOS model are obtained by minimizing the Continuous Ranked Probability Score (CRPS) over the training dataset. Model fitting on Box-Cox transformed data allows us to take account of the positive skewness of river discharge distributions. In order to achieve realistic forecast scenarios over an entire range of lead-times, there is a need for multivariate extensions. To this end, we smooth the marginal parameter estimates over lead-times. In order to obtain realistic scenarios of discharge evolution over time, the marginal distributions have to be linked with each other. To this end, the multivariate dependence structure can either be adopted from the raw ensemble like in Ensemble Copula Coupling (ECC), or be estimated from observations in a training period. The censored EMOS model has been applied to multi-model ensemble forecasts issued on a daily basis over a period of three years. For the two catchments considered, this resulted in well calibrated and sharp forecast distributions over all lead-times from 1 to 114 h. Training observations tended to be better indicators for the dependence structure than the raw ensemble.
Extensions to Multivariate Space Time Mixture Modeling of Small Area Cancer Data.

PubMed

Carroll, Rachel; Lawson, Andrew B; Faes, Christel; Kirby, Russell S; Aregay, Mehreteab; Watjou, Kevin

2017-05-09

Oral cavity and pharynx cancer, even when considered together, is a fairly rare disease. Implementation of multivariate modeling with lung and bronchus cancer, as well as melanoma cancer of the skin, could lead to better inference for oral cavity and pharynx cancer. The multivariate structure of these models is accomplished via the use of shared random effects, as well as other multivariate prior distributions. The results in this paper indicate that care should be taken when executing these types of models, and that multivariate mixture models may not always be the ideal option, depending on the data of interest.

EXTENDING MULTIVARIATE DISTANCE MATRIX REGRESSION WITH AN EFFECT SIZE MEASURE AND THE ASYMPTOTIC NULL DISTRIBUTION OF THE TEST STATISTIC

PubMed Central

McArtor, Daniel B.; Lubke, Gitta H.; Bergeman, C. S.

2017-01-01

Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains. PMID:27738957
Extending multivariate distance matrix regression with an effect size measure and the asymptotic null distribution of the test statistic.

PubMed

McArtor, Daniel B; Lubke, Gitta H; Bergeman, C S

2017-12-01

Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains.
Heterogeneity Coefficients for Mahalanobis' D as a Multivariate Effect Size.

PubMed

Del Giudice, Marco

2017-01-01

The Mahalanobis distance D is the multivariate generalization of Cohen's d and can be used as a standardized effect size for multivariate differences between groups. An important issue in the interpretation of D is heterogeneity, that is, the extent to which contributions to the overall effect size are concentrated in a small subset of variables rather than evenly distributed across the whole set. Here I present two heterogeneity coefficients for D based on the Gini coefficient, a well-known index of inequality among values of a distribution. I discuss the properties and limitations of the two coefficients and illustrate their use by reanalyzing some published findings from studies of gender differences.
Multivariate quantile mapping bias correction: an N-dimensional probability density function transform for climate model simulations of multiple variables

NASA Astrophysics Data System (ADS)

Cannon, Alex J.

2018-01-01

Most bias correction algorithms used in climatology, for example quantile mapping, are applied to univariate time series. They neglect the dependence between different variables. Those that are multivariate often correct only limited measures of joint dependence, such as Pearson or Spearman rank correlation. Here, an image processing technique designed to transfer colour information from one image to another—the N-dimensional probability density function transform—is adapted for use as a multivariate bias correction algorithm (MBCn) for climate model projections/predictions of multiple climate variables. MBCn is a multivariate generalization of quantile mapping that transfers all aspects of an observed continuous multivariate distribution to the corresponding multivariate distribution of variables from a climate model. When applied to climate model projections, changes in quantiles of each variable between the historical and projection period are also preserved. The MBCn algorithm is demonstrated on three case studies. First, the method is applied to an image processing example with characteristics that mimic a climate projection problem. Second, MBCn is used to correct a suite of 3-hourly surface meteorological variables from the Canadian Centre for Climate Modelling and Analysis Regional Climate Model (CanRCM4) across a North American domain. Components of the Canadian Forest Fire Weather Index (FWI) System, a complicated set of multivariate indices that characterizes the risk of wildfire, are then calculated and verified against observed values. Third, MBCn is used to correct biases in the spatial dependence structure of CanRCM4 precipitation fields. Results are compared against a univariate quantile mapping algorithm, which neglects the dependence between variables, and two multivariate bias correction algorithms, each of which corrects a different form of inter-variable correlation structure. MBCn outperforms these alternatives, often by a large margin, particularly for annual maxima of the FWI distribution and spatiotemporal autocorrelation of precipitation fields.
The return period analysis of natural disasters with statistical modeling of bivariate joint probability distribution.

PubMed

Li, Ning; Liu, Xueqin; Xie, Wei; Wu, Jidong; Zhang, Peng

2013-01-01

New features of natural disasters have been observed over the last several years. The factors that influence the disasters' formation mechanisms, regularity of occurrence and main characteristics have been revealed to be more complicated and diverse in nature than previously thought. As the uncertainty involved increases, the variables need to be examined further. This article discusses the importance and the shortage of multivariate analysis of natural disasters and presents a method to estimate the joint probability of the return periods and perform a risk analysis. Severe dust storms from 1990 to 2008 in Inner Mongolia were used as a case study to test this new methodology, as they are normal and recurring climatic phenomena on Earth. Based on the 79 investigated events and according to the dust storm definition with bivariate, the joint probability distribution of severe dust storms was established using the observed data of maximum wind speed and duration. The joint return periods of severe dust storms were calculated, and the relevant risk was analyzed according to the joint probability. The copula function is able to simulate severe dust storm disasters accurately. The joint return periods generated are closer to those observed in reality than the univariate return periods and thus have more value in severe dust storm disaster mitigation, strategy making, program design, and improvement of risk management. This research may prove useful in risk-based decision making. The exploration of multivariate analysis methods can also lay the foundation for further applications in natural disaster risk analysis. © 2012 Society for Risk Analysis.
Multivariate multiscale entropy of financial markets

NASA Astrophysics Data System (ADS)

Lu, Yunfan; Wang, Jun

2017-11-01

In current process of quantifying the dynamical properties of the complex phenomena in financial market system, the multivariate financial time series are widely concerned. In this work, considering the shortcomings and limitations of univariate multiscale entropy in analyzing the multivariate time series, the multivariate multiscale sample entropy (MMSE), which can evaluate the complexity in multiple data channels over different timescales, is applied to quantify the complexity of financial markets. Its effectiveness and advantages have been detected with numerical simulations with two well-known synthetic noise signals. For the first time, the complexity of four generated trivariate return series for each stock trading hour in China stock markets is quantified thanks to the interdisciplinary application of this method. We find that the complexity of trivariate return series in each hour show a significant decreasing trend with the stock trading time progressing. Further, the shuffled multivariate return series and the absolute multivariate return series are also analyzed. As another new attempt, quantifying the complexity of global stock markets (Asia, Europe and America) is carried out by analyzing the multivariate returns from them. Finally we utilize the multivariate multiscale entropy to assess the relative complexity of normalized multivariate return volatility series with different degrees.
Is Middle-Upper Arm Circumference "normally" distributed? Secondary data analysis of 852 nutrition surveys.

PubMed

Frison, Severine; Checchi, Francesco; Kerac, Marko; Nicholas, Jennifer

2016-01-01

Wasting is a major public health issue throughout the developing world. Out of the 6.9 million estimated deaths among children under five annually, over 800,000 deaths (11.6 %) are attributed to wasting. Wasting is quantified as low Weight-For-Height (WFH) and/or low Mid-Upper Arm Circumference (MUAC) (since 2005). Many statistical procedures are based on the assumption that the data used are normally distributed. Analyses have been conducted on the distribution of WFH but there are no equivalent studies on the distribution of MUAC. This secondary data analysis assesses the normality of the MUAC distributions of 852 nutrition cross-sectional survey datasets of children from 6 to 59 months old and examines different approaches to normalise "non-normal" distributions. The distribution of MUAC showed no departure from a normal distribution in 319 (37.7 %) distributions using the Shapiro-Wilk test. Out of the 533 surveys showing departure from a normal distribution, 183 (34.3 %) were skewed (D'Agostino test) and 196 (36.8 %) had a kurtosis different to the one observed in the normal distribution (Anscombe-Glynn test). Testing for normality can be sensitive to data quality, design effect and sample size. Out of the 533 surveys showing departure from a normal distribution, 294 (55.2 %) showed high digit preference, 164 (30.8 %) had a large design effect, and 204 (38.3 %) a large sample size. Spline and LOESS smoothing techniques were explored and both techniques work well. After Spline smoothing, 56.7 % of the MUAC distributions showing departure from normality were "normalised" and 59.7 % after LOESS. Box-Cox power transformation had similar results on distributions showing departure from normality with 57 % of distributions approximating "normal" after transformation. Applying Box-Cox transformation after Spline or Loess smoothing techniques increased that proportion to 82.4 and 82.7 % respectively. This suggests that statistical approaches relying on the normal distribution assumption can be successfully applied to MUAC. In light of this promising finding, further research is ongoing to evaluate the performance of a normal distribution based approach to estimating the prevalence of wasting using MUAC.
Evaluation of the microscopic distribution of florfenicol in feed pellets for salmon by Fourier Transform infrared imaging and multivariate analysis.

PubMed

Bastidas, Camila Y; von Plessing, Carlos; Troncoso, José; Del P Castillo, Rosario

2018-04-15

Fourier Transform infrared imaging and multivariate analysis were used to identify, at the microscopic level, the presence of florfenicol (FF), a heavily-used antibiotic in the salmon industry, supplied to fishes in feed pellets for the treatment of salmonid rickettsial septicemia (SRS). The FF distribution was evaluated using Principal Component Analysis (PCA) and Augmented Multivariate Curve Resolution with Alternating Least Squares (augmented MCR-ALS) on the spectra obtained from images with pixel sizes of 6.25 μm × 6.25 μm and 1.56 μm × 1.56 μm, in different zones of feed pellets. Since the concentration of the drug was 3.44 mg FF/g pellet, this is the first report showing the powerful ability of the used of spectroscopic techniques and multivariate analysis, especially the augmented MCR-ALS, to describe the FF distribution in both the surface and inner parts of feed pellets at low concentration, in a complex matrix and at the microscopic level. The results allow monitoring the incorporation of the drug into the feed pellets. Copyright © 2018 Elsevier B.V. All rights reserved.
Clustering of change patterns using Fourier coefficients.

PubMed

Kim, Jaehee; Kim, Haseong

2008-01-15

To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a time period because biologically related gene groups can share the same change patterns. Many clustering algorithms have been proposed to group observation data. However, because of the complexity of the underlying functions there have not been many studies on grouping data based on change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. The sample Fourier coefficients not only provide information about the underlying functions, but also reduce the dimension. In addition, as their limiting distribution is a multivariate normal, a model-based clustering method incorporating statistical properties would be appropriate. This work is aimed at discovering gene groups with similar change patterns that share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. The model-based method is advantageous over other methods in our proposed model because the sample Fourier coefficients asymptotically follow the multivariate normal distribution. Change patterns are automatically estimated with the Fourier representation in our model. Our model was tested in simulations and on real gene data sets. The simulation results showed that the model-based clustering method with the sample Fourier coefficients has a lower clustering error rate than K-means clustering. Even when the number of repeated time points was small, the same results were obtained. We also applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns. The R program is available upon the request.
Analysis of quantitative data obtained from toxicity studies showing non-normal distribution.

PubMed

Kobayashi, Katsumi

2005-05-01

The data obtained from toxicity studies are examined for homogeneity of variance, but, usually, they are not examined for normal distribution. In this study I examined the measured items of a carcinogenicity/chronic toxicity study with rats for both homogeneity of variance and normal distribution. It was observed that a lot of hematology and biochemistry items showed non-normal distribution. For testing normal distribution of the data obtained from toxicity studies, the data of the concurrent control group may be examined, and for the data that show a non-normal distribution, non-parametric tests with robustness may be applied.
On the efficacy of procedures to normalize Ex-Gaussian distributions.

PubMed

Marmolejo-Ramos, Fernando; Cousineau, Denis; Benites, Luis; Maehara, Rocío

2014-01-01

Reaction time (RT) is one of the most common types of measure used in experimental psychology. Its distribution is not normal (Gaussian) but resembles a convolution of normal and exponential distributions (Ex-Gaussian). One of the major assumptions in parametric tests (such as ANOVAs) is that variables are normally distributed. Hence, it is acknowledged by many that the normality assumption is not met. This paper presents different procedures to normalize data sampled from an Ex-Gaussian distribution in such a way that they are suitable for parametric tests based on the normality assumption. Using simulation studies, various outlier elimination and transformation procedures were tested against the level of normality they provide. The results suggest that the transformation methods are better than elimination methods in normalizing positively skewed data and the more skewed the distribution then the transformation methods are more effective in normalizing such data. Specifically, transformation with parameter lambda -1 leads to the best results.
Alkaline phosphatase normalization is a biomarker of improved survival in primary sclerosing cholangitis.

PubMed

Hilscher, Moira; Enders, Felicity B; Carey, Elizabeth J; Lindor, Keith D; Tabibian, James H

2016-01-01

Introduction. Recent studies suggest that serum alkaline phosphatase may represent a prognostic biomarker in patients with primary sclerosing cholangitis. However, this association remains poorly understood. Therefore, the aim of this study was to investigate the prognostic significance and clinical correlates of alkaline phosphatase normalization in primary sclerosing cholangitis. This was a retrospective cohort study of patients with a new diagnosis of primary sclerosing cholangitis made at an academic medical center. The primary endpoint was time to hepatobiliaryneoplasia, liver transplantation, or liver-related death. Secondary endpoints included occurrence of and time to alkaline phosphatase normalization. Patients who did and did not achieve normalization were compared with respect to clinical characteristics and endpoint-free survival, and the association between normalization and the primary endpoint was assessed with univariate and multivariate Cox proportional-hazards analyses. Eighty six patients were included in the study, with a total of 755 patient-years of follow-up. Thirty-eight patients (44%) experienced alkaline phosphatase normalization within 12 months of diagnosis. Alkaline phosphatase normalization was associated with longer primary endpoint-free survival (p = 0.0032) and decreased risk of requiring liver transplantation (p = 0.033). Persistent normalization was associated with even fewer adverse endpoints as well as longer survival. In multivariate analyses, alkaline phosphatase normalization (adjusted hazard ratio 0.21, p = 0.012) and baseline bilirubin (adjusted hazard ratio 4.87, p = 0.029) were the only significant predictors of primary endpoint-free survival. Alkaline phosphatase normalization, particularly if persistent, represents a robust biomarker of improved long-term survival and decreased risk of requiring liver transplantation in patients with primary sclerosing cholangitis.
Comparison of Two Procedures for Analyzing Small Sets of Repeated Measures Data

ERIC Educational Resources Information Center

Vallejo, Guillermo; Livacic-Rojas, Pablo

2005-01-01

This article compares two methods for analyzing small sets of repeated measures data under normal and non-normal heteroscedastic conditions: a mixed model approach with the Kenward-Roger correction and a multivariate extension of the modified Brown-Forsythe (BF) test. These procedures differ in their assumptions about the covariance structure of…
Modelling lifetime data with multivariate Tweedie distribution

NASA Astrophysics Data System (ADS)

Nor, Siti Rohani Mohd; Yusof, Fadhilah; Bahar, Arifah

2017-05-01

This study aims to measure the dependence between individual lifetimes by applying multivariate Tweedie distribution to the lifetime data. Dependence between lifetimes incorporated in the mortality model is a new form of idea that gives significant impact on the risk of the annuity portfolio which is actually against the idea of standard actuarial methods that assumes independent between lifetimes. Hence, this paper applies Tweedie family distribution to the portfolio of lifetimes to induce the dependence between lives. Tweedie distribution is chosen since it contains symmetric and non-symmetric, as well as light-tailed and heavy-tailed distributions. Parameter estimation is modified in order to fit the Tweedie distribution to the data. This procedure is developed by using method of moments. In addition, the comparison stage is made to check for the adequacy between the observed mortality and expected mortality. Finally, the importance of including systematic mortality risk in the model is justified by the Pearson's chi-squared test.
Modeling and Simulation of Upset-Inducing Disturbances for Digital Systems in an Electromagnetic Reverberation Chamber

NASA Technical Reports Server (NTRS)

Torres-Pomales, Wilfredo

2014-01-01

This report describes a modeling and simulation approach for disturbance patterns representative of the environment experienced by a digital system in an electromagnetic reverberation chamber. The disturbance is modeled by a multi-variate statistical distribution based on empirical observations. Extended versions of the Rejection Samping and Inverse Transform Sampling techniques are developed to generate multi-variate random samples of the disturbance. The results show that Inverse Transform Sampling returns samples with higher fidelity relative to the empirical distribution. This work is part of an ongoing effort to develop a resilience assessment methodology for complex safety-critical distributed systems.
Comparing interval estimates for small sample ordinal CFA models

PubMed Central

Natesan, Prathiba

2015-01-01

Robust maximum likelihood (RML) and asymptotically generalized least squares (AGLS) methods have been recommended for fitting ordinal structural equation models. Studies show that some of these methods underestimate standard errors. However, these studies have not investigated the coverage and bias of interval estimates. An estimate with a reasonable standard error could still be severely biased. This can only be known by systematically investigating the interval estimates. The present study compares Bayesian, RML, and AGLS interval estimates of factor correlations in ordinal confirmatory factor analysis models (CFA) for small sample data. Six sample sizes, 3 factor correlations, and 2 factor score distributions (multivariate normal and multivariate mildly skewed) were studied. Two Bayesian prior specifications, informative and relatively less informative were studied. Undercoverage of confidence intervals and underestimation of standard errors was common in non-Bayesian methods. Underestimated standard errors may lead to inflated Type-I error rates. Non-Bayesian intervals were more positive biased than negatively biased, that is, most intervals that did not contain the true value were greater than the true value. Some non-Bayesian methods had non-converging and inadmissible solutions for small samples and non-normal data. Bayesian empirical standard error estimates for informative and relatively less informative priors were closer to the average standard errors of the estimates. The coverage of Bayesian credibility intervals was closer to what was expected with overcoverage in a few cases. Although some Bayesian credibility intervals were wider, they reflected the nature of statistical uncertainty that comes with the data (e.g., small sample). Bayesian point estimates were also more accurate than non-Bayesian estimates. The results illustrate the importance of analyzing coverage and bias of interval estimates, and how ignoring interval estimates can be misleading. Therefore, editors and policymakers should continue to emphasize the inclusion of interval estimates in research. PMID:26579002
Comparing interval estimates for small sample ordinal CFA models.

PubMed

Natesan, Prathiba

2015-01-01

Robust maximum likelihood (RML) and asymptotically generalized least squares (AGLS) methods have been recommended for fitting ordinal structural equation models. Studies show that some of these methods underestimate standard errors. However, these studies have not investigated the coverage and bias of interval estimates. An estimate with a reasonable standard error could still be severely biased. This can only be known by systematically investigating the interval estimates. The present study compares Bayesian, RML, and AGLS interval estimates of factor correlations in ordinal confirmatory factor analysis models (CFA) for small sample data. Six sample sizes, 3 factor correlations, and 2 factor score distributions (multivariate normal and multivariate mildly skewed) were studied. Two Bayesian prior specifications, informative and relatively less informative were studied. Undercoverage of confidence intervals and underestimation of standard errors was common in non-Bayesian methods. Underestimated standard errors may lead to inflated Type-I error rates. Non-Bayesian intervals were more positive biased than negatively biased, that is, most intervals that did not contain the true value were greater than the true value. Some non-Bayesian methods had non-converging and inadmissible solutions for small samples and non-normal data. Bayesian empirical standard error estimates for informative and relatively less informative priors were closer to the average standard errors of the estimates. The coverage of Bayesian credibility intervals was closer to what was expected with overcoverage in a few cases. Although some Bayesian credibility intervals were wider, they reflected the nature of statistical uncertainty that comes with the data (e.g., small sample). Bayesian point estimates were also more accurate than non-Bayesian estimates. The results illustrate the importance of analyzing coverage and bias of interval estimates, and how ignoring interval estimates can be misleading. Therefore, editors and policymakers should continue to emphasize the inclusion of interval estimates in research.
Drunk driving detection based on classification of multivariate time series.

PubMed

Li, Zhenlong; Jin, Xue; Zhao, Xiaohua

2015-09-01

This paper addresses the problem of detecting drunk driving based on classification of multivariate time series. First, driving performance measures were collected from a test in a driving simulator located in the Traffic Research Center, Beijing University of Technology. Lateral position and steering angle were used to detect drunk driving. Second, multivariate time series analysis was performed to extract the features. A piecewise linear representation was used to represent multivariate time series. A bottom-up algorithm was then employed to separate multivariate time series. The slope and time interval of each segment were extracted as the features for classification. Third, a support vector machine classifier was used to classify driver's state into two classes (normal or drunk) according to the extracted features. The proposed approach achieved an accuracy of 80.0%. Drunk driving detection based on the analysis of multivariate time series is feasible and effective. The approach has implications for drunk driving detection. Copyright © 2015 Elsevier Ltd and National Safety Council. All rights reserved.
Advanced clinical interpretation of the Delis-Kaplan Executive Function System: multivariate base rates of low scores.

PubMed

Karr, Justin E; Garcia-Barrera, Mauricio A; Holdnack, James A; Iverson, Grant L

2018-01-01

Multivariate base rates allow for the simultaneous statistical interpretation of multiple test scores, quantifying the normal frequency of low scores on a test battery. This study provides multivariate base rates for the Delis-Kaplan Executive Function System (D-KEFS). The D-KEFS consists of 9 tests with 16 Total Achievement scores (i.e. primary indicators of executive function ability). Stratified by education and intelligence, multivariate base rates were derived for the full D-KEFS and an abbreviated four-test battery (i.e. Trail Making, Color-Word Interference, Verbal Fluency, and Tower Test) using the adult portion of the normative sample (ages 16-89). Multivariate base rates are provided for the full and four-test D-KEFS batteries, calculated using five low score cutoffs (i.e. ≤25th, 16th, 9th, 5th, and 2nd percentiles). Low scores occurred commonly among the D-KEFS normative sample, with 82.6 and 71.8% of participants obtaining at least one score ≤16th percentile for the full and four-test batteries, respectively. Intelligence and education were inversely related to low score frequency. The base rates provided herein allow clinicians to interpret multiple D-KEFS scores simultaneously for the full D-KEFS and an abbreviated battery of commonly administered tests. The use of these base rates will support clinicians when differentiating between normal variations in cognitive performance and true executive function deficits.
Towards exaggerated emphysema stereotypes

NASA Astrophysics Data System (ADS)

Chen, C.; Sørensen, L.; Lauze, F.; Igel, C.; Loog, M.; Feragen, A.; de Bruijne, M.; Nielsen, M.

2012-03-01

Classification is widely used in the context of medical image analysis and in order to illustrate the mechanism of a classifier, we introduce the notion of an exaggerated image stereotype based on training data and trained classifier. The stereotype of some image class of interest should emphasize/exaggerate the characteristic patterns in an image class and visualize the information the employed classifier relies on. This is useful for gaining insight into the classification and serves for comparison with the biological models of disease. In this work, we build exaggerated image stereotypes by optimizing an objective function which consists of a discriminative term based on the classification accuracy, and a generative term based on the class distributions. A gradient descent method based on iterated conditional modes (ICM) is employed for optimization. We use this idea with Fisher's linear discriminant rule and assume a multivariate normal distribution for samples within a class. The proposed framework is applied to computed tomography (CT) images of lung tissue with emphysema. The synthesized stereotypes illustrate the exaggerated patterns of lung tissue with emphysema, which is underpinned by three different quantitative evaluation methods.

Stewart analysis of apparently normal acid-base state in the critically ill.

PubMed

Moviat, Miriam; van den Boogaard, Mark; Intven, Femke; van der Voort, Peter; van der Hoeven, Hans; Pickkers, Peter

2013-12-01

This study aimed to describe Stewart parameters in critically ill patients with an apparently normal acid-base state and to determine the incidence of mixed metabolic acid-base disorders in these patients. We conducted a prospective, observational multicenter study of 312 consecutive Dutch intensive care unit patients with normal pH (7.35 ≤ pH ≤ 7.45) on days 3 to 5. Apparent (SIDa) and effective strong ion difference (SIDe) and strong ion gap (SIG) were calculated from 3 consecutive arterial blood samples. Multivariate linear regression analysis was performed to analyze factors potentially associated with levels of SIDa and SIG. A total of 137 patients (44%) were identified with an apparently normal acid-base state (normal pH and -2 < base excess < 2 and 35 < PaCO2 < 45 mm Hg). In this group, SIDa values were 36.6 ± 3.6 mEq/L, resulting from hyperchloremia (109 ± 4.6 mEq/L, sodium-chloride difference 30.0 ± 3.6 mEq/L); SIDe values were 33.5 ± 2.3 mEq/L, resulting from hypoalbuminemia (24.0 ± 6.2 g/L); and SIG values were 3.1 ± 3.1 mEq/L. During admission, base excess increased secondary to a decrease in SIG levels and, subsequently, an increase in SIDa levels. Levels of SIDa were associated with positive cation load, chloride load, and admission SIDa (multivariate r(2) = 0.40, P < .001). Levels of SIG were associated with kidney function, sepsis, and SIG levels at intensive care unit admission (multivariate r(2) = 0.28, P < .001). Intensive care unit patients with an apparently normal acid-base state have an underlying mixed metabolic acid-base disorder characterized by acidifying effects of a low SIDa (caused by hyperchloremia) and high SIG combined with the alkalinizing effect of hypoalbuminemia. © 2013.
Robust tests for multivariate factorial designs under heteroscedasticity.

PubMed

Vallejo, Guillermo; Ato, Manuel

2012-06-01

The question of how to analyze several multivariate normal mean vectors when normality and covariance homogeneity assumptions are violated is considered in this article. For the two-way MANOVA layout, we address this problem adapting results presented by Brunner, Dette, and Munk (BDM; 1997) and Vallejo and Ato (modified Brown-Forsythe [MBF]; 2006) in the context of univariate factorial and split-plot designs and a multivariate version of the linear model (MLM) to accommodate heterogeneous data. Furthermore, we compare these procedures with the Welch-James (WJ) approximate degrees of freedom multivariate statistics based on ordinary least squares via Monte Carlo simulation. Our numerical studies show that of the methods evaluated, only the modified versions of the BDM and MBF procedures were robust to violations of underlying assumptions. The MLM approach was only occasionally liberal, and then by only a small amount, whereas the WJ procedure was often liberal if the interactive effects were involved in the design, particularly when the number of dependent variables increased and total sample size was small. On the other hand, it was also found that the MLM procedure was uniformly more powerful than its most direct competitors. The overall success rate was 22.4% for the BDM, 36.3% for the MBF, and 45.0% for the MLM.
Multivariate analysis of cytokine profiles in pregnancy complications.

PubMed

Azizieh, Fawaz; Dingle, Kamaludin; Raghupathy, Raj; Johnson, Kjell; VanderPlas, Jacob; Ansari, Ali

2018-03-01

The immunoregulation to tolerate the semiallogeneic fetus during pregnancy includes a harmonious dynamic balance between anti- and pro-inflammatory cytokines. Several earlier studies reported significantly different levels and/or ratios of several cytokines in complicated pregnancy as compared to normal pregnancy. However, as cytokines operate in networks with potentially complex interactions, it is also interesting to compare groups with multi-cytokine data sets, with multivariate analysis. Such analysis will further examine how great the differences are, and which cytokines are more different than others. Various multivariate statistical tools, such as Cramer test, classification and regression trees, partial least squares regression figures, 2-dimensional Kolmogorov-Smirmov test, principal component analysis and gap statistic, were used to compare cytokine data of normal vs anomalous groups of different pregnancy complications. Multivariate analysis assisted in examining if the groups were different, how strongly they differed, in what ways they differed and further reported evidence for subgroups in 1 group (pregnancy-induced hypertension), possibly indicating multiple causes for the complication. This work contributes to a better understanding of cytokines interaction and may have important implications on targeting cytokine balance modulation or design of future medications or interventions that best direct management or prevention from an immunological approach. © 2018 The Authors. American Journal of Reproductive Immunology Published by John Wiley & Sons Ltd.
The discrete Laplace exponential family and estimation of Y-STR haplotype frequencies.

PubMed

Andersen, Mikkel Meyer; Eriksen, Poul Svante; Morling, Niels

2013-07-21

Estimating haplotype frequencies is important in e.g. forensic genetics, where the frequencies are needed to calculate the likelihood ratio for the evidential weight of a DNA profile found at a crime scene. Estimation is naturally based on a population model, motivating the investigation of the Fisher-Wright model of evolution for haploid lineage DNA markers. An exponential family (a class of probability distributions that is well understood in probability theory such that inference is easily made by using existing software) called the 'discrete Laplace distribution' is described. We illustrate how well the discrete Laplace distribution approximates a more complicated distribution that arises by investigating the well-known population genetic Fisher-Wright model of evolution by a single-step mutation process. It was shown how the discrete Laplace distribution can be used to estimate haplotype frequencies for haploid lineage DNA markers (such as Y-chromosomal short tandem repeats), which in turn can be used to assess the evidential weight of a DNA profile found at a crime scene. This was done by making inference in a mixture of multivariate, marginally independent, discrete Laplace distributions using the EM algorithm to estimate the probabilities of membership of a set of unobserved subpopulations. The discrete Laplace distribution can be used to estimate haplotype frequencies with lower prediction error than other existing estimators. Furthermore, the calculations could be performed on a normal computer. This method was implemented in the freely available open source software R that is supported on Linux, MacOS and MS Windows. Copyright © 2013 Elsevier Ltd. All rights reserved.
A comparison of confidence interval methods for the concordance correlation coefficient and intraclass correlation coefficient with small number of raters.

PubMed

Feng, Dai; Svetnik, Vladimir; Coimbra, Alexandre; Baumgartner, Richard

2014-01-01

The intraclass correlation coefficient (ICC) with fixed raters or, equivalently, the concordance correlation coefficient (CCC) for continuous outcomes is a widely accepted aggregate index of agreement in settings with small number of raters. Quantifying the precision of the CCC by constructing its confidence interval (CI) is important in early drug development applications, in particular in qualification of biomarker platforms. In recent years, there have been several new methods proposed for construction of CIs for the CCC, but their comprehensive comparison has not been attempted. The methods consisted of the delta method and jackknifing with and without Fisher's Z-transformation, respectively, and Bayesian methods with vague priors. In this study, we carried out a simulation study, with data simulated from multivariate normal as well as heavier tailed distribution (t-distribution with 5 degrees of freedom), to compare the state-of-the-art methods for assigning CI to the CCC. When the data are normally distributed, the jackknifing with Fisher's Z-transformation (JZ) tended to provide superior coverage and the difference between it and the closest competitor, the Bayesian method with the Jeffreys prior was in general minimal. For the nonnormal data, the jackknife methods, especially the JZ method, provided the coverage probabilities closest to the nominal in contrast to the others which yielded overly liberal coverage. Approaches based upon the delta method and Bayesian method with conjugate prior generally provided slightly narrower intervals and larger lower bounds than others, though this was offset by their poor coverage. Finally, we illustrated the utility of the CIs for the CCC in an example of a wake after sleep onset (WASO) biomarker, which is frequently used in clinical sleep studies of drugs for treatment of insomnia.
The PIT-trap-A "model-free" bootstrap procedure for inference about regression models with discrete, multivariate responses.

PubMed

Warton, David I; Thibaut, Loïc; Wang, Yi Alice

2017-01-01

Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.
The PIT-trap—A “model-free” bootstrap procedure for inference about regression models with discrete, multivariate responses

PubMed Central

Thibaut, Loïc; Wang, Yi Alice

2017-01-01

Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)—common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of “model-free bootstrap”, adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods. PMID:28738071
Meteor localization via statistical analysis of spatially temporal fluctuations in image sequences

NASA Astrophysics Data System (ADS)

Kukal, Jaromír.; Klimt, Martin; Šihlík, Jan; Fliegel, Karel

2015-09-01

Meteor detection is one of the most important procedures in astronomical imaging. Meteor path in Earth's atmosphere is traditionally reconstructed from double station video observation system generating 2D image sequences. However, the atmospheric turbulence and other factors cause spatially-temporal fluctuations of image background, which makes the localization of meteor path more difficult. Our approach is based on nonlinear preprocessing of image intensity using Box-Cox and logarithmic transform as its particular case. The transformed image sequences are then differentiated along discrete coordinates to obtain statistical description of sky background fluctuations, which can be modeled by multivariate normal distribution. After verification and hypothesis testing, we use the statistical model for outlier detection. Meanwhile the isolated outlier points are ignored, the compact cluster of outliers indicates the presence of meteoroids after ignition.
Estimation of value at risk and conditional value at risk using normal mixture distributions model

NASA Astrophysics Data System (ADS)

Kamaruzzaman, Zetty Ain; Isa, Zaidi

2013-04-01

Normal mixture distributions model has been successfully applied in financial time series analysis. In this paper, we estimate the return distribution, value at risk (VaR) and conditional value at risk (CVaR) for monthly and weekly rates of returns for FTSE Bursa Malaysia Kuala Lumpur Composite Index (FBMKLCI) from July 1990 until July 2010 using the two component univariate normal mixture distributions model. First, we present the application of normal mixture distributions model in empirical finance where we fit our real data. Second, we present the application of normal mixture distributions model in risk analysis where we apply the normal mixture distributions model to evaluate the value at risk (VaR) and conditional value at risk (CVaR) with model validation for both risk measures. The empirical results provide evidence that using the two components normal mixture distributions model can fit the data well and can perform better in estimating value at risk (VaR) and conditional value at risk (CVaR) where it can capture the stylized facts of non-normality and leptokurtosis in returns distribution.
Use of Raman microscopy and multivariate data analysis to observe the biomimetic growth of carbonated hydroxyapatite on bioactive glass.

PubMed

Seah, Regina K H; Garland, Marc; Loo, Joachim S C; Widjaja, Effendi

2009-02-15

In the present contribution, the biomimetic growth of carbonated hydroxyapatite (HA) on bioactive glass were investigated by Raman microscopy. Bioactive glass samples were immersed in simulated body fluid (SBF) buffered solution at pH 7.40 up to 17 days at 37 degrees C. Raman microscopy mapping was performed on the bioglass samples immersed in SBF solution for different periods of time. The collected data was then analyzed using the band-target entropy minimization technique to extract the observable pure component Raman spectral information. In this study, the pure component Raman spectra of the precursor amorphous calcium phosphate, transient octacalcium phosphate, and matured HA were all recovered. In addition, pure component Raman spectra of calcite, silica glass, and some organic impurities were also recovered. The resolved pure component spectra were fit to the normalized measured Raman data to provide the spatial distribution of these species on the sample surfaces. The current results show that Raman microscopy and multivariate data analysis provide a sensitive and accurate tool to characterize the surface morphology, as well as to give more specific information on the chemical species present and the phase transformation of phosphate species during the formation of HA on bioactive glass.
Evaluation of biomolecular distributions in rat brain tissues by means of ToF-SIMS using a continuous beam of Ar clusters.

PubMed

Nakano, Shusuke; Yokoyama, Yuta; Aoyagi, Satoka; Himi, Naoyuki; Fletcher, John S; Lockyer, Nicholas P; Henderson, Alex; Vickerman, John C

2016-06-08

Time-of-flight secondary ion mass spectrometry (ToF-SIMS) provides detailed chemical structure information and high spatial resolution images. Therefore, ToF-SIMS is useful for studying biological phenomena such as ischemia. In this study, in order to evaluate cerebral microinfarction, the distribution of biomolecules generated by ischemia was measured with ToF-SIMS. ToF-SIMS data sets were analyzed by means of multivariate analysis for interpreting complex samples containing unknown information and to obtain biomolecular mapping indicated by fragment ions from the target biomolecules. Using conventional ToF-SIMS (primary ion source: Bi cluster ion), it is difficult to detect secondary ions beyond approximately 1000 u. Moreover, the intensity of secondary ions related to biomolecules is not always high enough for imaging because of low concentration even if the masses are lower than 1000 u. However, for the observation of biomolecular distributions in tissues, it is important to detect low amounts of biological molecules from a particular area of tissue. Rat brain tissue samples were measured with ToF-SIMS (J105, Ionoptika, Ltd., Chandlers Ford, UK), using a continuous beam of Ar clusters as a primary ion source. ToF-SIMS with Ar clusters efficiently detects secondary ions related to biomolecules and larger molecules. Molecules detected by ToF-SIMS were examined by analyzing ToF-SIMS data using multivariate analysis. Microspheres (45 μm diameter) were injected into the rat unilateral internal carotid artery (MS rat) to cause cerebral microinfarction. The rat brain was sliced and then measured with ToF-SIMS. The brain samples of a normal rat and the MS rat were examined to find specific secondary ions related to important biomolecules, and then the difference between them was investigated. Finally, specific secondary ions were found around vessels incorporating microspheres in the MS rat. The results suggest that important biomolecules related to cerebral microinfarction can be detected by ToF-SIMS.
Fully probabilistic seismic source inversion - Part 2: Modelling errors and station covariances

NASA Astrophysics Data System (ADS)

Stähler, Simon C.; Sigloch, Karin

2016-11-01

Seismic source inversion, a central task in seismology, is concerned with the estimation of earthquake source parameters and their uncertainties. Estimating uncertainties is particularly challenging because source inversion is a non-linear problem. In a companion paper, Stähler and Sigloch (2014) developed a method of fully Bayesian inference for source parameters, based on measurements of waveform cross-correlation between broadband, teleseismic body-wave observations and their modelled counterparts. This approach yields not only depth and moment tensor estimates but also source time functions. A prerequisite for Bayesian inference is the proper characterisation of the noise afflicting the measurements, a problem we address here. We show that, for realistic broadband body-wave seismograms, the systematic error due to an incomplete physical model affects waveform misfits more strongly than random, ambient background noise. In this situation, the waveform cross-correlation coefficient CC, or rather its decorrelation D = 1 - CC, performs more robustly as a misfit criterion than ℓp norms, more commonly used as sample-by-sample measures of misfit based on distances between individual time samples. From a set of over 900 user-supervised, deterministic earthquake source solutions treated as a quality-controlled reference, we derive the noise distribution on signal decorrelation D = 1 - CC of the broadband seismogram fits between observed and modelled waveforms. The noise on D is found to approximately follow a log-normal distribution, a fortunate fact that readily accommodates the formulation of an empirical likelihood function for D for our multivariate problem. The first and second moments of this multivariate distribution are shown to depend mostly on the signal-to-noise ratio (SNR) of the CC measurements and on the back-azimuthal distances of seismic stations. By identifying and quantifying this likelihood function, we make D and thus waveform cross-correlation measurements usable for fully probabilistic sampling strategies, in source inversion and related applications such as seismic tomography.
On the efficacy of procedures to normalize Ex-Gaussian distributions

PubMed Central

Marmolejo-Ramos, Fernando; Cousineau, Denis; Benites, Luis; Maehara, Rocío

2015-01-01

Reaction time (RT) is one of the most common types of measure used in experimental psychology. Its distribution is not normal (Gaussian) but resembles a convolution of normal and exponential distributions (Ex-Gaussian). One of the major assumptions in parametric tests (such as ANOVAs) is that variables are normally distributed. Hence, it is acknowledged by many that the normality assumption is not met. This paper presents different procedures to normalize data sampled from an Ex-Gaussian distribution in such a way that they are suitable for parametric tests based on the normality assumption. Using simulation studies, various outlier elimination and transformation procedures were tested against the level of normality they provide. The results suggest that the transformation methods are better than elimination methods in normalizing positively skewed data and the more skewed the distribution then the transformation methods are more effective in normalizing such data. Specifically, transformation with parameter lambda -1 leads to the best results. PMID:25709588
Apparent Transition in the Human Height Distribution Caused by Age-Dependent Variation during Puberty Period

NASA Astrophysics Data System (ADS)

Iwata, Takaki; Yamazaki, Yoshihiro; Kuninaka, Hiroto

2013-08-01

In this study, we examine the validity of the transition of the human height distribution from the log-normal distribution to the normal distribution during puberty, as suggested in an earlier study [Kuninaka et al.: J. Phys. Soc. Jpn. 78 (2009) 125001]. Our data analysis reveals that, in late puberty, the variation in height decreases as children grow. Thus, the classification of a height dataset by age at this stage leads us to analyze a mixture of distributions with larger means and smaller variations. This mixture distribution has a negative skewness and is consequently closer to the normal distribution than to the log-normal distribution. The opposite case occurs in early puberty and the mixture distribution is positively skewed, which resembles the log-normal distribution rather than the normal distribution. Thus, this scenario mimics the transition during puberty. Additionally, our scenario is realized through a numerical simulation based on a statistical model. The present study does not support the transition suggested by the earlier study.
Understanding a Normal Distribution of Data.

PubMed

Maltenfort, Mitchell G

2015-12-01

Assuming data follow a normal distribution is essential for many common statistical tests. However, what are normal data and when can we assume that a data set follows this distribution? What can be done to analyze non-normal data?
Some Integrated Squared Error Procedures for Multivariate Normal Data,

DTIC Science & Technology

1986-01-01

a lnear regresmion or experimental design model). Our procedures have &lSO been usned wcelyOn non -linear models but we do not addres nan-lnear...of fit, outliers, influence functions, experimental design , cluster analysis, robustness 24L A =TO ACT (VCefme - pvre alli of magsy MW identif by...structured data such as multivariate experimental designs . Several illustrations are provided. * 0 %41 %-. 4.’. * " , -.--, ,. -,, ., -, ’v ’ , " ,,- ,, . -,-. . ., * . - tAma- t
The retest distribution of the visual field summary index mean deviation is close to normal.

PubMed

Anderson, Andrew J; Cheng, Allan C Y; Lau, Samantha; Le-Pham, Anne; Liu, Victor; Rahman, Farahnaz

2016-09-01

When modelling optimum strategies for how best to determine visual field progression in glaucoma, it is commonly assumed that the summary index mean deviation (MD) is normally distributed on repeated testing. Here we tested whether this assumption is correct. We obtained 42 reliable 24-2 Humphrey Field Analyzer SITA standard visual fields from one eye of each of five healthy young observers, with the first two fields excluded from analysis. Previous work has shown that although MD variability is higher in glaucoma, the shape of the MD distribution is similar to that found in normal visual fields. A Shapiro-Wilks test determined any deviation from normality. Kurtosis values for the distributions were also calculated. Data from each observer passed the Shapiro-Wilks normality test. Bootstrapped 95% confidence intervals for kurtosis encompassed the value for a normal distribution in four of five observers. When examined with quantile-quantile plots, distributions were close to normal and showed no consistent deviations across observers. The retest distribution of MD is not significantly different from normal in healthy observers, and so is likely also normally distributed - or nearly so - in those with glaucoma. Our results increase our confidence in the results of influential modelling studies where a normal distribution for MD was assumed. © 2016 The Authors Ophthalmic & Physiological Optics © 2016 The College of Optometrists.
Observed, unknown distributions of clinical chemical quantities should be considered to be log-normal: a proposal.

PubMed

Haeckel, Rainer; Wosniok, Werner

2010-10-01

The distribution of many quantities in laboratory medicine are considered to be Gaussian if they are symmetric, although, theoretically, a Gaussian distribution is not plausible for quantities that can attain only non-negative values. If a distribution is skewed, further specification of the type is required, which may be difficult to provide. Skewed (non-Gaussian) distributions found in clinical chemistry usually show only moderately large positive skewness (e.g., log-normal- and χ(2) distribution). The degree of skewness depends on the magnitude of the empirical biological variation (CV(e)), as demonstrated using the log-normal distribution. A Gaussian distribution with a small CV(e) (e.g., for plasma sodium) is very similar to a log-normal distribution with the same CV(e). In contrast, a relatively large CV(e) (e.g., plasma aspartate aminotransferase) leads to distinct differences between a Gaussian and a log-normal distribution. If the type of an empirical distribution is unknown, it is proposed that a log-normal distribution be assumed in such cases. This avoids distributional assumptions that are not plausible and does not contradict the observation that distributions with small biological variation look very similar to a Gaussian distribution.
Plasma Electrolyte Distributions in Humans-Normal or Skewed?

PubMed

Feldman, Mark; Dickson, Beverly

2017-11-01

It is widely believed that plasma electrolyte levels are normally distributed. Statistical tests and calculations using plasma electrolyte data are often reported based on this assumption of normality. Examples include t tests, analysis of variance, correlations and confidence intervals. The purpose of our study was to determine whether plasma sodium (Na + ), potassium (K + ), chloride (Cl - ) and bicarbonate [Formula: see text] distributions are indeed normally distributed. We analyzed plasma electrolyte data from 237 consecutive adults (137 women and 100 men) who had normal results on a standard basic metabolic panel which included plasma electrolyte measurements. The skewness of each distribution (as a measure of its asymmetry) was compared to the zero skewness of a normal (Gaussian) distribution. The plasma Na + distribution was skewed slightly to the right, but the skew was not significantly different from zero skew. The plasma Cl - distribution was skewed slightly to the left, but again the skew was not significantly different from zero skew. On the contrary, both the plasma K + and [Formula: see text] distributions were significantly skewed to the right (P < 0.01 zero skew). There was also a suggestion from examining frequency distribution curves that K + and [Formula: see text] distributions were bimodal. In adults with a normal basic metabolic panel, plasma potassium and bicarbonate levels are not normally distributed and may be bimodal. Thus, statistical methods to evaluate these 2 plasma electrolytes should be nonparametric tests and not parametric ones that require a normal distribution. Copyright © 2017 Southern Society for Clinical Investigation. Published by Elsevier Inc. All rights reserved.
Analytical probabilistic proton dose calculation and range uncertainties

NASA Astrophysics Data System (ADS)

Bangert, M.; Hennig, P.; Oelfke, U.

2014-03-01

We introduce the concept of analytical probabilistic modeling (APM) to calculate the mean and the standard deviation of intensity-modulated proton dose distributions under the influence of range uncertainties in closed form. For APM, range uncertainties are modeled with a multivariate Normal distribution p(z) over the radiological depths z. A pencil beam algorithm that parameterizes the proton depth dose d(z) with a weighted superposition of ten Gaussians is used. Hence, the integrals ∫ dz p(z) d(z) and ∫ dz p(z) d(z)2 required for the calculation of the expected value and standard deviation of the dose remain analytically tractable and can be efficiently evaluated. The means μk, widths δk, and weights ωk of the Gaussian components parameterizing the depth dose curves are found with least squares fits for all available proton ranges. We observe less than 0.3% average deviation of the Gaussian parameterizations from the original proton depth dose curves. Consequently, APM yields high accuracy estimates for the expected value and standard deviation of intensity-modulated proton dose distributions for two dimensional test cases. APM can accommodate arbitrary correlation models and account for the different nature of random and systematic errors in fractionated radiation therapy. Beneficial applications of APM in robust planning are feasible.

Bayesian Estimation of Multivariate Latent Regression Models: Gauss versus Laplace

ERIC Educational Resources Information Center

Culpepper, Steven Andrew; Park, Trevor

2017-01-01

A latent multivariate regression model is developed that employs a generalized asymmetric Laplace (GAL) prior distribution for regression coefficients. The model is designed for high-dimensional applications where an approximate sparsity condition is satisfied, such that many regression coefficients are near zero after accounting for all the model…
Evaluation of Meterorite Amono Acid Analysis Data Using Multivariate Techniques

NASA Technical Reports Server (NTRS)

McDonald, G.; Storrie-Lombardi, M.; Nealson, K.

1999-01-01

The amino acid distributions in the Murchison carbonaceous chondrite, Mars meteorite ALH84001, and ice from the Allan Hills region of Antarctica are shown, using a multivariate technique known as Principal Component Analysis (PCA), to be statistically distinct from the average amino acid compostion of 101 terrestrial protein superfamilies.
Novel adipokines WISP1 and betatrophin in PCOS: relationship to AMH levels, atherogenic and metabolic profile.

PubMed

Sahin Ersoy, Gulcin; Altun Ensari, Tugba; Vatansever, Dogan; Emirdar, Volkan; Cevik, Ozge

2017-02-01

To determine the levels of WISP1 and betatrophin in normal weight and obese women with polycystic ovary syndrome (PCOS) and to assess their relationship with anti-Müllerian hormone (AMH) levels, atherogenic profile and metabolic parameters Methods: In this prospective cross-sectional study, the study group was composed of 49 normal weighed and 34 obese women with PCOS diagnosed based on the Rotterdam criteria; 36 normal weight and 26 obese age matched non-hyperandrogenemic women with regular menstrual cycle. Serum WISP1, betatrophin, homeostasis model assessment of insulin resistance (HOMA-IR) and AMH levels were evaluated. Univariate and multivariate analyses were performed between betatrophin, WISP1 levels and AMH levels, metabolic and atherogenic parameters. Serum WISP1 and betatrophin values were elevated in the PCOS group than in the control group. Moreover, serum WISP1 and betatrophin levels were higher in the obese PCOS subgroup than in normal weight and obese control subgroups. Multivariate analyses revealed that Body mass index, HOMA-IR, AMH independently and positively predicted WISP1 levels. Serum betatrophin level variability was explained by homocysteine, HOMA-IR and androstenedione levels. WISP1 and betatrophin may play a key role on the pathogenesis of PCOS.
Near-infrared confocal micro-Raman spectroscopy combined with PCA-LDA multivariate analysis for detection of esophageal cancer

NASA Astrophysics Data System (ADS)

Chen, Long; Wang, Yue; Liu, Nenrong; Lin, Duo; Weng, Cuncheng; Zhang, Jixue; Zhu, Lihuan; Chen, Weisheng; Chen, Rong; Feng, Shangyuan

2013-06-01

The diagnostic capability of using tissue intrinsic micro-Raman signals to obtain biochemical information from human esophageal tissue is presented in this paper. Near-infrared micro-Raman spectroscopy combined with multivariate analysis was applied for discrimination of esophageal cancer tissue from normal tissue samples. Micro-Raman spectroscopy measurements were performed on 54 esophageal cancer tissues and 55 normal tissues in the 400-1750 cm-1 range. The mean Raman spectra showed significant differences between the two groups. Tentative assignments of the Raman bands in the measured tissue spectra suggested some changes in protein structure, a decrease in the relative amount of lactose, and increases in the percentages of tryptophan, collagen and phenylalanine content in esophageal cancer tissue as compared to those of a normal subject. The diagnostic algorithms based on principal component analysis (PCA) and linear discriminate analysis (LDA) achieved a diagnostic sensitivity of 87.0% and specificity of 70.9% for separating cancer from normal esophageal tissue samples. The result demonstrated that near-infrared micro-Raman spectroscopy combined with PCA-LDA analysis could be an effective and sensitive tool for identification of esophageal cancer.
Probabilities of Dilating Vesicoureteral Reflux in Children with First Time Simple Febrile Urinary Tract Infection, and Normal Renal and Bladder Ultrasound.

PubMed

Rianthavorn, Pornpimol; Tangngamsakul, Onjira

2016-11-01

We evaluated risk factors and assessed predicted probabilities for grade III or higher vesicoureteral reflux (dilating reflux) in children with a first simple febrile urinary tract infection and normal renal and bladder ultrasound. Data for 167 children 2 to 72 months old with a first febrile urinary tract infection and normal ultrasound were compared between those who had dilating vesicoureteral reflux (12 patients, 7.2%) and those who did not. Exclusion criteria consisted of history of prenatal hydronephrosis or familial reflux and complicated urinary tract infection. The logistic regression model was used to identify independent variables associated with dilating reflux. Predicted probabilities for dilating reflux were assessed. Patient age and prevalence of nonEscherichia coli bacteria were greater in children who had dilating reflux compared to those who did not (p = 0.02 and p = 0.004, respectively). Gender distribution was similar between the 2 groups (p = 0.08). In multivariate analysis older age and nonE. coli bacteria independently predicted dilating reflux, with odds ratios of 1.04 (95% CI 1.01-1.07, p = 0.02) and 3.76 (95% CI 1.05-13.39, p = 0.04), respectively. The impact of nonE. coli bacteria on predicted probabilities of dilating reflux increased with patient age. We support the concept of selective voiding cystourethrogram in children with a first simple febrile urinary tract infection and normal ultrasound. Voiding cystourethrogram should be considered in children with late onset urinary tract infection due to nonE. coli bacteria since they are at risk for dilating reflux even if the ultrasound is normal. Copyright © 2016 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
Conducting Privacy-Preserving Multivariable Propensity Score Analysis When Patient Covariate Information Is Stored in Separate Locations.

PubMed

Bohn, Justin; Eddings, Wesley; Schneeweiss, Sebastian

2017-03-15

Distributed networks of health-care data sources are increasingly being utilized to conduct pharmacoepidemiologic database studies. Such networks may contain data that are not physically pooled but instead are distributed horizontally (separate patients within each data source) or vertically (separate measures within each data source) in order to preserve patient privacy. While multivariable methods for the analysis of horizontally distributed data are frequently employed, few practical approaches have been put forth to deal with vertically distributed health-care databases. In this paper, we propose 2 propensity score-based approaches to vertically distributed data analysis and test their performance using 5 example studies. We found that these approaches produced point estimates close to what could be achieved without partitioning. We further found a performance benefit (i.e., lower mean squared error) for sequentially passing a propensity score through each data domain (called the "sequential approach") as compared with fitting separate domain-specific propensity scores (called the "parallel approach"). These results were validated in a small simulation study. This proof-of-concept study suggests a new multivariable analysis approach to vertically distributed health-care databases that is practical, preserves patient privacy, and warrants further investigation for use in clinical research applications that rely on health-care databases. © The Author 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Applying the log-normal distribution to target detection

NASA Astrophysics Data System (ADS)

Holst, Gerald C.

1992-09-01

Holst and Pickard experimentally determined that MRT responses tend to follow a log-normal distribution. The log normal distribution appeared reasonable because nearly all visual psychological data is plotted on a logarithmic scale. It has the additional advantage that it is bounded to positive values; an important consideration since probability of detection is often plotted in linear coordinates. Review of published data suggests that the log-normal distribution may have universal applicability. Specifically, the log-normal distribution obtained from MRT tests appears to fit the target transfer function and the probability of detection of rectangular targets.
Insulin Sensitivity Measured With Euglycemic Clamp Is Independently Associated With Glomerular Filtration Rate in a Community-Based Cohort

PubMed Central

Nerpin, Elisabet; Risérus, Ulf; Ingelsson, Erik; Sundström, Johan; Jobs, Magnus; Larsson, Anders; Basu, Samar; Ärnlöv, Johan

2008-01-01

OBJECTIVE—To investigate the association between insulin sensitivity and glomerular filtration rate (GFR) in the community, with prespecified subgroup analyses in normoglycemic individuals with normal GFR. RESEARCH DESIGN AND METHODS—We investigated the cross-sectional association between insulin sensitivity (M/I, assessed using euglycemic clamp) and cystatin C–based GFR in a community-based cohort of elderly men (Uppsala Longitudinal Study of Adult Men [ULSAM], n = 1,070). We also investigated whether insulin sensitivity predicted the incidence of renal dysfunction at a follow-up examination after 7 years. RESULTS—Insulin sensitivity was directly related to GFR (multivariable-adjusted regression coefficient for 1-unit higher M/I 1.19 [95% CI 0.69–1.68]; P < 0.001) after adjusting for age, glucometabolic variables (fasting plasma glucose, fasting plasma insulin, and 2-h glucose after an oral glucose tolerance test), cardiovascular risk factors (hypertension, dyslipidemia, and smoking), and lifestyle factors (BMI, physical activity, and consumption of tea, coffee, and alcohol). The positive multivariable-adjusted association between insulin sensitivity and GFR also remained statistically significant in participants with normal fasting plasma glucose, normal glucose tolerance, and normal GFR (n = 443; P < 0.02). In longitudinal analyses, higher insulin sensitivity at baseline was associated with lower risk of impaired renal function (GFR <50 ml/min per 1.73 m2) during follow-up independently of glucometabolic variables (multivariable-adjusted odds ratio for 1-unit higher of M/I 0.58 [95% CI 0.40–0.84]; P < 0.004). CONCLUSIONS—Our data suggest that impaired insulin sensitivity may be involved in the development of renal dysfunction at an early stage, before the onset of diabetes or prediabetic glucose elevations. Further studies are needed in order to establish causality. PMID:18509205
Detection of cervical lesions by multivariate analysis of diffuse reflectance spectra: a clinical study.

PubMed

Prabitha, Vasumathi Gopala; Suchetha, Sambasivan; Jayanthi, Jayaraj Lalitha; Baiju, Kamalasanan Vijayakumary; Rema, Prabhakaran; Anuraj, Koyippurath; Mathews, Anita; Sebastian, Paul; Subhash, Narayanan

2016-01-01

Diffuse reflectance (DR) spectroscopy is a non-invasive, real-time, and cost-effective tool for early detection of malignant changes in squamous epithelial tissues. The present study aims to evaluate the diagnostic power of diffuse reflectance spectroscopy for non-invasive discrimination of cervical lesions in vivo. A clinical trial was carried out on 48 sites in 34 patients by recording DR spectra using a point-monitoring device with white light illumination. The acquired data were analyzed and classified using multivariate statistical analysis based on principal component analysis (PCA) and linear discriminant analysis (LDA). Diagnostic accuracies were validated using random number generators. The receiver operating characteristic (ROC) curves were plotted for evaluating the discriminating power of the proposed statistical technique. An algorithm was developed and used to classify non-diseased (normal) from diseased sites (abnormal) with a sensitivity of 72 % and specificity of 87 %. While low-grade squamous intraepithelial lesion (LSIL) could be discriminated from normal with a sensitivity of 56 % and specificity of 80 %, and high-grade squamous intraepithelial lesion (HSIL) from normal with a sensitivity of 89 % and specificity of 97 %, LSIL could be discriminated from HSIL with 100 % sensitivity and specificity. The areas under the ROC curves were 0.993 (95 % confidence interval (CI) 0.0 to 1) and 1 (95 % CI 1) for the discrimination of HSIL from normal and HSIL from LSIL, respectively. The results of the study show that DR spectroscopy could be used along with multivariate analytical techniques as a non-invasive technique to monitor cervical disease status in real time.
Aspirin and the Risk of Colorectal Cancer in Relation to the Expression of 15-Hydroxyprostaglandin Dehydrogenase (15-PGDH, HPGD)

PubMed Central

Fink, Stephen P.; Yamauchi, Mai; Nishihara, Reiko; Jung, Seungyoun; Kuchiba, Aya; Wu, Kana; Cho, Eunyoung; Giovannucci, Edward; Fuchs, Charles S.; Ogino, Shuji; Markowitz, Sanford D.; Chan, Andrew T.

2014-01-01

Aspirin use reduces the risk of colorectal neoplasia, at least in part, through inhibition of prostaglandin-endoperoxide synthase 2 (PTGS2, cyclooxygenase 2)-related pathways. Hydroxyprostaglandin dehydrogenase 15-(NAD) (15-PGDH, HPGD) is downregulated in colorectal cancers and functions as a metabolic antagonist of PTGS2. We hypothesized that the effect of aspirin may be antagonized by low 15-PGDH expression in the normal colon. In the Nurses’ Health Study and the Health Professionals Follow-up Study, we collected data on aspirin use and other risk factors every two years and followed up participants for diagnoses of colorectal cancer. Duplication-method Cox proportional, multivariable-adjusted, cause-specific hazards regression for competing risks data was used to compute hazard ratios (HRs) for incident colorectal cancer according to 15-PGDH mRNA expression level measured in normal mucosa from colorectal cancer resections. Among 127,865 participants, we documented 270 colorectal cancer cases that developed during 3,166,880 person-years of follow-up and from which we could assess 15-PGDH expression. Compared with nonuse, regular aspirin use was associated with lower risk of colorectal cancer that developed within a background of colonic mucosa with high 15-PGDH expression (multivariable HR=0.49; 95% CI, 0.34–0.71), but not with low 15-PGDH expression (multivariable HR=0.90; 95% CI, 0.63–1.27) (P for heterogeneity=0.018). Regular aspirin use was associated with lower incidence of colorectal cancers arising in association with high 15-PGDH expression, but not with low 15-PGDH expression in normal colon mucosa. This suggests that 15-PGDH expression level in normal colon mucosa may serve as a biomarker which may predict stronger benefit from aspirin chemoprevention. PMID:24760190
Reproductive Health Assessment of Female Elephants in North American Zoos and Association of Husbandry Practices with Reproductive Dysfunction in African Elephants (Loxodonta africana)

PubMed Central

Meehan, Cheryl L.; Hogan, Jennifer N.; Morfeld, Kari A.; Carlstead, Kathy

2016-01-01

As part of a multi-institutional study of zoo elephant welfare, we evaluated female elephants managed by zoos accredited by the Association of Zoos and Aquariums and applied epidemiological methods to determine what factors in the zoo environment are associated with reproductive problems, including ovarian acyclicity and hyperprolactinemia. Bi-weekly blood samples were collected from 95 African (Loxodonta africana) and 75 Asian (Elephas maximus) (8–55 years of age) elephants over a 12-month period for analysis of serum progestogens and prolactin. Females were categorized as normal cycling (regular 13- to 17-week cycles), irregular cycling (cycles longer or shorter than normal) or acyclic (baseline progestogens, <0.1 ng/ml throughout), and having Low/Normal (<14 or 18 ng/ml) or High (≥14 or 18 ng/ml) prolactin for Asian and African elephants, respectively. Rates of normal cycling, acyclicity and irregular cycling were 73.2, 22.5 and 4.2% for Asian, and 48.4, 37.9 and 13.7% for African elephants, respectively, all of which differed between species (P < 0.05). For African elephants, univariate assessment found that social isolation decreased and higher enrichment diversity increased the chance a female would cycle normally. The strongest multi-variable models included Age (positive) and Enrichment Diversity (negative) as important factors of acyclicity among African elephants. The Asian elephant data set was not robust enough to support multi-variable analyses of cyclicity status. Additionally, only 3% of Asian elephants were found to be hyperprolactinemic as compared to 28% of Africans, so predictive analyses of prolactin status were conducted on African elephants only. The strongest multi-variable model included Age (positive), Enrichment Diversity (negative), Alternate Feeding Methods (negative) and Social Group Contact (positive) as predictors of hyperprolactinemia. In summary, the incidence of ovarian cycle problems and hyperprolactinemia predominantly affects African elephants, and increases in social stability and feeding and enrichment diversity may have positive influences on hormone status. PMID:27416141
Reproductive Health Assessment of Female Elephants in North American Zoos and Association of Husbandry Practices with Reproductive Dysfunction in African Elephants (Loxodonta africana).

PubMed

Brown, Janine L; Paris, Stephen; Prado-Oviedo, Natalia A; Meehan, Cheryl L; Hogan, Jennifer N; Morfeld, Kari A; Carlstead, Kathy

2016-01-01

As part of a multi-institutional study of zoo elephant welfare, we evaluated female elephants managed by zoos accredited by the Association of Zoos and Aquariums and applied epidemiological methods to determine what factors in the zoo environment are associated with reproductive problems, including ovarian acyclicity and hyperprolactinemia. Bi-weekly blood samples were collected from 95 African (Loxodonta africana) and 75 Asian (Elephas maximus) (8-55 years of age) elephants over a 12-month period for analysis of serum progestogens and prolactin. Females were categorized as normal cycling (regular 13- to 17-week cycles), irregular cycling (cycles longer or shorter than normal) or acyclic (baseline progestogens, <0.1 ng/ml throughout), and having Low/Normal (<14 or 18 ng/ml) or High (≥14 or 18 ng/ml) prolactin for Asian and African elephants, respectively. Rates of normal cycling, acyclicity and irregular cycling were 73.2, 22.5 and 4.2% for Asian, and 48.4, 37.9 and 13.7% for African elephants, respectively, all of which differed between species (P < 0.05). For African elephants, univariate assessment found that social isolation decreased and higher enrichment diversity increased the chance a female would cycle normally. The strongest multi-variable models included Age (positive) and Enrichment Diversity (negative) as important factors of acyclicity among African elephants. The Asian elephant data set was not robust enough to support multi-variable analyses of cyclicity status. Additionally, only 3% of Asian elephants were found to be hyperprolactinemic as compared to 28% of Africans, so predictive analyses of prolactin status were conducted on African elephants only. The strongest multi-variable model included Age (positive), Enrichment Diversity (negative), Alternate Feeding Methods (negative) and Social Group Contact (positive) as predictors of hyperprolactinemia. In summary, the incidence of ovarian cycle problems and hyperprolactinemia predominantly affects African elephants, and increases in social stability and feeding and enrichment diversity may have positive influences on hormone status.
Creating Hierarchical Pores by Controlled Linker Thermolysis in Multivariate Metal-Organic Frameworks.

PubMed

Feng, Liang; Yuan, Shuai; Zhang, Liang-Liang; Tan, Kui; Li, Jia-Luo; Kirchon, Angelo; Liu, Ling-Mei; Zhang, Peng; Han, Yu; Chabal, Yves J; Zhou, Hong-Cai

2018-02-14

Sufficient pore size, appropriate stability, and hierarchical porosity are three prerequisites for open frameworks designed for drug delivery, enzyme immobilization, and catalysis involving large molecules. Herein, we report a powerful and general strategy, linker thermolysis, to construct ultrastable hierarchically porous metal-organic frameworks (HP-MOFs) with tunable pore size distribution. Linker instability, usually an undesirable trait of MOFs, was exploited to create mesopores by generating crystal defects throughout a microporous MOF crystal via thermolysis. The crystallinity and stability of HP-MOFs remain after thermolabile linkers are selectively removed from multivariate metal-organic frameworks (MTV-MOFs) through a decarboxylation process. A domain-based linker spatial distribution was found to be critical for creating hierarchical pores inside MTV-MOFs. Furthermore, linker thermolysis promotes the formation of ultrasmall metal oxide nanoparticles immobilized in an open framework that exhibits high catalytic activity for Lewis acid-catalyzed reactions. Most importantly, this work provides fresh insights into the connection between linker apportionment and vacancy distribution, which may shed light on probing the disordered linker apportionment in multivariate systems, a long-standing challenge in the study of MTV-MOFs.
Multivariate hydrological frequency analysis for extreme events using Archimedean copula. Case study: Lower Tunjuelo River basin (Colombia)

NASA Astrophysics Data System (ADS)

Gómez, Wilmar

2017-04-01

By analyzing the spatial and temporal variability of extreme precipitation events we can prevent or reduce the threat and risk. Many water resources projects require joint probability distributions of random variables such as precipitation intensity and duration, which can not be independent with each other. The problem of defining a probability model for observations of several dependent variables is greatly simplified by the joint distribution in terms of their marginal by taking copulas. This document presents a general framework set frequency analysis bivariate and multivariate using Archimedean copulas for extreme events of hydroclimatological nature such as severe storms. This analysis was conducted in the lower Tunjuelo River basin in Colombia for precipitation events. The results obtained show that for a joint study of the intensity-duration-frequency, IDF curves can be obtained through copulas and thus establish more accurate and reliable information from design storms and associated risks. It shows how the use of copulas greatly simplifies the study of multivariate distributions that introduce the concept of joint return period used to represent the needs of hydrological designs properly in frequency analysis.
A comparison of maximum likelihood and other estimators of eigenvalues from several correlated Monte Carlo samples

DOE Office of Scientific and Technical Information (OSTI.GOV)

Beer, M.

1980-12-01

The maximum likelihood method for the multivariate normal distribution is applied to the case of several individual eigenvalues. Correlated Monte Carlo estimates of the eigenvalue are assumed to follow this prescription and aspects of the assumption are examined. Monte Carlo cell calculations using the SAM-CE and VIM codes for the TRX-1 and TRX-2 benchmark reactors, and SAM-CE full core results are analyzed with this method. Variance reductions of a few percent to a factor of 2 are obtained from maximum likelihood estimation as compared with the simple average and the minimum variance individual eigenvalue. The numerical results verify that themore » use of sample variances and correlation coefficients in place of the corresponding population statistics still leads to nearly minimum variance estimation for a sufficient number of histories and aggregates.« less
Clustering of the human skeletal muscle fibers using linear programming and angular Hilbertian metrics.

PubMed

Neji, Radhouène; Besbes, Ahmed; Komodakis, Nikos; Deux, Jean-François; Maatouk, Mezri; Rahmouni, Alain; Bassez, Guillaume; Fleury, Gilles; Paragios, Nikos

2009-01-01

In this paper, we present a manifold clustering method fo the classification of fibers obtained from diffusion tensor images (DTI) of the human skeletal muscle. Using a linear programming formulation of prototype-based clustering, we propose a novel fiber classification algorithm over manifolds that circumvents the necessity to embed the data in low dimensional spaces and determines automatically the number of clusters. Furthermore, we propose the use of angular Hilbertian metrics between multivariate normal distributions to define a family of distances between tensors that we generalize to fibers. These metrics are used to approximate the geodesic distances over the fiber manifold. We also discuss the case where only geodesic distances to a reduced set of landmark fibers are available. The experimental validation of the method is done using a manually annotated significant dataset of DTI of the calf muscle for healthy and diseased subjects.
Accounting for parameter uncertainty in the definition of parametric distributions used to describe individual patient variation in health economic models.

PubMed

Degeling, Koen; IJzerman, Maarten J; Koopman, Miriam; Koffijberg, Hendrik

2017-12-15

Parametric distributions based on individual patient data can be used to represent both stochastic and parameter uncertainty. Although general guidance is available on how parameter uncertainty should be accounted for in probabilistic sensitivity analysis, there is no comprehensive guidance on reflecting parameter uncertainty in the (correlated) parameters of distributions used to represent stochastic uncertainty in patient-level models. This study aims to provide this guidance by proposing appropriate methods and illustrating the impact of this uncertainty on modeling outcomes. Two approaches, 1) using non-parametric bootstrapping and 2) using multivariate Normal distributions, were applied in a simulation and case study. The approaches were compared based on point-estimates and distributions of time-to-event and health economic outcomes. To assess sample size impact on the uncertainty in these outcomes, sample size was varied in the simulation study and subgroup analyses were performed for the case-study. Accounting for parameter uncertainty in distributions that reflect stochastic uncertainty substantially increased the uncertainty surrounding health economic outcomes, illustrated by larger confidence ellipses surrounding the cost-effectiveness point-estimates and different cost-effectiveness acceptability curves. Although both approaches performed similar for larger sample sizes (i.e. n = 500), the second approach was more sensitive to extreme values for small sample sizes (i.e. n = 25), yielding infeasible modeling outcomes. Modelers should be aware that parameter uncertainty in distributions used to describe stochastic uncertainty needs to be reflected in probabilistic sensitivity analysis, as it could substantially impact the total amount of uncertainty surrounding health economic outcomes. If feasible, the bootstrap approach is recommended to account for this uncertainty.
Evaluation of probabilistic forecasts with the scoringRules package

NASA Astrophysics Data System (ADS)

Jordan, Alexander; Krüger, Fabian; Lerch, Sebastian

2017-04-01

Over the last decades probabilistic forecasts in the form of predictive distributions have become popular in many scientific disciplines. With the proliferation of probabilistic models arises the need for decision-theoretically principled tools to evaluate the appropriateness of models and forecasts in a generalized way in order to better understand sources of prediction errors and to improve the models. Proper scoring rules are functions S(F,y) which evaluate the accuracy of a forecast distribution F , given that an outcome y was observed. In coherence with decision-theoretical principles they allow to compare alternative models, a crucial ability given the variety of theories, data sources and statistical specifications that is available in many situations. This contribution presents the software package scoringRules for the statistical programming language R, which provides functions to compute popular scoring rules such as the continuous ranked probability score for a variety of distributions F that come up in applied work. For univariate variables, two main classes are parametric distributions like normal, t, or gamma distributions, and distributions that are not known analytically, but are indirectly described through a sample of simulation draws. For example, ensemble weather forecasts take this form. The scoringRules package aims to be a convenient dictionary-like reference for computing scoring rules. We offer state of the art implementations of several known (but not routinely applied) formulas, and implement closed-form expressions that were previously unavailable. Whenever more than one implementation variant exists, we offer statistically principled default choices. Recent developments include the addition of scoring rules to evaluate multivariate forecast distributions. The use of the scoringRules package is illustrated in an example on post-processing ensemble forecasts of temperature.
A Simpli ed, General Approach to Simulating from Multivariate Copula Functions

Treesearch

Barry Goodwin

2012-01-01

Copulas have become an important analytic tool for characterizing multivariate distributions and dependence. One is often interested in simulating data from copula estimates. The process can be analytically and computationally complex and usually involves steps that are unique to a given parametric copula. We describe an alternative approach that uses \\probability{...
A novel generalized normal distribution for human longevity and other negatively skewed data.

PubMed

Robertson, Henry T; Allison, David B

2012-01-01

Negatively skewed data arise occasionally in statistical practice; perhaps the most familiar example is the distribution of human longevity. Although other generalizations of the normal distribution exist, we demonstrate a new alternative that apparently fits human longevity data better. We propose an alternative approach of a normal distribution whose scale parameter is conditioned on attained age. This approach is consistent with previous findings that longevity conditioned on survival to the modal age behaves like a normal distribution. We derive such a distribution and demonstrate its accuracy in modeling human longevity data from life tables. The new distribution is characterized by 1. An intuitively straightforward genesis; 2. Closed forms for the pdf, cdf, mode, quantile, and hazard functions; and 3. Accessibility to non-statisticians, based on its close relationship to the normal distribution.

A Novel Generalized Normal Distribution for Human Longevity and other Negatively Skewed Data

PubMed Central

Robertson, Henry T.; Allison, David B.

2012-01-01

Negatively skewed data arise occasionally in statistical practice; perhaps the most familiar example is the distribution of human longevity. Although other generalizations of the normal distribution exist, we demonstrate a new alternative that apparently fits human longevity data better. We propose an alternative approach of a normal distribution whose scale parameter is conditioned on attained age. This approach is consistent with previous findings that longevity conditioned on survival to the modal age behaves like a normal distribution. We derive such a distribution and demonstrate its accuracy in modeling human longevity data from life tables. The new distribution is characterized by 1. An intuitively straightforward genesis; 2. Closed forms for the pdf, cdf, mode, quantile, and hazard functions; and 3. Accessibility to non-statisticians, based on its close relationship to the normal distribution. PMID:22623974
Modeling error distributions of growth curve models through Bayesian methods.

PubMed

Zhang, Zhiyong

2016-06-01

Growth curve models are widely used in social and behavioral sciences. However, typical growth curve models often assume that the errors are normally distributed although non-normal data may be even more common than normal data. In order to avoid possible statistical inference problems in blindly assuming normality, a general Bayesian framework is proposed to flexibly model normal and non-normal data through the explicit specification of the error distributions. A simulation study shows when the distribution of the error is correctly specified, one can avoid the loss in the efficiency of standard error estimates. A real example on the analysis of mathematical ability growth data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 is used to show the application of the proposed methods. Instructions and code on how to conduct growth curve analysis with both normal and non-normal error distributions using the the MCMC procedure of SAS are provided.
Multivariate analysis for scanning tunneling spectroscopy data

NASA Astrophysics Data System (ADS)

Yamanishi, Junsuke; Iwase, Shigeru; Ishida, Nobuyuki; Fujita, Daisuke

2018-01-01

We applied principal component analysis (PCA) to two-dimensional tunneling spectroscopy (2DTS) data obtained on a Si(111)-(7 × 7) surface to explore the effectiveness of multivariate analysis for interpreting 2DTS data. We demonstrated that several components that originated mainly from specific atoms at the Si(111)-(7 × 7) surface can be extracted by PCA. Furthermore, we showed that hidden components in the tunneling spectra can be decomposed (peak separation), which is difficult to achieve with normal 2DTS analysis without the support of theoretical calculations. Our analysis showed that multivariate analysis can be an additional powerful way to analyze 2DTS data and extract hidden information from a large amount of spectroscopic data.
A Spatially Constrained Multi-autoencoder Approach for Multivariate Geochemical Anomaly Recognition

NASA Astrophysics Data System (ADS)

Lirong, C.; Qingfeng, G.; Renguang, Z.; Yihui, X.

2017-12-01

Separating and recognizing geochemical anomalies from the geochemical background is one of the key tasks in geochemical exploration. Many methods have been developed, such as calculating the mean ±2 standard deviation, and fractal/multifractal models. In recent years, deep autoencoder, a deep learning approach, have been used for multivariate geochemical anomaly recognition. While being able to deal with the non-normal distributions of geochemical concentrations and the non-linear relationships among them, this self-supervised learning method does not take into account the spatial heterogeneity of geochemical background and the uncertainty induced by the randomly initialized weights of neurons, leading to ineffective recognition of weak anomalies. In this paper, we introduce a spatially constrained multi-autoencoder (SCMA) approach for multivariate geochemical anomaly recognition, which includes two steps: spatial partitioning and anomaly score computation. The first step divides the study area into multiple sub-regions to segregate the geochemical background, by grouping the geochemical samples through K-means clustering, spatial filtering, and spatial constraining rules. In the second step, for each sub-region, a group of autoencoder neural networks are constructed with an identical structure but different initial weights on neurons. Each autoencoder is trained using the geochemical samples within the corresponding sub-region to learn the sub-regional geochemical background. The best autoencoder of a group is chosen as the final model for the corresponding sub-region. The anomaly score at each location can then be calculated as the euclidean distance between the observed concentrations and reconstructed concentrations of geochemical elements.The experiments using the geochemical data and Fe deposits in the southwestern Fujian province of China showed that our SCMA approach greatly improved the recognition of weak anomalies, achieving the AUC of 0.89, compared with the AUC of 0.77 using a single deep autoencoder approach.
Virtual quantification of metabolites by capillary electrophoresis-electrospray ionization-mass spectrometry: predicting ionization efficiency without chemical standards.

PubMed

Chalcraft, Kenneth R; Lee, Richard; Mills, Casandra; Britz-McKibbin, Philip

2009-04-01

A major obstacle in metabolomics remains the identification and quantification of a large fraction of unknown metabolites in complex biological samples when purified standards are unavailable. Herein we introduce a multivariate strategy for de novo quantification of cationic/zwitterionic metabolites using capillary electrophoresis-electrospray ionization-mass spectrometry (CE-ESI-MS) based on fundamental molecular, thermodynamic, and electrokinetic properties of an ion. Multivariate calibration was used to derive a quantitative relationship between the measured relative response factor (RRF) of polar metabolites with respect to four physicochemical properties associated with ion evaporation in ESI-MS, namely, molecular volume (MV), octanol-water distribution coefficient (log D), absolute mobility (mu(o)), and effective charge (z(eff)). Our studies revealed that a limited set of intrinsic solute properties can be used to predict the RRF of various classes of metabolites (e.g., amino acids, amines, peptides, acylcarnitines, nucleosides, etc.) with reasonable accuracy and robustness provided that an appropriate training set is validated and ion responses are normalized to an internal standard(s). The applicability of the multivariate model to quantify micromolar levels of metabolites spiked in red blood cell (RBC) lysates was also examined by CE-ESI-MS without significant matrix effects caused by involatile salts and/or major co-ion interferences. This work demonstrates the feasibility for virtual quantification of low-abundance metabolites and their isomers in real-world samples using physicochemical properties estimated by computer modeling, while providing deeper insight into the wide disparity of solute responses in ESI-MS. New strategies for predicting ionization efficiency in silico allow for rapid and semiquantitative analysis of newly discovered biomarkers and/or drug metabolites in metabolomics research when chemical standards do not exist.
Demographic and lifestyle factors and survival among patients with esophageal and gastric cancer: The Biobank Japan Project.

PubMed

Okada, Emiko; Ukawa, Shigekazu; Nakamura, Koshi; Hirata, Makoto; Nagai, Akiko; Matsuda, Koichi; Ninomiya, Toshiharu; Kiyohara, Yutaka; Muto, Kaori; Kamatani, Yoichiro; Yamagata, Zentaro; Kubo, Michiaki; Nakamura, Yusuke; Tamakoshi, Akiko

2017-03-01

Several studies have evaluated associations between the characteristics of patients with esophageal and gastric cancer and survival, but these associations remain unclear. We described the distribution of demographic and lifestyle factors among patients with esophageal and gastric cancer in Japan, and investigated their potential effects on survival. Between 2003 and 2007, 24- to 95-year-old Japanese patients with esophageal and gastric cancer were enrolled in the BioBank Japan Project. The analysis included 365 patients with esophageal squamous cell carcinoma (ESCC) and 1574 patients with gastric cancer. Hazard ratios (HRs) and 95% confidence intervals (CIs) for mortality were estimated using medical institution-stratified Cox proportional hazards models. During follow-up, 213 patients with ESCC (median follow-up, 4.4 years) and 603 patients with gastric cancer (median follow-up, 6.1 years) died. Among patients with ESCC, the mortality risk was higher in ever drinkers versus never drinkers (multivariable HR = 2.37, 95% CI: 1.24, 4.53). Among patients with gastric cancer, the mortality risk was higher in underweight patients versus patients of normal weight (multivariable HR = 1.66, 95% CI: 1.34, 2.05). Compared to patients with gastric cancer with no physical exercise habit, those who exercised ≥3 times/week had a lower mortality risk (multivariate HR = 0.75, 95% CI = 0.61, 0.93). However, lack of stage in many cases was a limitation. Among patients with ESCC, alcohol drinkers have a poor prognosis. Patients with gastric cancer who are underweight also have a poor prognosis, whereas patients with physical exercise habits have a good prognosis. Copyright © 2017 The Authors. Production and hosting by Elsevier B.V. All rights reserved.
Inheritance of Properties of Normal and Non-Normal Distributions after Transformation of Scores to Ranks

ERIC Educational Resources Information Center

Zimmerman, Donald W.

2011-01-01

This study investigated how population parameters representing heterogeneity of variance, skewness, kurtosis, bimodality, and outlier-proneness, drawn from normal and eleven non-normal distributions, also characterized the ranks corresponding to independent samples of scores. When the parameters of population distributions from which samples were…
Bayesian bivariate meta-analysis of correlated effects: Impact of the prior distributions on the between-study correlation, borrowing of strength, and joint inferences

PubMed Central

Bujkiewicz, Sylwia; Riley, Richard D

2016-01-01

Multivariate random-effects meta-analysis allows the joint synthesis of correlated results from multiple studies, for example, for multiple outcomes or multiple treatment groups. In a Bayesian univariate meta-analysis of one endpoint, the importance of specifying a sensible prior distribution for the between-study variance is well understood. However, in multivariate meta-analysis, there is little guidance about the choice of prior distributions for the variances or, crucially, the between-study correlation, ρB; for the latter, researchers often use a Uniform(−1,1) distribution assuming it is vague. In this paper, an extensive simulation study and a real illustrative example is used to examine the impact of various (realistically) vague prior distributions for ρB and the between-study variances within a Bayesian bivariate random-effects meta-analysis of two correlated treatment effects. A range of diverse scenarios are considered, including complete and missing data, to examine the impact of the prior distributions on posterior results (for treatment effect and between-study correlation), amount of borrowing of strength, and joint predictive distributions of treatment effectiveness in new studies. Two key recommendations are identified to improve the robustness of multivariate meta-analysis results. First, the routine use of a Uniform(−1,1) prior distribution for ρB should be avoided, if possible, as it is not necessarily vague. Instead, researchers should identify a sensible prior distribution, for example, by restricting values to be positive or negative as indicated by prior knowledge. Second, it remains critical to use sensible (e.g. empirically based) prior distributions for the between-study variances, as an inappropriate choice can adversely impact the posterior distribution for ρB, which may then adversely affect inferences such as joint predictive probabilities. These recommendations are especially important with a small number of studies and missing data. PMID:26988929
Gradually truncated log-normal in USA publicly traded firm size distribution

NASA Astrophysics Data System (ADS)

Gupta, Hari M.; Campanha, José R.; de Aguiar, Daniela R.; Queiroz, Gabriel A.; Raheja, Charu G.

2007-03-01

We study the statistical distribution of firm size for USA and Brazilian publicly traded firms through the Zipf plot technique. Sale size is used to measure firm size. The Brazilian firm size distribution is given by a log-normal distribution without any adjustable parameter. However, we also need to consider different parameters of log-normal distribution for the largest firms in the distribution, which are mostly foreign firms. The log-normal distribution has to be gradually truncated after a certain critical value for USA firms. Therefore, the original hypothesis of proportional effect proposed by Gibrat is valid with some modification for very large firms. We also consider the possible mechanisms behind this distribution.
Integrated GIS and multivariate statistical analysis for regional scale assessment of heavy metal soil contamination: A critical review.

PubMed

Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan

2017-12-01

Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis (PCA) and cluster analysis (CA). Copyright © 2017 Elsevier Ltd. All rights reserved.
A Comparison of the Influences of Verbal-Successive and Spatial-Simultaneous Factors on Achieving Readers in Fourth and Fifth Grade: A Multivariate Correlational Study.

ERIC Educational Resources Information Center

Solan, Harold A.

1987-01-01

This study involving 38 normally achieving fourth and fifth grade children confirmed previous studies indicating that both spatial-simultaneous (in which perceived stimuli are totally available at one point in time) and verbal-successive (information is presented in serial order) cognitive processing are important in normal learning. (DB)
Is the ML Chi-Square Ever Robust to Nonnormality? A Cautionary Note with Missing Data

ERIC Educational Resources Information Center

Savalei, Victoria

2008-01-01

Normal theory maximum likelihood (ML) is by far the most popular estimation and testing method used in structural equation modeling (SEM), and it is the default in most SEM programs. Even though this approach assumes multivariate normality of the data, its use can be justified on the grounds that it is fairly robust to the violations of the…
Bayesian transformation cure frailty models with multivariate failure time data.

PubMed

Yin, Guosheng

2008-12-10

We propose a class of transformation cure frailty models to accommodate a survival fraction in multivariate failure time data. Established through a general power transformation, this family of cure frailty models includes the proportional hazards and the proportional odds modeling structures as two special cases. Within the Bayesian paradigm, we obtain the joint posterior distribution and the corresponding full conditional distributions of the model parameters for the implementation of Gibbs sampling. Model selection is based on the conditional predictive ordinate statistic and deviance information criterion. As an illustration, we apply the proposed method to a real data set from dentistry.
Multivariate Distributions in Reliability Theory and Life Testing.

DTIC Science & Technology

1981-04-01

Downton Distribution This distribution is a special case of a classical bivariate gamma distribution due to Wicksell and to Kibble. See Krishnaiah and...Krishnamoorthy and Parthasarathy (1951) (see also Krishnaiah and Rao (1961) and Krishnaiah (1977))and also within the frame- 13 work of the Arnold classes. A...for these distributions and their properties is Johnson and Kotz (1972). Krishnaiah (1977) has specifically discussed multi- variate gamma
Multivariate statistical process control (MSPC) using Raman spectroscopy for in-line culture cell monitoring considering time-varying batches synchronized with correlation optimized warping (COW).

PubMed

Liu, Ya-Juan; André, Silvère; Saint Cristau, Lydia; Lagresle, Sylvain; Hannas, Zahia; Calvosa, Éric; Devos, Olivier; Duponchel, Ludovic

2017-02-01

Multivariate statistical process control (MSPC) is increasingly popular as the challenge provided by large multivariate datasets from analytical instruments such as Raman spectroscopy for the monitoring of complex cell cultures in the biopharmaceutical industry. However, Raman spectroscopy for in-line monitoring often produces unsynchronized data sets, resulting in time-varying batches. Moreover, unsynchronized data sets are common for cell culture monitoring because spectroscopic measurements are generally recorded in an alternate way, with more than one optical probe parallelly connecting to the same spectrometer. Synchronized batches are prerequisite for the application of multivariate analysis such as multi-way principal component analysis (MPCA) for the MSPC monitoring. Correlation optimized warping (COW) is a popular method for data alignment with satisfactory performance; however, it has never been applied to synchronize acquisition time of spectroscopic datasets in MSPC application before. In this paper we propose, for the first time, to use the method of COW to synchronize batches with varying durations analyzed with Raman spectroscopy. In a second step, we developed MPCA models at different time intervals based on the normal operation condition (NOC) batches synchronized by COW. New batches are finally projected considering the corresponding MPCA model. We monitored the evolution of the batches using two multivariate control charts based on Hotelling's T 2 and Q. As illustrated with results, the MSPC model was able to identify abnormal operation condition including contaminated batches which is of prime importance in cell culture monitoring We proved that Raman-based MSPC monitoring can be used to diagnose batches deviating from the normal condition, with higher efficacy than traditional diagnosis, which would save time and money in the biopharmaceutical industry. Copyright © 2016 Elsevier B.V. All rights reserved.
Multivariate quadrature for representing cloud condensation nuclei activity of aerosol populations

DOE PAGES

Fierce, Laura; McGraw, Robert L.

2017-07-26

Here, sparse representations of atmospheric aerosols are needed for efficient regional- and global-scale chemical transport models. Here we introduce a new framework for representing aerosol distributions, based on the quadrature method of moments. Given a set of moment constraints, we show how linear programming, combined with an entropy-inspired cost function, can be used to construct optimized quadrature representations of aerosol distributions. The sparse representations derived from this approach accurately reproduce cloud condensation nuclei (CCN) activity for realistically complex distributions simulated by a particleresolved model. Additionally, the linear programming techniques described in this study can be used to bound key aerosolmore » properties, such as the number concentration of CCN. Unlike the commonly used sparse representations, such as modal and sectional schemes, the maximum-entropy approach described here is not constrained to pre-determined size bins or assumed distribution shapes. This study is a first step toward a particle-based aerosol scheme that will track multivariate aerosol distributions with sufficient computational efficiency for large-scale simulations.« less
Multivariate quadrature for representing cloud condensation nuclei activity of aerosol populations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fierce, Laura; McGraw, Robert L.

Here, sparse representations of atmospheric aerosols are needed for efficient regional- and global-scale chemical transport models. Here we introduce a new framework for representing aerosol distributions, based on the quadrature method of moments. Given a set of moment constraints, we show how linear programming, combined with an entropy-inspired cost function, can be used to construct optimized quadrature representations of aerosol distributions. The sparse representations derived from this approach accurately reproduce cloud condensation nuclei (CCN) activity for realistically complex distributions simulated by a particleresolved model. Additionally, the linear programming techniques described in this study can be used to bound key aerosolmore » properties, such as the number concentration of CCN. Unlike the commonly used sparse representations, such as modal and sectional schemes, the maximum-entropy approach described here is not constrained to pre-determined size bins or assumed distribution shapes. This study is a first step toward a particle-based aerosol scheme that will track multivariate aerosol distributions with sufficient computational efficiency for large-scale simulations.« less
Discriminative analysis of non-linear brain connectivity for leukoaraiosis with resting-state fMRI

NASA Astrophysics Data System (ADS)

Lai, Youzhi; Xu, Lele; Yao, Li; Wu, Xia

2015-03-01

Leukoaraiosis (LA) describes diffuse white matter abnormalities on CT or MR brain scans, often seen in the normal elderly and in association with vascular risk factors such as hypertension, or in the context of cognitive impairment. The mechanism of cognitive dysfunction is still unclear. The recent clinical studies have revealed that the severity of LA was not corresponding to the cognitive level, and functional connectivity analysis is an appropriate method to detect the relation between LA and cognitive decline. However, existing functional connectivity analyses of LA have been mostly limited to linear associations. In this investigation, a novel measure utilizing the extended maximal information coefficient (eMIC) was applied to construct non-linear functional connectivity in 44 LA subjects (9 dementia, 25 mild cognitive impairment (MCI) and 10 cognitively normal (CN)). The strength of non-linear functional connections for the first 1% of discriminative power increased in MCI compared with CN and dementia, which was opposed to its linear counterpart. Further functional network analysis revealed that the changes of the non-linear and linear connectivity have similar but not completely the same spatial distribution in human brain. In the multivariate pattern analysis with multiple classifiers, the non-linear functional connectivity mostly identified dementia, MCI and CN from LA with a relatively higher accuracy rate than the linear measure. Our findings revealed the non-linear functional connectivity provided useful discriminative power in classification of LA, and the spatial distributed changes between the non-linear and linear measure may indicate the underlying mechanism of cognitive dysfunction in LA.
Fast Detection of Copper Content in Rice by Laser-Induced Breakdown Spectroscopy with Uni- and Multivariate Analysis.

PubMed

Liu, Fei; Ye, Lanhan; Peng, Jiyu; Song, Kunlin; Shen, Tingting; Zhang, Chu; He, Yong

2018-02-27

Fast detection of heavy metals is very important for ensuring the quality and safety of crops. Laser-induced breakdown spectroscopy (LIBS), coupled with uni- and multivariate analysis, was applied for quantitative analysis of copper in three kinds of rice (Jiangsu rice, regular rice, and Simiao rice). For univariate analysis, three pre-processing methods were applied to reduce fluctuations, including background normalization, the internal standard method, and the standard normal variate (SNV). Linear regression models showed a strong correlation between spectral intensity and Cu content, with an R 2 more than 0.97. The limit of detection (LOD) was around 5 ppm, lower than the tolerance limit of copper in foods. For multivariate analysis, partial least squares regression (PLSR) showed its advantage in extracting effective information for prediction, and its sensitivity reached 1.95 ppm, while support vector machine regression (SVMR) performed better in both calibration and prediction sets, where R c 2 and R p 2 reached 0.9979 and 0.9879, respectively. This study showed that LIBS could be considered as a constructive tool for the quantification of copper contamination in rice.
Fast Detection of Copper Content in Rice by Laser-Induced Breakdown Spectroscopy with Uni- and Multivariate Analysis

PubMed Central

Ye, Lanhan; Song, Kunlin; Shen, Tingting

2018-01-01

Fast detection of heavy metals is very important for ensuring the quality and safety of crops. Laser-induced breakdown spectroscopy (LIBS), coupled with uni- and multivariate analysis, was applied for quantitative analysis of copper in three kinds of rice (Jiangsu rice, regular rice, and Simiao rice). For univariate analysis, three pre-processing methods were applied to reduce fluctuations, including background normalization, the internal standard method, and the standard normal variate (SNV). Linear regression models showed a strong correlation between spectral intensity and Cu content, with an R2 more than 0.97. The limit of detection (LOD) was around 5 ppm, lower than the tolerance limit of copper in foods. For multivariate analysis, partial least squares regression (PLSR) showed its advantage in extracting effective information for prediction, and its sensitivity reached 1.95 ppm, while support vector machine regression (SVMR) performed better in both calibration and prediction sets, where Rc2 and Rp2 reached 0.9979 and 0.9879, respectively. This study showed that LIBS could be considered as a constructive tool for the quantification of copper contamination in rice. PMID:29495445

Bias and Precision of Measures of Association for a Fixed-Effect Multivariate Analysis of Variance Model

ERIC Educational Resources Information Center

Kim, Soyoung; Olejnik, Stephen

2005-01-01

The sampling distributions of five popular measures of association with and without two bias adjusting methods were examined for the single factor fixed-effects multivariate analysis of variance model. The number of groups, sample sizes, number of outcomes, and the strength of association were manipulated. The results indicate that all five…
Gaussian-based routines to impute categorical variables in health surveys.

PubMed

Yucel, Recai M; He, Yulei; Zaslavsky, Alan M

2011-12-20

The multivariate normal (MVN) distribution is arguably the most popular parametric model used in imputation and is available in most software packages (e.g., SAS PROC MI, R package norm). When it is applied to categorical variables as an approximation, practitioners often either apply simple rounding techniques for ordinal variables or create a distinct 'missing' category and/or disregard the nominal variable from the imputation phase. All of these practices can potentially lead to biased and/or uninterpretable inferences. In this work, we develop a new rounding methodology calibrated to preserve observed distributions to multiply impute missing categorical covariates. The major attractiveness of this method is its flexibility to use any 'working' imputation software, particularly those based on MVN, allowing practitioners to obtain usable imputations with small biases. A simulation study demonstrates the clear advantage of the proposed method in rounding ordinal variables and, in some scenarios, its plausibility in imputing nominal variables. We illustrate our methods on a widely used National Survey of Children with Special Health Care Needs where incomplete values on race posed a valid threat on inferences pertaining to disparities. Copyright © 2011 John Wiley & Sons, Ltd.
Log-gamma linear-mixed effects models for multiple outcomes with application to a longitudinal glaucoma study

PubMed Central

Zhang, Peng; Luo, Dandan; Li, Pengfei; Sharpsten, Lucie; Medeiros, Felipe A.

2015-01-01

Glaucoma is a progressive disease due to damage in the optic nerve with associated functional losses. Although the relationship between structural and functional progression in glaucoma is well established, there is disagreement on how this association evolves over time. In addressing this issue, we propose a new class of non-Gaussian linear-mixed models to estimate the correlations among subject-specific effects in multivariate longitudinal studies with a skewed distribution of random effects, to be used in a study of glaucoma. This class provides an efficient estimation of subject-specific effects by modeling the skewed random effects through the log-gamma distribution. It also provides more reliable estimates of the correlations between the random effects. To validate the log-gamma assumption against the usual normality assumption of the random effects, we propose a lack-of-fit test using the profile likelihood function of the shape parameter. We apply this method to data from a prospective observation study, the Diagnostic Innovations in Glaucoma Study, to present a statistically significant association between structural and functional change rates that leads to a better understanding of the progression of glaucoma over time. PMID:26075565
Log-normal distribution from a process that is not multiplicative but is additive.

PubMed

Mouri, Hideaki

2013-10-01

The central limit theorem ensures that a sum of random variables tends to a Gaussian distribution as their total number tends to infinity. However, for a class of positive random variables, we find that the sum tends faster to a log-normal distribution. Although the sum tends eventually to a Gaussian distribution, the distribution of the sum is always close to a log-normal distribution rather than to any Gaussian distribution if the summands are numerous enough. This is in contrast to the current consensus that any log-normal distribution is due to a product of random variables, i.e., a multiplicative process, or equivalently to nonlinearity of the system. In fact, the log-normal distribution is also observable for a sum, i.e., an additive process that is typical of linear systems. We show conditions for such a sum, an analytical example, and an application to random scalar fields such as those of turbulence.
Harnessing Multivariate Statistics for Ellipsoidal Data in Structural Geology

NASA Astrophysics Data System (ADS)

Roberts, N.; Davis, J. R.; Titus, S.; Tikoff, B.

2015-12-01

Most structural geology articles do not state significance levels, report confidence intervals, or perform regressions to find trends. This is, in part, because structural data tend to include directions, orientations, ellipsoids, and tensors, which are not treatable by elementary statistics. We describe a full procedural methodology for the statistical treatment of ellipsoidal data. We use a reconstructed dataset of deformed ooids in Maryland from Cloos (1947) to illustrate the process. Normalized ellipsoids have five degrees of freedom and can be represented by a second order tensor. This tensor can be permuted into a five dimensional vector that belongs to a vector space and can be treated with standard multivariate statistics. Cloos made several claims about the distribution of deformation in the South Mountain fold, Maryland, and we reexamine two particular claims using hypothesis testing: 1) octahedral shear strain increases towards the axial plane of the fold; 2) finite strain orientation varies systematically along the trend of the axial trace as it bends with the Appalachian orogen. We then test the null hypothesis that the southern segment of South Mountain is the same as the northern segment. This test illustrates the application of ellipsoidal statistics, which combine both orientation and shape. We report confidence intervals for each test, and graphically display our results with novel plots. This poster illustrates the importance of statistics in structural geology, especially when working with noisy or small datasets.
A generalized multivariate regression model for modelling ocean wave heights

NASA Astrophysics Data System (ADS)

Wang, X. L.; Feng, Y.; Swail, V. R.

2012-04-01

In this study, a generalized multivariate linear regression model is developed to represent the relationship between 6-hourly ocean significant wave heights (Hs) and the corresponding 6-hourly mean sea level pressure (MSLP) fields. The model is calibrated using the ERA-Interim reanalysis of Hs and MSLP fields for 1981-2000, and is validated using the ERA-Interim reanalysis for 2001-2010 and ERA40 reanalysis of Hs and MSLP for 1958-2001. The performance of the fitted model is evaluated in terms of Pierce skill score, frequency bias index, and correlation skill score. Being not normally distributed, wave heights are subjected to a data adaptive Box-Cox transformation before being used in the model fitting. Also, since 6-hourly data are being modelled, lag-1 autocorrelation must be and is accounted for. The models with and without Box-Cox transformation, and with and without accounting for autocorrelation, are inter-compared in terms of their prediction skills. The fitted MSLP-Hs relationship is then used to reconstruct historical wave height climate from the 6-hourly MSLP fields taken from the Twentieth Century Reanalysis (20CR, Compo et al. 2011), and to project possible future wave height climates using CMIP5 model simulations of MSLP fields. The reconstructed and projected wave heights, both seasonal means and maxima, are subject to a trend analysis that allows for non-linear (polynomial) trends.
Gaussianization for fast and accurate inference from cosmological data

NASA Astrophysics Data System (ADS)

Schuhmann, Robert L.; Joachimi, Benjamin; Peiris, Hiranya V.

2016-06-01

We present a method to transform multivariate unimodal non-Gaussian posterior probability densities into approximately Gaussian ones via non-linear mappings, such as Box-Cox transformations and generalizations thereof. This permits an analytical reconstruction of the posterior from a point sample, like a Markov chain, and simplifies the subsequent joint analysis with other experiments. This way, a multivariate posterior density can be reported efficiently, by compressing the information contained in Markov Chain Monte Carlo samples. Further, the model evidence integral (I.e. the marginal likelihood) can be computed analytically. This method is analogous to the search for normal parameters in the cosmic microwave background, but is more general. The search for the optimally Gaussianizing transformation is performed computationally through a maximum-likelihood formalism; its quality can be judged by how well the credible regions of the posterior are reproduced. We demonstrate that our method outperforms kernel density estimates in this objective. Further, we select marginal posterior samples from Planck data with several distinct strongly non-Gaussian features, and verify the reproduction of the marginal contours. To demonstrate evidence computation, we Gaussianize the joint distribution of data from weak lensing and baryon acoustic oscillations, for different cosmological models, and find a preference for flat Λcold dark matter. Comparing to values computed with the Savage-Dickey density ratio, and Population Monte Carlo, we find good agreement of our method within the spread of the other two.
Renal function changes after percutaneous nephrolithotomy in patients with renal calculi with a solitary kidney compared to bilateral kidneys.

PubMed

Shi, Xiaolei; Peng, Yonghan; Li, Ling; Li, Xiao; Wang, Qi; Zhang, Wei; Dong, Hao; Shen, Rong; Lu, Chaoyue; Liu, Min; Gao, Xiaofeng; Sun, Yinghao

2018-05-26

To evaluate renal function changes and risk factors for acute kidney injury (AKI) after percutaneous nephrolithotomy (PCNL) in patients with renal calculi with a solitary kidney (SK) or normal bilateral kidneys (BKs). Between 2012 and 2016, 859 patients undergoing PCNL were retrospectively reviewed at Changhai Hospital. In all, 53 patients with a SK were paired with 53 patients with normal BKs via a propensity score-matched analysis. Data for the following variables were collected: age, sex, body mass index, stone size, distribution, operation time, perioperative outcomes, and complications. The complications were graded according to the modified Clavien-Dindo system. Univariable and multivariable logistic regression models were constructed to evaluate risk factors for predicting AKI. The SK and BKs groups were comparable in terms of age, sex ratio, stone size, stone location distribution, comorbidities, and American Society of Anesthesiologists Physical Status classification. The initial and final stone-free rates were comparable between the SK and BKs groups (initial: 52.83% vs 58.49%, P = 0.696; final: 84.91% vs 92.45%, P = 0.359). There was no difference between the two groups for complications, according to the Clavien-Dindo grades. The estimated glomerular filtration rate (eGFR) increased dramatically after the stone burden was immediately relieved, and during the 6-month follow-up eGFR was lower in the SK group compared with the BKs group. We found a modest improvement in renal function immediately after PCNL in the BKs group, and renal function gain was delayed in the SK group. Through logistic regression analysis, we discovered that a SK, preoperative creatinine and diabetes were independent risk factors for predicting AKI after PCNL. Considering the overall complication rates, PCNL is generally a safe procedure for treating renal calculi amongst patients with a SK or normal BKs. Follow-up renal function analysis showed a modest improvement in patients of both groups. Compared to patients with normal BKs, patients with a SK were more likely to develop AKI after PCNL. © 2018 The Authors BJU International © 2018 BJU International Published by John Wiley & Sons Ltd.
Accounting for Sampling Error in Genetic Eigenvalues Using Random Matrix Theory.

PubMed

Sztepanacz, Jacqueline L; Blows, Mark W

2017-07-01

The distribution of genetic variance in multivariate phenotypes is characterized by the empirical spectral distribution of the eigenvalues of the genetic covariance matrix. Empirical estimates of genetic eigenvalues from random effects linear models are known to be overdispersed by sampling error, where large eigenvalues are biased upward, and small eigenvalues are biased downward. The overdispersion of the leading eigenvalues of sample covariance matrices have been demonstrated to conform to the Tracy-Widom (TW) distribution. Here we show that genetic eigenvalues estimated using restricted maximum likelihood (REML) in a multivariate random effects model with an unconstrained genetic covariance structure will also conform to the TW distribution after empirical scaling and centering. However, where estimation procedures using either REML or MCMC impose boundary constraints, the resulting genetic eigenvalues tend not be TW distributed. We show how using confidence intervals from sampling distributions of genetic eigenvalues without reference to the TW distribution is insufficient protection against mistaking sampling error as genetic variance, particularly when eigenvalues are small. By scaling such sampling distributions to the appropriate TW distribution, the critical value of the TW statistic can be used to determine if the magnitude of a genetic eigenvalue exceeds the sampling error for each eigenvalue in the spectral distribution of a given genetic covariance matrix. Copyright © 2017 by the Genetics Society of America.
Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers

PubMed Central

Han, Buhm; Kang, Hyun Min; Eskin, Eleazar

2009-01-01

With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu. PMID:19381255
Subset Selection Procedures: A Review and an Assessment

DTIC Science & Technology

1984-02-01

distance function (Alam and Rizvi, 1966; Gupta, 1966; Gupta and Studden, 1970), generalized variance ( Gnanadesikan and Gupta, 1970), and multiple... Gnanadesikan (1966) considered a location type procedure based on sample component means. Except in the case of bivariate normal, only a lower bound of the...Frischtak, 1973; Gnanadesikan , 1966) for ranking multivariate normal populations but the results in these cases are very limited in scope or are asymptotic
Profile and Determinants of Retinal Optical Intensity in Normal Eyes with Spectral Domain Optical Coherence Tomography.

PubMed

Chen, Binyao; Gao, Enting; Chen, Haoyu; Yang, Jianling; Shi, Fei; Zheng, Ce; Zhu, Weifang; Xiang, Dehui; Chen, Xinjian; Zhang, Mingzhi

2016-01-01

To investigate the profile and determinants of retinal optical intensity in normal subjects using 3D spectral domain optical coherence tomography (SD OCT). A total of 231 eyes from 231 healthy subjects ranging in age from 18 to 80 years were included and underwent a 3D OCT scan. Forty-four eyes were randomly chosen to be scanned by two operators for reproducibility analysis. Distribution of optical intensity of each layer and regions specified by the Early Treatment of Diabetic Retinopathy Study (ETDRS) were investigated by analyzing the OCT raw data with our automatic graph-based algorithm. Univariate and multivariate analyses were performed between retinal optical intensity and sex, age, height, weight, spherical equivalent (SE), axial length, image quality, disc area and rim/disc area ratio (R/D area ratio). For optical intensity measurements, the intraclass correlation coefficient of each layer ranged from 0.815 to 0.941, indicating good reproducibility. Optical intensity was lowest in the central area of retinal nerve fiber layer, ganglion cell layer, inner plexiform layer, inner nuclear layer, outer plexiform layer and photoreceptor layer, except for the retinal pigment epithelium (RPE). Optical intensity was positively correlated with image quality in all retinal layers (0.553<β<0.851, p<0.01), and negatively correlated with age in most retinal layers (-0.362<β<-0.179, p<0.01), except for the RPE (β = 0.456, p<0.01), outer nuclear layer and photoreceptor layer (p>0.05). There was no relationship between retinal optical intensity and sex, height, weight, SE, axial length, disc area and R/D area ratio. There was a specific pattern of distribution of retinal optical intensity in different regions. The optical intensity was affected by image quality and age. Image quality can be used as a reference for normalization. The effect of age needs to be taken into consideration when using OCT for diagnosis.
Polynomial Chaos Based Acoustic Uncertainty Predictions from Ocean Forecast Ensembles

NASA Astrophysics Data System (ADS)

Dennis, S.

2016-02-01

Most significant ocean acoustic propagation occurs at tens of kilometers, at scales small compared basin and to most fine scale ocean modeling. To address the increased emphasis on uncertainty quantification, for example transmission loss (TL) probability density functions (PDF) within some radius, a polynomial chaos (PC) based method is utilized. In order to capture uncertainty in ocean modeling, Navy Coastal Ocean Model (NCOM) now includes ensembles distributed to reflect the ocean analysis statistics. Since the ensembles are included in the data assimilation for the new forecast ensembles, the acoustic modeling uses the ensemble predictions in a similar fashion for creating sound speed distribution over an acoustically relevant domain. Within an acoustic domain, singular value decomposition over the combined time-space structure of the sound speeds can be used to create Karhunen-Loève expansions of sound speed, subject to multivariate normality testing. These sound speed expansions serve as a basis for Hermite polynomial chaos expansions of derived quantities, in particular TL. The PC expansion coefficients result from so-called non-intrusive methods, involving evaluation of TL at multi-dimensional Gauss-Hermite quadrature collocation points. Traditional TL calculation from standard acoustic propagation modeling could be prohibitively time consuming at all multi-dimensional collocation points. This method employs Smolyak order and gridding methods to allow adaptive sub-sampling of the collocation points to determine only the most significant PC expansion coefficients to within a preset tolerance. Practically, the Smolyak order and grid sizes grow only polynomially in the number of Karhunen-Loève terms, alleviating the curse of dimensionality. The resulting TL PC coefficients allow the determination of TL PDF normality and its mean and standard deviation. In the non-normal case, PC Monte Carlo methods are used to rapidly establish the PDF. This work was sponsored by the Office of Naval Research
Combining Frequency Doubling Technology Perimetry and Scanning Laser Polarimetry for Glaucoma Detection.

PubMed

Mwanza, Jean-Claude; Warren, Joshua L; Hochberg, Jessica T; Budenz, Donald L; Chang, Robert T; Ramulu, Pradeep Y

2015-01-01

To determine the ability of frequency doubling technology (FDT) and scanning laser polarimetry with variable corneal compensation (GDx-VCC) to detect glaucoma when used individually and in combination. One hundred ten normal and 114 glaucomatous subjects were tested with FDT C-20-5 screening protocol and the GDx-VCC. The discriminating ability was tested for each device individually and for both devices combined using GDx-NFI, GDx-TSNIT, number of missed points of FDT, and normal or abnormal FDT. Measures of discrimination included sensitivity, specificity, area under the curve (AUC), Akaike's information criterion (AIC), and prediction confidence interval lengths. For detecting glaucoma regardless of severity, the multivariable model resulting from the combination of GDx-TSNIT, number of abnormal points on FDT (NAP-FDT), and the interaction GDx-TSNIT×NAP-FDT (AIC: 88.28, AUC: 0.959, sensitivity: 94.6%, specificity: 89.5%) outperformed the best single-variable model provided by GDx-NFI (AIC: 120.88, AUC: 0.914, sensitivity: 87.8%, specificity: 84.2%). The multivariable model combining GDx-TSNIT, NAP-FDT, and interaction GDx-TSNIT×NAP-FDT consistently provided better discriminating abilities for detecting early, moderate, and severe glaucoma than the best single-variable models. The multivariable model including GDx-TSNIT, NAP-FDT, and the interaction GDx-TSNIT×NAP-FDT provides the best glaucoma prediction compared with all other multivariable and univariable models. Combining the FDT C-20-5 screening protocol and GDx-VCC improves glaucoma detection compared with using GDx or FDT alone.
Bladder cancer diagnosis during cystoscopy using Raman spectroscopy

NASA Astrophysics Data System (ADS)

Grimbergen, M. C. M.; van Swol, C. F. P.; Draga, R. O. P.; van Diest, P.; Verdaasdonk, R. M.; Stone, N.; Bosch, J. H. L. R.

2009-02-01

Raman spectroscopy is an optical technique that can be used to obtain specific molecular information of biological tissues. It has been used successfully to differentiate normal and pre-malignant tissue in many organs. The goal of this study is to determine the possibility to distinguish normal tissue from bladder cancer using this system. The endoscopic Raman system consists of a 6 Fr endoscopic probe connected to a 785nm diode laser and a spectral recording system. A total of 107 tissue samples were obtained from 54 patients with known bladder cancer during transurethral tumor resection. Immediately after surgical removal the samples were placed under the Raman probe and spectra were collected and stored for further analysis. The collected spectra were analyzed using multivariate statistical methods. In total 2949 Raman spectra were recorded ex vivo from cold cup biopsy samples with 2 seconds integration time. A multivariate algorithm allowed differentiation of normal and malignant tissue with a sensitivity and specificity of 78,5% and 78,9% respectively. The results show the possibility of discerning normal from malignant bladder tissue by means of Raman spectroscopy using a small fiber based system. Despite the low number of samples the results indicate that it might be possible to use this technique to grade identified bladder wall lesions during endoscopy.
SMURC: High-Dimension Small-Sample Multivariate Regression With Covariance Estimation.

PubMed

Bayar, Belhassen; Bouaynaya, Nidhal; Shterenberg, Roman

2017-03-01

We consider a high-dimension low sample-size multivariate regression problem that accounts for correlation of the response variables. The system is underdetermined as there are more parameters than samples. We show that the maximum likelihood approach with covariance estimation is senseless because the likelihood diverges. We subsequently propose a normalization of the likelihood function that guarantees convergence. We call this method small-sample multivariate regression with covariance (SMURC) estimation. We derive an optimization problem and its convex approximation to compute SMURC. Simulation results show that the proposed algorithm outperforms the regularized likelihood estimator with known covariance matrix and the sparse conditional Gaussian graphical model. We also apply SMURC to the inference of the wing-muscle gene network of the Drosophila melanogaster (fruit fly).
Pretransplant cachexia and morbid obesity are predictors of increased mortality after heart transplantation.

PubMed

Lietz, K; John, R; Burke, E A; Ankersmit, J H; McCue, J D; Naka, Y; Oz, M C; Mancini, D M; Edwards, N M

2001-07-27

Extremes in body weight are a relative contraindication to cardiac transplantation. We retrospectively reviewed 474 consecutive adult patients (377 male, 97 female, mean age 50.3+/-12.2 years), who received 444 primary and 30 heart retransplants between January of 1992 and January of 1999. Of these, 68 cachectic (body mass index [BMI]<20 kg/m2), 113 overweight (BMI=>27-30 kg/m2), and 55 morbidly obese (BMI>30 kg/m2) patients were compared with 238 normal-weight recipients (BMI=20-27 kg/m2). We evaluated the influence of pretransplant BMI on morbidity and mortality after cardiac transplantation. Kaplan-Meier survival distribution and Cox proportional hazards model were used for statistical analyses. Morbidly obese as well as cachectic recipients demonstrated nearly twice the 5-year mortality of normal-weight or overweight recipients (53% vs. 27%, respectively, P=0.001). An increase in mortality was seen at 30 days for morbidly obese and cachectic recipients (12.7% and 17.7%, respectively) versus a 30-day mortality rate of 7.6% in normal-weight recipients. Morbidly obese recipients experienced a shorter time to high-grade acute rejection (P=0.004) as well as an increased annual high-grade rejection frequency when compared with normal-weight recipients (P=0.001). By multivariable analysis, the incidence of transplant-related coronary artery disease (TCAD) was not increased in morbidly obese patients but cachectic patients had a significantly lower incidence of TCAD (P=0.05). Cachectic patients receiving oversized donor hearts had a significantly higher postoperative mortality (P=0.02). The risks of cardiac transplantation are increased in both morbidly obese and cachectic patients compared with normal-weight recipients. However, the results of cardiac transplantation in overweight patients is comparable to that in normal-weight patients. Recipient size should be kept in mind while selecting patients and the use of oversized donors in cachectic recipients should be avoided.
Evaluation of Kurtosis into the product of two normally distributed variables

NASA Astrophysics Data System (ADS)

Oliveira, Amílcar; Oliveira, Teresa; Seijas-Macías, Antonio

2016-06-01

Kurtosis (κ) is any measure of the "peakedness" of a distribution of a real-valued random variable. We study the evolution of the Kurtosis for the product of two normally distributed variables. Product of two normal variables is a very common problem for some areas of study, like, physics, economics, psychology, … Normal variables have a constant value for kurtosis (κ = 3), independently of the value of the two parameters: mean and variance. In fact, the excess kurtosis is defined as κ- 3 and the Normal Distribution Kurtosis is zero. The product of two normally distributed variables is a function of the parameters of the two variables and the correlation between then, and the range for kurtosis is in [0, 6] for independent variables and in [0, 12] when correlation between then is allowed.
Distribution Functions of Sizes and Fluxes Determined from Supra-Arcade Downflows

NASA Technical Reports Server (NTRS)

McKenzie, D.; Savage, S.

2011-01-01

The frequency distributions of sizes and fluxes of supra-arcade downflows (SADs) provide information about the process of their creation. For example, a fractal creation process may be expected to yield a power-law distribution of sizes and/or fluxes. We examine 120 cross-sectional areas and magnetic flux estimates found by Savage & McKenzie for SADs, and find that (1) the areas are consistent with a log-normal distribution and (2) the fluxes are consistent with both a log-normal and an exponential distribution. Neither set of measurements is compatible with a power-law distribution nor a normal distribution. As a demonstration of the applicability of these findings to improved understanding of reconnection, we consider a simple SAD growth scenario with minimal assumptions, capable of producing a log-normal distribution.
Estimating Non-Normal Latent Trait Distributions within Item Response Theory Using True and Estimated Item Parameters

ERIC Educational Resources Information Center

Sass, D. A.; Schmitt, T. A.; Walker, C. M.

2008-01-01

Item response theory (IRT) procedures have been used extensively to study normal latent trait distributions and have been shown to perform well; however, less is known concerning the performance of IRT with non-normal latent trait distributions. This study investigated the degree of latent trait estimation error under normal and non-normal…

Combining Mixture Components for Clustering*

PubMed Central

Baudry, Jean-Patrick; Raftery, Adrian E.; Celeux, Gilles; Lo, Kenneth; Gottardo, Raphaël

2010-01-01

Model-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used. The number of clusters is usually determined from the data, often using BIC. In practice, however, individual clusters can be poorly fitted by Gaussian distributions, and in that case model-based clustering tends to represent one non-Gaussian cluster by a mixture of two or more Gaussian distributions. If the number of mixture components is interpreted as the number of clusters, this can lead to overestimation of the number of clusters. This is because BIC selects the number of mixture components needed to provide a good approximation to the density, rather than the number of clusters as such. We propose first selecting the total number of Gaussian mixture components, K, using BIC and then combining them hierarchically according to an entropy criterion. This yields a unique soft clustering for each number of clusters less than or equal to K. These clusterings can be compared on substantive grounds, and we also describe an automatic way of selecting the number of clusters via a piecewise linear regression fit to the rescaled entropy plot. We illustrate the method with simulated data and a flow cytometry dataset. Supplemental Materials are available on the journal Web site and described at the end of the paper. PMID:20953302
Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances.

PubMed

Sáez, Carlos; Robles, Montserrat; García-Gómez, Juan M

2017-02-01

Biomedical data may be composed of individuals generated from distinct, meaningful sources. Due to possible contextual biases in the processes that generate data, there may exist an undesirable and unexpected variability among the probability distribution functions (PDFs) of the source subsamples, which, when uncontrolled, may lead to inaccurate or unreproducible research results. Classical statistical methods may have difficulties to undercover such variabilities when dealing with multi-modal, multi-type, multi-variate data. This work proposes two metrics for the analysis of stability among multiple data sources, robust to the aforementioned conditions, and defined in the context of data quality assessment. Specifically, a global probabilistic deviation and a source probabilistic outlyingness metrics are proposed. The first provides a bounded degree of the global multi-source variability, designed as an estimator equivalent to the notion of normalized standard deviation of PDFs. The second provides a bounded degree of the dissimilarity of each source to a latent central distribution. The metrics are based on the projection of a simplex geometrical structure constructed from the Jensen-Shannon distances among the sources PDFs. The metrics have been evaluated and demonstrated their correct behaviour on a simulated benchmark and with real multi-source biomedical data using the UCI Heart Disease data set. The biomedical data quality assessment based on the proposed stability metrics may improve the efficiency and effectiveness of biomedical data exploitation and research.
Quantiles for Finite Mixtures of Normal Distributions

ERIC Educational Resources Information Center

Rahman, Mezbahur; Rahman, Rumanur; Pearson, Larry M.

2006-01-01

Quantiles for finite mixtures of normal distributions are computed. The difference between a linear combination of independent normal random variables and a linear combination of independent normal densities is emphasized. (Contains 3 tables and 1 figure.)
Braking System Integration in Dual Mode Systems

DOT National Transportation Integrated Search

1974-05-01

An optimal braking system for Dual Mode is a complex product of vast number of multivariate, interdependent parameters that encompass on-guideway and off-guideway operation as well as normal and emergency braking. : Details of, and interralations amo...
Optimal False Discovery Rate Control for Dependent Data

PubMed Central

Xie, Jichun; Cai, T. Tony; Maris, John; Li, Hongzhe

2013-01-01

This paper considers the problem of optimal false discovery rate control when the test statistics are dependent. An optimal joint oracle procedure, which minimizes the false non-discovery rate subject to a constraint on the false discovery rate is developed. A data-driven marginal plug-in procedure is then proposed to approximate the optimal joint procedure for multivariate normal data. It is shown that the marginal procedure is asymptotically optimal for multivariate normal data with a short-range dependent covariance structure. Numerical results show that the marginal procedure controls false discovery rate and leads to a smaller false non-discovery rate than several commonly used p-value based false discovery rate controlling methods. The procedure is illustrated by an application to a genome-wide association study of neuroblastoma and it identifies a few more genetic variants that are potentially associated with neuroblastoma than several p-value-based false discovery rate controlling procedures. PMID:23378870
A note on a simplified and general approach to simulating from multivariate copula functions

Treesearch

Barry K. Goodwin

2013-01-01

Copulas have become an important analytic tool for characterizing multivariate distributions and dependence. One is often interested in simulating data from copula estimates. The process can be analytically and computationally complex and usually involves steps that are unique to a given parametric copula. We describe an alternative approach that uses âProbability-...
Testing Mean Differences among Groups: Multivariate and Repeated Measures Analysis with Minimal Assumptions

PubMed Central

Bathke, Arne C.; Friedrich, Sarah; Pauly, Markus; Konietschke, Frank; Staffen, Wolfgang; Strobl, Nicolas; Höller, Yvonne

2018-01-01

ABSTRACT To date, there is a lack of satisfactory inferential techniques for the analysis of multivariate data in factorial designs, when only minimal assumptions on the data can be made. Presently available methods are limited to very particular study designs or assume either multivariate normality or equal covariance matrices across groups, or they do not allow for an assessment of the interaction effects across within-subjects and between-subjects variables. We propose and methodologically validate a parametric bootstrap approach that does not suffer from any of the above limitations, and thus provides a rather general and comprehensive methodological route to inference for multivariate and repeated measures data. As an example application, we consider data from two different Alzheimer’s disease (AD) examination modalities that may be used for precise and early diagnosis, namely, single-photon emission computed tomography (SPECT) and electroencephalogram (EEG). These data violate the assumptions of classical multivariate methods, and indeed classical methods would not have yielded the same conclusions with regards to some of the factors involved. PMID:29565679
Robust Bayesian Analysis of Heavy-tailed Stochastic Volatility Models using Scale Mixtures of Normal Distributions

PubMed Central

Abanto-Valle, C. A.; Bandyopadhyay, D.; Lachos, V. H.; Enriquez, I.

2009-01-01

A Bayesian analysis of stochastic volatility (SV) models using the class of symmetric scale mixtures of normal (SMN) distributions is considered. In the face of non-normality, this provides an appealing robust alternative to the routine use of the normal distribution. Specific distributions examined include the normal, student-t, slash and the variance gamma distributions. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo (MCMC) algorithm is introduced for parameter estimation. Moreover, the mixing parameters obtained as a by-product of the scale mixture representation can be used to identify outliers. The methods developed are applied to analyze daily stock returns data on S&P500 index. Bayesian model selection criteria as well as out-of- sample forecasting results reveal that the SV models based on heavy-tailed SMN distributions provide significant improvement in model fit as well as prediction to the S&P500 index data over the usual normal model. PMID:20730043
Log Normal Distribution of Cellular Uptake of Radioactivity: Statistical Analysis of Alpha Particle Track Autoradiography

PubMed Central

Neti, Prasad V.S.V.; Howell, Roger W.

2008-01-01

Recently, the distribution of radioactivity among a population of cells labeled with 210Po was shown to be well described by a log normal distribution function (J Nucl Med 47, 6 (2006) 1049-1058) with the aid of an autoradiographic approach. To ascertain the influence of Poisson statistics on the interpretation of the autoradiographic data, the present work reports on a detailed statistical analyses of these data. Methods The measured distributions of alpha particle tracks per cell were subjected to statistical tests with Poisson (P), log normal (LN), and Poisson – log normal (P – LN) models. Results The LN distribution function best describes the distribution of radioactivity among cell populations exposed to 0.52 and 3.8 kBq/mL 210Po-citrate. When cells were exposed to 67 kBq/mL, the P – LN distribution function gave a better fit, however, the underlying activity distribution remained log normal. Conclusions The present analysis generally provides further support for the use of LN distributions to describe the cellular uptake of radioactivity. Care should be exercised when analyzing autoradiographic data on activity distributions to ensure that Poisson processes do not distort the underlying LN distribution. PMID:16741316
Mapping of quantitative trait loci using the skew-normal distribution.

PubMed

Fernandes, Elisabete; Pacheco, António; Penha-Gonçalves, Carlos

2007-11-01

In standard interval mapping (IM) of quantitative trait loci (QTL), the QTL effect is described by a normal mixture model. When this assumption of normality is violated, the most commonly adopted strategy is to use the previous model after data transformation. However, an appropriate transformation may not exist or may be difficult to find. Also this approach can raise interpretation issues. An interesting alternative is to consider a skew-normal mixture model in standard IM, and the resulting method is here denoted as skew-normal IM. This flexible model that includes the usual symmetric normal distribution as a special case is important, allowing continuous variation from normality to non-normality. In this paper we briefly introduce the main peculiarities of the skew-normal distribution. The maximum likelihood estimates of parameters of the skew-normal distribution are obtained by the expectation-maximization (EM) algorithm. The proposed model is illustrated with real data from an intercross experiment that shows a significant departure from the normality assumption. The performance of the skew-normal IM is assessed via stochastic simulation. The results indicate that the skew-normal IM has higher power for QTL detection and better precision of QTL location as compared to standard IM and nonparametric IM.
On the generation of log-Lévy distributions and extreme randomness

NASA Astrophysics Data System (ADS)

Eliazar, Iddo; Klafter, Joseph

2011-10-01

The log-normal distribution is prevalent across the sciences, as it emerges from the combination of multiplicative processes and the central limit theorem (CLT). The CLT, beyond yielding the normal distribution, also yields the class of Lévy distributions. The log-Lévy distributions are the Lévy counterparts of the log-normal distribution, they appear in the context of ultraslow diffusion processes, and they are categorized by Mandelbrot as belonging to the class of extreme randomness. In this paper, we present a natural stochastic growth model from which both the log-normal distribution and the log-Lévy distributions emerge universally—the former in the case of deterministic underlying setting, and the latter in the case of stochastic underlying setting. In particular, we establish a stochastic growth model which universally generates Mandelbrot’s extreme randomness.
Arm structure in normal spiral galaxies, 1: Multivariate data for 492 galaxies

NASA Technical Reports Server (NTRS)

Magri, Christopher

1994-01-01

Multivariate data have been collected as part of an effort to develop a new classification system for spiral galaxies, one which is not necessarily based on subjective morphological properties. A sample of 492 moderately bright northern Sa and Sc spirals was chosen for future statistical analysis. New observations were made at 20 and 21 cm; the latter data are described in detail here. Infrared Astronomy Satellite (IRAS) fluxes were obtained from archival data. Finally, new estimates of arm pattern radomness and of local environmental harshness were compiled for most sample objects.
Optimal transformations leading to normal distributions of positron emission tomography standardized uptake values.

PubMed

Scarpelli, Matthew; Eickhoff, Jens; Cuna, Enrique; Perlman, Scott; Jeraj, Robert

2018-01-30

The statistical analysis of positron emission tomography (PET) standardized uptake value (SUV) measurements is challenging due to the skewed nature of SUV distributions. This limits utilization of powerful parametric statistical models for analyzing SUV measurements. An ad-hoc approach, which is frequently used in practice, is to blindly use a log transformation, which may or may not result in normal SUV distributions. This study sought to identify optimal transformations leading to normally distributed PET SUVs extracted from tumors and assess the effects of therapy on the optimal transformations. The optimal transformation for producing normal distributions of tumor SUVs was identified by iterating the Box-Cox transformation parameter (λ) and selecting the parameter that maximized the Shapiro-Wilk P-value. Optimal transformations were identified for tumor SUV max distributions at both pre and post treatment. This study included 57 patients that underwent 18 F-fluorodeoxyglucose ( 18 F-FDG) PET scans (publically available dataset). In addition, to test the generality of our transformation methodology, we included analysis of 27 patients that underwent 18 F-Fluorothymidine ( 18 F-FLT) PET scans at our institution. After applying the optimal Box-Cox transformations, neither the pre nor the post treatment 18 F-FDG SUV distributions deviated significantly from normality (P > 0.10). Similar results were found for 18 F-FLT PET SUV distributions (P > 0.10). For both 18 F-FDG and 18 F-FLT SUV distributions, the skewness and kurtosis increased from pre to post treatment, leading to a decrease in the optimal Box-Cox transformation parameter from pre to post treatment. There were types of distributions encountered for both 18 F-FDG and 18 F-FLT where a log transformation was not optimal for providing normal SUV distributions. Optimization of the Box-Cox transformation, offers a solution for identifying normal SUV transformations for when the log transformation is insufficient. The log transformation is not always the appropriate transformation for producing normally distributed PET SUVs.
Optimal transformations leading to normal distributions of positron emission tomography standardized uptake values

NASA Astrophysics Data System (ADS)

Scarpelli, Matthew; Eickhoff, Jens; Cuna, Enrique; Perlman, Scott; Jeraj, Robert

2018-02-01

The statistical analysis of positron emission tomography (PET) standardized uptake value (SUV) measurements is challenging due to the skewed nature of SUV distributions. This limits utilization of powerful parametric statistical models for analyzing SUV measurements. An ad-hoc approach, which is frequently used in practice, is to blindly use a log transformation, which may or may not result in normal SUV distributions. This study sought to identify optimal transformations leading to normally distributed PET SUVs extracted from tumors and assess the effects of therapy on the optimal transformations. Methods. The optimal transformation for producing normal distributions of tumor SUVs was identified by iterating the Box-Cox transformation parameter (λ) and selecting the parameter that maximized the Shapiro-Wilk P-value. Optimal transformations were identified for tumor SUVmax distributions at both pre and post treatment. This study included 57 patients that underwent 18F-fluorodeoxyglucose (18F-FDG) PET scans (publically available dataset). In addition, to test the generality of our transformation methodology, we included analysis of 27 patients that underwent 18F-Fluorothymidine (18F-FLT) PET scans at our institution. Results. After applying the optimal Box-Cox transformations, neither the pre nor the post treatment 18F-FDG SUV distributions deviated significantly from normality (P > 0.10). Similar results were found for 18F-FLT PET SUV distributions (P > 0.10). For both 18F-FDG and 18F-FLT SUV distributions, the skewness and kurtosis increased from pre to post treatment, leading to a decrease in the optimal Box-Cox transformation parameter from pre to post treatment. There were types of distributions encountered for both 18F-FDG and 18F-FLT where a log transformation was not optimal for providing normal SUV distributions. Conclusion. Optimization of the Box-Cox transformation, offers a solution for identifying normal SUV transformations for when the log transformation is insufficient. The log transformation is not always the appropriate transformation for producing normally distributed PET SUVs.
Study on probability distribution of prices in electricity market: A case study of zhejiang province, china

NASA Astrophysics Data System (ADS)

Zhou, H.; Chen, B.; Han, Z. X.; Zhang, F. Q.

2009-05-01

The study on probability density function and distribution function of electricity prices contributes to the power suppliers and purchasers to estimate their own management accurately, and helps the regulator monitor the periods deviating from normal distribution. Based on the assumption of normal distribution load and non-linear characteristic of the aggregate supply curve, this paper has derived the distribution of electricity prices as the function of random variable of load. The conclusion has been validated with the electricity price data of Zhejiang market. The results show that electricity prices obey normal distribution approximately only when supply-demand relationship is loose, whereas the prices deviate from normal distribution and present strong right-skewness characteristic. Finally, the real electricity markets also display the narrow-peak characteristic when undersupply occurs.
A closer look at the effect of preliminary goodness-of-fit testing for normality for the one-sample t-test.

PubMed

Rochon, Justine; Kieser, Meinhard

2011-11-01

Student's one-sample t-test is a commonly used method when inference about the population mean is made. As advocated in textbooks and articles, the assumption of normality is often checked by a preliminary goodness-of-fit (GOF) test. In a paper recently published by Schucany and Ng it was shown that, for the uniform distribution, screening of samples by a pretest for normality leads to a more conservative conditional Type I error rate than application of the one-sample t-test without preliminary GOF test. In contrast, for the exponential distribution, the conditional level is even more elevated than the Type I error rate of the t-test without pretest. We examine the reasons behind these characteristics. In a simulation study, samples drawn from the exponential, lognormal, uniform, Student's t-distribution with 2 degrees of freedom (t(2) ) and the standard normal distribution that had passed normality screening, as well as the ingredients of the test statistics calculated from these samples, are investigated. For non-normal distributions, we found that preliminary testing for normality may change the distribution of means and standard deviations of the selected samples as well as the correlation between them (if the underlying distribution is non-symmetric), thus leading to altered distributions of the resulting test statistics. It is shown that for skewed distributions the excess in Type I error rate may be even more pronounced when testing one-sided hypotheses. ©2010 The British Psychological Society.
Transformation of arbitrary distributions to the normal distribution with application to EEG test-retest reliability.

PubMed

van Albada, S J; Robinson, P A

2007-04-15

Many variables in the social, physical, and biosciences, including neuroscience, are non-normally distributed. To improve the statistical properties of such data, or to allow parametric testing, logarithmic or logit transformations are often used. Box-Cox transformations or ad hoc methods are sometimes used for parameters for which no transformation is known to approximate normality. However, these methods do not always give good agreement with the Gaussian. A transformation is discussed that maps probability distributions as closely as possible to the normal distribution, with exact agreement for continuous distributions. To illustrate, the transformation is applied to a theoretical distribution, and to quantitative electroencephalographic (qEEG) measures from repeat recordings of 32 subjects which are highly non-normal. Agreement with the Gaussian was better than using logarithmic, logit, or Box-Cox transformations. Since normal data have previously been shown to have better test-retest reliability than non-normal data under fairly general circumstances, the implications of our transformation for the test-retest reliability of parameters were investigated. Reliability was shown to improve with the transformation, where the improvement was comparable to that using Box-Cox. An advantage of the general transformation is that it does not require laborious optimization over a range of parameters or a case-specific choice of form.
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION

EPA Science Inventory

Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Hybrid least squares multivariate spectral analysis methods

DOEpatents

Haaland, David M.

2004-03-23

A set of hybrid least squares multivariate spectral analysis methods in which spectral shapes of components or effects not present in the original calibration step are added in a following prediction or calibration step to improve the accuracy of the estimation of the amount of the original components in the sampled mixture. The hybrid method herein means a combination of an initial calibration step with subsequent analysis by an inverse multivariate analysis method. A spectral shape herein means normally the spectral shape of a non-calibrated chemical component in the sample mixture but can also mean the spectral shapes of other sources of spectral variation, including temperature drift, shifts between spectrometers, spectrometer drift, etc. The shape can be continuous, discontinuous, or even discrete points illustrative of the particular effect.
Hybrid least squares multivariate spectral analysis methods

DOEpatents

Haaland, David M.

2002-01-01

A set of hybrid least squares multivariate spectral analysis methods in which spectral shapes of components or effects not present in the original calibration step are added in a following estimation or calibration step to improve the accuracy of the estimation of the amount of the original components in the sampled mixture. The "hybrid" method herein means a combination of an initial classical least squares analysis calibration step with subsequent analysis by an inverse multivariate analysis method. A "spectral shape" herein means normally the spectral shape of a non-calibrated chemical component in the sample mixture but can also mean the spectral shapes of other sources of spectral variation, including temperature drift, shifts between spectrometers, spectrometer drift, etc. The "shape" can be continuous, discontinuous, or even discrete points illustrative of the particular effect.

Central Body Fat Distribution Associates with Unfavorable Renal Hemodynamics Independent of Body Mass Index

PubMed Central

Zelle, Dorien M.; Bakker, Stephan J.L.; Navis, Gerjan

2013-01-01

Central distribution of body fat is associated with a higher risk of renal disease, but whether it is the distribution pattern or the overall excess weight that underlies this association is not well understood. Here, we studied the association between waist-to-hip ratio (WHR), which reflects central adiposity, and renal hemodynamics in 315 healthy persons with a mean body mass index (BMI) of 24.9 kg/m2 and a mean 125I-iothalamate GFR of 109 ml/min per 1.73 m2. In multivariate analyses, WHR was associated with lower GFR, lower effective renal plasma flow, and higher filtration fraction, even after adjustment for sex, age, mean arterial pressure, and BMI. Multivariate models produced similar results regardless of whether the hemodynamic measures were indexed to body surface area. Thus, these results suggest that central body fat distribution, independent of BMI, is associated with an unfavorable pattern of renal hemodynamic measures that could underlie the increased renal risk reported in observational studies. PMID:23578944
General models for the distributions of electric field gradients in disordered solids

NASA Astrophysics Data System (ADS)

LeCaër, G.; Brand, R. A.

1998-11-01

Hyperfine studies of disordered materials often yield the distribution of the electric field gradient (EFG) or related quadrupole splitting (QS). The question of the structural information that may be extracted from such distributions has been considered for more than fifteen years. Experimentally most studies have been performed using Mössbauer spectroscopy, especially on 0953-8984/10/47/020/img5. However, NMR, NQR, EPR and PAC methods have also received some attention. The EFG distribution for a random distribution of electric charges was for instance first investigated by Czjzek et al [1] and a general functional form was derived for the joint (bivariate) distribution of the principal EFG tensor component 0953-8984/10/47/020/img6 and the asymmetry parameter 0953-8984/10/47/020/img7. The importance of the Gauss distribution for such rotationally invariant structural models was thus evidenced. Extensions of that model which are based on degenerate multivariate Gauss distributions for the elements of the EFG tensor were proposed by Czjzek. The latter extensions have been used since that time, more particularly in Mössbauer spectroscopy, under the name `shell models'. The mathematical foundations of all the previous models are presented and critically discussed as they are evidenced by simple calculations in the case of the EFG tensor. The present article only focuses on those aspects of the EFG distribution in disordered solids which can be discussed without explicitly looking at particular physical mechanisms. We present studies of three different model systems. A reference model directly related to the first model of Czjzek, called the Gaussian isotropic model (GIM), is shown to be the limiting case for many different models with a large number of independent contributions to the EFG tensor and not restricted to a point-charge model. The extended validity of the marginal distribution of 0953-8984/10/47/020/img7 in the GIM model is discussed. It is also shown that the second model based on degenerate multivariate normal distributions for the EFG components yields questionable results and has been exaggeratedly used in experimental studies. The latter models are further discussed in the light of new results. The problems raised by these extensions are due to the fact that the consequences of the statistical invariance by rotation of the EFG tensor have not been sufficiently taken into account. Further difficulties arise because the structural degrees of freedom of the disordered solid under consideration have been confused with the degrees of freedom of QS distributions. The relations which are derived and discussed are further illustrated by the case of the EFG tensor distribution created at the centre of a sphere by m charges randomly distributed on its surface. The third model, a simple extension of the GIM, considers the case of an EFG tensor which is the sum of a fixed part and of a random part with variable weights. The bivariate distribution 0953-8984/10/47/020/img9 is calculated exactly in the most symmetric case and the effect of the random part is investigated as a function of its weight. The various models are more particularly discussed in connection with short-range order in disordered solids. An ambiguity problem which arises in the evaluation of bivariate distributions of centre lineshift (isomer shift) and quadrupole splitting from 0953-8984/10/47/020/img10 Mössbauer spectra is finally quantitatively considered.
Dichotomisation using a distributional approach when the outcome is skewed.

PubMed

Sauzet, Odile; Ofuya, Mercy; Peacock, Janet L

2015-04-24

Dichotomisation of continuous outcomes has been rightly criticised by statisticians because of the loss of information incurred. However to communicate a comparison of risks, dichotomised outcomes may be necessary. Peacock et al. developed a distributional approach to the dichotomisation of normally distributed outcomes allowing the presentation of a comparison of proportions with a measure of precision which reflects the comparison of means. Many common health outcomes are skewed so that the distributional method for the dichotomisation of continuous outcomes may not apply. We present a methodology to obtain dichotomised outcomes for skewed variables illustrated with data from several observational studies. We also report the results of a simulation study which tests the robustness of the method to deviation from normality and assess the validity of the newly developed method. The review showed that the pattern of dichotomisation was varying between outcomes. Birthweight, Blood pressure and BMI can either be transformed to normal so that normal distributional estimates for a comparison of proportions can be obtained or better, the skew-normal method can be used. For gestational age, no satisfactory transformation is available and only the skew-normal method is reliable. The normal distributional method is reliable also when there are small deviations from normality. The distributional method with its applicability for common skewed data allows researchers to provide both continuous and dichotomised estimates without losing information or precision. This will have the effect of providing a practical understanding of the difference in means in terms of proportions.
Combining Frequency Doubling Technology Perimetry and Scanning Laser Polarimetry for Glaucoma Detection

PubMed Central

Mwanza, Jean-Claude; Warren, Joshua L.; Hochberg, Jessica T.; Budenz, Donald L.; Chang, Robert T.; Ramulu, Pradeep Y.

2014-01-01

Purpose To determine the ability of frequency doubling technology (FDT) and scanning laser polarimetry with variable corneal compensation (GDx-VCC) to detect glaucoma when used individually and in combination. Methods One hundred and ten normal and 114 glaucomatous subjects were tested with FDT C-20-5 screening protocol and the GDx-VCC. The discriminating ability was tested for each device individually and for both devices combined using GDx-NFI, GDx-TSNIT, number of missed points of FDT, and normal or abnormal FDT. Measures of discrimination included sensitivity, specificity, area under the curve (AUC), Akaike’s information criterion (AIC), and prediction confidence interval lengths (PIL). Results For detecting glaucoma regardless of severity, the multivariable model resulting from the combination of GDX-TSNIT, number of abnormal points on FDT (NAP-FDT), and the interaction GDx-TSNIT * NAP-FDT (AIC: 88.28, AUC: 0.959, sensitivity: 94.6%, specificity: 89.5%) outperformed the best single variable model provided by GDx-NFI (AIC: 120.88, AUC: 0.914, sensitivity: 87.8%, specificity: 84.2%). The multivariable model combining GDx-TSNIT, NAPFDT, and interaction GDx-TSNIT*NAP-FDT consistently provided better discriminating abilities for detecting early, moderate and severe glaucoma than the best single variable models. Conclusions The multivariable model including GDx-TSNIT, NAP-FDT, and the interaction GDX-TSNIT * NAP-FDT provides the best glaucoma prediction compared to all other multivariable and univariable models. Combining the FDT C-20-5 screening protocol and GDx-VCC improves glaucoma detection compared to using GDx or FDT alone. PMID:24777046
The distribution of the intervals between neural impulses in the maintained discharges of retinal ganglion cells.

PubMed

Levine, M W

1991-01-01

Simulated neural impulse trains were generated by a digital realization of the integrate-and-fire model. The variability in these impulse trains had as its origin a random noise of specified distribution. Three different distributions were used: the normal (Gaussian) distribution (no skew, normokurtic), a first-order gamma distribution (positive skew, leptokurtic), and a uniform distribution (no skew, platykurtic). Despite these differences in the distribution of the variability, the distributions of the intervals between impulses were nearly indistinguishable. These inter-impulse distributions were better fit with a hyperbolic gamma distribution than a hyperbolic normal distribution, although one might expect a better approximation for normally distributed inverse intervals. Consideration of why the inter-impulse distribution is independent of the distribution of the causative noise suggests two putative interval distributions that do not depend on the assumed noise distribution: the log normal distribution, which is predicated on the assumption that long intervals occur with the joint probability of small input values, and the random walk equation, which is the diffusion equation applied to a random walk model of the impulse generating process. Either of these equations provides a more satisfactory fit to the simulated impulse trains than the hyperbolic normal or hyperbolic gamma distributions. These equations also provide better fits to impulse trains derived from the maintained discharges of ganglion cells in the retinae of cats or goldfish. It is noted that both equations are free from the constraint that the coefficient of variation (CV) have a maximum of unity.(ABSTRACT TRUNCATED AT 250 WORDS)
Geotechnical parameter spatial distribution stochastic analysis based on multi-precision information assimilation

NASA Astrophysics Data System (ADS)

Wang, C.; Rubin, Y.

2014-12-01

Spatial distribution of important geotechnical parameter named compression modulus Es contributes considerably to the understanding of the underlying geological processes and the adequate assessment of the Es mechanics effects for differential settlement of large continuous structure foundation. These analyses should be derived using an assimilating approach that combines in-situ static cone penetration test (CPT) with borehole experiments. To achieve such a task, the Es distribution of stratum of silty clay in region A of China Expo Center (Shanghai) is studied using the Bayesian-maximum entropy method. This method integrates rigorously and efficiently multi-precision of different geotechnical investigations and sources of uncertainty. Single CPT samplings were modeled as a rational probability density curve by maximum entropy theory. Spatial prior multivariate probability density function (PDF) and likelihood PDF of the CPT positions were built by borehole experiments and the potential value of the prediction point, then, preceding numerical integration on the CPT probability density curves, the posterior probability density curve of the prediction point would be calculated by the Bayesian reverse interpolation framework. The results were compared between Gaussian Sequential Stochastic Simulation and Bayesian methods. The differences were also discussed between single CPT samplings of normal distribution and simulated probability density curve based on maximum entropy theory. It is shown that the study of Es spatial distributions can be improved by properly incorporating CPT sampling variation into interpolation process, whereas more informative estimations are generated by considering CPT Uncertainty for the estimation points. Calculation illustrates the significance of stochastic Es characterization in a stratum, and identifies limitations associated with inadequate geostatistical interpolation techniques. This characterization results will provide a multi-precision information assimilation method of other geotechnical parameters.
Integrating remote sensing with species distribution models; Mapping tamarisk invasions using the Software for Assisted Habitat Modeling (SAHM)

USGS Publications Warehouse

West, Amanda M.; Evangelista, Paul H.; Jarnevich, Catherine S.; Young, Nicholas E.; Stohlgren, Thomas J.; Talbert, Colin; Talbert, Marian; Morisette, Jeffrey; Anderson, Ryan

2016-01-01

Early detection of invasive plant species is vital for the management of natural resources and protection of ecosystem processes. The use of satellite remote sensing for mapping the distribution of invasive plants is becoming more common, however conventional imaging software and classification methods have been shown to be unreliable. In this study, we test and evaluate the use of five species distribution model techniques fit with satellite remote sensing data to map invasive tamarisk (Tamarix spp.) along the Arkansas River in Southeastern Colorado. The models tested included boosted regression trees (BRT), Random Forest (RF), multivariate adaptive regression splines (MARS), generalized linear model (GLM), and Maxent. These analyses were conducted using a newly developed software package called the Software for Assisted Habitat Modeling (SAHM). All models were trained with 499 presence points, 10,000 pseudo-absence points, and predictor variables acquired from the Landsat 5 Thematic Mapper (TM) sensor over an eight-month period to distinguish tamarisk from native riparian vegetation using detection of phenological differences. From the Landsat scenes, we used individual bands and calculated Normalized Difference Vegetation Index (NDVI), Soil-Adjusted Vegetation Index (SAVI), and tasseled capped transformations. All five models identified current tamarisk distribution on the landscape successfully based on threshold independent and threshold dependent evaluation metrics with independent location data. To account for model specific differences, we produced an ensemble of all five models with map output highlighting areas of agreement and areas of uncertainty. Our results demonstrate the usefulness of species distribution models in analyzing remotely sensed data and the utility of ensemble mapping, and showcase the capability of SAHM in pre-processing and executing multiple complex models.
Integrating Remote Sensing with Species Distribution Models; Mapping Tamarisk Invasions Using the Software for Assisted Habitat Modeling (SAHM).

PubMed

West, Amanda M; Evangelista, Paul H; Jarnevich, Catherine S; Young, Nicholas E; Stohlgren, Thomas J; Talbert, Colin; Talbert, Marian; Morisette, Jeffrey; Anderson, Ryan

2016-10-11

Early detection of invasive plant species is vital for the management of natural resources and protection of ecosystem processes. The use of satellite remote sensing for mapping the distribution of invasive plants is becoming more common, however conventional imaging software and classification methods have been shown to be unreliable. In this study, we test and evaluate the use of five species distribution model techniques fit with satellite remote sensing data to map invasive tamarisk (Tamarix spp.) along the Arkansas River in Southeastern Colorado. The models tested included boosted regression trees (BRT), Random Forest (RF), multivariate adaptive regression splines (MARS), generalized linear model (GLM), and Maxent. These analyses were conducted using a newly developed software package called the Software for Assisted Habitat Modeling (SAHM). All models were trained with 499 presence points, 10,000 pseudo-absence points, and predictor variables acquired from the Landsat 5 Thematic Mapper (TM) sensor over an eight-month period to distinguish tamarisk from native riparian vegetation using detection of phenological differences. From the Landsat scenes, we used individual bands and calculated Normalized Difference Vegetation Index (NDVI), Soil-Adjusted Vegetation Index (SAVI), and tasseled capped transformations. All five models identified current tamarisk distribution on the landscape successfully based on threshold independent and threshold dependent evaluation metrics with independent location data. To account for model specific differences, we produced an ensemble of all five models with map output highlighting areas of agreement and areas of uncertainty. Our results demonstrate the usefulness of species distribution models in analyzing remotely sensed data and the utility of ensemble mapping, and showcase the capability of SAHM in pre-processing and executing multiple complex models.
New spatial upscaling methods for multi-point measurements: From normal to p-normal

NASA Astrophysics Data System (ADS)

Liu, Feng; Li, Xin

2017-12-01

Careful attention must be given to determining whether the geophysical variables of interest are normally distributed, since the assumption of a normal distribution may not accurately reflect the probability distribution of some variables. As a generalization of the normal distribution, the p-normal distribution and its corresponding maximum likelihood estimation (the least power estimation, LPE) were introduced in upscaling methods for multi-point measurements. Six methods, including three normal-based methods, i.e., arithmetic average, least square estimation, block kriging, and three p-normal-based methods, i.e., LPE, geostatistics LPE and inverse distance weighted LPE are compared in two types of experiments: a synthetic experiment to evaluate the performance of the upscaling methods in terms of accuracy, stability and robustness, and a real-world experiment to produce real-world upscaling estimates using soil moisture data obtained from multi-scale observations. The results show that the p-normal-based methods produced lower mean absolute errors and outperformed the other techniques due to their universality and robustness. We conclude that introducing appropriate statistical parameters into an upscaling strategy can substantially improve the estimation, especially if the raw measurements are disorganized; however, further investigation is required to determine which parameter is the most effective among variance, spatial correlation information and parameter p.
Is Coefficient Alpha Robust to Non-Normal Data?

PubMed Central

Sheng, Yanyan; Sheng, Zhaohui

2011-01-01

Coefficient alpha has been a widely used measure by which internal consistency reliability is assessed. In addition to essential tau-equivalence and uncorrelated errors, normality has been noted as another important assumption for alpha. Earlier work on evaluating this assumption considered either exclusively non-normal error score distributions, or limited conditions. In view of this and the availability of advanced methods for generating univariate non-normal data, Monte Carlo simulations were conducted to show that non-normal distributions for true or error scores do create problems for using alpha to estimate the internal consistency reliability. The sample coefficient alpha is affected by leptokurtic true score distributions, or skewed and/or kurtotic error score distributions. Increased sample sizes, not test lengths, help improve the accuracy, bias, or precision of using it with non-normal data. PMID:22363306
A random effects meta-analysis model with Box-Cox transformation.

PubMed

Yamaguchi, Yusuke; Maruo, Kazushi; Partlett, Christopher; Riley, Richard D

2017-07-19

In a random effects meta-analysis model, true treatment effects for each study are routinely assumed to follow a normal distribution. However, normality is a restrictive assumption and the misspecification of the random effects distribution may result in a misleading estimate of overall mean for the treatment effect, an inappropriate quantification of heterogeneity across studies and a wrongly symmetric prediction interval. We focus on problems caused by an inappropriate normality assumption of the random effects distribution, and propose a novel random effects meta-analysis model where a Box-Cox transformation is applied to the observed treatment effect estimates. The proposed model aims to normalise an overall distribution of observed treatment effect estimates, which is sum of the within-study sampling distributions and the random effects distribution. When sampling distributions are approximately normal, non-normality in the overall distribution will be mainly due to the random effects distribution, especially when the between-study variation is large relative to the within-study variation. The Box-Cox transformation addresses this flexibly according to the observed departure from normality. We use a Bayesian approach for estimating parameters in the proposed model, and suggest summarising the meta-analysis results by an overall median, an interquartile range and a prediction interval. The model can be applied for any kind of variables once the treatment effect estimate is defined from the variable. A simulation study suggested that when the overall distribution of treatment effect estimates are skewed, the overall mean and conventional I 2 from the normal random effects model could be inappropriate summaries, and the proposed model helped reduce this issue. We illustrated the proposed model using two examples, which revealed some important differences on summary results, heterogeneity measures and prediction intervals from the normal random effects model. The random effects meta-analysis with the Box-Cox transformation may be an important tool for examining robustness of traditional meta-analysis results against skewness on the observed treatment effect estimates. Further critical evaluation of the method is needed.
On Nonequivalence of Several Procedures of Structural Equation Modeling

ERIC Educational Resources Information Center

Yuan, Ke-Hai; Chan, Wai

2005-01-01

The normal theory based maximum likelihood procedure is widely used in structural equation modeling. Three alternatives are: the normal theory based generalized least squares, the normal theory based iteratively reweighted least squares, and the asymptotically distribution-free procedure. When data are normally distributed and the model structure…
Robustness of location estimators under t-distributions: a literature review

NASA Astrophysics Data System (ADS)

Sumarni, C.; Sadik, K.; Notodiputro, K. A.; Sartono, B.

2017-03-01

The assumption of normality is commonly used in estimation of parameters in statistical modelling, but this assumption is very sensitive to outliers. The t-distribution is more robust than the normal distribution since the t-distributions have longer tails. The robustness measures of location estimators under t-distributions are reviewed and discussed in this paper. For the purpose of illustration we use the onion yield data which includes outliers as a case study and showed that the t model produces better fit than the normal model.
A short note on the maximal point-biserial correlation under non-normality.

PubMed

Cheng, Ying; Liu, Haiyan

2016-11-01

The aim of this paper is to derive the maximal point-biserial correlation under non-normality. Several widely used non-normal distributions are considered, namely the uniform distribution, t-distribution, exponential distribution, and a mixture of two normal distributions. Results show that the maximal point-biserial correlation, depending on the non-normal continuous variable underlying the binary manifest variable, may not be a function of p (the probability that the dichotomous variable takes the value 1), can be symmetric or non-symmetric around p = .5, and may still lie in the range from -1.0 to 1.0. Therefore researchers should exercise caution when they interpret their sample point-biserial correlation coefficients based on popular beliefs that the maximal point-biserial correlation is always smaller than 1, and that the size of the correlation is always further restricted as p deviates from .5. © 2016 The British Psychological Society.
Imputation of missing data in time series for air pollutants

NASA Astrophysics Data System (ADS)

Junger, W. L.; Ponce de Leon, A.

2015-02-01

Missing data are major concerns in epidemiological studies of the health effects of environmental air pollutants. This article presents an imputation-based method that is suitable for multivariate time series data, which uses the EM algorithm under the assumption of normal distribution. Different approaches are considered for filtering the temporal component. A simulation study was performed to assess validity and performance of proposed method in comparison with some frequently used methods. Simulations showed that when the amount of missing data was as low as 5%, the complete data analysis yielded satisfactory results regardless of the generating mechanism of the missing data, whereas the validity began to degenerate when the proportion of missing values exceeded 10%. The proposed imputation method exhibited good accuracy and precision in different settings with respect to the patterns of missing observations. Most of the imputations obtained valid results, even under missing not at random. The methods proposed in this study are implemented as a package called mtsdi for the statistical software system R.
Physical Activity and Blood Lead Concentration in Korea: Study Using the Korea National Health and Nutrition Examination Survey (2008-2013).

PubMed

Rhie, Jeongbae; Lee, Hye-Eun

2016-06-01

Physical activity normally has a positive influence on health, however it can be detrimental in the presence of air pollution. Lead, a heavy metal with established adverse health effects, is a major air pollutant. We evaluated the correlation between blood lead concentration and physical activity using data collected from the Korea National Health and Nutrition Examination Survey. Multivariate logistic regression analysis was performed after dividing participants according to whether they were in the top 25% in the distribution of blood lead concentration (i.e., ≥ 2.76 µg/dL), with physical activity level as an independent variable and adjusting for factors such as age, sex, drinking, smoking, body mass index, region, and occupation. The high physical activity group had greater odds of having a blood lead concentration higher than 2.76 µg/dL (odds ratio 1.29, 95% CI 1.11-1.51) compared to the low physical activity group. Furthermore, blood lead concentration is correlated with increasing physical activity.
Pu239 Cross-Section Variations Based on Experimental Uncertainties and Covariances

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sigeti, David Edward; Williams, Brian J.; Parsons, D. Kent

2016-10-18

Algorithms and software have been developed for producing variations in plutonium-239 neutron cross sections based on experimental uncertainties and covariances. The varied cross-section sets may be produced as random samples from the multi-variate normal distribution defined by an experimental mean vector and covariance matrix, or they may be produced as Latin-Hypercube/Orthogonal-Array samples (based on the same means and covariances) for use in parametrized studies. The variations obey two classes of constraints that are obligatory for cross-section sets and which put related constraints on the mean vector and covariance matrix that detemine the sampling. Because the experimental means and covariances domore » not obey some of these constraints to sufficient precision, imposing the constraints requires modifying the experimental mean vector and covariance matrix. Modification is done with an algorithm based on linear algebra that minimizes changes to the means and covariances while insuring that the operations that impose the different constraints do not conflict with each other.« less
The Multivariate Largest Lyapunov Exponent as an Age-Related Metric of Quiet Standing Balance

PubMed Central

Liu, Kun; Wang, Hongrui; Xiao, Jinzhuang

2015-01-01

The largest Lyapunov exponent has been researched as a metric of the balance ability during human quiet standing. However, the sensitivity and accuracy of this measurement method are not good enough for clinical use. The present research proposes a metric of the human body's standing balance ability based on the multivariate largest Lyapunov exponent which can quantify the human standing balance. The dynamic multivariate time series of ankle, knee, and hip were measured by multiple electrical goniometers. Thirty-six normal people of different ages participated in the test. With acquired data, the multivariate largest Lyapunov exponent was calculated. Finally, the results of the proposed approach were analysed and compared with the traditional method, for which the largest Lyapunov exponent and power spectral density from the centre of pressure were also calculated. The following conclusions can be obtained. The multivariate largest Lyapunov exponent has a higher degree of differentiation in differentiating balance in eyes-closed conditions. The MLLE value reflects the overall coordination between multisegment movements. Individuals of different ages can be distinguished by their MLLE values. The standing stability of human is reduced with the increment of age. PMID:26064182
Empirical analysis on the runners' velocity distribution in city marathons

NASA Astrophysics Data System (ADS)

Lin, Zhenquan; Meng, Fan

2018-01-01

In recent decades, much researches have been performed on human temporal activity and mobility patterns, while few investigations have been made to examine the features of the velocity distributions of human mobility patterns. In this paper, we investigated empirically the velocity distributions of finishers in New York City marathon, American Chicago marathon, Berlin marathon and London marathon. By statistical analyses on the datasets of the finish time records, we captured some statistical features of human behaviors in marathons: (1) The velocity distributions of all finishers and of partial finishers in the fastest age group both follow log-normal distribution; (2) In the New York City marathon, the velocity distribution of all male runners in eight 5-kilometer internal timing courses undergoes two transitions: from log-normal distribution at the initial stage (several initial courses) to the Gaussian distribution at the middle stage (several middle courses), and to log-normal distribution at the last stage (several last courses); (3) The intensity of the competition, which is described by the root-mean-square value of the rank changes of all runners, goes weaker from initial stage to the middle stage corresponding to the transition of the velocity distribution from log-normal distribution to Gaussian distribution, and when the competition gets stronger in the last course of the middle stage, there will come a transition from Gaussian distribution to log-normal one at last stage. This study may enrich the researches on human mobility patterns and attract attentions on the velocity features of human mobility.
Analysis of Blood Glucose Distribution Characteristics and Its Risk Factors among a Health Examination Population in Wuhu (China)

PubMed Central

Song, Jiangen; Zha, Xiaojuan; Li, Haibo; Guo, Rui; Zhu, Yu; Wen, Yufeng

2016-01-01

Background: Diabetes mellitus (DM) and Impaired Fasting Glucose (IFG) represent serious threats to human health, and as a result, this study was aimed at understanding the blood glucose distribution characteristics and the risk factors among a large health examination population in China. Methods: An investigation with physical and biochemical examinations and questionnaires was conducted in the physical examination center from 2011 to 2014 and as a result 175,122 physical examination attendees were enrolled in this study. Multivariate logistic regression was used to explore the factors influencing blood sugar levels. Results: The rates of IFG and DM were 6.0% and 3.8%. Prevalence were 7.6%/5.1% in males and 5.1%/2.8% in females for IFG and DM, respectively. The prevalence of IFG and DM were thus higher in males than in females. In the normal group, except high density lipoprotein (HDL) that was significantly higher than in the IFG and DM group, the other indexes (age, body mass index (BMI), glucose (Glu), total cholesterol (TC) and total glycerides (TG) were lower than those in the IFG and DM group. The proportion of IFG and DM also increased with the increases in proportion of abnormal blood pressure, smoking and alcohol consumption. Multivariate logistic regression analysis showed that increasing age, high BMI, high TC, high TG and low HDL increased the risk of diabetes, while in males, in addition to the above factors, the smoking and drinking factors also increased the risk of diabetes. After the age of 65, the blood glucose level reached a peak in males, while in females, the increasing trends was on the rise. The inflexion age of the fast rise was younger in males than in females. Conclusion: The study population showed a high prevalence of DM and IFG among the adults. Regular physical examination for the early detection of diabetes is recommended in the high-risk population. PMID:27043603

Analysis of Blood Glucose Distribution Characteristics and Its Risk Factors among a Health Examination Population in Wuhu (China).

PubMed

Song, Jiangen; Zha, Xiaojuan; Li, Haibo; Guo, Rui; Zhu, Yu; Wen, Yufeng

2016-03-31

Diabetes mellitus (DM) and Impaired Fasting Glucose (IFG) represent serious threats to human health, and as a result, this study was aimed at understanding the blood glucose distribution characteristics and the risk factors among a large health examination population in China. An investigation with physical and biochemical examinations and questionnaires was conducted in the physical examination center from 2011 to 2014 and as a result 175,122 physical examination attendees were enrolled in this study. Multivariate logistic regression was used to explore the factors influencing blood sugar levels. The rates of IFG and DM were 6.0% and 3.8%. Prevalence were 7.6%/5.1% in males and 5.1%/2.8% in females for IFG and DM, respectively. The prevalence of IFG and DM were thus higher in males than in females. In the normal group, except high density lipoprotein (HDL) that was significantly higher than in the IFG and DM group, the other indexes (age, body mass index (BMI), glucose (Glu), total cholesterol (TC) and total glycerides (TG) were lower than those in the IFG and DM group. The proportion of IFG and DM also increased with the increases in proportion of abnormal blood pressure, smoking and alcohol consumption. Multivariate logistic regression analysis showed that increasing age, high BMI, high TC, high TG and low HDL increased the risk of diabetes, while in males, in addition to the above factors, the smoking and drinking factors also increased the risk of diabetes. After the age of 65, the blood glucose level reached a peak in males, while in females, the increasing trends was on the rise. The inflexion age of the fast rise was younger in males than in females. The study population showed a high prevalence of DM and IFG among the adults. Regular physical examination for the early detection of diabetes is recommended in the high-risk population.
Hyperconnectivity in juvenile myoclonic epilepsy: a network analysis.

PubMed

Caeyenberghs, K; Powell, H W R; Thomas, R H; Brindley, L; Church, C; Evans, J; Muthukumaraswamy, S D; Jones, D K; Hamandi, K

2015-01-01

Juvenile myoclonic epilepsy (JME) is a common idiopathic (genetic) generalized epilepsy (IGE) syndrome characterized by impairments in executive and cognitive control, affecting independent living and psychosocial functioning. There is a growing consensus that JME is associated with abnormal function of diffuse brain networks, typically affecting frontal and fronto-thalamic areas. Using diffusion MRI and a graph theoretical analysis, we examined bivariate (network-based statistic) and multivariate (global and local) properties of structural brain networks in patients with JME (N = 34) and matched controls. Neuropsychological assessment was performed in a subgroup of 14 patients. Neuropsychometry revealed impaired visual memory and naming in JME patients despite a normal full scale IQ (mean = 98.6). Both JME patients and controls exhibited a small world topology in their white matter networks, with no significant differences in the global multivariate network properties between the groups. The network-based statistic approach identified one subnetwork of hyperconnectivity in the JME group, involving primary motor, parietal and subcortical regions. Finally, there was a significant positive correlation in structural connectivity with cognitive task performance. Our findings suggest that structural changes in JME patients are distributed at a network level, beyond the frontal lobes. The identified subnetwork includes key structures in spike wave generation, along with primary motor areas, which may contribute to myoclonic jerks. We conclude that analyzing the affected subnetworks may provide new insights into understanding seizure generation, as well as the cognitive deficits observed in JME patients.
Hyperconnectivity in juvenile myoclonic epilepsy: A network analysis

PubMed Central

Caeyenberghs, K.; Powell, H.W.R.; Thomas, R.H.; Brindley, L.; Church, C.; Evans, J.; Muthukumaraswamy, S.D.; Jones, D.K.; Hamandi, K.

2014-01-01

Objective Juvenile myoclonic epilepsy (JME) is a common idiopathic (genetic) generalized epilepsy (IGE) syndrome characterized by impairments in executive and cognitive control, affecting independent living and psychosocial functioning. There is a growing consensus that JME is associated with abnormal function of diffuse brain networks, typically affecting frontal and fronto-thalamic areas. Methods Using diffusion MRI and a graph theoretical analysis, we examined bivariate (network-based statistic) and multivariate (global and local) properties of structural brain networks in patients with JME (N = 34) and matched controls. Neuropsychological assessment was performed in a subgroup of 14 patients. Results Neuropsychometry revealed impaired visual memory and naming in JME patients despite a normal full scale IQ (mean = 98.6). Both JME patients and controls exhibited a small world topology in their white matter networks, with no significant differences in the global multivariate network properties between the groups. The network-based statistic approach identified one subnetwork of hyperconnectivity in the JME group, involving primary motor, parietal and subcortical regions. Finally, there was a significant positive correlation in structural connectivity with cognitive task performance. Conclusions Our findings suggest that structural changes in JME patients are distributed at a network level, beyond the frontal lobes. The identified subnetwork includes key structures in spike wave generation, along with primary motor areas, which may contribute to myoclonic jerks. We conclude that analyzing the affected subnetworks may provide new insights into understanding seizure generation, as well as the cognitive deficits observed in JME patients. PMID:25610771
Multisite-multivariable sensitivity analysis of distributed watershed models: enhancing the perceptions from computationally frugal methods

USDA-ARS?s Scientific Manuscript database

This paper assesses the impact of different likelihood functions in identifying sensitive parameters of the highly parameterized, spatially distributed Soil and Water Assessment Tool (SWAT) watershed model for multiple variables at multiple sites. The global one-factor-at-a-time (OAT) method of Morr...
BLURRING OF BIOGEOGRAPHIC BOUNDARIES: A MULTIVARIATE ANALYSIS OF THE REGIONAL PATTERNS OF NATIVE AND NONINDIGENOUS SPECIES ASSEMBLAGES IN PACIFIC COAST ESTUARIES

EPA Science Inventory

Many, if not most, invaders have wide physiological tolerance limits and generalist habitat requirements. Consequently as a group nonindigenous species should have wider geographic distributions compared to native fauna. In turn, these broader distributions of nonindigenous speci...
A climate-based multivariate extreme emulator of met-ocean-hydrological events for coastal flooding

NASA Astrophysics Data System (ADS)

Camus, Paula; Rueda, Ana; Mendez, Fernando J.; Tomas, Antonio; Del Jesus, Manuel; Losada, Iñigo J.

2015-04-01

Atmosphere-ocean general circulation models (AOGCMs) are useful to analyze large-scale climate variability (long-term historical periods, future climate projections). However, applications such as coastal flood modeling require climate information at finer scale. Besides, flooding events depend on multiple climate conditions: waves, surge levels from the open-ocean and river discharge caused by precipitation. Therefore, a multivariate statistical downscaling approach is adopted to reproduce relationships between variables and due to its low computational cost. The proposed method can be considered as a hybrid approach which combines a probabilistic weather type downscaling model with a stochastic weather generator component. Predictand distributions are reproduced modeling the relationship with AOGCM predictors based on a physical division in weather types (Camus et al., 2012). The multivariate dependence structure of the predictand (extreme events) is introduced linking the independent marginal distributions of the variables by a probabilistic copula regression (Ben Ayala et al., 2014). This hybrid approach is applied for the downscaling of AOGCM data to daily precipitation and maximum significant wave height and storm-surge in different locations along the Spanish coast. Reanalysis data is used to assess the proposed method. A commonly predictor for the three variables involved is classified using a regression-guided clustering algorithm. The most appropriate statistical model (general extreme value distribution, pareto distribution) for daily conditions is fitted. Stochastic simulation of the present climate is performed obtaining the set of hydraulic boundary conditions needed for high resolution coastal flood modeling. References: Camus, P., Menéndez, M., Méndez, F.J., Izaguirre, C., Espejo, A., Cánovas, V., Pérez, J., Rueda, A., Losada, I.J., Medina, R. (2014b). A weather-type statistical downscaling framework for ocean wave climate. Journal of Geophysical Research, doi: 10.1002/2014JC010141. Ben Ayala, M.A., Chebana, F., Ouarda, T.B.M.J. (2014). Probabilistic Gaussian Copula Regression Model for Multisite and Multivariable Downscaling, Journal of Climate, 27, 3331-3347.
Exploring image data assimilation in the prospect of high-resolution satellite oceanic observations

NASA Astrophysics Data System (ADS)

Durán Moro, Marina; Brankart, Jean-Michel; Brasseur, Pierre; Verron, Jacques

2017-07-01

Satellite sensors increasingly provide high-resolution (HR) observations of the ocean. They supply observations of sea surface height (SSH) and of tracers of the dynamics such as sea surface salinity (SSS) and sea surface temperature (SST). In particular, the Surface Water Ocean Topography (SWOT) mission will provide measurements of the surface ocean topography at very high-resolution (HR) delivering unprecedented information on the meso-scale and submeso-scale dynamics. This study investigates the feasibility to use these measurements to reconstruct meso-scale features simulated by numerical models, in particular on the vertical dimension. A methodology to reconstruct three-dimensional (3D) multivariate meso-scale scenes is developed by using a HR numerical model of the Solomon Sea region. An inverse problem is defined in the framework of a twin experiment where synthetic observations are used. A true state is chosen among the 3D multivariate states which is considered as a reference state. In order to correct a first guess of this true state, a two-step analysis is carried out. A probability distribution of the first guess is defined and updated at each step of the analysis: (i) the first step applies the analysis scheme of a reduced-order Kalman filter to update the first guess probability distribution using SSH observation; (ii) the second step minimizes a cost function using observations of HR image structure and a new probability distribution is estimated. The analysis is extended to the vertical dimension using 3D multivariate empirical orthogonal functions (EOFs) and the probabilistic approach allows the update of the probability distribution through the two-step analysis. Experiments show that the proposed technique succeeds in correcting a multivariate state using meso-scale and submeso-scale information contained in HR SSH and image structure observations. It also demonstrates how the surface information can be used to reconstruct the ocean state below the surface.
A Cyber-Attack Detection Model Based on Multivariate Analyses

NASA Astrophysics Data System (ADS)

Sakai, Yuto; Rinsaka, Koichiro; Dohi, Tadashi

In the present paper, we propose a novel cyber-attack detection model based on two multivariate-analysis methods to the audit data observed on a host machine. The statistical techniques used here are the well-known Hayashi's quantification method IV and cluster analysis method. We quantify the observed qualitative audit event sequence via the quantification method IV, and collect similar audit event sequence in the same groups based on the cluster analysis. It is shown in simulation experiments that our model can improve the cyber-attack detection accuracy in some realistic cases where both normal and attack activities are intermingled.
Combining markers with and without the limit of detection

PubMed Central

Dong, Ting; Liu, Catherine Chunling; Petricoin, Emanuel F.; Tang, Liansheng Larry

2014-01-01

In this paper, we consider the combination of markers with and without the limit of detection (LOD). LOD is often encountered when measuring proteomic markers. Because of the limited detecting ability of an equipment or instrument, it is difficult to measure markers at a relatively low level. Suppose that after some monotonic transformation, the marker values approximately follow multivariate normal distributions. We propose to estimate distribution parameters while taking the LOD into account, and then combine markers using the results from the linear discriminant analysis. Our simulation results show that the ROC curve parameter estimates generated from the proposed method are much closer to the truth than simply using the linear discriminant analysis to combine markers without considering the LOD. In addition, we propose a procedure to select and combine a subset of markers when many candidate markers are available. The procedure based on the correlation among markers is different from a common understanding that a subset of the most accurate markers should be selected for the combination. The simulation studies show that the accuracy of a combined marker can be largely impacted by the correlation of marker measurements. Our methods are applied to a protein pathway dataset to combine proteomic biomarkers to distinguish cancer patients from non-cancer patients. PMID:24132938
Forecasts of non-Gaussian parameter spaces using Box-Cox transformations

NASA Astrophysics Data System (ADS)

Joachimi, B.; Taylor, A. N.

2011-09-01

Forecasts of statistical constraints on model parameters using the Fisher matrix abound in many fields of astrophysics. The Fisher matrix formalism involves the assumption of Gaussianity in parameter space and hence fails to predict complex features of posterior probability distributions. Combining the standard Fisher matrix with Box-Cox transformations, we propose a novel method that accurately predicts arbitrary posterior shapes. The Box-Cox transformations are applied to parameter space to render it approximately multivariate Gaussian, performing the Fisher matrix calculation on the transformed parameters. We demonstrate that, after the Box-Cox parameters have been determined from an initial likelihood evaluation, the method correctly predicts changes in the posterior when varying various parameters of the experimental setup and the data analysis, with marginally higher computational cost than a standard Fisher matrix calculation. We apply the Box-Cox-Fisher formalism to forecast cosmological parameter constraints by future weak gravitational lensing surveys. The characteristic non-linear degeneracy between matter density parameter and normalization of matter density fluctuations is reproduced for several cases, and the capabilities of breaking this degeneracy by weak-lensing three-point statistics is investigated. Possible applications of Box-Cox transformations of posterior distributions are discussed, including the prospects for performing statistical data analysis steps in the transformed Gaussianized parameter space.
Assessment of methane emissions from oil and gas production pads using mobile measurements.

PubMed

Brantley, Halley L; Thoma, Eben D; Squier, William C; Guven, Birnur B; Lyon, David

2014-12-16

A new mobile methane emissions inspection approach, Other Test Method (OTM) 33A, was used to quantify short-term emission rates from 210 oil and gas production pads during eight two-week field studies in Texas, Colorado, and Wyoming from 2010 to 2013. Emission rates were log-normally distributed with geometric means and 95% confidence intervals (CIs) of 0.33 (0.23, 0.48), 0.14 (0.11, 0.19), and 0.59 (0.47, 0.74) g/s in the Barnett, Denver-Julesburg, and Pinedale basins, respectively. This study focused on sites with emission rates above 0.01 g/s and included short-term (i.e., condensate tank flashing) and maintenance-related emissions. The results fell within the upper ranges of the distributions observed in recent onsite direct measurement studies. Considering data across all basins, a multivariate linear regression was used to assess the relationship of methane emissions to well age, gas production, and hydrocarbon liquids (oil or condensate) production. Methane emissions were positively correlated with gas production, but only approximately 10% of the variation in emission rates was explained by variation in production levels. The weak correlation between emission and production rates may indicate that maintenance-related stochastic variables and design of production and control equipment are factors determining emissions.
Characterization of sildenafil citrate tablets of different sources by near infrared chemical imaging and chemometric tools.

PubMed

Sabin, Guilherme P; Lozano, Valeria A; Rocha, Werickson F C; Romão, Wanderson; Ortiz, Rafael S; Poppi, Ronei J

2013-11-01

The chemical imaging technique by near infrared spectroscopy was applied for characterization of formulations in tablets of sildenafil citrate of six different sources. Five formulations were provided by Brazilian Federal Police and correspond to several trademarks of prohibited marketing and one was an authentic sample of Viagra. In a first step of the study, multivariate curve resolution was properly chosen for the estimation of the distribution map of concentration of the active ingredient in tablets of different sources, where the chemical composition of all excipients constituents was not truly known. In such cases, it is very difficult to establish an appropriate calibration technique, so that only the information of sildenafil is considered independently of the excipients. This determination was possible only by reaching the second-order advantage, where the analyte quantification can be performed in the presence of unknown interferences. In a second step, the normalized histograms of images from active ingredient were grouped according to their similarities by hierarchical cluster analysis. Finally it was possible to recognize the patterns of distribution maps of concentration of sildenafil citrate, distinguishing the true formulation of Viagra. This concept can be used to improve the knowledge of industrial products and processes, as well as, for characterization of counterfeit drugs. Copyright © 2013. Published by Elsevier B.V.
Assessment of CT image quality using a Bayesian approach

NASA Astrophysics Data System (ADS)

Reginatto, M.; Anton, M.; Elster, C.

2017-08-01

One of the most promising approaches for evaluating CT image quality is task-specific quality assessment. This involves a simplified version of a clinical task, e.g. deciding whether an image belongs to the class of images that contain the signature of a lesion or not. Task-specific quality assessment can be done by model observers, which are mathematical procedures that carry out the classification task. The most widely used figure of merit for CT image quality is the area under the ROC curve (AUC), a quantity which characterizes the performance of a given model observer. In order to estimate AUC from a finite sample of images, different approaches from classical statistics have been suggested. The goal of this paper is to introduce task-specific quality assessment of CT images to metrology and to propose a novel Bayesian estimation of AUC for the channelized Hotelling observer (CHO) applied to the task of detecting a lesion at a known image location. It is assumed that signal-present and signal-absent images follow multivariate normal distributions with the same covariance matrix. The Bayesian approach results in a posterior distribution for the AUC of the CHO which provides in addition a complete characterization of the uncertainty of this figure of merit. The approach is illustrated by its application to both simulated and experimental data.
Anatomic distribution of culprit lesions in patients with non-ST-segment elevation myocardial infarction and normal ECG.

PubMed

Moustafa, Abdelmoniem; Abi-Saleh, Bernard; El-Baba, Mohammad; Hamoui, Omar; AlJaroudi, Wael

2016-02-01

In patients presenting with non-ST-elevation myocardial infarction (NSTEMI), left anterior descending (LAD) coronary artery and three-vessel disease are the most commonly encountered culprit lesions in the presence of ST depression, while one third of patients with left circumflex (LCX) artery related infarction have normal ECG. We sought to determine the predictors of presence of culprit lesion in NSTEMI patients based on ECG, echocardiographic, and clinical characteristics. Patients admitted to the coronary care unit with the diagnosis of NSTEMI between June 2012 and December 2013 were retrospectively identified. Admission ECG was interpreted by an electrophysiologist that was blinded to the result of the coronary angiogram. Patients were dichotomized into either normal or abnormal ECG group. The primary endpoint was presence of culprit lesion. Secondary endpoints included length of stay, re-hospitalization within 60 days, and in-hospital mortality. A total of 118 patients that were identified; 47 with normal and 71 with abnormal ECG. At least one culprit lesion was identified in 101 patients (86%), and significantly more among those with abnormal ECG (91.5% vs. 76.6%, P=0.041).The LAD was the most frequently detected culprit lesion in both groups. There was a higher incidence of two and three-vessel disease in the abnormal ECG group (P=0.041).On the other hand, there was a trend of higher LCX involvement (25% vs. 13.8%, P=0.18) and more normal coronary arteries in the normal ECG group (23.4% vs. 8.5%, P=0.041). On multivariate analysis, prior history of coronary artery disease (CAD) [odds ratio (OR) 6.4 (0.8-52)], male gender [OR 5.0 (1.5-17)], and abnormal admission ECG [OR 3.6 (1.12-12)], were independent predictors of a culprit lesion. There was no difference in secondary endpoints between those with normal and abnormal ECG. Among patients presenting with NSTEMI, prior history of CAD, male gender and abnormal admission ECG were independent predictors of a culprit lesion. An abnormal ECG was significantly associated with two and three-vessel disease, while normal ECG was more associated with LCX involvement or normal angiogram. Admission ECG did not impact secondary outcomes.
Data driven discrete-time parsimonious identification of a nonlinear state-space model for a weakly nonlinear system with short data record

NASA Astrophysics Data System (ADS)

Relan, Rishi; Tiels, Koen; Marconato, Anna; Dreesen, Philippe; Schoukens, Johan

2018-05-01

Many real world systems exhibit a quasi linear or weakly nonlinear behavior during normal operation, and a hard saturation effect for high peaks of the input signal. In this paper, a methodology to identify a parsimonious discrete-time nonlinear state space model (NLSS) for the nonlinear dynamical system with relatively short data record is proposed. The capability of the NLSS model structure is demonstrated by introducing two different initialisation schemes, one of them using multivariate polynomials. In addition, a method using first-order information of the multivariate polynomials and tensor decomposition is employed to obtain the parsimonious decoupled representation of the set of multivariate real polynomials estimated during the identification of NLSS model. Finally, the experimental verification of the model structure is done on the cascaded water-benchmark identification problem.
Notes on power of normality tests of error terms in regression models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Střelec, Luboš

2015-03-10

Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less
Influence of malnutrition on the course of childhood bacterial meningitis.

PubMed

Roine, Irmeli; Weisstaub, Gerardo; Peltola, Heikki

2010-02-01

Malnutrition may be an important cofactor explaining poor outcome of childhood bacterial meningitis (BM) in developing countries. We examined its effect in Latin American children. The weight-for-age z score was determined for 482 children with BM aged 2 months to 5 years. Normal weight (z score from >-1 to <+1), underweight (z score <-1) and overweight (z score >+1) children were compared on admission, in-hospital and at discharge. Using uni- and multivariate analysis, we sought for associations between malnutrition and 3 different outcomes. The mean z score was -0.41 +/- 1.54, with a normal distribution. Overall, 260 (54%) patients were of normal weight, 151 (31%) underweight, and 71 (15%) overweight. Compared with others, underweight patients had on admission a lower Glasgow coma score (P = 0.0006) and cerebrospinal fluid glucose concentration (P = 0.03), and a slower capillary filling time (P = 0.02). Their death rate was higher (P = 0.0004) and they survived with more neurological sequelae (P = 0.04), but a similar frequency of hearing impairment (P > 0.05). The odds for death increased 1.98 times by mild (95% confidence interval [CI], 1.03-3.83; P = 0.04), 2.55 times by moderate (95% CI, 1.05-6.17; P = 0.04), and 5.85 times (95% CI, 2.53-13.50; P < 0.0001) by severe underweight. Overweight was not associated with adverse outcomes (P > 0.05). Children who are underweight at the time of onset of BM have a substantially increased probability of neurological sequelae and death.
Modeling Error Distributions of Growth Curve Models through Bayesian Methods

ERIC Educational Resources Information Center

Zhang, Zhiyong

2016-01-01

Growth curve models are widely used in social and behavioral sciences. However, typical growth curve models often assume that the errors are normally distributed although non-normal data may be even more common than normal data. In order to avoid possible statistical inference problems in blindly assuming normality, a general Bayesian framework is…
Polynomial compensation, inversion, and approximation of discrete time linear systems

NASA Technical Reports Server (NTRS)

Baram, Yoram

1987-01-01

The least-squares transformation of a discrete-time multivariable linear system into a desired one by convolving the first with a polynomial system yields optimal polynomial solutions to the problems of system compensation, inversion, and approximation. The polynomial coefficients are obtained from the solution to a so-called normal linear matrix equation, whose coefficients are shown to be the weighting patterns of certain linear systems. These, in turn, can be used in the recursive solution of the normal equation.
A coupled classification - evolutionary optimization model for contamination event detection in water distribution systems.

PubMed

Oliker, Nurit; Ostfeld, Avi

2014-03-15

This study describes a decision support system, alerts for contamination events in water distribution systems. The developed model comprises a weighted support vector machine (SVM) for the detection of outliers, and a following sequence analysis for the classification of contamination events. The contribution of this study is an improvement of contamination events detection ability and a multi-dimensional analysis of the data, differing from the parallel one-dimensional analysis conducted so far. The multivariate analysis examines the relationships between water quality parameters and detects changes in their mutual patterns. The weights of the SVM model accomplish two goals: blurring the difference between sizes of the two classes' data sets (as there are much more normal/regular than event time measurements), and adhering the time factor attribute by a time decay coefficient, ascribing higher importance to recent observations when classifying a time step measurement. All model parameters were determined by data driven optimization so the calibration of the model was completely autonomic. The model was trained and tested on a real water distribution system (WDS) data set with randomly simulated events superimposed on the original measurements. The model is prominent in its ability to detect events that were only partly expressed in the data (i.e., affecting only some of the measured parameters). The model showed high accuracy and better detection ability as compared to previous modeling attempts of contamination event detection. Copyright © 2013 Elsevier Ltd. All rights reserved.

Pleiotropy Analysis of Quantitative Traits at Gene Level by Multivariate Functional Linear Models

PubMed Central

Wang, Yifan; Liu, Aiyi; Mills, James L.; Boehnke, Michael; Wilson, Alexander F.; Bailey-Wilson, Joan E.; Xiong, Momiao; Wu, Colin O.; Fan, Ruzong

2015-01-01

In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate F-distribution tests based on Pillai–Bartlett trace, Hotelling–Lawley trace, and Wilks’s Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and optimal sequence kernel association test (SKAT-O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate F-distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and SKAT-O for the three biochemical traits. The approximate F-distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT-O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT-O in the univariate case. PMID:25809955
Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models.

PubMed

Wang, Yifan; Liu, Aiyi; Mills, James L; Boehnke, Michael; Wilson, Alexander F; Bailey-Wilson, Joan E; Xiong, Momiao; Wu, Colin O; Fan, Ruzong

2015-05-01

In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate F-distribution tests based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and optimal sequence kernel association test (SKAT-O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate F-distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and SKAT-O for the three biochemical traits. The approximate F-distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT-O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT-O in the univariate case. © 2015 WILEY PERIODICALS, INC.
Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed

PubMed Central

Landfors, Mattias; Philip, Philge; Rydén, Patrik; Stenberg, Per

2011-01-01

Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical variation introduced in the course of conducting experimental work, but standard normalization techniques are not capable of eliminating technical bias in cases where the distribution of the truly altered variables is skewed, i.e. when a large fraction of the variables are either positively or negatively affected by the treatment. However, several experiments are likely to generate such skewed distributions, including ChIP-chip experiments for the study of chromatin, gene expression experiments for the study of apoptosis, and SNP-studies of copy number variation in normal and tumour tissues. A preliminary study using spike-in array data established that the capacity of an experiment to identify altered variables and generate unbiased estimates of the fold change decreases as the fraction of altered variables and the skewness increases. We propose the following work-flow for analyzing high-dimensional experiments with regions of altered variables: (1) Pre-process raw data using one of the standard normalization techniques. (2) Investigate if the distribution of the altered variables is skewed. (3) If the distribution is not believed to be skewed, no additional normalization is needed. Otherwise, re-normalize the data using a novel HMM-assisted normalization procedure. (4) Perform downstream analysis. Here, ChIP-chip data and simulated data were used to evaluate the performance of the work-flow. It was found that skewed distributions can be detected by using the novel DSE-test (Detection of Skewed Experiments). Furthermore, applying the HMM-assisted normalization to experiments where the distribution of the truly altered variables is skewed results in considerably higher sensitivity and lower bias than can be attained using standard and invariant normalization methods. PMID:22132175
Automatic and objective oral cancer diagnosis by Raman spectroscopic detection of keratin with multivariate curve resolution analysis

NASA Astrophysics Data System (ADS)

Chen, Po-Hsiung; Shimada, Rintaro; Yabumoto, Sohshi; Okajima, Hajime; Ando, Masahiro; Chang, Chiou-Tzu; Lee, Li-Tzu; Wong, Yong-Kie; Chiou, Arthur; Hamaguchi, Hiro-O.

2016-01-01

We have developed an automatic and objective method for detecting human oral squamous cell carcinoma (OSCC) tissues with Raman microspectroscopy. We measure 196 independent Raman spectra from 196 different points of one oral tissue sample and globally analyze these spectra using a Multivariate Curve Resolution (MCR) analysis. Discrimination of OSCC tissues is automatically and objectively made by spectral matching comparison of the MCR decomposed Raman spectra and the standard Raman spectrum of keratin, a well-established molecular marker of OSCC. We use a total of 24 tissue samples, 10 OSCC and 10 normal tissues from the same 10 patients, 3 OSCC and 1 normal tissues from different patients. Following the newly developed protocol presented here, we have been able to detect OSCC tissues with 77 to 92% sensitivity (depending on how to define positivity) and 100% specificity. The present approach lends itself to a reliable clinical diagnosis of OSCC substantiated by the “molecular fingerprint” of keratin.
Classification of Fusarium-Infected Korean Hulled Barley Using Near-Infrared Reflectance Spectroscopy and Partial Least Squares Discriminant Analysis

PubMed Central

Lim, Jongguk; Kim, Giyoung; Mo, Changyeun; Oh, Kyoungmin; Yoo, Hyeonchae; Ham, Hyeonheui; Kim, Moon S.

2017-01-01

The purpose of this study is to use near-infrared reflectance (NIR) spectroscopy equipment to nondestructively and rapidly discriminate Fusarium-infected hulled barley. Both normal hulled barley and Fusarium-infected hulled barley were scanned by using a NIR spectrometer with a wavelength range of 1175 to 2170 nm. Multiple mathematical pretreatments were applied to the reflectance spectra obtained for Fusarium discrimination and the multivariate analysis method of partial least squares discriminant analysis (PLS-DA) was used for discriminant prediction. The PLS-DA prediction model developed by applying the second-order derivative pretreatment to the reflectance spectra obtained from the side of hulled barley without crease achieved 100% accuracy in discriminating the normal hulled barley and the Fusarium-infected hulled barley. These results demonstrated the feasibility of rapid discrimination of the Fusarium-infected hulled barley by combining multivariate analysis with the NIR spectroscopic technique, which is utilized as a nondestructive detection method. PMID:28974012
Impact of statistical learning methods on the predictive power of multivariate normal tissue complication probability models.

PubMed

Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A; van't Veld, Aart A

2012-03-15

To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended. Copyright Â© 2012 Elsevier Inc. All rights reserved.
Potential of non-invasive esophagus cancer detection based on urine surface-enhanced Raman spectroscopy

NASA Astrophysics Data System (ADS)

Huang, Shaohua; Wang, Lan; Chen, Weisheng; Feng, Shangyuan; Lin, Juqiang; Huang, Zufang; Chen, Guannan; Li, Buhong; Chen, Rong

2014-11-01

Non-invasive esophagus cancer detection based on urine surface-enhanced Raman spectroscopy (SERS) analysis was presented. Urine SERS spectra were measured on esophagus cancer patients (n = 56) and healthy volunteers (n = 36) for control analysis. Tentative assignments of the urine SERS spectra indicated some interesting esophagus cancer-specific biomolecular changes, including a decrease in the relative content of urea and an increase in the percentage of uric acid in the urine of esophagus cancer patients compared to that of healthy subjects. Principal component analysis (PCA) combined with linear discriminant analysis (LDA) was employed to analyze and differentiate the SERS spectra between normal and esophagus cancer urine. The diagnostic algorithms utilizing a multivariate analysis method achieved a diagnostic sensitivity of 89.3% and specificity of 83.3% for separating esophagus cancer samples from normal urine samples. These results from the explorative work suggested that silver nano particle-based urine SERS analysis coupled with PCA-LDA multivariate analysis has potential for non-invasive detection of esophagus cancer.
Application of Maxent Multivariate Analysis to Define Climate-Change Effects on Species Distributions and Changes

DTIC Science & Technology

2014-09-01

approaches. Ecological Modelling Volume 200, Issues 1–2, 10, pp 1–19. Buhlmann, Kurt A ., Thomas S.B. Akre , John B. Iverson, Deno Karapatakis, Russell A ...statistical multivariate analysis to define the current and projected future range probability for species of interest to Army land managers. A software...15 Figure 4. RCW omission rate and predicted area as a function of the cumulative threshold
Development of multivariate NTCP models for radiation-induced hypothyroidism: a comparative analysis.

PubMed

Cella, Laura; Liuzzi, Raffaele; Conson, Manuel; D'Avino, Vittoria; Salvatore, Marco; Pacelli, Roberto

2012-12-27

Hypothyroidism is a frequent late side effect of radiation therapy of the cervical region. Purpose of this work is to develop multivariate normal tissue complication probability (NTCP) models for radiation-induced hypothyroidism (RHT) and to compare them with already existing NTCP models for RHT. Fifty-three patients treated with sequential chemo-radiotherapy for Hodgkin's lymphoma (HL) were retrospectively reviewed for RHT events. Clinical information along with thyroid gland dose distribution parameters were collected and their correlation to RHT was analyzed by Spearman's rank correlation coefficient (Rs). Multivariate logistic regression method using resampling methods (bootstrapping) was applied to select model order and parameters for NTCP modeling. Model performance was evaluated through the area under the receiver operating characteristic curve (AUC). Models were tested against external published data on RHT and compared with other published NTCP models. If we express the thyroid volume exceeding X Gy as a percentage (Vx(%)), a two-variable NTCP model including V30(%) and gender resulted to be the optimal predictive model for RHT (Rs = 0.615, p < 0.001. AUC = 0.87). Conversely, if absolute thyroid volume exceeding X Gy (Vx(cc)) was analyzed, an NTCP model based on 3 variables including V30(cc), thyroid gland volume and gender was selected as the most predictive model (Rs = 0.630, p < 0.001. AUC = 0.85). The three-variable model performs better when tested on an external cohort characterized by large inter-individuals variation in thyroid volumes (AUC = 0.914, 95% CI 0.760-0.984). A comparable performance was found between our model and that proposed in the literature based on thyroid gland mean dose and volume (p = 0.264). The absolute volume of thyroid gland exceeding 30 Gy in combination with thyroid gland volume and gender provide an NTCP model for RHT with improved prediction capability not only within our patient population but also in an external cohort.
Comparative forensic soil analysis of New Jersey state parks using a combination of simple techniques with multivariate statistics.

PubMed

Bonetti, Jennifer; Quarino, Lawrence

2014-05-01

This study has shown that the combination of simple techniques with the use of multivariate statistics offers the potential for the comparative analysis of soil samples. Five samples were obtained from each of twelve state parks across New Jersey in both the summer and fall seasons. Each sample was examined using particle-size distribution, pH analysis in both water and 1 M CaCl2 , and a loss on ignition technique. Data from each of the techniques were combined, and principal component analysis (PCA) and canonical discriminant analysis (CDA) were used for multivariate data transformation. Samples from different locations could be visually differentiated from one another using these multivariate plots. Hold-one-out cross-validation analysis showed error rates as low as 3.33%. Ten blind study samples were analyzed resulting in no misclassifications using Mahalanobis distance calculations and visual examinations of multivariate plots. Seasonal variation was minimal between corresponding samples, suggesting potential success in forensic applications. © 2014 American Academy of Forensic Sciences.
Impacts of rising health care costs on families with employment-based private insurance: a national analysis with state fixed effects.

PubMed

Yu, Hao; Dick, Andrew W

2012-10-01

Given the rapid growth of health care costs, some experts were concerned with erosion of employment-based private insurance (EBPI). This empirical analysis aims to quantify the concern. Using the National Health Account, we generated a cost index to represent state-level annual cost growth. We merged it with the 1996-2003 Medical Expenditure Panel Survey. The unit of analysis is the family. We conducted both bivariate and multivariate logistic analyses. The bivariate analysis found a significant inverse association between the cost index and the proportion of families receiving an offer of EBPI. The multivariate analysis showed that the cost index was significantly negatively associated with the likelihood of receiving an EBPI offer for the entire sample and for families in the first, second, and third quartiles of income distribution. The cost index was also significantly negatively associated with the proportion of families with EBPI for the entire year for each family member (EBPI-EYEM). The multivariate analysis confirmed significance of the relationship for the entire sample, and for families in the second and third quartiles of income distribution. Among the families with EBPI-EYEM, there was a positive relationship between the cost index and this group's likelihood of having out-of-pocket expenditures exceeding 10 percent of family income. The multivariate analysis confirmed significance of the relationship for the entire group and for families in the second and third quartiles of income distribution. Rising health costs reduce EBPI availability and enrollment, and the financial protection provided by it, especially for middle-class families. © Health Research and Educational Trust.
Impacts of Rising Health Care Costs on Families with Employment-Based Private Insurance: A National Analysis with State Fixed Effects

PubMed Central

Yu, Hao; Dick, Andrew W

2012-01-01

Background Given the rapid growth of health care costs, some experts were concerned with erosion of employment-based private insurance (EBPI). This empirical analysis aims to quantify the concern. Methods Using the National Health Account, we generated a cost index to represent state-level annual cost growth. We merged it with the 1996–2003 Medical Expenditure Panel Survey. The unit of analysis is the family. We conducted both bivariate and multivariate logistic analyses. Results The bivariate analysis found a significant inverse association between the cost index and the proportion of families receiving an offer of EBPI. The multivariate analysis showed that the cost index was significantly negatively associated with the likelihood of receiving an EBPI offer for the entire sample and for families in the first, second, and third quartiles of income distribution. The cost index was also significantly negatively associated with the proportion of families with EBPI for the entire year for each family member (EBPI-EYEM). The multivariate analysis confirmed significance of the relationship for the entire sample, and for families in the second and third quartiles of income distribution. Among the families with EBPI-EYEM, there was a positive relationship between the cost index and this group's likelihood of having out-of-pocket expenditures exceeding 10 percent of family income. The multivariate analysis confirmed significance of the relationship for the entire group and for families in the second and third quartiles of income distribution. Conclusions Rising health costs reduce EBPI availability and enrollment, and the financial protection provided by it, especially for middle-class families. PMID:22417314
A randomised approach for NARX model identification based on a multivariate Bernoulli distribution

NASA Astrophysics Data System (ADS)

Bianchi, F.; Falsone, A.; Prandini, M.; Piroddi, L.

2017-04-01

The identification of polynomial NARX models is typically performed by incremental model building techniques. These methods assess the importance of each regressor based on the evaluation of partial individual models, which may ultimately lead to erroneous model selections. A more robust assessment of the significance of a specific model term can be obtained by considering ensembles of models, as done by the RaMSS algorithm. In that context, the identification task is formulated in a probabilistic fashion and a Bernoulli distribution is employed to represent the probability that a regressor belongs to the target model. Then, samples of the model distribution are collected to gather reliable information to update it, until convergence to a specific model. The basic RaMSS algorithm employs multiple independent univariate Bernoulli distributions associated to the different candidate model terms, thus overlooking the correlations between different terms, which are typically important in the selection process. Here, a multivariate Bernoulli distribution is employed, in which the sampling of a given term is conditioned by the sampling of the others. The added complexity inherent in considering the regressor correlation properties is more than compensated by the achievable improvements in terms of accuracy of the model selection process.
Modeling Multi-Variate Gaussian Distributions and Analysis of Higgs Boson Couplings with the ATLAS Detector

NASA Astrophysics Data System (ADS)

Krohn, Olivia; Armbruster, Aaron; Gao, Yongsheng; Atlas Collaboration

2017-01-01

Software tools developed for the purpose of modeling CERN LHC pp collision data to aid in its interpretation are presented. Some measurements are not adequately described by a Gaussian distribution; thus an interpretation assuming Gaussian uncertainties will inevitably introduce bias, necessitating analytical tools to recreate and evaluate non-Gaussian features. One example is the measurements of Higgs boson production rates in different decay channels, and the interpretation of these measurements. The ratios of data to Standard Model expectations (μ) for five arbitrary signals were modeled by building five Poisson distributions with mixed signal contributions such that the measured values of μ are correlated. Algorithms were designed to recreate probability distribution functions of μ as multi-variate Gaussians, where the standard deviation (σ) and correlation coefficients (ρ) are parametrized. There was good success with modeling 1-D likelihood contours of μ, and the multi-dimensional distributions were well modeled within 1- σ but the model began to diverge after 2- σ due to unmerited assumptions in developing ρ. Future plans to improve the algorithms and develop a user-friendly analysis package will also be discussed. NSF International Research Experiences for Students
Non-normal Distributions Commonly Used in Health, Education, and Social Sciences: A Systematic Review

PubMed Central

Bono, Roser; Blanca, María J.; Arnau, Jaume; Gómez-Benito, Juana

2017-01-01

Statistical analysis is crucial for research and the choice of analytical technique should take into account the specific distribution of data. Although the data obtained from health, educational, and social sciences research are often not normally distributed, there are very few studies detailing which distributions are most likely to represent data in these disciplines. The aim of this systematic review was to determine the frequency of appearance of the most common non-normal distributions in the health, educational, and social sciences. The search was carried out in the Web of Science database, from which we retrieved the abstracts of papers published between 2010 and 2015. The selection was made on the basis of the title and the abstract, and was performed independently by two reviewers. The inter-rater reliability for article selection was high (Cohen’s kappa = 0.84), and agreement regarding the type of distribution reached 96.5%. A total of 262 abstracts were included in the final review. The distribution of the response variable was reported in 231 of these abstracts, while in the remaining 31 it was merely stated that the distribution was non-normal. In terms of their frequency of appearance, the most-common non-normal distributions can be ranked in descending order as follows: gamma, negative binomial, multinomial, binomial, lognormal, and exponential. In addition to identifying the distributions most commonly used in empirical studies these results will help researchers to decide which distributions should be included in simulation studies examining statistical procedures. PMID:28959227
Log-Normal Distribution of Cosmic Voids in Simulations and Mocks

NASA Astrophysics Data System (ADS)

Russell, E.; Pycke, J.-R.

2017-01-01

Following up on previous studies, we complete here a full analysis of the void size distributions of the Cosmic Void Catalog based on three different simulation and mock catalogs: dark matter (DM), haloes, and galaxies. Based on this analysis, we attempt to answer two questions: Is a three-parameter log-normal distribution a good candidate to satisfy the void size distributions obtained from different types of environments? Is there a direct relation between the shape parameters of the void size distribution and the environmental effects? In an attempt to answer these questions, we find here that all void size distributions of these data samples satisfy the three-parameter log-normal distribution whether the environment is dominated by DM, haloes, or galaxies. In addition, the shape parameters of the three-parameter log-normal void size distribution seem highly affected by environment, particularly existing substructures. Therefore, we show two quantitative relations given by linear equations between the skewness and the maximum tree depth, and between the variance of the void size distribution and the maximum tree depth, directly from the simulated data. In addition to this, we find that the percentage of voids with nonzero central density in the data sets has a critical importance. If the number of voids with nonzero central density reaches ≥3.84% in a simulation/mock sample, then a second population is observed in the void size distributions. This second population emerges as a second peak in the log-normal void size distribution at larger radius.
SENSITIVITY OF NORMAL THEORY METHODS TO MODEL MISSPECIFICATION IN THE CALCULATION OF UPPER CONFIDENCE LIMITS ON THE RISK FUNCTION FOR CONTINUOUS RESPONSES. (R825385)

EPA Science Inventory

Normal theory procedures for calculating upper confidence limits (UCL) on the risk function for continuous responses work well when the data come from a normal distribution. However, if the data come from an alternative distribution, the application of the normal theory procedure...
Log-normal distribution of the trace element data results from a mixture of stocahstic input and deterministic internal dynamics.

PubMed

Usuda, Kan; Kono, Koichi; Dote, Tomotaro; Shimizu, Hiroyasu; Tominaga, Mika; Koizumi, Chisato; Nakase, Emiko; Toshina, Yumi; Iwai, Junko; Kawasaki, Takashi; Akashi, Mitsuya

2002-04-01

In previous article, we showed a log-normal distribution of boron and lithium in human urine. This type of distribution is common in both biological and nonbiological applications. It can be observed when the effects of many independent variables are combined, each of which having any underlying distribution. Although elemental excretion depends on many variables, the one-compartment open model following a first-order process can be used to explain the elimination of elements. The rate of excretion is proportional to the amount present of any given element; that is, the same percentage of an existing element is eliminated per unit time, and the element concentration is represented by a deterministic negative power function of time in the elimination time-course. Sampling is of a stochastic nature, so the dataset of time variables in the elimination phase when the sample was obtained is expected to show Normal distribution. The time variable appears as an exponent of the power function, so a concentration histogram is that of an exponential transformation of Normally distributed time. This is the reason why the element concentration shows a log-normal distribution. The distribution is determined not by the element concentration itself, but by the time variable that defines the pharmacokinetic equation.
Descriptive Statistics for Modern Test Score Distributions: Skewness, Kurtosis, Discreteness, and Ceiling Effects.

PubMed

Ho, Andrew D; Yu, Carol C

2015-06-01

Many statistical analyses benefit from the assumption that unconditional or conditional distributions are continuous and normal. More than 50 years ago in this journal, Lord and Cook chronicled departures from normality in educational tests, and Micerri similarly showed that the normality assumption is met rarely in educational and psychological practice. In this article, the authors extend these previous analyses to state-level educational test score distributions that are an increasingly common target of high-stakes analysis and interpretation. Among 504 scale-score and raw-score distributions from state testing programs from recent years, nonnormal distributions are common and are often associated with particular state programs. The authors explain how scaling procedures from item response theory lead to nonnormal distributions as well as unusual patterns of discreteness. The authors recommend that distributional descriptive statistics be calculated routinely to inform model selection for large-scale test score data, and they illustrate consequences of nonnormality using sensitivity studies that compare baseline results to those from normalized score scales.
Effects of Missing Data Methods in SEM under Conditions of Incomplete and Nonnormal Data

ERIC Educational Resources Information Center

Li, Jian; Lomax, Richard G.

2017-01-01

Using Monte Carlo simulations, this research examined the performance of four missing data methods in SEM under different multivariate distributional conditions. The effects of four independent variables (sample size, missing proportion, distribution shape, and factor loading magnitude) were investigated on six outcome variables: convergence rate,…

Square Root Graphical Models: Multivariate Generalizations of Univariate Exponential Families that Permit Positive Dependencies

PubMed Central

Inouye, David I.; Ravikumar, Pradeep; Dhillon, Inderjit S.

2016-01-01

We develop Square Root Graphical Models (SQR), a novel class of parametric graphical models that provides multivariate generalizations of univariate exponential family distributions. Previous multivariate graphical models (Yang et al., 2015) did not allow positive dependencies for the exponential and Poisson generalizations. However, in many real-world datasets, variables clearly have positive dependencies. For example, the airport delay time in New York—modeled as an exponential distribution—is positively related to the delay time in Boston. With this motivation, we give an example of our model class derived from the univariate exponential distribution that allows for almost arbitrary positive and negative dependencies with only a mild condition on the parameter matrix—a condition akin to the positive definiteness of the Gaussian covariance matrix. Our Poisson generalization allows for both positive and negative dependencies without any constraints on the parameter values. We also develop parameter estimation methods using node-wise regressions with ℓ1 regularization and likelihood approximation methods using sampling. Finally, we demonstrate our exponential generalization on a synthetic dataset and a real-world dataset of airport delay times. PMID:27563373
WE-H-207A-03: The Universality of the Lognormal Behavior of [F-18]FLT PET SUV Measurements

DOE Office of Scientific and Technical Information (OSTI.GOV)

Scarpelli, M; Eickhoff, J; Perlman, S

Purpose: Log transforming [F-18]FDG PET standardized uptake values (SUVs) has been shown to lead to normal SUV distributions, which allows utilization of powerful parametric statistical models. This study identified the optimal transformation leading to normally distributed [F-18]FLT PET SUVs from solid tumors and offers an example of how normal distributions permits analysis of non-independent/correlated measurements. Methods: Forty patients with various metastatic diseases underwent up to six FLT PET/CT scans during treatment. Tumors were identified by nuclear medicine physician and manually segmented. Average uptake was extracted for each patient giving a global SUVmean (gSUVmean) for each scan. The Shapiro-Wilk test wasmore » used to test distribution normality. One parameter Box-Cox transformations were applied to each of the six gSUVmean distributions and the optimal transformation was found by selecting the parameter that maximized the Shapiro-Wilk test statistic. The relationship between gSUVmean and a serum biomarker (VEGF) collected at imaging timepoints was determined using a linear mixed effects model (LMEM), which accounted for correlated/non-independent measurements from the same individual. Results: Untransformed gSUVmean distributions were found to be significantly non-normal (p<0.05). The optimal transformation parameter had a value of 0.3 (95%CI: −0.4 to 1.6). Given the optimal parameter was close to zero (which corresponds to log transformation), the data were subsequently log transformed. All log transformed gSUVmean distributions were normally distributed (p>0.10 for all timepoints). Log transformed data were incorporated into the LMEM. VEGF serum levels significantly correlated with gSUVmean (p<0.001), revealing log-linear relationship between SUVs and underlying biology. Conclusion: Failure to account for correlated/non-independent measurements can lead to invalid conclusions and motivated transformation to normally distributed SUVs. The log transformation was found to be close to optimal and sufficient for obtaining normally distributed FLT PET SUVs. These transformations allow utilization of powerful LMEMs when analyzing quantitative imaging metrics.« less
Non-Gaussian Distributions Affect Identification of Expression Patterns, Functional Annotation, and Prospective Classification in Human Cancer Genomes

PubMed Central

Marko, Nicholas F.; Weil, Robert J.

2012-01-01

Introduction Gene expression data is often assumed to be normally-distributed, but this assumption has not been tested rigorously. We investigate the distribution of expression data in human cancer genomes and study the implications of deviations from the normal distribution for translational molecular oncology research. Methods We conducted a central moments analysis of five cancer genomes and performed empiric distribution fitting to examine the true distribution of expression data both on the complete-experiment and on the individual-gene levels. We used a variety of parametric and nonparametric methods to test the effects of deviations from normality on gene calling, functional annotation, and prospective molecular classification using a sixth cancer genome. Results Central moments analyses reveal statistically-significant deviations from normality in all of the analyzed cancer genomes. We observe as much as 37% variability in gene calling, 39% variability in functional annotation, and 30% variability in prospective, molecular tumor subclassification associated with this effect. Conclusions Cancer gene expression profiles are not normally-distributed, either on the complete-experiment or on the individual-gene level. Instead, they exhibit complex, heavy-tailed distributions characterized by statistically-significant skewness and kurtosis. The non-Gaussian distribution of this data affects identification of differentially-expressed genes, functional annotation, and prospective molecular classification. These effects may be reduced in some circumstances, although not completely eliminated, by using nonparametric analytics. This analysis highlights two unreliable assumptions of translational cancer gene expression analysis: that “small” departures from normality in the expression data distributions are analytically-insignificant and that “robust” gene-calling algorithms can fully compensate for these effects. PMID:23118863
A quantitative trait locus mixture model that avoids spurious LOD score peaks.

PubMed Central

Feenstra, Bjarke; Skovgaard, Ib M

2004-01-01

In standard interval mapping of quantitative trait loci (QTL), the QTL effect is described by a normal mixture model. At any given location in the genome, the evidence of a putative QTL is measured by the likelihood ratio of the mixture model compared to a single normal distribution (the LOD score). This approach can occasionally produce spurious LOD score peaks in regions of low genotype information (e.g., widely spaced markers), especially if the phenotype distribution deviates markedly from a normal distribution. Such peaks are not indicative of a QTL effect; rather, they are caused by the fact that a mixture of normals always produces a better fit than a single normal distribution. In this study, a mixture model for QTL mapping that avoids the problems of such spurious LOD score peaks is presented. PMID:15238544
A quantitative trait locus mixture model that avoids spurious LOD score peaks.

PubMed

Feenstra, Bjarke; Skovgaard, Ib M

2004-06-01

In standard interval mapping of quantitative trait loci (QTL), the QTL effect is described by a normal mixture model. At any given location in the genome, the evidence of a putative QTL is measured by the likelihood ratio of the mixture model compared to a single normal distribution (the LOD score). This approach can occasionally produce spurious LOD score peaks in regions of low genotype information (e.g., widely spaced markers), especially if the phenotype distribution deviates markedly from a normal distribution. Such peaks are not indicative of a QTL effect; rather, they are caused by the fact that a mixture of normals always produces a better fit than a single normal distribution. In this study, a mixture model for QTL mapping that avoids the problems of such spurious LOD score peaks is presented.
Estimating sales and sales market share from sales rank data for consumer appliances

NASA Astrophysics Data System (ADS)

Touzani, Samir; Van Buskirk, Robert

2016-06-01

Our motivation in this work is to find an adequate probability distribution to fit sales volumes of different appliances. This distribution allows for the translation of sales rank into sales volume. This paper shows that the log-normal distribution and specifically the truncated version are well suited for this purpose. We demonstrate that using sales proxies derived from a calibrated truncated log-normal distribution function can be used to produce realistic estimates of market average product prices, and product attributes. We show that the market averages calculated with the sales proxies derived from the calibrated, truncated log-normal distribution provide better market average estimates than sales proxies estimated with simpler distribution functions.
Discrimination and prediction of cultivation age and parts of Panax ginseng by Fourier-transform infrared spectroscopy combined with multivariate statistical analysis.

PubMed

Lee, Byeong-Ju; Kim, Hye-Youn; Lim, Sa Rang; Huang, Linfang; Choi, Hyung-Kyoon

2017-01-01

Panax ginseng C.A. Meyer is a herb used for medicinal purposes, and its discrimination according to cultivation age has been an important and practical issue. This study employed Fourier-transform infrared (FT-IR) spectroscopy with multivariate statistical analysis to obtain a prediction model for discriminating cultivation ages (5 and 6 years) and three different parts (rhizome, tap root, and lateral root) of P. ginseng. The optimal partial-least-squares regression (PLSR) models for discriminating ginseng samples were determined by selecting normalization methods, number of partial-least-squares (PLS) components, and variable influence on projection (VIP) cutoff values. The best prediction model for discriminating 5- and 6-year-old ginseng was developed using tap root, vector normalization applied after the second differentiation, one PLS component, and a VIP cutoff of 1.0 (based on the lowest root-mean-square error of prediction value). In addition, for discriminating among the three parts of P. ginseng, optimized PLSR models were established using data sets obtained from vector normalization, two PLS components, and VIP cutoff values of 1.5 (for 5-year-old ginseng) and 1.3 (for 6-year-old ginseng). To our knowledge, this is the first study to provide a novel strategy for rapidly discriminating the cultivation ages and parts of P. ginseng using FT-IR by selected normalization methods, number of PLS components, and VIP cutoff values.
Discrimination and prediction of cultivation age and parts of Panax ginseng by Fourier-transform infrared spectroscopy combined with multivariate statistical analysis

PubMed Central

Lim, Sa Rang; Huang, Linfang

2017-01-01

Panax ginseng C.A. Meyer is a herb used for medicinal purposes, and its discrimination according to cultivation age has been an important and practical issue. This study employed Fourier-transform infrared (FT-IR) spectroscopy with multivariate statistical analysis to obtain a prediction model for discriminating cultivation ages (5 and 6 years) and three different parts (rhizome, tap root, and lateral root) of P. ginseng. The optimal partial-least-squares regression (PLSR) models for discriminating ginseng samples were determined by selecting normalization methods, number of partial-least-squares (PLS) components, and variable influence on projection (VIP) cutoff values. The best prediction model for discriminating 5- and 6-year-old ginseng was developed using tap root, vector normalization applied after the second differentiation, one PLS component, and a VIP cutoff of 1.0 (based on the lowest root-mean-square error of prediction value). In addition, for discriminating among the three parts of P. ginseng, optimized PLSR models were established using data sets obtained from vector normalization, two PLS components, and VIP cutoff values of 1.5 (for 5-year-old ginseng) and 1.3 (for 6-year-old ginseng). To our knowledge, this is the first study to provide a novel strategy for rapidly discriminating the cultivation ages and parts of P. ginseng using FT-IR by selected normalization methods, number of PLS components, and VIP cutoff values. PMID:29049369
The law of distribution of light beam direction fluctuations in telescopes. [normal density functions

NASA Technical Reports Server (NTRS)

Divinskiy, M. L.; Kolchinskiy, I. G.

1974-01-01

The distribution of deviations from mean star trail directions was studied on the basis of 105 star trails. It was found that about 93% of the trails yield a distribution in agreement with the normal law. About 4% of the star trails agree with the Charlier distribution.
Root Cause Analysis of Quality Defects Using HPLC-MS Fingerprint Knowledgebase for Batch-to-batch Quality Control of Herbal Drugs.

PubMed

Yan, Binjun; Fang, Zhonghua; Shen, Lijuan; Qu, Haibin

2015-01-01

The batch-to-batch quality consistency of herbal drugs has always been an important issue. To propose a methodology for batch-to-batch quality control based on HPLC-MS fingerprints and process knowledgebase. The extraction process of Compound E-jiao Oral Liquid was taken as a case study. After establishing the HPLC-MS fingerprint analysis method, the fingerprints of the extract solutions produced under normal and abnormal operation conditions were obtained. Multivariate statistical models were built for fault detection and a discriminant analysis model was built using the probabilistic discriminant partial-least-squares method for fault diagnosis. Based on multivariate statistical analysis, process knowledge was acquired and the cause-effect relationship between process deviations and quality defects was revealed. The quality defects were detected successfully by multivariate statistical control charts and the type of process deviations were diagnosed correctly by discriminant analysis. This work has demonstrated the benefits of combining HPLC-MS fingerprints, process knowledge and multivariate analysis for the quality control of herbal drugs. Copyright © 2015 John Wiley & Sons, Ltd.
The association between body mass index and severe biliary infections: a multivariate analysis.

PubMed

Stewart, Lygia; Griffiss, J McLeod; Jarvis, Gary A; Way, Lawrence W

2012-11-01

Obesity has been associated with worse infectious disease outcomes. It is a risk factor for cholesterol gallstones, but little is known about associations between body mass index (BMI) and biliary infections. We studied this using factors associated with biliary infections. A total of 427 patients with gallstones were studied. Gallstones, bile, and blood (as applicable) were cultured. Illness severity was classified as follows: none (no infection or inflammation), systemic inflammatory response syndrome (fever, leukocytosis), severe (abscess, cholangitis, empyema), or multi-organ dysfunction syndrome (bacteremia, hypotension, organ failure). Associations between BMI and biliary bacteria, bacteremia, gallstone type, and illness severity were examined using bivariate and multivariate analysis. BMI inversely correlated with pigment stones, biliary bacteria, bacteremia, and increased illness severity on bivariate and multivariate analysis. Obesity correlated with less severe biliary infections. BMI inversely correlated with pigment stones and biliary bacteria; multivariate analysis showed an independent correlation between lower BMI and illness severity. Most patients with severe biliary infections had a normal BMI, suggesting that obesity may be protective in biliary infections. This study examined the correlation between BMI and biliary infection severity. Published by Elsevier Inc.
Does tip-of-the-tongue for proper names discriminate amnestic mild cognitive impairment?

PubMed

Juncos-Rabadán, Onésimo; Facal, David; Lojo-Seoane, Cristina; Pereiro, Arturo X

2013-04-01

Difficulty in retrieving people's names is very common in the early stages of Alzheimer's disease and mild cognitive impairment. Such difficulty is often observed as the tip-of-the-tongue (TOT) phenomenon. The main aim of this study was to explore whether a famous people's naming task that elicited the TOT state can be used to discriminate between amnestic mild cognitive impairment (aMCI) patients and normal controls. Eighty-four patients with aMCI and 106 normal controls aged over 50 years performed a task involving naming 50 famous people shown in pictures. Univariate and multivariate regression analyses were used to study the relationships between aMCI and semantic and phonological measures in the TOT paradigm. Univariate regression analyses revealed that all TOT measures significantly predicted aMCI. Multivariate analysis of all these measures correctly classified 70% of controls (specificity) and 71.6% of aMCI patients (sensitivity), with an AUC (area under curve ROC) value of 0.74, but only the phonological measure remained significant. This classification value was similar to that obtained with the Semantic verbal fluency test. TOTs for proper names may effectively discriminate aMCI patients from normal controls through measures that represent one of the naming processes affected, that is, phonological access.
Dose response explorer: an integrated open-source tool for exploring and modelling radiotherapy dose volume outcome relationships

NASA Astrophysics Data System (ADS)

El Naqa, I.; Suneja, G.; Lindsay, P. E.; Hope, A. J.; Alaly, J. R.; Vicic, M.; Bradley, J. D.; Apte, A.; Deasy, J. O.

2006-11-01

Radiotherapy treatment outcome models are a complicated function of treatment, clinical and biological factors. Our objective is to provide clinicians and scientists with an accurate, flexible and user-friendly software tool to explore radiotherapy outcomes data and build statistical tumour control or normal tissue complications models. The software tool, called the dose response explorer system (DREES), is based on Matlab, and uses a named-field structure array data type. DREES/Matlab in combination with another open-source tool (CERR) provides an environment for analysing treatment outcomes. DREES provides many radiotherapy outcome modelling features, including (1) fitting of analytical normal tissue complication probability (NTCP) and tumour control probability (TCP) models, (2) combined modelling of multiple dose-volume variables (e.g., mean dose, max dose, etc) and clinical factors (age, gender, stage, etc) using multi-term regression modelling, (3) manual or automated selection of logistic or actuarial model variables using bootstrap statistical resampling, (4) estimation of uncertainty in model parameters, (5) performance assessment of univariate and multivariate analyses using Spearman's rank correlation and chi-square statistics, boxplots, nomograms, Kaplan-Meier survival plots, and receiver operating characteristics curves, and (6) graphical capabilities to visualize NTCP or TCP prediction versus selected variable models using various plots. DREES provides clinical researchers with a tool customized for radiotherapy outcome modelling. DREES is freely distributed. We expect to continue developing DREES based on user feedback.
Finite Element Simulation and Experimental Verification of Internal Stress of Quenched AISI 4140 Cylinders

NASA Astrophysics Data System (ADS)

Liu, Yu; Qin, Shengwei; Hao, Qingguo; Chen, Nailu; Zuo, Xunwei; Rong, Yonghua

2017-03-01

The study of internal stress in quenched AISI 4140 medium carbon steel is of importance in engineering. In this work, the finite element simulation (FES) was employed to predict the distribution of internal stress in quenched AISI 4140 cylinders with two sizes of diameter based on exponent-modified (Ex-Modified) normalized function. The results indicate that the FES based on Ex-Modified normalized function proposed is better consistent with X-ray diffraction measurements of the stress distribution than FES based on normalized function proposed by Abrassart, Desalos and Leblond, respectively, which is attributed that Ex-Modified normalized function better describes transformation plasticity. Effect of temperature distribution on the phase formation, the origin of residual stress distribution and effect of transformation plasticity function on the residual stress distribution were further discussed.
Smooth quantile normalization.

PubMed

Hicks, Stephanie C; Okrah, Kwame; Paulson, Joseph N; Quackenbush, John; Irizarry, Rafael A; Bravo, Héctor Corrada

2018-04-01

Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.
Multi-Sample Cluster Analysis Using Akaike’s Information Criterion.

DTIC Science & Technology

1982-12-20

Intervals. For more details on these test procedures refer to Gabriel [7J, Krishnaiah (CIlUj, [11]), Srivastava [16), and others. -3- As noted in Consul...723. (4] Consul, P. C. (1969), "The Exact Distributions of Likelihood Criteria for Different Hypotheses," in P. R. Krishnaiah (Ed.), Multivariate...1178. [7] Gabriel, K. R. (1969), "A Comparison of Some lethods of Simultaneous Inference in MANOVA," in P. R. Krishnaiah (Ed.), Multivariate Analysis-lI
A hybrid clustering approach for multivariate time series - A case study applied to failure analysis in a gas turbine.

PubMed

Fontes, Cristiano Hora; Budman, Hector

2017-11-01

A clustering problem involving multivariate time series (MTS) requires the selection of similarity metrics. This paper shows the limitations of the PCA similarity factor (SPCA) as a single metric in nonlinear problems where there are differences in magnitude of the same process variables due to expected changes in operation conditions. A novel method for clustering MTS based on a combination between SPCA and the average-based Euclidean distance (AED) within a fuzzy clustering approach is proposed. Case studies involving either simulated or real industrial data collected from a large scale gas turbine are used to illustrate that the hybrid approach enhances the ability to recognize normal and fault operating patterns. This paper also proposes an oversampling procedure to create synthetic multivariate time series that can be useful in commonly occurring situations involving unbalanced data sets. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Optimal moment determination in POME-copula based hydrometeorological dependence modelling

NASA Astrophysics Data System (ADS)

Liu, Dengfeng; Wang, Dong; Singh, Vijay P.; Wang, Yuankun; Wu, Jichun; Wang, Lachun; Zou, Xinqing; Chen, Yuanfang; Chen, Xi

2017-07-01

Copula has been commonly applied in multivariate modelling in various fields where marginal distribution inference is a key element. To develop a flexible, unbiased mathematical inference framework in hydrometeorological multivariate applications, the principle of maximum entropy (POME) is being increasingly coupled with copula. However, in previous POME-based studies, determination of optimal moment constraints has generally not been considered. The main contribution of this study is the determination of optimal moments for POME for developing a coupled optimal moment-POME-copula framework to model hydrometeorological multivariate events. In this framework, margins (marginals, or marginal distributions) are derived with the use of POME, subject to optimal moment constraints. Then, various candidate copulas are constructed according to the derived margins, and finally the most probable one is determined, based on goodness-of-fit statistics. This optimal moment-POME-copula framework is applied to model the dependence patterns of three types of hydrometeorological events: (i) single-site streamflow-water level; (ii) multi-site streamflow; and (iii) multi-site precipitation, with data collected from Yichang and Hankou in the Yangtze River basin, China. Results indicate that the optimal-moment POME is more accurate in margin fitting and the corresponding copulas reflect a good statistical performance in correlation simulation. Also, the derived copulas, capturing more patterns which traditional correlation coefficients cannot reflect, provide an efficient way in other applied scenarios concerning hydrometeorological multivariate modelling.
A Semi-parametric Transformation Frailty Model for Semi-competing Risks Survival Data

PubMed Central

Jiang, Fei; Haneuse, Sebastien

2016-01-01

In the analysis of semi-competing risks data interest lies in estimation and inference with respect to a so-called non-terminal event, the observation of which is subject to a terminal event. Multi-state models are commonly used to analyse such data, with covariate effects on the transition/intensity functions typically specified via the Cox model and dependence between the non-terminal and terminal events specified, in part, by a unit-specific shared frailty term. To ensure identifiability, the frailties are typically assumed to arise from a parametric distribution, specifically a Gamma distribution with mean 1.0 and variance, say, σ2. When the frailty distribution is misspecified, however, the resulting estimator is not guaranteed to be consistent, with the extent of asymptotic bias depending on the discrepancy between the assumed and true frailty distributions. In this paper, we propose a novel class of transformation models for semi-competing risks analysis that permit the non-parametric specification of the frailty distribution. To ensure identifiability, the class restricts to parametric specifications of the transformation and the error distribution; the latter are flexible, however, and cover a broad range of possible specifications. We also derive the semi-parametric efficient score under the complete data setting and propose a non-parametric score imputation method to handle right censoring; consistency and asymptotic normality of the resulting estimators is derived and small-sample operating characteristics evaluated via simulation. Although the proposed semi-parametric transformation model and non-parametric score imputation method are motivated by the analysis of semi-competing risks data, they are broadly applicable to any analysis of multivariate time-to-event outcomes in which a unit-specific shared frailty is used to account for correlation. Finally, the proposed model and estimation procedures are applied to a study of hospital readmission among patients diagnosed with pancreatic cancer. PMID:28439147
Continuous Sub-daily Rainfall Simulation for Regional Flood Risk Assessment - Modelling of Spatio-temporal Correlation Structure of Extreme Precipitation in the Austrian Alps

NASA Astrophysics Data System (ADS)

Salinas, J. L.; Nester, T.; Komma, J.; Bloeschl, G.

2017-12-01

Generation of realistic synthetic spatial rainfall is of pivotal importance for assessing regional hydroclimatic hazard as the input for long term rainfall-runoff simulations. The correct reproduction of observed rainfall characteristics, such as regional intensity-duration-frequency curves, and spatial and temporal correlations is necessary to adequately model the magnitude and frequency of the flood peaks, by reproducing antecedent soil moisture conditions before extreme rainfall events, and joint probability of flood waves at confluences. In this work, a modification of the model presented by Bardossy and Platte (1992), where precipitation is first modeled on a station basis as a multivariate autoregressive model (mAr) in a Normal space. The spatial and temporal correlation structures are imposed in the Normal space, allowing for a different temporal autocorrelation parameter for each station, and simultaneously ensuring the positive-definiteness of the correlation matrix of the mAr errors. The Normal rainfall is then transformed to a Gamma-distributed space, with parameters varying monthly according to a sinusoidal function, in order to adapt to the observed rainfall seasonality. One of the main differences with the original model is the simulation time-step, reduced from 24h to 6h. Due to a larger availability of daily rainfall data, as opposite to sub-daily (e.g. hourly), the parameters of the Gamma distributions are calibrated to reproduce simultaneously a series of daily rainfall characteristics (mean daily rainfall, standard deviations of daily rainfall, and 24h intensity-duration-frequency [IDF] curves), as well as other aggregated rainfall measures (mean annual rainfall, and monthly rainfall). The calibration of the spatial and temporal correlation parameters is performed in a way that the catchment-averaged IDF curves aggregated at different temporal scales fit the measured ones. The rainfall model is used to generate 10.000 years of synthetic precipitation, fed into a rainfall-runoff model to derive the flood frequency in the Tirolean Alps in Austria. Given the number of generated events, the simulation framework is able to generate a large variety of rainfall patterns, as well as reproduce the variograms of relevant extreme rainfall events in the region of interest.

Considerations in cross-validation type density smoothing with a look at some data

NASA Technical Reports Server (NTRS)

Schuster, E. F.

1982-01-01

Experience gained in applying nonparametric maximum likelihood techniques of density estimation to judge the comparative quality of various estimators is reported. Two invariate data sets of one hundered samples (one Cauchy, one natural normal) are considered as well as studies in the multivariate case.
Simultaneous Inference Procedures for Means.

ERIC Educational Resources Information Center

Krishnaiah, P. R.

Some aspects of simultaneous tests for means are reviewed. Specifically, the comparison of univariate or multivariate normal populations based on the values of the means or mean vectors when the variances or covariance matrices are equal is discussed. Tukey's and Dunnett's tests for multiple comparisons of means, Scheffe's method of examining…
Disfluency in Spasmodic Dysphonia: A Multivariate Analysis.

ERIC Educational Resources Information Center

Cannito, Michael P.; Burch, Annette Renee; Watts, Christopher; Rappold, Patrick W.; Hood, Stephen B.; Sherrard, Kyla

1997-01-01

This study examined visual analog scaling judgments of disfluency by normal listeners in response to oral reading by 20 adults with spasmodic dysphonia (SD) and nondysphonic controls. Findings suggest that although dysfluency is not a defining feature of SD, it does contribute significantly to the overall clinical impression of severity of the…
LOG-NORMAL DISTRIBUTION OF COSMIC VOIDS IN SIMULATIONS AND MOCKS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Russell, E.; Pycke, J.-R., E-mail: er111@nyu.edu, E-mail: jrp15@nyu.edu

2017-01-20

Following up on previous studies, we complete here a full analysis of the void size distributions of the Cosmic Void Catalog based on three different simulation and mock catalogs: dark matter (DM), haloes, and galaxies. Based on this analysis, we attempt to answer two questions: Is a three-parameter log-normal distribution a good candidate to satisfy the void size distributions obtained from different types of environments? Is there a direct relation between the shape parameters of the void size distribution and the environmental effects? In an attempt to answer these questions, we find here that all void size distributions of thesemore » data samples satisfy the three-parameter log-normal distribution whether the environment is dominated by DM, haloes, or galaxies. In addition, the shape parameters of the three-parameter log-normal void size distribution seem highly affected by environment, particularly existing substructures. Therefore, we show two quantitative relations given by linear equations between the skewness and the maximum tree depth, and between the variance of the void size distribution and the maximum tree depth, directly from the simulated data. In addition to this, we find that the percentage of voids with nonzero central density in the data sets has a critical importance. If the number of voids with nonzero central density reaches ≥3.84% in a simulation/mock sample, then a second population is observed in the void size distributions. This second population emerges as a second peak in the log-normal void size distribution at larger radius.« less
Application of multivariate Gaussian detection theory to known non-Gaussian probability density functions

NASA Astrophysics Data System (ADS)

Schwartz, Craig R.; Thelen, Brian J.; Kenton, Arthur C.

1995-06-01

A statistical parametric multispectral sensor performance model was developed by ERIM to support mine field detection studies, multispectral sensor design/performance trade-off studies, and target detection algorithm development. The model assumes target detection algorithms and their performance models which are based on data assumed to obey multivariate Gaussian probability distribution functions (PDFs). The applicability of these algorithms and performance models can be generalized to data having non-Gaussian PDFs through the use of transforms which convert non-Gaussian data to Gaussian (or near-Gaussian) data. An example of one such transform is the Box-Cox power law transform. In practice, such a transform can be applied to non-Gaussian data prior to the introduction of a detection algorithm that is formally based on the assumption of multivariate Gaussian data. This paper presents an extension of these techniques to the case where the joint multivariate probability density function of the non-Gaussian input data is known, and where the joint estimate of the multivariate Gaussian statistics, under the Box-Cox transform, is desired. The jointly estimated multivariate Gaussian statistics can then be used to predict the performance of a target detection algorithm which has an associated Gaussian performance model.
Caspase-3 activity, response to chemotherapy and clinical outcome in patients with colon cancer.

PubMed

de Oca, Javier; Azuara, Daniel; Sanchez-Santos, Raquel; Navarro, Matilde; Capella, Gabriel; Moreno, Victor; Sola, Anna; Hotter, Georgina; Biondo, Sebastiano; Osorio, Alfonso; Martí-Ragué, Joan; Rafecas, Antoni

2008-01-01

The prognostic value of the degree of apoptosis in colorectal cancer is controversial. This study evaluates the putative clinical usefulness of measuring caspase-3 activity as a prognostic factor in colonic cancer patients receiving 5-fluoracil adjuvant chemotherapy. We evaluated caspase-3-like protease activity in tumours and in normal colon tissue. Specimens were studied from 54 patients. These patients had either stage III cancer (Dukes stage C) or high-risk stage II cancer (Dukes stage B2 with invasion of adjacent organs, lymphatic or vascular infiltration or carcinoembryonic antigen [CEA] >5). Median follow-up was 73 months. Univariate analysis was performed previously to explore the relation of different variables (age, sex, preoperative CEA, tumour size, Dukes stage, vascular invasion, lymphatic invasion, caspase-3 activity in tumour and caspase-3 activity in normal mucosa) as prognostic factors of tumour recurrence after chemotherapy treatment. Subsequently, a multivariate Cox regression model was performed. Median values of caspase-3 activity in tumours were more than twice those in normal mucosa (88.1 vs 40.6 U, p=0.001), showing a statistically significant correlation (r=0.34). Significant prognostic factors of recurrence in multivariate analysis were: male sex (odds ratio, OR=3.53 [1.13-10.90], p=0.02), age (OR=1.09 [1.01-1.18], p=0.03), Dukes stage (OR=1.93 [1.01-3.70]), caspase-3 activity in normal mucosa (OR=1.02 [1.01-1.04], p=0.017) and caspase-3 activity in tumour (OR=1.02 [1.01-1.03], p=0.013). Low caspase-3 activity in the normal mucosa and tumour are independent prognostic factors of tumour recurrence in patients receiving adjuvant 5-fluoracil-based treatment in colon cancer, correlating with poor disease-free survival and higher recurrence rate.
Time-independent models of asset returns revisited

NASA Astrophysics Data System (ADS)

Gillemot, L.; Töyli, J.; Kertesz, J.; Kaski, K.

2000-07-01

In this study we investigate various well-known time-independent models of asset returns being simple normal distribution, Student t-distribution, Lévy, truncated Lévy, general stable distribution, mixed diffusion jump, and compound normal distribution. For this we use Standard and Poor's 500 index data of the New York Stock Exchange, Helsinki Stock Exchange index data describing a small volatile market, and artificial data. The results indicate that all models, excluding the simple normal distribution, are, at least, quite reasonable descriptions of the data. Furthermore, the use of differences instead of logarithmic returns tends to make the data looking visually more Lévy-type distributed than it is. This phenomenon is especially evident in the artificial data that has been generated by an inflated random walk process.
[Multivariate geostatistics and GIS-based approach to study the spatial distribution and sources of heavy metals in agricultural soil in the Pearl River Delta, China].

PubMed

Cai, Li-mei; Ma, Jin; Zhou, Yong-zhang; Huang, Lan-chun; Dou, Lei; Zhang, Cheng-bo; Fu, Shan-ming

2008-12-01

One hundred and eighteen surface soil samples were collected from the Dongguan City, and analyzed for concentration of Cu, Zn, Ni, Cr, Pb, Cd, As, Hg, pH and OM. The spatial distribution and sources of soil heavy metals were studied using multivariate geostatistical methods and GIS technique. The results indicated concentrations of Cu, Zn, Ni, Pb, Cd and Hg were beyond the soil background content in Guangdong province, and especially concentrations of Pb, Cd and Hg were greatly beyond the content. The results of factor analysis group Cu, Zn, Ni, Cr and As in Factor 1, Pb and Hg in Factor 2 and Cd in Factor 3. The spatial maps based on geostatistical analysis show definite association of Factor 1 with the soil parent material, Factor 2 was mainly affected by industries. The spatial distribution of Factor 3 was attributed to anthropogenic influence.
Constructing inverse probability weights for continuous exposures: a comparison of methods.

PubMed

Naimi, Ashley I; Moodie, Erica E M; Auger, Nathalie; Kaufman, Jay S

2014-03-01

Inverse probability-weighted marginal structural models with binary exposures are common in epidemiology. Constructing inverse probability weights for a continuous exposure can be complicated by the presence of outliers, and the need to identify a parametric form for the exposure and account for nonconstant exposure variance. We explored the performance of various methods to construct inverse probability weights for continuous exposures using Monte Carlo simulation. We generated two continuous exposures and binary outcomes using data sampled from a large empirical cohort. The first exposure followed a normal distribution with homoscedastic variance. The second exposure followed a contaminated Poisson distribution, with heteroscedastic variance equal to the conditional mean. We assessed six methods to construct inverse probability weights using: a normal distribution, a normal distribution with heteroscedastic variance, a truncated normal distribution with heteroscedastic variance, a gamma distribution, a t distribution (1, 3, and 5 degrees of freedom), and a quantile binning approach (based on 10, 15, and 20 exposure categories). We estimated the marginal odds ratio for a single-unit increase in each simulated exposure in a regression model weighted by the inverse probability weights constructed using each approach, and then computed the bias and mean squared error for each method. For the homoscedastic exposure, the standard normal, gamma, and quantile binning approaches performed best. For the heteroscedastic exposure, the quantile binning, gamma, and heteroscedastic normal approaches performed best. Our results suggest that the quantile binning approach is a simple and versatile way to construct inverse probability weights for continuous exposures.
Generating an Empirical Probability Distribution for the Andrews-Pregibon Statistic.

ERIC Educational Resources Information Center

Jarrell, Michele G.

A probability distribution was developed for the Andrews-Pregibon (AP) statistic. The statistic, developed by D. F. Andrews and D. Pregibon (1978), identifies multivariate outliers. It is a ratio of the determinant of the data matrix with an observation deleted to the determinant of the entire data matrix. Although the AP statistic has been used…
A general approach to double-moment normalization of drop size distributions

NASA Astrophysics Data System (ADS)

Lee, G. W.; Sempere-Torres, D.; Uijlenhoet, R.; Zawadzki, I.

2003-04-01

Normalization of drop size distributions (DSDs) is re-examined here. First, we present an extension of scaling normalization using one moment of the DSD as a parameter (as introduced by Sempere-Torres et al, 1994) to a scaling normalization using two moments as parameters of the normalization. It is shown that the normalization of Testud et al. (2001) is a particular case of the two-moment scaling normalization. Thus, a unified vision of the question of DSDs normalization and a good model representation of DSDs is given. Data analysis shows that from the point of view of moment estimation least square regression is slightly more effective than moment estimation from the normalized average DSD.
A Bayesian Nonparametric Meta-Analysis Model

ERIC Educational Resources Information Center

Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G.

2015-01-01

In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall…
Elemental analysis of tissue pellets for the differentiation of epidermal lesion and normal skin by laser-induced breakdown spectroscopy

PubMed Central

Moon, Youngmin; Han, Jung Hyun; Shin, Sungho; Kim, Yong-Chul; Jeong, Sungho

2016-01-01

By laser induced breakdown spectroscopy (LIBS) analysis of epidermal lesion and dermis tissue pellets of hairless mouse, it is shown that Ca intensity in the epidermal lesion is higher than that in dermis, whereas Na and K intensities have an opposite tendency. It is demonstrated that epidermal lesion and normal dermis can be differentiated with high selectivity either by univariate or multivariate analysis of LIBS spectra with an intensity ratio difference by factor of 8 or classification accuracy over 0.995, respectively. PMID:27231610
Type I error rates of rare single nucleotide variants are inflated in tests of association with non-normally distributed traits using simple linear regression methods.

PubMed

Schwantes-An, Tae-Hwi; Sung, Heejong; Sabourin, Jeremy A; Justice, Cristina M; Sorant, Alexa J M; Wilson, Alexander F

2016-01-01

In this study, the effects of (a) the minor allele frequency of the single nucleotide variant (SNV), (b) the degree of departure from normality of the trait, and (c) the position of the SNVs on type I error rates were investigated in the Genetic Analysis Workshop (GAW) 19 whole exome sequence data. To test the distribution of the type I error rate, 5 simulated traits were considered: standard normal and gamma distributed traits; 2 transformed versions of the gamma trait (log 10 and rank-based inverse normal transformations); and trait Q1 provided by GAW 19. Each trait was tested with 313,340 SNVs. Tests of association were performed with simple linear regression and average type I error rates were determined for minor allele frequency classes. Rare SNVs (minor allele frequency < 0.05) showed inflated type I error rates for non-normally distributed traits that increased as the minor allele frequency decreased. The inflation of average type I error rates increased as the significance threshold decreased. Normally distributed traits did not show inflated type I error rates with respect to the minor allele frequency for rare SNVs. There was no consistent effect of transformation on the uniformity of the distribution of the location of SNVs with a type I error.
Alternatives for using multivariate regression to adjust prospective payment rates

PubMed Central

Sheingold, Steven H.

1990-01-01

Multivariate regression analysis has been used in structuring three of the adjustments to Medicare's prospective payment rates. Because the indirect-teaching adjustment, the disproportionate-share adjustment, and the adjustment for large cities are responsible for distributing approximately $3 billion in payments each year, the specification of regression models for these adjustments is of critical importance. In this article, the application of regression for adjusting Medicare's prospective rates is discussed, and the implications that differing specifications could have for these adjustments are demonstrated. PMID:10113271
Computation of distribution of minimum resolution for log-normal distribution of chromatographic peak heights.

PubMed

Davis, Joe M

2011-10-28

General equations are derived for the distribution of minimum resolution between two chromatographic peaks, when peak heights in a multi-component chromatogram follow a continuous statistical distribution. The derivation draws on published theory by relating the area under the distribution of minimum resolution to the area under the distribution of the ratio of peak heights, which in turn is derived from the peak-height distribution. Two procedures are proposed for the equations' numerical solution. The procedures are applied to the log-normal distribution, which recently was reported to describe the distribution of component concentrations in three complex natural mixtures. For published statistical parameters of these mixtures, the distribution of minimum resolution is similar to that for the commonly assumed exponential distribution of peak heights used in statistical-overlap theory. However, these two distributions of minimum resolution can differ markedly, depending on the scale parameter of the log-normal distribution. Theory for the computation of the distribution of minimum resolution is extended to other cases of interest. With the log-normal distribution of peak heights as an example, the distribution of minimum resolution is computed when small peaks are lost due to noise or detection limits, and when the height of at least one peak is less than an upper limit. The distribution of minimum resolution shifts slightly to lower resolution values in the first case and to markedly larger resolution values in the second one. The theory and numerical procedure are confirmed by Monte Carlo simulation. Copyright © 2011 Elsevier B.V. All rights reserved.
Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable.

PubMed

Austin, Peter C; Steyerberg, Ewout W

2012-06-20

When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population.
Application of a truncated normal failure distribution in reliability testing

NASA Technical Reports Server (NTRS)

Groves, C., Jr.

1968-01-01

Statistical truncated normal distribution function is applied as a time-to-failure distribution function in equipment reliability estimations. Age-dependent characteristics of the truncated function provide a basis for formulating a system of high-reliability testing that effectively merges statistical, engineering, and cost considerations.
The transmembrane gradient of the dielectric constant influences the DPH lifetime distribution.

PubMed

Konopásek, I; Kvasnicka, P; Amler, E; Kotyk, A; Curatola, G

1995-11-06

The fluorescence lifetime distribution of 1,6-diphenyl-1,3,5-hexatriene (DPH) and 1-[4-(trimethylamino)phenyl]-6-phenyl-1,3,5-hexatriene (TMA-DPH) in egg-phosphatidylcholine liposomes was measured in normal and heavy water. The lower dielectric constant (by approximately 12%) of heavy water compared with normal water was employed to provide direct evidence that the drop of the dielectric constant along the membrane normal shifts the centers of the distribution of both DPH and TMA-DPH to higher values and sharpens the widths of the distribution. The profile of the dielectric constant along the membrane normal was not found to be a linear gradient (in contrast to [1]) but a more complex function. Presence of cholesterol in liposomes further shifted the center of the distributions to higher value and sharpened them. In addition, it resulted in a more gradient-like profile of the dielectric constant (i.e. linearization) along the normal of the membrane. The effect of the change of dielectric constant on the membrane proteins is discussed.
Noninvasive Characterization of Indeterminate Pulmonary Nodules Detected on Chest High-Resolution Computed Tomography

DTIC Science & Technology

2016-10-01

the nodule. The discriminability of benign and malignant nodules were analyzed using t- test and the normal distribution of the individual metric value...22 Surround Distribution Distribution of the 7 parenchymal exemplars (Normal, Honey comb, Reticular, Ground glass, mild low attenuation area...the distribution of honey comb, reticular and ground glass surrounding the nodule. 0.001

29 CFR 4044.73 - Lump sums and other alternative forms of distribution in lieu of annuities.

Code of Federal Regulations, 2010 CFR

2010-07-01

... distribution is the present value of the normal form of benefit provided by the plan payable at normal... 29 Labor 9 2010-07-01 2010-07-01 false Lump sums and other alternative forms of distribution in... Benefits and Assets Non-Trusteed Plans § 4044.73 Lump sums and other alternative forms of distribution in...
Detection and Parameter Estimation of Chirped Radar Signals.

DTIC Science & Technology

2000-01-10

Wigner - Ville distribution ( WVD ): The WVD belongs to the Cohen’s class of energy distributions ...length. 28 6. Pseudo Wigner - Ville distribution (PWVD): The PWVD introduces a time-window to the WVD definition, thereby reducing the interferences...Frequency normalized to sampling frequency 26 Figure V.2: Wigner - Ville distribution ; time normalized to the pulse length 28 Figure V.3:
A cutoff value based on analysis of a reference population decreases overestimation of the prevalence of nocturnal polyuria.

PubMed

van Haarst, Ernst P; Bosch, J L H Ruud

2012-09-01

We sought criteria for nocturnal polyuria in asymptomatic, nonurological adults of all ages by reporting reference values of the ratio of daytime and nighttime urine volumes, and finding nocturia predictors. Data from a database of frequency-volume charts from a reference population of 894 nonurological, asymptomatic volunteers of all age groups were analyzed. The nocturnal polyuria index and the nocturia index were calculated and factors influencing these values were determined by multivariate analysis. The nocturnal polyuria index had wide variation but a normal distribution with a mean ± SD of 30% ± 12%. The 95th percentile of the values was 53%. Above this cutoff a patient had nocturnal polyuria. This value contrasts with the International Continence Society definition of 33% but agrees with several other reports. On multivariate regression analysis with the nocturnal polyuria index as the dependent variable sleeping time, maximum voided volume and age were the covariates. However, the increase in the nocturnal polyuria index by age was small. Excluding polyuria and nocturia from analysis did not alter the results in a relevant way. The nocturnal voiding frequency depended on sleeping time and maximum voided volume but most of all on the nocturia index. The prevalence of nocturnal polyuria is overestimated. We suggest a new cutoff value for the nocturnal polyuria index, that is nocturnal polyuria exists when the nocturnal polyuria index exceeds 53%. The nocturia index is the best predictor of nocturia. Copyright © 2012 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
Generalization of the normal-exponential model: exploration of a more accurate parametrisation for the signal distribution on Illumina BeadArrays.

PubMed

Plancade, Sandra; Rozenholc, Yves; Lund, Eiliv

2012-12-11

Illumina BeadArray technology includes non specific negative control features that allow a precise estimation of the background noise. As an alternative to the background subtraction proposed in BeadStudio which leads to an important loss of information by generating negative values, a background correction method modeling the observed intensities as the sum of the exponentially distributed signal and normally distributed noise has been developed. Nevertheless, Wang and Ye (2012) display a kernel-based estimator of the signal distribution on Illumina BeadArrays and suggest that a gamma distribution would represent a better modeling of the signal density. Hence, the normal-exponential modeling may not be appropriate for Illumina data and background corrections derived from this model may lead to wrong estimation. We propose a more flexible modeling based on a gamma distributed signal and a normal distributed background noise and develop the associated background correction, implemented in the R-package NormalGamma. Our model proves to be markedly more accurate to model Illumina BeadArrays: on the one hand, it is shown on two types of Illumina BeadChips that this model offers a more correct fit of the observed intensities. On the other hand, the comparison of the operating characteristics of several background correction procedures on spike-in and on normal-gamma simulated data shows high similarities, reinforcing the validation of the normal-gamma modeling. The performance of the background corrections based on the normal-gamma and normal-exponential models are compared on two dilution data sets, through testing procedures which represent various experimental designs. Surprisingly, we observe that the implementation of a more accurate parametrisation in the model-based background correction does not increase the sensitivity. These results may be explained by the operating characteristics of the estimators: the normal-gamma background correction offers an improvement in terms of bias, but at the cost of a loss in precision. This paper addresses the lack of fit of the usual normal-exponential model by proposing a more flexible parametrisation of the signal distribution as well as the associated background correction. This new model proves to be considerably more accurate for Illumina microarrays, but the improvement in terms of modeling does not lead to a higher sensitivity in differential analysis. Nevertheless, this realistic modeling makes way for future investigations, in particular to examine the characteristics of pre-processing strategies.
A New Approach in Generating Meteorological Forecasts for Ensemble Streamflow Forecasting using Multivariate Functions

NASA Astrophysics Data System (ADS)

Khajehei, S.; Madadgar, S.; Moradkhani, H.

2014-12-01

The reliability and accuracy of hydrological predictions are subject to various sources of uncertainty, including meteorological forcing, initial conditions, model parameters and model structure. To reduce the total uncertainty in hydrological applications, one approach is to reduce the uncertainty in meteorological forcing by using the statistical methods based on the conditional probability density functions (pdf). However, one of the requirements for current methods is to assume the Gaussian distribution for the marginal distribution of the observed and modeled meteorology. Here we propose a Bayesian approach based on Copula functions to develop the conditional distribution of precipitation forecast needed in deriving a hydrologic model for a sub-basin in the Columbia River Basin. Copula functions are introduced as an alternative approach in capturing the uncertainties related to meteorological forcing. Copulas are multivariate joint distribution of univariate marginal distributions, which are capable to model the joint behavior of variables with any level of correlation and dependency. The method is applied to the monthly forecast of CPC with 0.25x0.25 degree resolution to reproduce the PRISM dataset over 1970-2000. Results are compared with Ensemble Pre-Processor approach as a common procedure used by National Weather Service River forecast centers in reproducing observed climatology during a ten-year verification period (2000-2010).
Self-tuning multivariable pole placement control of a multizone crystal growth furnace

NASA Technical Reports Server (NTRS)

Batur, C.; Sharpless, R. B.; Duval, W. M. B.; Rosenthal, B. N.

1992-01-01

This paper presents the design and implementation of a multivariable self-tuning temperature controller for the control of lead bromide crystal growth. The crystal grows inside a multizone transparent furnace. There are eight interacting heating zones shaping the axial temperature distribution inside the furnace. A multi-input, multi-output furnace model is identified on-line by a recursive least squares estimation algorithm. A multivariable pole placement controller based on this model is derived and implemented. Comparison between single-input, single-output and multi-input, multi-output self-tuning controllers demonstrates that the zone-to-zone interactions can be minimized better by a multi-input, multi-output controller design. This directly affects the quality of crystal grown.
Logistic Approximation to the Normal: The KL Rationale

ERIC Educational Resources Information Center

Savalei, Victoria

2006-01-01

A rationale is proposed for approximating the normal distribution with a logistic distribution using a scaling constant based on minimizing the Kullback-Leibler (KL) information, that is, the expected amount of information available in a sample to distinguish between two competing distributions using a likelihood ratio (LR) test, assuming one of…
Inference for the Bivariate and Multivariate Hidden Truncated Pareto(type II) and Pareto(type IV) Distribution and Some Measures of Divergence Related to Incompatibility of Probability Distribution

ERIC Educational Resources Information Center

Ghosh, Indranil

2011-01-01

Consider a discrete bivariate random variable (X, Y) with possible values x[subscript 1], x[subscript 2],..., x[subscript I] for X and y[subscript 1], y[subscript 2],..., y[subscript J] for Y. Further suppose that the corresponding families of conditional distributions, for X given values of Y and of Y for given values of X are available. We…
Bivariate normal, conditional and rectangular probabilities: A computer program with applications

NASA Technical Reports Server (NTRS)

Swaroop, R.; Brownlow, J. D.; Ashwworth, G. R.; Winter, W. R.

1980-01-01

Some results for the bivariate normal distribution analysis are presented. Computer programs for conditional normal probabilities, marginal probabilities, as well as joint probabilities for rectangular regions are given: routines for computing fractile points and distribution functions are also presented. Some examples from a closed circuit television experiment are included.
Descriptive Statistics for Modern Test Score Distributions: Skewness, Kurtosis, Discreteness, and Ceiling Effects

ERIC Educational Resources Information Center

Ho, Andrew D.; Yu, Carol C.

2015-01-01

Many statistical analyses benefit from the assumption that unconditional or conditional distributions are continuous and normal. More than 50 years ago in this journal, Lord and Cook chronicled departures from normality in educational tests, and Micerri similarly showed that the normality assumption is met rarely in educational and psychological…
The comparison of proportional hazards and accelerated failure time models in analyzing the first birth interval survival data

NASA Astrophysics Data System (ADS)

Faruk, Alfensi

2018-03-01

Survival analysis is a branch of statistics, which is focussed on the analysis of time- to-event data. In multivariate survival analysis, the proportional hazards (PH) is the most popular model in order to analyze the effects of several covariates on the survival time. However, the assumption of constant hazards in PH model is not always satisfied by the data. The violation of the PH assumption leads to the misinterpretation of the estimation results and decreasing the power of the related statistical tests. On the other hand, the accelerated failure time (AFT) models do not assume the constant hazards in the survival data as in PH model. The AFT models, moreover, can be used as the alternative to PH model if the constant hazards assumption is violated. The objective of this research was to compare the performance of PH model and the AFT models in analyzing the significant factors affecting the first birth interval (FBI) data in Indonesia. In this work, the discussion was limited to three AFT models which were based on Weibull, exponential, and log-normal distribution. The analysis by using graphical approach and a statistical test showed that the non-proportional hazards exist in the FBI data set. Based on the Akaike information criterion (AIC), the log-normal AFT model was the most appropriate model among the other considered models. Results of the best fitted model (log-normal AFT model) showed that the covariates such as women’s educational level, husband’s educational level, contraceptive knowledge, access to mass media, wealth index, and employment status were among factors affecting the FBI in Indonesia.
Stream biogeochemical resilience in the age of Anthropocene

NASA Astrophysics Data System (ADS)

Dong, H.; Creed, I. F.

2017-12-01

Recent evidence indicates that biogeochemical cycles are being pushed beyond the tolerance limits of the earth system in the age of the Anthropocene placing terrestrial and aquatic ecosystems at risk. Here, we explored the question: Is there empirical evidence of global atmospheric changes driving losses in stream biogeochemical resilience towards a new normal? Stream biogeochemical resilience is the process of returning to equilibrium conditions after a disturbance and can be measured using three metrics: reactivity (the highest initial response after a disturbance), return rate (the rate of return to equilibrium condition after reactive changes), and variance of the stationary distribution (the signal to noise ratio). Multivariate autoregressive models were used to derive the three metrics for streams along a disturbance gradient - from natural systems where global drivers would dominate, to relatively managed or modified systems where global and local drivers would interact. We observed a loss of biogeochemical resilience in all streams. The key biogeochemical constituent(s) that may be driving loss of biogeochemical resilience were identified from the time series of the stream biogeochemical constituents. Non-stationary trends (detected by Mann-Kendall analysis) and stationary cycles (revealed through Morlet wavelet analysis) were removed, and the standard deviation (SD) of the remaining residuals were analyzed to determine if there was an increase in SD over time that would indicate a pending shift towards a new normal. We observed that nitrate-N and total phosphorus showed behaviours indicative of a pending shift in natural and managed forest systems, but not in agricultural systems. This study provides empirical support that stream ecosystems are showing signs of exceeding planetary boundary tolerance levels and shifting towards a "new normal" in response to global changes, which can be exacerbated by local management activities. Future work will consider the potential for cascading effects on downstream systems.
Relationship of Echocardiographic Z Scores Adjusted for Body Surface Area to Age, Sex, Race, and Ethnicity: The Pediatric Heart Network Normal Echocardiogram Database.

PubMed

Lopez, Leo; Colan, Steven; Stylianou, Mario; Granger, Suzanne; Trachtenberg, Felicia; Frommelt, Peter; Pearson, Gail; Camarda, Joseph; Cnota, James; Cohen, Meryl; Dragulescu, Andreea; Frommelt, Michele; Garuba, Olukayode; Johnson, Tiffanie; Lai, Wyman; Mahgerefteh, Joseph; Pignatelli, Ricardo; Prakash, Ashwin; Sachdeva, Ritu; Soriano, Brian; Soslow, Jonathan; Spurney, Christopher; Srivastava, Shubhika; Taylor, Carolyn; Thankavel, Poonam; van der Velde, Mary; Minich, LuAnn

2017-11-01

Published nomograms of pediatric echocardiographic measurements are limited by insufficient sample size to assess the effects of age, sex, race, and ethnicity. Variable methodologies have resulted in a wide range of Z scores for a single measurement. This multicenter study sought to determine Z scores for common measurements adjusted for body surface area (BSA) and stratified by age, sex, race, and ethnicity. Data collected from healthy nonobese children ≤18 years of age at 19 centers with a normal echocardiogram included age, sex, race, ethnicity, height, weight, echocardiographic images, and measurements performed at the Core Laboratory. Z score models involved indexed parameters (X/BSA α ) that were normally distributed without residual dependence on BSA. The models were tested for the effects of age, sex, race, and ethnicity. Raw measurements from models with and without these effects were compared, and <5% difference was considered clinically insignificant because interobserver variability for echocardiographic measurements are reported as ≥5% difference. Of the 3566 subjects, 90% had measurable images. Appropriate BSA transformations (BSA α ) were selected for each measurement. Multivariable regression revealed statistically significant effects by age, sex, race, and ethnicity for all outcomes, but all effects were clinically insignificant based on comparisons of models with and without the effects, resulting in Z scores independent of age, sex, race, and ethnicity for each measurement. Echocardiographic Z scores based on BSA were derived from a large, diverse, and healthy North American population. Age, sex, race, and ethnicity have small effects on the Z scores that are statistically significant but not clinically important. © 2017 American Heart Association, Inc.
[Rank distributions in community ecology from the statistical viewpoint].

PubMed

Maksimov, V N

2004-01-01

Traditional statistical methods for definition of empirical functions of abundance distribution (population, biomass, production, etc.) of species in a community are applicable for processing of multivariate data contained in the above quantitative indices of the communities. In particular, evaluation of moments of distribution suffices for convolution of the data contained in a list of species and their abundance. At the same time, the species should be ranked in the list in ascending rather than descending population and the distribution models should be analyzed on the basis of the data on abundant species only.
Faà di Bruno's formula and the distributions of random partitions in population genetics and physics.

PubMed

Hoppe, Fred M

2008-06-01

We show that the formula of Faà di Bruno for the derivative of a composite function gives, in special cases, the sampling distributions in population genetics that are due to Ewens and to Pitman. The composite function is the same in each case. Other sampling distributions also arise in this way, such as those arising from Dirichlet, multivariate hypergeometric, and multinomial models, special cases of which correspond to Bose-Einstein, Fermi-Dirac, and Maxwell-Boltzmann distributions in physics. Connections are made to compound sampling models.
PI-RADS version 2: Preoperative role in the detection of normal-sized pelvic lymph node metastasis in prostate cancer.

PubMed

Park, Sung Yoon; Shin, Su-Jin; Jung, Dae Chul; Cho, Nam Hoon; Choi, Young Deuk; Rha, Koon Ho; Hong, Sung Joon; Oh, Young Taik

2017-06-01

To analyze whether Prostate Imaging Reporting and Data System (PI-RADSv2) scores are associated with a risk of normal-sized pelvic lymph node metastasis (PLNM) in prostate cancer (PCa). A consecutive series of 221 patients who underwent magnetic resonance imaging and radical prostatectomy with pelvic lymph node dissection (PLND) for PCa were retrospectively analyzed under the approval of institutional review board in our institution. No patients had enlarged (≥0.8cm in short-axis diameter) lymph nodes. Clinical parameters [prostate-specific antigen (PSA), greatest percentage of biopsy core, and percentage of positive cores], and PI-RADSv2 score from two independent readers were analyzed with multivariate logistic regression and receiver operating-characteristic curve for PLNM. Diagnostic performance of PI-RADSv2 and Briganti nomogram was compared. Weighted kappa was investigated regarding PI-RADSv2 scoring. Normal-sized PLNM was found in 9.5% (21/221) of patients. In multivariate analysis, PI-RADSv2 (reader 1, p=0.009; reader 2, p=0.026) and PSA (reader 1, p=0.008; reader 2, p=0.037) were predictive of normal-sized PLNM. Threshold of PI-RADSv2 was a score of 5, where PI-RADSv2 was associated with high sensitivity (reader 1, 95.2% [20/21]; reader 2, 90.5% [19/21]) and negative predictive value (reader 1, 99.2% [124/125]; reader 2, 98.6% [136/138]). However, diagnostic performance of PI-RADSv2 (AUC=0.786-0.788) was significantly lower than that of Briganti nomogram (AUC=0.890) for normal-sized PLNM (p<0.05). The inter-reader agreement was excellent for PI-RADSv2 of 5 or not (weighted kappa=0.804). PI-RADSv2 scores may be associated with the risk of normal-sized PLNM in PCa. Copyright © 2017. Published by Elsevier B.V.
Parallel Planes Information Visualization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bush, Brian

2015-12-26

This software presents a user-provided multivariate dataset as an interactive three dimensional visualization so that the user can explore the correlation between variables in the observations and the distribution of observations among the variables.
Rural retention of doctors graduating from the rural medical education project to increase rural doctors in Thailand: a cohort study.

PubMed

Pagaiya, Nonglak; Kongkam, Lalitaya; Sriratana, Sanya

2015-03-01

In Thailand, the inequitable distribution of doctors between rural and urban areas has a major impact on access to care for those living in rural communities. The rural medical education programme 'Collaborative Project to Increase Rural Doctors (CPIRD)' was implemented in 1994 with the aim of attracting and retaining rural doctors. This study examined the impact of CPIRD in relation to doctor retention in rural areas and public health service. Baseline data consisting of age, sex and date of entry to the Ministry of Health (MoH) service was collected from 7,157 doctors graduating between 2000 and 2007. There were 1,093 graduates from the CPIRD track and 6,064 that graduated through normal channels. Follow-up data, consisting of workplace, number of years spent in rural districts and years within the MoH service, were retrieved from June 2000 to July 2011. The Kaplan-Meier method of survival analysis and Cox proportional hazards ratios were used to interpret the data. Female subjects slightly outnumbered their male counterparts. Almost half of the normal track (48%) and 33% of the CPIRD doctors eventually left the MoH. The retention rate at rural hospitals was 29% for the CPIRD doctors compared to 18% for those from the normal track. Survival curves indicated a dramatic drop rate after 3 years in service for both groups, but normal track individuals decreased at a faster rate. Multivariate Cox proportional hazards modelling revealed that the normal track doctors had a significantly higher risk of leaving rural areas at about 1.3 times the CPIRD doctors. The predicted median survival time in rural hospitals was 4.2 years for the CPIRD group and 3.4 years for the normal track. The normal track doctors had a significantly higher risk of leaving public service at about 1.5 times the CPIRD doctors. The project evaluation results showed a positive impact in that CPIRD doctors were more likely to stay longer in rural areas and in public service than their counterparts. However, turnover has been increasing in recent years for both groups. There is a need for the MoH to review and improve upon the project implementation.
Fast-NPS-A Markov Chain Monte Carlo-based analysis tool to obtain structural information from single-molecule FRET measurements

NASA Astrophysics Data System (ADS)

Eilert, Tobias; Beckers, Maximilian; Drechsler, Florian; Michaelis, Jens

2017-10-01

The analysis tool and software package Fast-NPS can be used to analyse smFRET data to obtain quantitative structural information about macromolecules in their natural environment. In the algorithm a Bayesian model gives rise to a multivariate probability distribution describing the uncertainty of the structure determination. Since Fast-NPS aims to be an easy-to-use general-purpose analysis tool for a large variety of smFRET networks, we established an MCMC based sampling engine that approximates the target distribution and requires no parameter specification by the user at all. For an efficient local exploration we automatically adapt the multivariate proposal kernel according to the shape of the target distribution. In order to handle multimodality, the sampler is equipped with a parallel tempering scheme that is fully adaptive with respect to temperature spacing and number of chains. Since the molecular surrounding of a dye molecule affects its spatial mobility and thus the smFRET efficiency, we introduce dye models which can be selected for every dye molecule individually. These models allow the user to represent the smFRET network in great detail leading to an increased localisation precision. Finally, a tool to validate the chosen model combination is provided. Programme Files doi:http://dx.doi.org/10.17632/7ztzj63r68.1 Licencing provisions: Apache-2.0 Programming language: GUI in MATLAB (The MathWorks) and the core sampling engine in C++ Nature of problem: Sampling of highly diverse multivariate probability distributions in order to solve for macromolecular structures from smFRET data. Solution method: MCMC algorithm with fully adaptive proposal kernel and parallel tempering scheme.
Finding Groups Using Model-Based Cluster Analysis: Heterogeneous Emotional Self-Regulatory Processes and Heavy Alcohol Use Risk

ERIC Educational Resources Information Center

Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

2008-01-01

Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…

Measurement of Physiologic Glucose Levels Using Raman Spectroscopy in a Rabbit Aqueous Humor Model

NASA Technical Reports Server (NTRS)

Lambert, J.; Storrie-Lombardi, M.; Borchert, M.

1998-01-01

We have elecited a reliable glucose signature in mammalian physiological ranges using near infrared Raman laser excitation at 785 nm and multivariate analysis. In a recent series of experiments we measured glucose levels in an artificial aqueous humor in the range from 0.5 to 13X normal values.
Fine-Tuning Cross-Battery Assessment Procedures: After Follow-Up Testing, Use All Valid Scores, Cohesive or Not

ERIC Educational Resources Information Center

Schneider, W. Joel; Roman, Zachary

2018-01-01

We used data simulations to test whether composites consisting of cohesive subtest scores are more accurate than composites consisting of divergent subtest scores. We demonstrate that when multivariate normality holds, divergent and cohesive scores are equally accurate. Furthermore, excluding divergent scores results in biased estimates of…
Estimation of Latent Group Effects: Psychometric Technical Report No. 2.

ERIC Educational Resources Information Center

Mislevy, Robert J.

Conventional methods of multivariate normal analysis do not apply when the variables of interest are not observed directly, but must be inferred from fallible or incomplete data. For example, responses to mental test items may depend upon latent aptitude variables, which modeled in turn as functions of demographic effects in the population. A…
Robust Optimum Invariant Tests for Random MANOVA Models.

DTIC Science & Technology

1986-10-01

are assumed to be independent normal with zero mean and dispersion o2 and o72 respectively, Roy and Gnanadesikan (1959) considered the prob- 2 2 lem of...Part II: The multivariate case. Ann. Math. Statist. 31, 939-968. [7] Roy, S.N. and Gnanadesikan , R. (1959). Some contributions to ANOVA in one or more
Sworn testimony of the model evidence: Gaussian Mixture Importance (GAME) sampling

NASA Astrophysics Data System (ADS)

Volpi, Elena; Schoups, Gerrit; Firmani, Giovanni; Vrugt, Jasper A.

2017-07-01

What is the "best" model? The answer to this question lies in part in the eyes of the beholder, nevertheless a good model must blend rigorous theory with redeeming qualities such as parsimony and quality of fit. Model selection is used to make inferences, via weighted averaging, from a set of K candidate models, Mk; k=>(1,…,K>), and help identify which model is most supported by the observed data, Y>˜=>(y˜1,…,y˜n>). Here, we introduce a new and robust estimator of the model evidence, p>(Y>˜|Mk>), which acts as normalizing constant in the denominator of Bayes' theorem and provides a single quantitative measure of relative support for each hypothesis that integrates model accuracy, uncertainty, and complexity. However, p>(Y>˜|Mk>) is analytically intractable for most practical modeling problems. Our method, coined GAussian Mixture importancE (GAME) sampling, uses bridge sampling of a mixture distribution fitted to samples of the posterior model parameter distribution derived from MCMC simulation. We benchmark the accuracy and reliability of GAME sampling by application to a diverse set of multivariate target distributions (up to 100 dimensions) with known values of p>(Y>˜|Mk>) and to hypothesis testing using numerical modeling of the rainfall-runoff transformation of the Leaf River watershed in Mississippi, USA. These case studies demonstrate that GAME sampling provides robust and unbiased estimates of the evidence at a relatively small computational cost outperforming commonly used estimators. The GAME sampler is implemented in the MATLAB package of DREAM and simplifies considerably scientific inquiry through hypothesis testing and model selection.
Evaluating effects of methylphenidate on brain activity in cocaine addiction: a machine-learning approach

NASA Astrophysics Data System (ADS)

Rish, Irina; Bashivan, Pouya; Cecchi, Guillermo A.; Goldstein, Rita Z.

2016-03-01

The objective of this study is to investigate effects of methylphenidate on brain activity in individuals with cocaine use disorder (CUD) using functional MRI (fMRI). Methylphenidate hydrochloride (MPH) is an indirect dopamine agonist commonly used for treating attention deficit/hyperactivity disorders; it was also shown to have some positive effects on CUD subjects, such as improved stop signal reaction times associated with better control/inhibition,1 as well as normalized task-related brain activity2 and resting-state functional connectivity in specific areas.3 While prior fMRI studies of MPH in CUDs have focused on mass-univariate statistical hypothesis testing, this paper evaluates multivariate, whole-brain effects of MPH as captured by the generalization (prediction) accuracy of different classification techniques applied to features extracted from resting-state functional networks (e.g., node degrees). Our multivariate predictive results based on resting-state data from3 suggest that MPH tends to normalize network properties such as voxel degrees in CUD subjects, thus providing additional evidence for potential benefits of MPH in treating cocaine addiction.
Predicting Potential Changes in Suitable Habitat and Distribution by 2100 for Tree Species of the Eastern United States

Treesearch

Louis R Iverson; Anantha M. Prasad; Mark W. Schwartz; Mark W. Schwartz

2005-01-01

We predict current distribution and abundance for tree species present in eastern North America, and subsequently estimate potential suitable habitat for those species under a changed climate with 2 x CO2. We used a series of statistical models (i.e., Regression Tree Analysis (RTA), Multivariate Adaptive Regression Splines (MARS), Bagging Trees (...
Macro-Econophysics

NASA Astrophysics Data System (ADS)

Aoyama, Hideaki; Fujiwara, Yoshi; Ikeda, Yuichi; Iyetomi, Hiroshi; Souma, Wataru; Yoshikawa, Hiroshi

2017-07-01

Preface; Foreword, Acknowledgements, List of tables; List of figures, prologue, 1. Introduction: reconstructing macroeconomics; 2. Basic concepts in statistical physics and stochastic models; 3. Income and firm-size distributions; 4. Productivity distribution and related topics; 5. Multivariate time-series analysis; 6. Business cycles; 7. Price dynamics and inflation/deflation; 8. Complex network, community analysis, visualization; 9. Systemic risks; Appendix A: computer program for beginners; Epilogue; Bibliography; Index.
The association between a body shape index and cardiovascular risk in overweight and obese children and adolescents.

PubMed

Mameli, Chiara; Krakauer, Nir Y; Krakauer, Jesse C; Bosetti, Alessandra; Ferrari, Chiara Matilde; Moiana, Norma; Schneider, Laura; Borsani, Barbara; Genoni, Teresa; Zuccotti, Gianvincenzo

2018-01-01

A Body Shape Index (ABSI) and normalized hip circumference (Hip Index, HI) have been recently shown to be strong risk factors for mortality and for cardiovascular disease in adults. We conducted an observational cross-sectional study to evaluate the relationship between ABSI, HI and cardiometabolic risk factors and obesity-related comorbidities in overweight and obese children and adolescents aged 2-18 years. We performed multivariate linear and logistic regression analyses with BMI, ABSI, and HI age and sex normalized z scores as predictors to examine the association with cardiometabolic risk markers (systolic and diastolic blood pressure, fasting glucose and insulin, total cholesterol and its components, transaminases, fat mass % detected by bioelectrical impedance analysis) and obesity-related conditions (including hepatic steatosis and metabolic syndrome). We recruited 217 patients (114 males), mean age 11.3 years. Multivariate linear regression showed a significant association of ABSI z score with 10 out of 15 risk markers expressed as continuous variables, while BMI z score showed a significant correlation with 9 and HI only with 1. In multivariate logistic regression to predict occurrence of obesity-related conditions and above-threshold values of risk factors, BMI z score was significantly correlated to 7 out of 12, ABSI to 5, and HI to 1. Overall, ABSI is an independent anthropometric index that was significantly associated with cardiometabolic risk markers in a pediatric population affected by overweight and obesity.
Body composition status and the risk of migraine: A meta-analysis.

PubMed

Gelaye, Bizu; Sacco, Simona; Brown, Wendy J; Nitchie, Haley L; Ornello, Raffaele; Peterlin, B Lee

2017-05-09

To evaluate the association between migraine and body composition status as estimated based on body mass index and WHO physical status categories. Systematic electronic database searches were conducted for relevant studies. Two independent reviewers performed data extraction and quality appraisal. Odds ratios (OR) and confidence intervals (CI) were pooled using a random effects model. Significant values, weighted effect sizes, and tests of homogeneity of variance were calculated. A total of 12 studies, encompassing data from 288,981 unique participants, were included. The age- and sex-adjusted pooled risk of migraine in those with obesity was increased by 27% compared with those of normal weight (odds ratio [OR] 1.27; 95% confidence interval [CI] 1.16-1.37, p < 0.001) and remained increased after multivariate adjustments. Although the age- and sex-adjusted pooled migraine risk was increased in overweight individuals (OR 1.08; 95% CI 1.04, 1.12, p < 0.001), significance was lost after multivariate adjustments. The age- and sex-adjusted pooled risk of migraine in underweight individuals was marginally increased by 13% compared with those of normal weight (OR 1.13; 95% CI 1.02, 1.24, p < 0.001) and remained increased after multivariate adjustments. The current body of evidence shows that the risk of migraine is increased in obese and underweight individuals. Studies are needed to confirm whether interventions that modify obesity status decrease the risk of migraine. © 2017 American Academy of Neurology.
Preliminary shape analysis of the outline of the baropodometric foot: patterns of covariation, allometry, sex and age differences, and loading variations.

PubMed

Bruner, E; Mantini, S; Guerrini, V; Ciccarelli, A; Giombini, A; Borrione, P; Pigozzi, F; Ripani, M

2009-09-01

Baropodometrical digital techniques map the pressures exerted on the foot plant during both static and dynamic loadings. The study of the distribution of such pressures makes it possible to evaluate the postural and locomotory biomechanics together with its pathological variations. This paper is aimed at evaluating the integration between baropodometric analysis (pressure distribution) and geometrical models (shape of the footprints), investigating the pattern of variation associated with normal plantar morphology. The sample includes 91 individuals (47 males, 44 females), ranging from 5 to 85 years of age (mean and standard deviation = 40 + or - 24).The first component of variation is largely associated with the breadth of the isthmus, along a continuous gradient of increasing/decreasing flattening of the foot plant. This character being dominant upon the whole set of morphological components even in a non-pathological sample, such multivariate computation may represent a good diagnostic tool to quantify its degree of expression in individual subject or group samples. Sexual differences are not significant, and allometric variations associated with increasing plantar surface or stature are not quantitatively relevant. There are some differences between adult and young individuals, associated in the latter with a widening of the medial and posterior areas. These results provide a geometrical framework of baropodometrical analysis, suggesting possible future applications in diagnosis and basic research.
Preoperative Red Cell Distribution Width and 30-day mortality in older patients undergoing non-cardiac surgery: a retrospective cohort observational study.

PubMed

Abdullah, H R; Sim, Y E; Sim, Y T; Ang, A L; Chan, Y H; Richards, T; Ong, B C

2018-04-18

Increased red cell distribution width (RDW) is associated with poorer outcomes in various patient populations. We investigated the association between preoperative RDW and anaemia on 30-day postoperative mortality among elderly patients undergoing non-cardiac surgery. Medical records of 24,579 patients aged 65 and older who underwent surgery under anaesthesia between 1 January 2012 and 31 October 2016 were retrospectively analysed. Patients who died within 30 days had higher median RDW (15.0%) than those who were alive (13.4%). Based on multivariate logistic regression, in our cohort of elderly patients undergoing non-cardiac surgery, moderate/severe preoperative anaemia (aOR 1.61, p = 0.04) and high preoperative RDW levels in the 3rd quartile (>13.4% and ≤14.3%) and 4th quartile (>14.3%) were significantly associated with increased odds of 30-day mortality - (aOR 2.12, p = 0.02) and (aOR 2.85, p = 0.001) respectively, after adjusting for the effects of transfusion, surgical severity, priority of surgery, and comorbidities. Patients with high RDW, defined as >15.7% (90th centile), and preoperative anaemia have higher odds of 30-day mortality compared to patients with anaemia and normal RDW. Thus, preoperative RDW independently increases risk of 30-day postoperative mortality, and future risk stratification strategies should include RDW as a factor.
The contribution of collective attack tactics in differentiating handball score efficiency.

PubMed

Rogulj, Nenad; Srhoj, Vatromir; Srhoj, Ljerka

2004-12-01

The prevalence of 19 elements of collective tactics in score efficient and score inefficient teams was analyzed in 90 First Croatian Handball League--Men games during the 1998-1999 season. Prediction variables were used to describe duration, continuity, system, organization and spatial direction of attacks. Analysis of the basic descriptive and distribution statistical parameters revealed normal distribution of all variables and possibility to use multivariate methods. Canonic discrimination analysis and analysis of variance showed the use of collective tactics elements on attacks to differ statistically significantly between the winning and losing teams. Counter-attacks and uninterrupted attacks predominate in winning teams. Other types of attacks such as long position attack, multiply interrupted attack, attack with one circle runner attack player/pivot, attack based on basic principles, attack based on group cooperation, attack based on independent action, attack based on group maneuvering, rightward directed attack and leftward directed attack predominate in losing teams. Winning teams were found to be clearly characterized by quick attacks against unorganized defense, whereas prolonged, interrupted position attacks against organized defense along with frequent and diverse tactical actions were characteristic of losing teams. The choice and frequency of using a particular tactical activity in position attack do not warrant score efficiency but usually are consequential to the limited anthropologic potential and low level of individual technical-tactical skills of the players in low-quality teams.
A Stochastic Point Cloud Sampling Method for Multi-Template Protein Comparative Modeling.

PubMed

Li, Jilong; Cheng, Jianlin

2016-05-10

Generating tertiary structural models for a target protein from the known structure of its homologous template proteins and their pairwise sequence alignment is a key step in protein comparative modeling. Here, we developed a new stochastic point cloud sampling method, called MTMG, for multi-template protein model generation. The method first superposes the backbones of template structures, and the Cα atoms of the superposed templates form a point cloud for each position of a target protein, which are represented by a three-dimensional multivariate normal distribution. MTMG stochastically resamples the positions for Cα atoms of the residues whose positions are uncertain from the distribution, and accepts or rejects new position according to a simulated annealing protocol, which effectively removes atomic clashes commonly encountered in multi-template comparative modeling. We benchmarked MTMG on 1,033 sequence alignments generated for CASP9, CASP10 and CASP11 targets, respectively. Using multiple templates with MTMG improves the GDT-TS score and TM-score of structural models by 2.96-6.37% and 2.42-5.19% on the three datasets over using single templates. MTMG's performance was comparable to Modeller in terms of GDT-TS score, TM-score, and GDT-HA score, while the average RMSD was improved by a new sampling approach. The MTMG software is freely available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/mtmg.html.
A Stochastic Point Cloud Sampling Method for Multi-Template Protein Comparative Modeling

PubMed Central

Li, Jilong; Cheng, Jianlin

2016-01-01

Generating tertiary structural models for a target protein from the known structure of its homologous template proteins and their pairwise sequence alignment is a key step in protein comparative modeling. Here, we developed a new stochastic point cloud sampling method, called MTMG, for multi-template protein model generation. The method first superposes the backbones of template structures, and the Cα atoms of the superposed templates form a point cloud for each position of a target protein, which are represented by a three-dimensional multivariate normal distribution. MTMG stochastically resamples the positions for Cα atoms of the residues whose positions are uncertain from the distribution, and accepts or rejects new position according to a simulated annealing protocol, which effectively removes atomic clashes commonly encountered in multi-template comparative modeling. We benchmarked MTMG on 1,033 sequence alignments generated for CASP9, CASP10 and CASP11 targets, respectively. Using multiple templates with MTMG improves the GDT-TS score and TM-score of structural models by 2.96–6.37% and 2.42–5.19% on the three datasets over using single templates. MTMG’s performance was comparable to Modeller in terms of GDT-TS score, TM-score, and GDT-HA score, while the average RMSD was improved by a new sampling approach. The MTMG software is freely available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/mtmg.html. PMID:27161489
Nonlinear relationship between waist to hip ratio, weight and strength in elders: is gender the key?

PubMed

Castillo, Carmen; Carnicero, José A; de la Torre, Mari Ángeles; Amor, Solange; Guadalupe-Grau, Amelia; Rodríguez-Mañas, Leocadio; García-García, Francisco J

2015-10-01

Visceral fat has a high metabolic activity with deleterious effects on health contributing to the risk for the frailty syndrome. We studied the association between waist to hip ratio (an indirect measure of visceral fat stores) on upper and lower extremities strength. 1741 individuals aged ≥65 participated in this study. The data was obtained from the Toledo Study for Healthy Aging. For each gender, we studied the relationship between the waist-to-hip ratio (WHR), body mass index (BMI) and regional muscle strength (grip, shoulder, knee and hip) using multivariate linear regression and kernel regression statistical models. WHR was higher in men than in women (0.98 ± 0.07 vs. 0.91 ± 0.08, respectively, P < 0.05). In women with high WHR, we observed a decrease in strength especially in those with a normal BMI. As the WHR lowered, the strength increased regardless of the BMI. In men, lower strength was generally related to the lowest and highest WHR's. Maximum strength in men corresponded at a WHR around 1 and the highest BMI. Muscle strength depends on the joined distribution of WHR and BMI according to gender. In consequence, sex, WHR and BMI should be analyzed conjointly to study the relationship among fat distribution, weight and muscle strength.
An alternative derivation of the stationary distribution of the multivariate neutral Wright-Fisher model for low mutation rates with a view to mutation rate estimation from site frequency data.

PubMed

Schrempf, Dominik; Hobolth, Asger

2017-04-01

Recently, Burden and Tang (2016) provided an analytical expression for the stationary distribution of the multivariate neutral Wright-Fisher model with low mutation rates. In this paper we present a simple, alternative derivation that illustrates the approximation. Our proof is based on the discrete multivariate boundary mutation model which has three key ingredients. First, the decoupled Moran model is used to describe genetic drift. Second, low mutation rates are assumed by limiting mutations to monomorphic states. Third, the mutation rate matrix is separated into a time-reversible part and a flux part, as suggested by Burden and Tang (2016). An application of our result to data from several great apes reveals that the assumption of stationarity may be inadequate or that other evolutionary forces like selection or biased gene conversion are acting. Furthermore we find that the model with a reversible mutation rate matrix provides a reasonably good fit to the data compared to the one with a non-reversible mutation rate matrix. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
Statistical analysis of the 70 meter antenna surface distortions

NASA Technical Reports Server (NTRS)

Kiedron, K.; Chian, C. T.; Chuang, K. L.

1987-01-01

Statistical analysis of surface distortions of the 70 meter NASA/JPL antenna, located at Goldstone, was performed. The purpose of this analysis is to verify whether deviations due to gravity loading can be treated as quasi-random variables with normal distribution. Histograms of the RF pathlength error distribution for several antenna elevation positions were generated. The results indicate that the deviations from the ideal antenna surface are not normally distributed. The observed density distribution for all antenna elevation angles is taller and narrower than the normal density, which results in large positive values of kurtosis and a significant amount of skewness. The skewness of the distribution changes from positive to negative as the antenna elevation changes from zenith to horizon.
An inexact log-normal distribution-based stochastic chance-constrained model for agricultural water quality management

NASA Astrophysics Data System (ADS)

Wang, Yu; Fan, Jie; Xu, Ye; Sun, Wei; Chen, Dong

2018-05-01

In this study, an inexact log-normal-based stochastic chance-constrained programming model was developed for solving the non-point source pollution issues caused by agricultural activities. Compared to the general stochastic chance-constrained programming model, the main advantage of the proposed model is that it allows random variables to be expressed as a log-normal distribution, rather than a general normal distribution. Possible deviations in solutions caused by irrational parameter assumptions were avoided. The agricultural system management in the Erhai Lake watershed was used as a case study, where critical system factors, including rainfall and runoff amounts, show characteristics of a log-normal distribution. Several interval solutions were obtained under different constraint-satisfaction levels, which were useful in evaluating the trade-off between system economy and reliability. The applied results show that the proposed model could help decision makers to design optimal production patterns under complex uncertainties. The successful application of this model is expected to provide a good example for agricultural management in many other watersheds.
Scoring in genetically modified organism proficiency tests based on log-transformed results.

PubMed

Thompson, Michael; Ellison, Stephen L R; Owen, Linda; Mathieson, Kenneth; Powell, Joanne; Key, Pauline; Wood, Roger; Damant, Andrew P

2006-01-01

The study considers data from 2 UK-based proficiency schemes and includes data from a total of 29 rounds and 43 test materials over a period of 3 years. The results from the 2 schemes are similar and reinforce each other. The amplification process used in quantitative polymerase chain reaction determinations predicts a mixture of normal, binomial, and lognormal distributions dominated by the latter 2. As predicted, the study results consistently follow a positively skewed distribution. Log-transformation prior to calculating z-scores is effective in establishing near-symmetric distributions that are sufficiently close to normal to justify interpretation on the basis of the normal distribution.

Inference for multivariate regression model based on multiply imputed synthetic data generated via posterior predictive sampling

NASA Astrophysics Data System (ADS)

Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.

2017-06-01

The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.
MANCOVA for one way classification with homogeneity of regression coefficient vectors

NASA Astrophysics Data System (ADS)

Mokesh Rayalu, G.; Ravisankar, J.; Mythili, G. Y.

2017-11-01

The MANOVA and MANCOVA are the extensions of the univariate ANOVA and ANCOVA techniques to multidimensional or vector valued observations. The assumption of a Gaussian distribution has been replaced with the Multivariate Gaussian distribution for the vectors data and residual term variables in the statistical models of these techniques. The objective of MANCOVA is to determine if there are statistically reliable mean differences that can be demonstrated between groups later modifying the newly created variable. When randomization assignment of samples or subjects to groups is not possible, multivariate analysis of covariance (MANCOVA) provides statistical matching of groups by adjusting dependent variables as if all subjects scored the same on the covariates. In this research article, an extension has been made to the MANCOVA technique with more number of covariates and homogeneity of regression coefficient vectors is also tested.
One Hundred Ways to be Non-Fickian - A Rigorous Multi-Variate Statistical Analysis of Pore-Scale Transport

NASA Astrophysics Data System (ADS)

Most, Sebastian; Nowak, Wolfgang; Bijeljic, Branko

2015-04-01

Fickian transport in groundwater flow is the exception rather than the rule. Transport in porous media is frequently simulated via particle methods (i.e. particle tracking random walk (PTRW) or continuous time random walk (CTRW)). These methods formulate transport as a stochastic process of particle position increments. At the pore scale, geometry and micro-heterogeneities prohibit the commonly made assumption of independent and normally distributed increments to represent dispersion. Many recent particle methods seek to loosen this assumption. Hence, it is important to get a better understanding of the processes at pore scale. For our analysis we track the positions of 10.000 particles migrating through the pore space over time. The data we use come from micro CT scans of a homogeneous sandstone and encompass about 10 grain sizes. Based on those images we discretize the pore structure and simulate flow at the pore scale based on the Navier-Stokes equation. This flow field realistically describes flow inside the pore space and we do not need to add artificial dispersion during the transport simulation. Next, we use particle tracking random walk and simulate pore-scale transport. Finally, we use the obtained particle trajectories to do a multivariate statistical analysis of the particle motion at the pore scale. Our analysis is based on copulas. Every multivariate joint distribution is a combination of its univariate marginal distributions. The copula represents the dependence structure of those univariate marginals and is therefore useful to observe correlation and non-Gaussian interactions (i.e. non-Fickian transport). The first goal of this analysis is to better understand the validity regions of commonly made assumptions. We are investigating three different transport distances: 1) The distance where the statistical dependence between particle increments can be modelled as an order-one Markov process. This would be the Markovian distance for the process, where the validity of yet-unexplored non-Gaussian-but-Markovian random walks start. 2) The distance where bivariate statistical dependence simplifies to a multi-Gaussian dependence based on simple linear correlation (validity of correlated PTRW/CTRW). 3) The distance of complete statistical independence (validity of classical PTRW/CTRW). The second objective is to reveal characteristic dependencies influencing transport the most. Those dependencies can be very complex. Copulas are highly capable of representing linear dependence as well as non-linear dependence. With that tool we are able to detect persistent characteristics dominating transport even across different scales. The results derived from our experimental data set suggest that there are many more non-Fickian aspects of pore-scale transport than the univariate statistics of longitudinal displacements. Non-Fickianity can also be found in transverse displacements, and in the relations between increments at different time steps. Also, the found dependence is non-linear (i.e. beyond simple correlation) and persists over long distances. Thus, our results strongly support the further refinement of techniques like correlated PTRW or correlated CTRW towards non-linear statistical relations.
Differential models of twin correlations in skew for body-mass index (BMI).

PubMed

Tsang, Siny; Duncan, Glen E; Dinescu, Diana; Turkheimer, Eric

2018-01-01

Body Mass Index (BMI), like most human phenotypes, is substantially heritable. However, BMI is not normally distributed; the skew appears to be structural, and increases as a function of age. Moreover, twin correlations for BMI commonly violate the assumptions of the most common variety of the classical twin model, with the MZ twin correlation greater than twice the DZ correlation. This study aimed to decompose twin correlations for BMI using more general skew-t distributions. Same sex MZ and DZ twin pairs (N = 7,086) from the community-based Washington State Twin Registry were included. We used latent profile analysis (LPA) to decompose twin correlations for BMI into multiple mixture distributions. LPA was performed using the default normal mixture distribution and the skew-t mixture distribution. Similar analyses were performed for height as a comparison. Our analyses are then replicated in an independent dataset. A two-class solution under the skew-t mixture distribution fits the BMI distribution for both genders. The first class consists of a relatively normally distributed, highly heritable BMI with a mean in the normal range. The second class is a positively skewed BMI in the overweight and obese range, with lower twin correlations. In contrast, height is normally distributed, highly heritable, and is well-fit by a single latent class. Results in the replication dataset were highly similar. Our findings suggest that two distinct processes underlie the skew of the BMI distribution. The contrast between height and weight is in accord with subjective psychological experience: both are under obvious genetic influence, but BMI is also subject to behavioral control, whereas height is not.
General Multivariate Linear Modeling of Surface Shapes Using SurfStat

PubMed Central

Chung, Moo K.; Worsley, Keith J.; Nacewicz, Brendon, M.; Dalton, Kim M.; Davidson, Richard J.

2010-01-01

Although there are many imaging studies on traditional ROI-based amygdala volumetry, there are very few studies on modeling amygdala shape variations. This paper present a unified computational and statistical framework for modeling amygdala shape variations in a clinical population. The weighted spherical harmonic representation is used as to parameterize, to smooth out, and to normalize amygdala surfaces. The representation is subsequently used as an input for multivariate linear models accounting for nuisance covariates such as age and brain size difference using SurfStat package that completely avoids the complexity of specifying design matrices. The methodology has been applied for quantifying abnormal local amygdala shape variations in 22 high functioning autistic subjects. PMID:20620211
The Effect on Non-Normal Distributions on the Integrated Moving Average Model of Time-Series Analysis.

ERIC Educational Resources Information Center

Doerann-George, Judith

The Integrated Moving Average (IMA) model of time series, and the analysis of intervention effects based on it, assume random shocks which are normally distributed. To determine the robustness of the analysis to violations of this assumption, empirical sampling methods were employed. Samples were generated from three populations; normal,…
An Evaluation of Normal versus Lognormal Distribution in Data Description and Empirical Analysis

ERIC Educational Resources Information Center

Diwakar, Rekha

2017-01-01

Many existing methods of statistical inference and analysis rely heavily on the assumption that the data are normally distributed. However, the normality assumption is not fulfilled when dealing with data which does not contain negative values or are otherwise skewed--a common occurrence in diverse disciplines such as finance, economics, political…
Computer program determines exact two-sided tolerance limits for normal distributions

NASA Technical Reports Server (NTRS)

Friedman, H. A.; Webb, S. R.

1968-01-01

Computer program determines by numerical integration the exact statistical two-sided tolerance limits, when the proportion between the limits is at least a specified number. The program is limited to situations in which the underlying probability distribution for the population sampled is the normal distribution with unknown mean and variance.
Normal versus Noncentral Chi-Square Asymptotics of Misspecified Models

ERIC Educational Resources Information Center

Chun, So Yeon; Shapiro, Alexander

2009-01-01

The noncentral chi-square approximation of the distribution of the likelihood ratio (LR) test statistic is a critical part of the methodology in structural equation modeling. Recently, it was argued by some authors that in certain situations normal distributions may give a better approximation of the distribution of the LR test statistic. The main…
Bias and Efficiency in Structural Equation Modeling: Maximum Likelihood versus Robust Methods

ERIC Educational Resources Information Center

Zhong, Xiaoling; Yuan, Ke-Hai

2011-01-01

In the structural equation modeling literature, the normal-distribution-based maximum likelihood (ML) method is most widely used, partly because the resulting estimator is claimed to be asymptotically unbiased and most efficient. However, this may not hold when data deviate from normal distribution. Outlying cases or nonnormally distributed data,…
1H NMR Metabolomics Study of Spleen from C57BL/6 Mice Exposed to Gamma Radiation

PubMed Central

Xiao, X; Hu, M; Liu, M; Hu, JZ

2016-01-01

Due to the potential risk of accidental exposure to gamma radiation, it’s critical to identify the biomarkers of radiation exposed creatures. In the present study, NMR based metabolomics combined with multivariate data analysis to evaluate the metabolites changed in the C57BL/6 mouse spleen after 4 days whole body exposure to 3.0 Gy and 7.8 Gy gamma radiations. Principal component analysis (PCA) and orthogonal projection to latent structures analysis (OPLS) are employed for classification and identification potential biomarkers associated with gamma irradiation. Two different strategies for NMR spectral data reduction (i.e., spectral binning and spectral deconvolution) are combined with normalize to constant sum and unit weight before multivariate data analysis, respectively. The combination of spectral deconvolution and normalization to unit weight is the best way for identifying discriminatory metabolites between the irradiation and control groups. Normalized to the constant sum may achieve some pseudo biomarkers. PCA and OPLS results shown that the exposed groups can be well separated from the control group. Leucine, 2-aminobutyrate, valine, lactate, arginine, glutathione, 2-oxoglutarate, creatine, tyrosine, phenylalanine, π-methylhistidine, taurine, myoinositol, glycerol and uracil are significantly elevated while ADP is decreased significantly. These significantly changed metabolites are associated with multiple metabolic pathways and may be potential biomarkers in the spleen exposed to gamma irradiation. PMID:27019763
1H NMR metabolomics study of spleen from C57BL/6 mice exposed to gamma radiation

DOE PAGES

Xiao, Xiongjie; Hu, M.; Liu, M.; ...

2016-01-27

Due to the potential risk of accidental exposure to gamma radiation, it’s critical to identify the biomarkers of radiation exposed creatures. In the present study, NMR based metabolomics combined with multivariate data analysis to evaluate the metabolites changed in the C57BL/6 mouse spleen after 4 days whole body exposure to 3.0 Gy and 7.8 Gy gamma radiations. Principal component analysis (PCA) and orthogonal projection to latent structures analysis (OPLS) are employed for classification and identification potential biomarkers associated with gamma irradiation. Two different strategies for NMR spectral data reduction (i.e., spectral binning and spectral deconvolution) are combined with normalize tomore » constant sum and unit weight before multivariate data analysis, respectively. The combination of spectral deconvolution and normalization to unit weight is the best way for identifying discriminatory metabolites between the irradiation and control groups. Normalized to the constant sum may achieve some pseudo biomarkers. PCA and OPLS results shown that the exposed groups can be well separated from the control group. Leucine, 2-aminobutyrate, valine, lactate, arginine, glutathione, 2-oxoglutarate, creatine, tyrosine, phenylalanine, π-methylhistidine, taurine, myoinositol, glycerol and uracil are significantly elevated while ADP is decreased significantly. As a result, these significantly changed metabolites are associated with multiple metabolic pathways and may be potential biomarkers in the spleen exposed to gamma irradiation.« less
A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments

PubMed Central

Avalappampatty Sivasamy, Aneetha; Sundan, Bose

2015-01-01

The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668
A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments.

PubMed

Sivasamy, Aneetha Avalappampatty; Sundan, Bose

2015-01-01

The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.
Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens.

PubMed

Taylor, Sandra L; Ruhaak, L Renee; Weiss, Robert H; Kelly, Karen; Kim, Kyoungmi

2017-01-01

High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact between-biospecimen correlation and multivariate analysis results. We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen. We provide R functions to implement and illustrate our method as supplementary information CONTACT: sltaylor@ucdavis.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
The Distribution of the Product Explains Normal Theory Mediation Confidence Interval Estimation.

PubMed

Kisbu-Sakarya, Yasemin; MacKinnon, David P; Miočević, Milica

2014-05-01

The distribution of the product has several useful applications. One of these applications is its use to form confidence intervals for the indirect effect as the product of 2 regression coefficients. The purpose of this article is to investigate how the moments of the distribution of the product explain normal theory mediation confidence interval coverage and imbalance. Values of the critical ratio for each random variable are used to demonstrate how the moments of the distribution of the product change across values of the critical ratio observed in research studies. Results of the simulation study showed that as skewness in absolute value increases, coverage decreases. And as skewness in absolute value and kurtosis increases, imbalance increases. The difference between testing the significance of the indirect effect using the normal theory versus the asymmetric distribution of the product is further illustrated with a real data example. This article is the first study to show the direct link between the distribution of the product and indirect effect confidence intervals and clarifies the results of previous simulation studies by showing why normal theory confidence intervals for indirect effects are often less accurate than those obtained from the asymmetric distribution of the product or from resampling methods.
Stick-slip behavior in a continuum-granular experiment.

PubMed

Geller, Drew A; Ecke, Robert E; Dahmen, Karin A; Backhaus, Scott

2015-12-01

We report moment distribution results from a laboratory experiment, similar in character to an isolated strike-slip earthquake fault, consisting of sheared elastic plates separated by a narrow gap filled with a two-dimensional granular medium. Local measurement of strain displacements of the plates at 203 spatial points located adjacent to the gap allows direct determination of the event moments and their spatial and temporal distributions. We show that events consist of spatially coherent, larger motions and spatially extended (noncoherent), smaller events. The noncoherent events have a probability distribution of event moment consistent with an M(-3/2) power law scaling with Poisson-distributed recurrence times. Coherent events have a log-normal moment distribution and mean temporal recurrence. As the applied normal pressure increases, there are more coherent events and their log-normal distribution broadens and shifts to larger average moment.
A method for analyzing clustered interval-censored data based on Cox's model.

PubMed

Kor, Chew-Teng; Cheng, Kuang-Fu; Chen, Yi-Hau

2013-02-28

Methods for analyzing interval-censored data are well established. Unfortunately, these methods are inappropriate for the studies with correlated data. In this paper, we focus on developing a method for analyzing clustered interval-censored data. Our method is based on Cox's proportional hazard model with piecewise-constant baseline hazard function. The correlation structure of the data can be modeled by using Clayton's copula or independence model with proper adjustment in the covariance estimation. We establish estimating equations for the regression parameters and baseline hazards (and a parameter in copula) simultaneously. Simulation results confirm that the point estimators follow a multivariate normal distribution, and our proposed variance estimations are reliable. In particular, we found that the approach with independence model worked well even when the true correlation model was derived from Clayton's copula. We applied our method to a family-based cohort study of pandemic H1N1 influenza in Taiwan during 2009-2010. Using the proposed method, we investigate the impact of vaccination and family contacts on the incidence of pH1N1 influenza. Copyright © 2012 John Wiley & Sons, Ltd.
Improved parameter inference in catchment models: 1. Evaluating parameter uncertainty

NASA Astrophysics Data System (ADS)

Kuczera, George

1983-10-01

A Bayesian methodology is developed to evaluate parameter uncertainty in catchment models fitted to a hydrologic response such as runoff, the goal being to improve the chance of successful regionalization. The catchment model is posed as a nonlinear regression model with stochastic errors possibly being both autocorrelated and heteroscedastic. The end result of this methodology, which may use Box-Cox power transformations and ARMA error models, is the posterior distribution, which summarizes what is known about the catchment model parameters. This can be simplified to a multivariate normal provided a linearization in parameter space is acceptable; means of checking and improving this assumption are discussed. The posterior standard deviations give a direct measure of parameter uncertainty, and study of the posterior correlation matrix can indicate what kinds of data are required to improve the precision of poorly determined parameters. Finally, a case study involving a nine-parameter catchment model fitted to monthly runoff and soil moisture data is presented. It is shown that use of ordinary least squares when its underlying error assumptions are violated gives an erroneous description of parameter uncertainty.
HMO selection and Medicare costs: Bayesian MCMC estimation of a robust panel data tobit model with survival.

PubMed

Hamilton, B H

1999-08-01

The fraction of US Medicare recipients enrolled in health maintenance organizations (HMOs) has increased substantially over the past 10 years. However, the impact of HMOs on health care costs is still hotly debated. In particular, it is argued that HMOs achieve cost reduction through 'cream-skimming' and enrolling relatively healthy patients. This paper develops a Bayesian panel data tobit model of HMO selection and Medicare expenditures for recent US retirees that accounts for mortality over the course of the panel. The model is estimated using Markov Chain Monte Carlo (MCMC) simulation methods, and is novel in that a multivariate t-link is used in place of normality to allow for the heavy-tailed distributions often found in health care expenditure data. The findings indicate that HMOs select individuals who are less likely to have positive health care expenditures prior to enrollment. However, there is no evidence that HMOs disenrol high cost patients. The results also indicate the importance of accounting for survival over the panel, since high mortality probabilities are associated with higher health care expenditures in the last year of life.

Strongdeco: Expansion of analytical, strongly correlated quantum states into a many-body basis

NASA Astrophysics Data System (ADS)

Juliá-Díaz, Bruno; Graß, Tobias

2012-03-01

We provide a Mathematica code for decomposing strongly correlated quantum states described by a first-quantized, analytical wave function into many-body Fock states. Within them, the single-particle occupations refer to the subset of Fock-Darwin functions with no nodes. Such states, commonly appearing in two-dimensional systems subjected to gauge fields, were first discussed in the context of quantum Hall physics and are nowadays very relevant in the field of ultracold quantum gases. As important examples, we explicitly apply our decomposition scheme to the prominent Laughlin and Pfaffian states. This allows for easily calculating the overlap between arbitrary states with these highly correlated test states, and thus provides a useful tool to classify correlated quantum systems. Furthermore, we can directly read off the angular momentum distribution of a state from its decomposition. Finally we make use of our code to calculate the normalization factors for Laughlin's famous quasi-particle/quasi-hole excitations, from which we gain insight into the intriguing fractional behavior of these excitations. Program summaryProgram title: Strongdeco Catalogue identifier: AELA_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AELA_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 5475 No. of bytes in distributed program, including test data, etc.: 31 071 Distribution format: tar.gz Programming language: Mathematica Computer: Any computer on which Mathematica can be installed Operating system: Linux, Windows, Mac Classification: 2.9 Nature of problem: Analysis of strongly correlated quantum states. Solution method: The program makes use of the tools developed in Mathematica to deal with multivariate polynomials to decompose analytical strongly correlated states of bosons and fermions into a standard many-body basis. Operations with polynomials, determinants and permanents are the basic tools. Running time: The distributed notebook takes a couple of minutes to run.
Improved Root Normal Size Distributions for Liquid Atomization

DTIC Science & Technology

2015-11-01

Jackson, Primary Breakup of Round Aerated- Liquid Jets in Supersonic Crossflows, Atomization and Sprays, 16(6), 657-672, 2006 H. C. Simmons, The...Breakup in Liquid - Gas Mixing Layers, Atomization and Sprays, 1, 421-440, 1991 P.-K. Wu, L.-K. Tseng, and G. M. Faeth, Primary Breakup in Gas / Liquid ...Improved Root Normal Size Distributions for Liquid Atomization Distribution Statement A. Approved for public release; distribution is unlimited
Bayesian alternative to the ISO-GUM's use of the Welch Satterthwaite formula

NASA Astrophysics Data System (ADS)

Kacker, Raghu N.

2006-02-01

In certain disciplines, uncertainty is traditionally expressed as an interval about an estimate for the value of the measurand. Development of such uncertainty intervals with a stated coverage probability based on the International Organization for Standardization (ISO) Guide to the Expression of Uncertainty in Measurement (GUM) requires a description of the probability distribution for the value of the measurand. The ISO-GUM propagates the estimates and their associated standard uncertainties for various input quantities through a linear approximation of the measurement equation to determine an estimate and its associated standard uncertainty for the value of the measurand. This procedure does not yield a probability distribution for the value of the measurand. The ISO-GUM suggests that under certain conditions motivated by the central limit theorem the distribution for the value of the measurand may be approximated by a scaled-and-shifted t-distribution with effective degrees of freedom obtained from the Welch-Satterthwaite (W-S) formula. The approximate t-distribution may then be used to develop an uncertainty interval with a stated coverage probability for the value of the measurand. We propose an approximate normal distribution based on a Bayesian uncertainty as an alternative to the t-distribution based on the W-S formula. A benefit of the approximate normal distribution based on a Bayesian uncertainty is that it greatly simplifies the expression of uncertainty by eliminating altogether the need for calculating effective degrees of freedom from the W-S formula. In the special case where the measurand is the difference between two means, each evaluated from statistical analyses of independent normally distributed measurements with unknown and possibly unequal variances, the probability distribution for the value of the measurand is known to be a Behrens-Fisher distribution. We compare the performance of the approximate normal distribution based on a Bayesian uncertainty and the approximate t-distribution based on the W-S formula with respect to the Behrens-Fisher distribution. The approximate normal distribution is simpler and better in this case. A thorough investigation of the relative performance of the two approximate distributions would require comparison for a range of measurement equations by numerical methods.
Serum potassium is a predictor of incident diabetes in African Americans with normal aldosterone: the Jackson Heart Study12

PubMed Central

Chatterjee, Ranee; Davenport, Clemontina A; Svetkey, Laura P; Batch, Bryan C; Lin, Pao-Hwa; Ramachandran, Vasan S; Fox, Ervin R; Harman, Jane; Yeh, Hsin-Chieh; Selvin, Elizabeth; Correa, Adolfo; Butler, Kenneth; Edelman, David

2017-01-01

Background: Low-normal potassium is a risk factor for diabetes and may account for some of the racial disparity in diabetes risk. Aldosterone affects serum potassium and is associated with insulin resistance. Objectives: We sought to confirm the association between potassium and incident diabetes in an African-American cohort, and to determine the effect of aldosterone on this association. Design: We studied participants from the Jackson Heart Study, an African-American adult cohort, who were without diabetes at baseline. With the use of logistic regression, we characterized the associations of serum, dietary, and urinary potassium with incident diabetes. In addition, we evaluated aldosterone as a potential effect modifier of these associations. Results: Of 2157 participants, 398 developed diabetes over 8 y. In a minimally adjusted model, serum potassium was a significant predictor of incident diabetes (OR: 0.83; 95% CI: 0.74, 0.92 per SD increment in serum potassium). In multivariable models, we found a significant interaction between serum potassium and aldosterone (P = 0.046). In stratified multivariable models, in those with normal aldosterone (<9 ng/dL, n = 1163), participants in the highest 2 potassium quartiles had significantly lower odds of incident diabetes than did those in the lowest potassium quartile [OR (95% CI): 0.61 (0.39, 0.97) and 0.54 (0.33, 0.90), respectively]. Among those with high-normal aldosterone (≥9 ng/dL, n = 202), we found no significant association between serum potassium and incident diabetes. In these stratified models, serum aldosterone was not a significant predictor of incident diabetes. We found no statistically significant associations between dietary or urinary potassium and incident diabetes. Conclusions: In this African-American cohort, we found that aldosterone may modify the association between serum potassium and incident diabetes. In participants with normal aldosterone, high-normal serum potassium was associated with a lower risk of diabetes than was low-normal serum potassium. Additional studies are warranted to determine whether serum potassium is a modifiable risk factor that could be a target for diabetes prevention. This trial was registered at clinicaltrials.gov as NCT00415415. PMID:27974310
Imaging of polysaccharides in the tomato cell wall with Raman microspectroscopy

PubMed Central

2014-01-01

Background The primary cell wall of fruits and vegetables is a structure mainly composed of polysaccharides (pectins, hemicelluloses, cellulose). Polysaccharides are assembled into a network and linked together. It is thought that the percentage of components and of plant cell wall has an important influence on mechanical properties of fruits and vegetables. Results In this study the Raman microspectroscopy technique was introduced to the visualization of the distribution of polysaccharides in cell wall of fruit. The methodology of the sample preparation, the measurement using Raman microscope and multivariate image analysis are discussed. Single band imaging (for preliminary analysis) and multivariate image analysis methods (principal component analysis and multivariate curve resolution) were used for the identification and localization of the components in the primary cell wall. Conclusions Raman microspectroscopy supported by multivariate image analysis methods is useful in distinguishing cellulose and pectins in the cell wall in tomatoes. It presents how the localization of biopolymers was possible with minimally prepared samples. PMID:24917885
[Distribution of individuals by spontaneous frequencies of lymphocytes with micronuclei. Particularity and consequences].

PubMed

Serebrianyĭ, A M; Akleev, A V; Aleshchenko, A V; Antoshchina, M M; Kudriashova, O V; Riabchenko, N I; Semenova, L P; Pelevina, I I

2011-01-01

By micronucleus (MN) assay with cytokinetic cytochalasin B block, the mean frequency of blood lymphocytes with MN has been determined in 76 Moscow inhabitants, 35 people from Obninsk and 122 from Chelyabinsk region. In contrast to the distribution of individuals on spontaneous frequency of cells with aberrations, which was shown to be binomial (Kusnetzov et al., 1980), the distribution of individuals on the spontaneous frequency of cells with MN in all three massif can be acknowledged as log-normal (chi2 test). Distribution of individuals in the joined massifs (Moscow and Obninsk inhabitants) and in the unique massif of all inspected with great reliability must be acknowledged as log-normal (0.70 and 0.86 correspondingly), but it cannot be regarded as Poisson, binomial or normal. Taking into account that log-normal distribution of children by spontaneous frequency of lymphocytes with MN has been observed by the inspection of 473 children from different kindergartens in Moscow we can make the conclusion that log-normal is regularity inherent in this type of damage of lymphocytes genome. On the contrary the distribution of individuals on induced by irradiation in vitro lymphocytes with MN frequency in most cases must be acknowledged as normal. This distribution character points out that damage appearance in the individual (genomic instability) in a single lymphocytes increases the probability of the damage appearance in another lymphocytes. We can propose that damaged stem cells lymphocyte progenitor's exchange by information with undamaged cells--the type of the bystander effect process. It can also be supposed that transmission of damage to daughter cells occurs in the time of stem cells division.
Fisher information for two gamma frailty bivariate Weibull models.

PubMed

Bjarnason, H; Hougaard, P

2000-03-01

The asymptotic properties of frailty models for multivariate survival data are not well understood. To study this aspect, the Fisher information is derived in the standard bivariate gamma frailty model, where the survival distribution is of Weibull form conditional on the frailty. For comparison, the Fisher information is also derived in the bivariate gamma frailty model, where the marginal distribution is of Weibull form.
Diagonal dominance for the multivariable Nyquist array using function minimization

NASA Technical Reports Server (NTRS)

Leininger, G. G.

1977-01-01

A new technique for the design of multivariable control systems using the multivariable Nyquist array method was developed. A conjugate direction function minimization algorithm is utilized to achieve a diagonal dominant condition over the extended frequency range of the control system. The minimization is performed on the ratio of the moduli of the off-diagonal terms to the moduli of the diagonal terms of either the inverse or direct open loop transfer function matrix. Several new feedback design concepts were also developed, including: (1) dominance control parameters for each control loop; (2) compensator normalization to evaluate open loop conditions for alternative design configurations; and (3) an interaction index to determine the degree and type of system interaction when all feedback loops are closed simultaneously. This new design capability was implemented on an IBM 360/75 in a batch mode but can be easily adapted to an interactive computer facility. The method was applied to the Pratt and Whitney F100 turbofan engine.
[Monitoring method of extraction process for Schisandrae Chinensis Fructus based on near infrared spectroscopy and multivariate statistical process control].

PubMed

Xu, Min; Zhang, Lei; Yue, Hong-Shui; Pang, Hong-Wei; Ye, Zheng-Liang; Ding, Li

2017-10-01

To establish an on-line monitoring method for extraction process of Schisandrae Chinensis Fructus, the formula medicinal material of Yiqi Fumai lyophilized injection by combining near infrared spectroscopy with multi-variable data analysis technology. The multivariate statistical process control (MSPC) model was established based on 5 normal batches in production and 2 test batches were monitored by PC scores, DModX and Hotelling T2 control charts. The results showed that MSPC model had a good monitoring ability for the extraction process. The application of the MSPC model to actual production process could effectively achieve on-line monitoring for extraction process of Schisandrae Chinensis Fructus, and can reflect the change of material properties in the production process in real time. This established process monitoring method could provide reference for the application of process analysis technology in the process quality control of traditional Chinese medicine injections. Copyright© by the Chinese Pharmaceutical Association.
Accuracy and uncertainty analysis of soil Bbf spatial distribution estimation at a coking plant-contaminated site based on normalization geostatistical technologies.

PubMed

Liu, Geng; Niu, Junjie; Zhang, Chao; Guo, Guanlin

2015-12-01

Data distribution is usually skewed severely by the presence of hot spots in contaminated sites. This causes difficulties for accurate geostatistical data transformation. Three types of typical normal distribution transformation methods termed the normal score, Johnson, and Box-Cox transformations were applied to compare the effects of spatial interpolation with normal distribution transformation data of benzo(b)fluoranthene in a large-scale coking plant-contaminated site in north China. Three normal transformation methods decreased the skewness and kurtosis of the benzo(b)fluoranthene, and all the transformed data passed the Kolmogorov-Smirnov test threshold. Cross validation showed that Johnson ordinary kriging has a minimum root-mean-square error of 1.17 and a mean error of 0.19, which was more accurate than the other two models. The area with fewer sampling points and that with high levels of contamination showed the largest prediction standard errors based on the Johnson ordinary kriging prediction map. We introduce an ideal normal transformation method prior to geostatistical estimation for severely skewed data, which enhances the reliability of risk estimation and improves the accuracy for determination of remediation boundaries.
Multivariate flood risk assessment: reinsurance perspective

NASA Astrophysics Data System (ADS)

Ghizzoni, Tatiana; Ellenrieder, Tobias

2013-04-01

For insurance and re-insurance purposes the knowledge of the spatial characteristics of fluvial flooding is fundamental. The probability of simultaneous flooding at different locations during one event and the associated severity and losses have to be estimated in order to assess premiums and for accumulation control (Probable Maximum Losses calculation). Therefore, the identification of a statistical model able to describe the multivariate joint distribution of flood events in multiple location is necessary. In this context, copulas can be viewed as alternative tools for dealing with multivariate simulations as they allow to formalize dependence structures of random vectors. An application of copula function for flood scenario generation is presented for Australia (Queensland, New South Wales and Victoria) where 100.000 possible flood scenarios covering approximately 15.000 years were simulated.
Frequency distribution of lithium in leaves of Lycium andersonii

DOE Office of Scientific and Technical Information (OSTI.GOV)

Romney, E.M.; Wallace, A.; Kinnear, J.

1977-01-01

Lycium andersonii A. Gray is an accumulator of Li. Assays were made of 200 samples of it collected from six different locations within the Northern Mojave Desert. Mean concentrations of Li varied from location to location and tended not to follow log/sub e/ normal distribution, and to follow a normal distribution only poorly. There was some negative skewness to the log/sub e/ distribution which did exist. The results imply that the variation in accumulation of Li depends upon native supply of Li. Possibly the Li supply and the ability of L. andersonii plants to accumulate it are both log/sub e/more » normally distributed. The mean leaf concentration of Li in all locations was 29 ..mu..g/g, but the maximum was 166 ..mu..g/g.« less
The stochastic distribution of available coefficient of friction for human locomotion of five different floor surfaces.

PubMed

Chang, Wen-Ruey; Matz, Simon; Chang, Chien-Chi

2014-05-01

The maximum coefficient of friction that can be supported at the shoe and floor interface without a slip is usually called the available coefficient of friction (ACOF) for human locomotion. The probability of a slip could be estimated using a statistical model by comparing the ACOF with the required coefficient of friction (RCOF), assuming that both coefficients have stochastic distributions. An investigation of the stochastic distributions of the ACOF of five different floor surfaces under dry, water and glycerol conditions is presented in this paper. One hundred friction measurements were performed on each floor surface under each surface condition. The Kolmogorov-Smirnov goodness-of-fit test was used to determine if the distribution of the ACOF was a good fit with the normal, log-normal and Weibull distributions. The results indicated that the ACOF distributions had a slightly better match with the normal and log-normal distributions than with the Weibull in only three out of 15 cases with a statistical significance. The results are far more complex than what had heretofore been published and different scenarios could emerge. Since the ACOF is compared with the RCOF for the estimate of slip probability, the distribution of the ACOF in seven cases could be considered a constant for this purpose when the ACOF is much lower or higher than the RCOF. A few cases could be represented by a normal distribution for practical reasons based on their skewness and kurtosis values without a statistical significance. No representation could be found in three cases out of 15. Copyright © 2013 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable

PubMed Central

2012-01-01

Background When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. Methods An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Results Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. Conclusions The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population. PMID:22716998
Feasibility Study on the Use of On-line Multivariate Statistical Process Control for Safeguards Applications in Natural Uranium Conversion Plants

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ladd-Lively, Jennifer L

2014-01-01

The objective of this work was to determine the feasibility of using on-line multivariate statistical process control (MSPC) for safeguards applications in natural uranium conversion plants. Multivariate statistical process control is commonly used throughout industry for the detection of faults. For safeguards applications in uranium conversion plants, faults could include the diversion of intermediate products such as uranium dioxide, uranium tetrafluoride, and uranium hexafluoride. This study was limited to a 100 metric ton of uranium (MTU) per year natural uranium conversion plant (NUCP) using the wet solvent extraction method for the purification of uranium ore concentrate. A key component inmore » the multivariate statistical methodology is the Principal Component Analysis (PCA) approach for the analysis of data, development of the base case model, and evaluation of future operations. The PCA approach was implemented through the use of singular value decomposition of the data matrix where the data matrix represents normal operation of the plant. Component mole balances were used to model each of the process units in the NUCP. However, this approach could be applied to any data set. The monitoring framework developed in this research could be used to determine whether or not a diversion of material has occurred at an NUCP as part of an International Atomic Energy Agency (IAEA) safeguards system. This approach can be used to identify the key monitoring locations, as well as locations where monitoring is unimportant. Detection limits at the key monitoring locations can also be established using this technique. Several faulty scenarios were developed to test the monitoring framework after the base case or normal operating conditions of the PCA model were established. In all of the scenarios, the monitoring framework was able to detect the fault. Overall this study was successful at meeting the stated objective.« less
Feature combinations and the divergence criterion

NASA Technical Reports Server (NTRS)

Decell, H. P., Jr.; Mayekar, S. M.

1976-01-01

Classifying large quantities of multidimensional remotely sensed agricultural data requires efficient and effective classification techniques and the construction of certain transformations of a dimension reducing, information preserving nature. The construction of transformations that minimally degrade information (i.e., class separability) is described. Linear dimension reducing transformations for multivariate normal populations are presented. Information content is measured by divergence.
Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model-Selection Criteria.

DTIC Science & Technology

1983-06-16

has been advocated by Gnanadesikan and 𔃾ilk (1969), and others in the literature. This suggests that, if we use the formal signficance test type...American Statistical Asso., 62, 1159-1178. Gnanadesikan , R., and Wilk, M..B. (1969). Data Analytic Methods in Multi- variate Statistical Analysis. In
Squeezing Interval Change From Ordinal Panel Data: Latent Growth Curves With Ordinal Outcomes

ERIC Educational Resources Information Center

Mehta, Paras D.; Neale, Michael C.; Flay, Brian R.

2004-01-01

A didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the…
Undiagnosed Small Fiber Polyneuropathy: Is it a Component of Gulf War Illness?

DTIC Science & Technology

2012-07-01

laboratory. After informed consent, a site (10 cm above the ankle ) is anesthetized and one or two 2- or 3mm diameter skin punches are removed using...of the scope of this study, the biopsy results of the youngsters anchor the lower end of the normal biopsy curve from which the multivariate
Multiple Imputation of Item Scores in Test and Questionnaire Data, and Influence on Psychometric Results

ERIC Educational Resources Information Center

van Ginkel, Joost R.; van der Ark, L. Andries; Sijtsma, Klaas

2007-01-01

The performance of five simple multiple imputation methods for dealing with missing data were compared. In addition, random imputation and multivariate normal imputation were used as lower and upper benchmark, respectively. Test data were simulated and item scores were deleted such that they were either missing completely at random, missing at…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.